The leading edge of Big Data in most enterprises is search: how do business analysts, sales and marketing folk, product managers, and others find what they need, when what they need is not in a database? In a multi-channel enterprise, where everyone is constantly exchanging not just emails but IMs and documents, and saving business-critical information in places like wikis and blogs as well as in databases, the ability to find what you need, when you need it, is getting harder to realize. Having to think to look in all those different places and then to rely on disparate search capabilities in each puts the seeker at a disadvantage.
Enterprise search tools do a few critical things in the era of big data:
- office documents, especially if in XML format (e.g. from word processors and spreadsheets)
- plain text files following any standard format such as CSV
- blog entries, social media postings, IMs, etc.
and even, in some cases, to truly unstructured data (audio, video).
In seeking an enterprise search solution, IT needs to look at several factors. First and foremost, will the tool integrate search across all the channels you already care about?
Secondly, if all channels are covered, look at ease of use on the search function: Can searches by easily narrowed? Are found items weighted, and is the weighting scheme amendable to match your business realities? In a generic tool, blog postings from specific individuals may all be weighted equally in the results, for example, but if you have SMEs it is nice to be able to create a taxonomy describing them and having the search engine weight their postings higher in subjects they are expert on.
Thirdly, look carefully at the infrastructure required: how many dedicated virtual servers, how much space for indices? If you press vendors for the typical index size expressed as a percentage of the space consumed by the material indexed, you can make realistic judgments about the impact on your data centers of adding enterprise search. Note, though, that with the data center groaning under the weight of all the data already flooding in, adding what it takes to make the rest truly useful is likely to be part of the new status quo in the big-data center.