Unstructured data is still growing; Search technology alone is not the solution

Today I received an e-mail from Computerworld about IBM Watson and “How to Put Watson to Work for Powerful Information and Insights”.  It’s a Whitepaper with info on IBM Watson Explorer (IBM rebranded all their products that have something to do with search and text-analytics with the prefix “Watson”).

One of the first things mentioned in the Whitepaper is “Unstructured data has been growing at 80%, year-over-year” (Yeah, yeah… again the magical 80%).
Mmm… where did I hear that before? Oh yeah that’s been propagated by every vendor of enterprise search solutions over the last decade or more.

The mantra of vendors of search solutions is “no matter how much information/data/documents you have, our solution is capable of finding the most relevant information”.

I’ve been a “search consultant” for years now, and I now that getting the most relevant information op top is very difficult. The more information a company has, the harder it gets.
Just adding files from filesystems, SharePoint, E-mail etc. to the search environment will not make all information findable. Sure, when you are looking for a specific document of which you know some very specific features, you can probably retrieve it. But when you are looking for information on a certain topic, you will be burried with all kinds of irrelevant, not authoritative, content.

So to get your employees or customers the right information one has to take measures.

  1. Know the processes that need information from the search engine.
  2. Identify “important”/”authoritative” sources and content.
    Boost that kind of content
  3. Make it clear to the users what they can expect to find: what sources are indexed?
    Where can they get help?
  4. Clean, Clean, Clean
    Yep… Get rid of information that will not matter at all.

I could mention many more points, but this post is not intended to be a complete guide :).

One remark: Using a search engine to just index as much content from as much sources you can find in your company can be very, very usefull. It gives an information manager the insight as to what content is available in what content system and what the quality is. That can be a first step in improving information governance: no ‘hear-say” but pure data and analytics.

Do not trust vendors of solutions that say that not the volume of the information/content is the problem, but the product that you are now using for search is.
Feeding ground for good findability is good information governance: delete outdated, irrelevant and non-authoritative content and take care of good structuring of the content that matters!


Geef een reactie

Het e-mailadres wordt niet gepubliceerd. Verplichte velden zijn gemarkeerd met *

De volgende HTML-tags en -attributen zijn toegestaan: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>