Queries are getting longer. What’s the impact?

Recently I’ve been working on a project for a Dutch financial company. It concerns the search functionality of there website. The business case is clear: support self service and getting answers to common questions to take the load (and costs) of the call center.

Of course we are taking search log analysis VERY seriously because there is much to be learned from them.

Some statistics: 400.000+ user queries per month, 108.00+ “unique” queries, Top 5000 queries cover only 7% of the total queries. The long tail is big.
So focussing on the top queries will only cover 7.500 of the 108.000 queries.
68% of the queries have 3 or less terms. When we remove the “stopwords” the queries with 3 terms or less are 78%.

We did some relevancy testing (manually, so very time consuming) and we know that the queries with 2 or 3 terms perform quite good.
The analysis of a part of the long tail helps us identify stopwords and synonyms. So far… so good.

These numbers made me more curious. I want to know what the trend is on the number of terms used in formulating queries. Are people “talking” in a more natural way to search engines? (See: Longer Search Queries Are Becoming the Norm: What It Means for SEO) . I am trying to find more resources on this, so let me know if you know about them.

Why is this important?

A lot of search engines work “keyword based” when trying to find relevant results. They look if the keywords appear in a document and if so, it becomes relevant. When combining those keywords with an “AND”, the more terms you use, the less results you will find. If there are a lot of “meaningless” terms in the query, the chance that you will find what you are looking for becomes less and less. Stopwords can help out here, but one cannot cover all variants.
OK, you say, “Why don’t you combine the terms with an ‘OR’?”.  Indeed that will bring back more possible relevant documents, but with the engine we use (Google Search Appliance), the relevancy is poor.
The issue here is referred to with the concepts “Precision” and “Recall” (see: Wikipedia “Precision and Recall“).

When coping with longer queries – in natural language – the search engine needs to be smarter. The user’s intent has to be determinated so that the essence of the search request is revealed. That essence can then be used to find relevant documents/information in unstructured content.
Instead of (manually) feeding the search engine with stopwords, synonyms etc., the search engine needs to be able to figure this out by itself.

Now I know that the “search engine” is something that ignorant users (sorry for the insult) see as one “thing”. We as search consultants know that there is a lot going on in the total solution (normalization, stemming, query rewriting etc.) and that a lot depends very much on the content, but still…. the “end solution” needs to be able to cope with the large queries.

Bottom line is that search solutions need to be able to handle short queries (a few terms) as well as complete “questions” if the end user is using more and more terms.
What current products support that? We talked to a couple of companies that say that they support “natural language processing”. A lot of times this comes down to analyzing questions that are asked to the call center and creating FAQ’s that match the questions so that a search will come up with the FAQ. Although effective, that’s not completely the idea. This demands a lot of manual actions, while the search has to be “magic” and work on the existing content without changes.

My customer is now looking at IBM Watson to solve their long term plans. They want to make search more “conversational” and support the queries on the website as well as a “virtual assistant” that acts like a chat.

Will search become more conversational? Will users type in their queries as normal questions? How will search vendors react to that?

Replacing a search appliance with… a search appliance?

With the news on the Google Search Appliance leaving the stage of (Enterprise) search solutions – of which there is still no record on the official Google for Work Blog – there are a couple of companies that are willing to fill the “gap”.

I think that a lot of people out there think that the appliance model is why companies choose for Google. I think that’s not the case.

A lot of people like Google when they use it to search the Internet. That’s why I hear a lot of “I want my enterprise search to be like Google!“. That’s pretty fair from a consumer perspective – every employee and employer are also consumers, right? We enterprise search consultants – and the search vendors – need to live up to the expectations. And we try to do so. We know that enterprise search is a different beast than web search, but still, it’s good having a company that sets the bar.

There are a few companies that deliver appliance models for search, namely Mindbreeze and Maxxcat. They are hopping on the flow and they do deliver very good search functionality with the appliance model.

But… wait! Why did those customers of Google choose the Google Search Appliance? Did they want “Google in a Box”? I don’t think so. They wanted “Google-like search experience”. The fact that it came in a yellow box was just “the way it was”. Now I know that the “Business” really liked it. It was kind of nifty, right? The fact was that in many cases IT was reluctant.

IT-infrastructure has been “virtualized” for years now. That hardware based solution does not fit into that. IT wants less dedicated servers to provide the functionality. They want every single server to be virtualized so that uptime/fail-over and performance can be monitored and tuned with the solution that are “in place”.

Bottom line? There are not many companies that choose for an appliance because it is an appliance. They choose a solution and take it for granted that it’s an appliance. IT is very reluctant towards this.

I’ve been (yes the past tense) a Google Search Appliance consultant for years. I see those boxes do great things. But for anything that could not be configured in the (HTML) admin interface, one has to go back to Google Support (which is/was great by the way!). There’s no way for a search team to analyse bugs or change to configuration on a deeper layer than the admin console.

So… If you own a Google Search Appliance, you have enough time to evaluate your search needs. Do this consciously. It may well be that there is a better solution out there, even open source nowadays.