Queries are getting longer. What’s the impact?

Recently I’ve been working on a project for a Dutch financial company. It concerns the search functionality of there website. The business case is clear: support self service and getting answers to common questions to take the load (and costs) of the call center.

Of course we are taking search log analysis VERY seriously because there is much to be learned from them.

Some statistics: 400.000+ user queries per month, 108.00+ “unique” queries, Top 5000 queries cover only 7% of the total queries. The long tail is big.
So focussing on the top queries will only cover 7.500 of the 108.000 queries.
68% of the queries have 3 or less terms. When we remove the “stopwords” the queries with 3 terms or less are 78%.

We did some relevancy testing (manually, so very time consuming) and we know that the queries with 2 or 3 terms perform quite good.
The analysis of a part of the long tail helps us identify stopwords and synonyms. So far… so good.

These numbers made me more curious. I want to know what the trend is on the number of terms used in formulating queries. Are people “talking” in a more natural way to search engines? (See: Longer Search Queries Are Becoming the Norm: What It Means for SEO) . I am trying to find more resources on this, so let me know if you know about them.

Why is this important?

A lot of search engines work “keyword based” when trying to find relevant results. They look if the keywords appear in a document and if so, it becomes relevant. When combining those keywords with an “AND”, the more terms you use, the less results you will find. If there are a lot of “meaningless” terms in the query, the chance that you will find what you are looking for becomes less and less. Stopwords can help out here, but one cannot cover all variants.
OK, you say, “Why don’t you combine the terms with an ‘OR’?”.  Indeed that will bring back more possible relevant documents, but with the engine we use (Google Search Appliance), the relevancy is poor.
The issue here is referred to with the concepts “Precision” and “Recall” (see: Wikipedia “Precision and Recall“).

When coping with longer queries – in natural language – the search engine needs to be smarter. The user’s intent has to be determinated so that the essence of the search request is revealed. That essence can then be used to find relevant documents/information in unstructured content.
Instead of (manually) feeding the search engine with stopwords, synonyms etc., the search engine needs to be able to figure this out by itself.

Now I know that the “search engine” is something that ignorant users (sorry for the insult) see as one “thing”. We as search consultants know that there is a lot going on in the total solution (normalization, stemming, query rewriting etc.) and that a lot depends very much on the content, but still…. the “end solution” needs to be able to cope with the large queries.

Bottom line is that search solutions need to be able to handle short queries (a few terms) as well as complete “questions” if the end user is using more and more terms.
What current products support that? We talked to a couple of companies that say that they support “natural language processing”. A lot of times this comes down to analyzing questions that are asked to the call center and creating FAQ’s that match the questions so that a search will come up with the FAQ. Although effective, that’s not completely the idea. This demands a lot of manual actions, while the search has to be “magic” and work on the existing content without changes.

My customer is now looking at IBM Watson to solve their long term plans. They want to make search more “conversational” and support the queries on the website as well as a “virtual assistant” that acts like a chat.

Will search become more conversational? Will users type in their queries as normal questions? How will search vendors react to that?

This entry was written by Edwin Stauthamer , posted on vrijdag maart 18 2016at 06:03 pm , filed under Kennis, Technologie and tagged , , . Bookmark the permalink . Post a comment below or leave a trackback: Trackback URL.

2 Responses to “Queries are getting longer. What’s the impact?”

  • Martin White zegt:

    The stated business case is to handle common questions, and yet the top 5000 queries only cover 7% of the queries. Really? My immediate reaction is that the website IA is so poor that people are resorting to search in desperation! To me there is also the question of Known Object vs Exploratory. No one goes to a website for maximum recall. They need a few items that are relevant (hence precision is important) and have no requirement for every document on the site about loans. Take a look at http://www.bankwest.com.au for the way the search box is surrounded by ‘did you mean?’ prompts. All financial services organisations have the same problems because of the diversity of their product range, and most of the best use all the tricks of the trade to make sure that people are offered advice on search. http://www.westpac.com.au displays other queries that site visitors have used for (say) mortgage information. Jumping from GSA to Watson is not going to help. If search was this important why are they still using a GSA? Other issues could be that there is a significant amount of irrelevant content on the site or the metatagging is poor. Has the client compared their GSA search with just using Google to search the domain? That can often help in understanding where the problems lie. Hopefully a few ideas that could be helpful.

  • cavitacion zegt:

    cavitacion

    “[…]State of Enterprise Search » Queries are getting longer. What’s the impact?[…]”

Geef een reactie

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>