It’s been pretty quiet around here… Not that I haven’t published anything, though.
Last week I had my first my first encounter with a potential client that changed their policy on open source search because of a recent event.
They were in the middle of a RFI (request for information) to see what options there are for their demands regarding enterprise search, when Google announced the end-of-life for their flag ship enterprise search product: the Google Search Appliance.
This has led them to think about this: “What if we choose a commercial or closed source product for our enterprise search solution and the vendor decides to discontinue it?”.
The news from Google has gotten a lot of attention on the internet, through blog posts and tweets. Of course there are commercial vendors trying to step into this “gap” like Mindbreeze and SearchBlox.
I have seen this happen before, in the time of the “great enterprise search take-overs”. Remember HP and Autonomy, IBM and Vivisimo, Oracle and Endeca, Microsoft and FAST ESP?
At that time organizations also started wondering what would happen to their investments in these high-class, high-priced “pure search” solutions.
In the case of the mentioned potential client the GSA was on their list of possible solutions (especially because of the needed connectors ánd the “document preview” feature). Now it’s gone.
Because of this, they started to embrace the strenght of the open source alternatives, like Elasticsearch and Solr. It’s even becoming a policy.
Surely open source will take some effort in getting all the required functionalities up and running, and they will need an implementation party. But… they will own every piece of software that is developed for them.
I wonder if there are other examples out there of companies switching to open source search solutions, like Apache Solr, because of this kind of unexptected “turn” of a commercial / closed source vendor.
Has Google unwillingly set the enterprise search world on the path of open source search solutions like Apache Solr or Elasticsearch?
Last week I was invited for an “Expert meeting E-Discovery”. I’ve been in the search business for many years and I regularly encounter the concept and practice for “E-discovery” as well as “Enterprise search” (and E-commerce search, and Search Based Application etc.).
So I decided to get some information about what people think about the difference between Enterprise search and E-Discovery.
Definition of E-Discovery (Wikipedia):
Electronic discovery (also e-discovery or ediscovery) refers to discovery in litigation or government investigations which deals with the exchange of information in electronic format (often referred to as electronically stored information or ESI). These data are subject to local rules and agreed-upon processes, and are often reviewed for privilege and relevance before being turned over to opposing counsel.
Definition of Enterprise search (Wikipedia):
Enterprise search is the practice of making content from multiple enterprise-type sources, such as databases and intranets, searchable to a defined audience.
When you look at the definitions, the difference is in the “goal”. E-Discovery is dealing with legal stuff to gather evidence; Enterprise search is dealing with “general purpose” to gather answers or information to be used in some business process.
But one can also see the similarities. Both deal with digital information, multiple sources and a defined audience.
What could be seen as different is that according to these definitions, E-Discovery does not talk about a technical solution that indexes all (possibly relevant) information and makes that searchable. Enterprise search is much more close to a technical solution.
So… not much differences there, but I am beginning to have a hunch about why people could see them as different. My quest continuous.
I found two articles that are pretty clear about the differences:
I think that the differences that are mentioned come from a conceptual aspect of E-Discovery vs. Enterprise search, not from a technical (solutions) point (and even on the conceptual point they are wrong). Also I think that the authors of the article compare the likings of the Google Search Appliance to specialized E-Discovery tools like ZyLab. They just simplify the fact that there are a lot of more solutions out there that do “Enterprise search” but are very more sophisticated than the Google Search Appliance.
Below I will get into the differences mentioned in those articles from a technical or solution point of view.
So when I look at the differences from my piont of view (implementation and technical) I see three topics:
What none of the articles mention is the complexity of getting all informations out of all systems that contain the content. When abstracting from the possible tools for E-Discovery or Enterprise search, the tools for connecting to many different content systems is probably the most essential thing. When you cannot get informations out of a content system, the most sophisticated tool for search will not help you.
Enterprise search vendors are well aware of that. That’s why they invest so hard into developing connectors for many content systems. There is no “ring to rule them all” in this. If there are E-Discovery vendors that have connectors to get all informations from all content systems I would like to urge them to get into the Enterprise search business.
My conclusion is that there are a couple of products/solutions that can fullfill both Enterprise search needs as well as E-Discovery needs. Specifically I want to mention HPE IDOL (the former Autonomy suite) and Solr.
When looking at the cost perspective, Solr (Open source) can even be the best alternative to expensive E-Discovery tools. When combining Solr with solutions that build on top of them, like LucidWorks Fusion, there is even less to build of your own.
I am only talking about two specific Enterprise search products because I want to make a point. I know that there are a lot more Enterprise search vendors/solutions that can fulfill E-Discovery needs.
Reading the article on cmswire “Enterprise search is bringing me down” by Martin White I also wonder why companies acknowledge that they have much informations (forgive me the term, but what can you make of the combination of “documents”, “databases”, “records”, “intranets”, “webpages”, “products”, “people cards” etc. … yep “informations”) spread around that they see that are (or is) valuable for everyone. There are plenty solutions and products that can help them achieve the goal of re-using that informations and put them to good use.
Still, most of the organizations focus on maintaining (storing, archiving) that informations in silo’s (CMS, DMS, FileShares, E-mail, Databases) and not combine it to see what business value the combination of that informations can (and will) bring. It’s pretty simple: If the informations can not be found, it’s useless. Why store it, maintain it? Just get rid of it!
But as humans… we do not like to delete informations. It’s like the working of our brain. In our brain we keep on storing informations and at some point we make use of that informations to make a decision, have a conversation, sing a song or whatever we want to do with that informations because we can retrieve it and make use of it!
Is it the costs?
Okay, building a search solution is not cheap. But if you can find a vendor/solution that can grow along the way, it will be not so expensive right from day one. There are commercial vendors and open source solutions that can deliver what you want. Just know what you want (in the end) and then discuss this with the implementation partners of product vendors. Maybe a “one size fits all” can be the way to go. Maybe the coöperation with an open source implementation partner can make it feasible on the long run?
But… always keep in mind the business case and the value that it can deliver for your business. You are always investing in the storage and maintenance of your informations at this time right? What are those costs? Why not making the informations usable? Remember that search solutions are also very good “cleaning agents”. It surfaces all informations and make it clear that something has to be done about deleting informations. I don’t even want to start on the gaps on informationsecurity that a good enterprise search system can surface…
Is it the complexity?
Whenever you want to take on a big project, at first it seems quite complex and you don’t want to even get started. It is not different with doing things in your own house. But once you have made a plan – do it room by room – then you are seeing the results in a short amount of time. And you are happy that you redecorated the room. That will give you energy in taking on the next room, right? It is the same with a search solution. If you take in on source by source or target group by target group, you will see improvement. And that will give you possitive feedback to start on another source or targetgroup!
Is it lack of vision?
… Yes… It’s the lack of vision. Vision doesn’t start with building a “Deus ex machina” that will do everything you ever wanted. It starts with small steps that will make that vision come true. It’s about making employees and customers happy. That can be achieved with having a vision, having a big plan, making small steps and scale fast.
Is the future of Enterprise search the creation of SBA’s (Search Based Applications) that optimize a couple of processes / business lines instead of optimizing the whole company? The Big Data movement is surely propagating that. They deliver solutions for analyzing big data sets that revolve around a specific business process or goal. But you don’t want lots of applications doing practically the same thing, right? That will costs you a lot of money. Well designed SBA’s work on the same set of informations while delivering value to many processes and target groups. The underlying infrastructure should be capable of serving many different business processes and information needs.
I still believe in the “Google Adigma” that all informations should be made accessible and usable for everyone, within and/or (you don’t want your internal informations out there for everyone to explore) outside of a company.
In my opinion everyone inside and outside a company should have a decent solution that gives access to the informations that are valuable and sometimes needed to perform their daily tasks. Google can do this on the internet, so why don’t you use the solutions at hand that can bring that to your customers and employees?
I won’t say it will be easy, but what has “easy” ever delivered? Surely not your edge on your competitors. Because then we all would be entreprenours with big revenues, right?
But as Martin wrote in the article, I sometimes get tired of explaining (and selling) the value of a good search solution to people who don’t just get it. Still they are using Google every day and take that value for granted… without realizing that they could have that for themselves in their own company and for their customers.
Last week, one of my customers pointed me to an article on Search Engine Land, titled: “The rise of personal assistants and the death of the search box“.
Google’s Behshad Behzadi explains why he thinks that the convenient little search box that sits at the top right corner of nearly every page on the web, will be replaced. The article was written by Eric Enge and of course interpreted by him.
“Google’s goal is to emulate the “Star Trek” computer, which allowed users to have conversations with the computer while accessing all of the world’s information at the same time.”
I think that’s a great goal, and these things could be happening in the not to distant future. Of course we all now Siri, Cortana and Google Now, so this is not so hard to image. Below a timeline about the growth of Google.com:
At this time we are talking more and more to our computers. For most people it still feels weird, but “It’s becoming more and more acceptable to talk to a phone, even in groups.”
So… search applications are getting to know our voice and the way we speak is the way we search.
That demands a lot from search engines. They need to get more intelligent to be able to interprete the questions and match them with a vast amount of possible answers hidden in documents, knowledge basis, graphs, databases etc.
When having found possible answers, the search application needs to present the possible answers in a meaningful way ánd get a dialog going to be sure that it has interpreted the question right.
This future got me wondering about “enterprise search”. All this exciting stuff is happening on the internet. Search behind the firewall is lagging behind. The vast information and development power that is available on the internet is not available in the enterprise.
An answering machine needs to be developed constantly. Better language interpretation, more knowledge graphs (facts and figures) to drive connections, machine learning to process the queries, the clicks visitors perform, other user feedback etc.
The question is if on-premise enterprise search solutions can ever deliver the same experience as the solutions that run in the cloud. It’s impossible to come up with a product that installs on-premise and has the same rich features that Google is delivering online. One could try, but then it’s the question if the product can keep up with the improvements.
So with the “death of the search box”, will this also lead to “the death of the on-premise search solutions”? Google is dropping support for their on-premise search solution, the Google Search Appliance, for a reason. The way to the cloud and personal assistents is driving that.
Recently I’ve been working on a project for a Dutch financial company. It concerns the search functionality of there website. The business case is clear: support self service and getting answers to common questions to take the load (and costs) of the call center.
Of course we are taking search log analysis VERY seriously because there is much to be learned from them.
Some statistics: 400.000+ user queries per month, 108.00+ “unique” queries, Top 5000 queries cover only 7% of the total queries. The long tail is big.
So focussing on the top queries will only cover 7.500 of the 108.000 queries.
68% of the queries have 3 or less terms. When we remove the “stopwords” the queries with 3 terms or less are 78%.
We did some relevancy testing (manually, so very time consuming) and we know that the queries with 2 or 3 terms perform quite good.
The analysis of a part of the long tail helps us identify stopwords and synonyms. So far… so good.
These numbers made me more curious. I want to know what the trend is on the number of terms used in formulating queries. Are people “talking” in a more natural way to search engines? (See: Longer Search Queries Are Becoming the Norm: What It Means for SEO) . I am trying to find more resources on this, so let me know if you know about them.
Why is this important?
A lot of search engines work “keyword based” when trying to find relevant results. They look if the keywords appear in a document and if so, it becomes relevant. When combining those keywords with an “AND”, the more terms you use, the less results you will find. If there are a lot of “meaningless” terms in the query, the chance that you will find what you are looking for becomes less and less. Stopwords can help out here, but one cannot cover all variants.
OK, you say, “Why don’t you combine the terms with an ‘OR’?”. Indeed that will bring back more possible relevant documents, but with the engine we use (Google Search Appliance), the relevancy is poor.
The issue here is referred to with the concepts “Precision” and “Recall” (see: Wikipedia “Precision and Recall“).
When coping with longer queries – in natural language – the search engine needs to be smarter. The user’s intent has to be determinated so that the essence of the search request is revealed. That essence can then be used to find relevant documents/information in unstructured content.
Instead of (manually) feeding the search engine with stopwords, synonyms etc., the search engine needs to be able to figure this out by itself.
Now I know that the “search engine” is something that ignorant users (sorry for the insult) see as one “thing”. We as search consultants know that there is a lot going on in the total solution (normalization, stemming, query rewriting etc.) and that a lot depends very much on the content, but still…. the “end solution” needs to be able to cope with the large queries.
Bottom line is that search solutions need to be able to handle short queries (a few terms) as well as complete “questions” if the end user is using more and more terms.
What current products support that? We talked to a couple of companies that say that they support “natural language processing”. A lot of times this comes down to analyzing questions that are asked to the call center and creating FAQ’s that match the questions so that a search will come up with the FAQ. Although effective, that’s not completely the idea. This demands a lot of manual actions, while the search has to be “magic” and work on the existing content without changes.
My customer is now looking at IBM Watson to solve their long term plans. They want to make search more “conversational” and support the queries on the website as well as a “virtual assistant” that acts like a chat.
Will search become more conversational? Will users type in their queries as normal questions? How will search vendors react to that?
With the news on the Google Search Appliance leaving the stage of (Enterprise) search solutions – of which there is still no record on the official Google for Work Blog – there are a couple of companies that are willing to fill the “gap”.
I think that a lot of people out there think that the appliance model is why companies choose for Google. I think that’s not the case.
A lot of people like Google when they use it to search the Internet. That’s why I hear a lot of “I want my enterprise search to be like Google!“. That’s pretty fair from a consumer perspective – every employee and employer are also consumers, right? We enterprise search consultants – and the search vendors – need to live up to the expectations. And we try to do so. We know that enterprise search is a different beast than web search, but still, it’s good having a company that sets the bar.
There are a few companies that deliver appliance models for search, namely Mindbreeze and Maxxcat. They are hopping on the flow and they do deliver very good search functionality with the appliance model.
But… wait! Why did those customers of Google choose the Google Search Appliance? Did they want “Google in a Box”? I don’t think so. They wanted “Google-like search experience”. The fact that it came in a yellow box was just “the way it was”. Now I know that the “Business” really liked it. It was kind of nifty, right? The fact was that in many cases IT was reluctant.
IT-infrastructure has been “virtualized” for years now. That hardware based solution does not fit into that. IT wants less dedicated servers to provide the functionality. They want every single server to be virtualized so that uptime/fail-over and performance can be monitored and tuned with the solution that are “in place”.
Bottom line? There are not many companies that choose for an appliance because it is an appliance. They choose a solution and take it for granted that it’s an appliance. IT is very reluctant towards this.
I’ve been (yes the past tense) a Google Search Appliance consultant for years. I see those boxes do great things. But for anything that could not be configured in the (HTML) admin interface, one has to go back to Google Support (which is/was great by the way!). There’s no way for a search team to analyse bugs or change to configuration on a deeper layer than the admin console.
So… If you own a Google Search Appliance, you have enough time to evaluate your search needs. Do this consciously. It may well be that there is a better solution out there, even open source nowadays.
Funny when reading the book of Martin White “Enterprise search – second edition” about “Federated search”.
He defines “Federated search” as:
…which is an
attempt to provide one single search box linked to one single search application that
has an index of every single item of information and data that the organization has
I have been working in the field of search for a couple of years now. When talking about federated search I do not see this as “one search box with all the information (structured and unstructured) stored in one single index”. The fact is that some search vendors/solutions even have something called “search federators”. I think about HP Autonomy “federation engine” and the “One Box” feature of the Google Search Appliance (now discontinued).
I think of federated search just as the opposite of that. In a “federated search” environment the essence is that all information is NOT stored in one big index.
Since there are content systems of which you cannot get the info in your “single index” (or only with high costs and security issues) and the fact that some systems have good search functionality by itself, there is a different way of connecting content in “information silo’s”.
The goal is to present the user a complete insight in every piece of possible relevant information. For that to work, all the info doesn’t have to be stored in one single index (the “Hall” approach that Martin mentions and I agree that this does not have to be the goal and even is not realistic).
Instead of that, the search application can also reach out to different (search) systems at once providing a query that is distributed over that (search) systems.
The results doesn’t have be presented in one single result list. Intelligent and good designed search user interface (or maybe more like a search “portal” in this case) can present the results from the different sources next to each other, using “progressive disclosure” to peruse results from one (search) system one at a time, but in a unified interface.
Wikipedia agrees with me on this:
Federated search is an information retrieval technology that allows the simultaneous search of multiple searchable resources. A user makes a single query request which is distributed to the search engines, databases or other query engines participating in the federation. The federated search then aggregates the results that are received from the search engines for presentation to the user
Of course, Federated search has some very serious disadvantages, but mentioning them is not the goal of this article.
So in my opinion an “Enterprise search” solution can/will consist of a combination of a central index (that will hold as much info as is economically and technically possible) and federated search to other (search) systems to complete the 360 view of all information in an organization.
I just want to get the definitions straight.
It was the year 2005 when Google decided that they could use their superior search to make information in enterprises/behind the firewall searchable.
The Mini could index up to 300.000 documents. The functionality was limited, but great in crawling webbased content, just like Google.com did. The Mini was mainly used to index intranets and public websites. That was in the time before Google introduced Site Search as a product. The Mini did not have features like facetting and connectors to crawl documents from sources other that websites or databases.
Google must have realized that the Mini could not fulfill the Enterprise search demands (many different content sources, need for facetting, changing the relevance, need coping with millions of documents etc.) so they released the Google Search Appliance.
The first versions of the GSA were very similar to the Mini. They added some connectors, facetting, morriring and API’s to manage the appliance.
One important feature was the ability to scale to millions of documents, distributed over several appliances. The limit of the number of documents one appliance could index was 10 million.
The proposition of the GSA shook up the enterprise search market. Management of the GSA was easy and so enterprise search became easy. Or so at least it seemed. “Google knows search and now it is bringing their knowledge to the enterprise. We can have search in our business as good as Google.com“. NOT so fast, there is a big difference in search on the web and search in the enterprise (read “Differences Between Internet vs. Enterprise Search“).
In 2012 Google pulled back the Mini from there offerings and focussed on selling more GSA’s and improving the Enterprise capabilities. I assume that the two are not that different at all and there could be a lot of more money to be made with the GSA.
After that time more energy was put into improving the GSA. After version 6 (the Mini stopped with version 5) came version 7 with more connectors and features like Wild Card search (truncation with ‘*’), Entity Recognition, Document Preview (Documill) etc.. Minor detail is that the OOTB search interface of the GSA was never improved. It reflected Google.com back in 2005.
The last years it became clear that Google didn’t know what to do with this anomaly in it’s cloud offerings. The attention dropped, employees were relocated to other divisions (mainly Google Apps and Cloud) and the implementation partners were left to their own when it came to sales support. There was not much improvement in adding features.
Beginning 2015 Google re-vamped the attention and dedicated more resources to the GSA again. It was clear (at that time) that the profits for the GSA are good and could even be better. Better sales support was promised to the partners (global partner meetings) and sales went slightly up. In 2015 version 7.4 was released with some small improvements but with a brand new connector framework (Plexi adaptors). Several technology partners invested in developing connectors to support this new model. Small detail was that the new connector framework relied heavily on the crawling by the GSA and the adaptors beeing more like a “proxy”. The old connector framework was pretty independant of the GSA by sending full contents of documents to the GSA. (since the open source character of the connectors other companies started to use it in theire own offerings, like LucidWorks using the SharePoint connector).
I’ve been working with the GSA for a long time a I must say that the solution made a lot of customers happy. The GSA really is easy to administer and the performance and stability is near to perfect.
On Thursday February 4th 2016 Google sent an e-mail to all GSA owners and partners stating that the GSA is “end-of-life”. Google will continue to offer support and renewals until 2019, but no innovation on the product will be done anymore. This came as a blow to the existing customers (who have invested a lot of money very recently) and the partners.
Google doesn’t have an alternative for enterprise search yet. It must be working on a cloud offering for that. It will certain be able to search through Google Drive (duh..) and some cloud services like Sales Force, DropBox, Box etc. since the data for those applications already reside in the cloud.
Also see the article “Why Google’s enterprise search is on a sunset march to the cloud“.
Google has proven not to be an enterprise search solution providor. It tried with the Google Search Appliance but it (sadly) failed. The GSA was a good product that fits wel in many areas. But Google is a cloud company an does not have other on-promise solutions.
Google must have come to the conclusion that enterprise search is hard and that the investments doesn’t stand up to the profit. Google doesn’t expose numbers on revenue on GSA deals, but it must be a small part of their revenue.
The GSA lacks some features that would make it “enterprise ready” and the number of feature requests would give them a work load of years to catch up with the current vendors.
Google is a cloud born company that thinks in large volume of users. Their offerings are all cloud based and focus on millions of users paying a small amount of money on a use base. When operating on that scale minimal margins are OK because of the volume.
Enterprise search doesn’t work that way. The license model of the GSA (based on number of documents) holds back opening up large amounts of documents (but that’s not only the case for the GSA. Other search vendors also have that model) .
Having said that, there a couple of search vendors that are ready to step up and are going to use the retraction of Google on the enterprise search market as their “Golden egg”:
This Blog reflects my personal opinions and not that from my employer
Aan het einde van 2015 is de tweede editie van het boek “Enterprise Search” uitgekomen. Dit boek is geschreven door Martin White, een gerespecteerd lid van de enterprise search community, schrijver van boeken over dat onderwerp en begenadigd spreker op vele events wereldwijd.
Ik heb af en toe contact met Martin, via twitter, maar ook “in real life” tijdens seminars. Onlangs heb ik nog in een expert panel gezeten tijdens het Enterprise Search Europe event in Londen. Dit evenement werd gemodereerd door Martin.
Martin noemt deze site in de lijst met bronnen die informatie publiceren over enterprise search. Dit heeft mij doen beseffen dat mijn publicaties ook internationaal worden gewaardeerd. Een mooie opsteker.
Maar… de meeste van mijn blogs zijn in het Nederlands. Niet echt leesbaar voor een internationaal publiek dus. Om beter deel te kunnen nemen aan de internationale stroming op het gebied van enterprise search is het beter als ik in het Engels ga schrijven.
Deze beslissing is niet makkelijk. Mijn activiteiten vinden over het algemeen in Nederland plaats (met af en toe een uitstapje naar België). Ik wil uiteraard ook goed gevonden worden in Nederland. Ik ga er echter vanuit dat nederlanders veelal zoeken op termen op het gebied van enterprise search waardoor mijn site toch wel wordt gevonden.
So… this will be the last article in Dutch on this website.
Ik blijf in het Nederlands bloggen op Blog-IT. Daar zullen vertalingen van mijn blogs op deze site verschijnen.