A modern intranet: connecting people to people?

2017-10-12 22_27_08-connecting people to content content to content people to people. - Google zoekeToday I read “Will Intranets Disappear With the Rise of the Bots?“. The author writes about how “old” intranet were all about one-way communication and providing essential content.

But:

Intranets designed and built around document libraries, one-way communications and links to deeper knowledge are no longer the proud, highly esteemed centerpieces of old“.

According to the article this doesn’t cut it in this time anymore. Doing business and work nowadays asks for more fluïd information, fast two-way communication etc. to support decision making and innovation:

A functioning intranet has become more about people: Finding people, interacting with people, building relationships and networks.” and “Decision making needs and the drive to improve the customer experience require a more fluid and faster intranet than one that is essentially a library“.

The article goes on about bots and how those bots will assist us in getting answers to common questions and helping us with doing simple tasks.

While reading this I thought to myself “But what with the ever-present problem of capturing tacit knowledge“? The goals of Knowledge management are to get the right information to the right people (of proces) at the right time, basically to achieve “doing the things right the first time”. There are two important use cases for managing knowledge:

  1. To make sure that new people that join the company know what the company has been doing, what works/worked and what (has) not, where to get the information they need. Simply to make them productive as quickly as possible.
  2. To make sure that the company is doing the right things right. Think about R&D and product/business development. It makes no sense to develop products you already have or to to research on a topic that has been covered in the past and from which the outcome is already known.

So when the author says:

Knowledge is instantly available, discussed, shared and fully used in the time it takes to add metadata to a document

and connecting people in a social environment is more important than securing information for future reference, we risk asking the same questions to people over and over again. Also, when experienced people leave the company the existing knowledge will leave the company with them. Connecting to people also poses the risk of getting them out of there current process to. This can lead to lower productivity because of the constant disturbance of notifications, instant messaging etc,

So, I still believe in “document libraries” with high quality information and data that any employee can access and use when ever he or she needs it. We simple need to manage the knowledge, information and data so that it is readily accessible.

When the article speaks of “bots” in that context I translate that to “a fucking good search engine” that understands what’s important and what not (in the context of the user/question). Enterprise search solutions also have the ability to provide pro-active suggestions to relevant content (research, people with knowledge). It all depends on how deep you want to integrate different technologies.

So, connecting people remains important in a company. But for a company to survive for a long time, it needs to secure it’s information and “knowledge”. Surely we need more smart solutions to connect people to content, content to content, content to people and people to people.

 

Everything you wanted to know about the Dark Web… and more

Today I acquired a copy of the “Dark Web Notebook” (Investigative tools and tactics for law enforcement, security, and intelligence organizations) by Stephen E Arnold.

I know, the grumpy old man from rural Kentucky who speaks negatively about almost all large “Blue Chip” companies and “self-appointed search experts”.
I read his articles with a lot of scepticism because he seems to “know it all”.

But… with this book he surprised me.

The Dark Web is something we all heard about, but most of us don’t know what it is, including myself. Until now.

If you are curious to, you should get a copy of this book. Purchase for $49 at https://gum.co/darkweb

From the introduction in the book:

The information in this book will equip a computer-savvy person to break the law. The purpose of the book is to help those in law enforcement, security, and intelligence to protect citizens and enforce applicable laws. The point of view in the Dark Web Notebook is pragmatic and pro-enforcement

You are warned!

HPe/IDOL (Former Autonomy IDOL) is still alive and kicking

2017-02-08 14_09_52-IDOL Unstructured Data Analytics_ Enterprise Search & Knowledge Discovery _ HewlWith all the rumble about Solr, Elasticsearch and other search vendors like Coveo, Attivio one could easily forget about that long existing behemoth in the (enterprise) search niche: HPE/IDOL.

IDOL stands for “Intelligent Data Operating Layer” and is a very sophisticated big data and unstructured text analytics platform and has been around for more than two decades.

HPE is stil investing heavily in this technology that consists of a very rich ecosystem of modules:

  • connectors
  • classifiers
  • taxonomy generators
  • clustering engine
  • summarization
  • language detection
  • video and audio search
  • alerting (agents)
  • visualization (Business intelligence for human information (BIFHI))
  • DIH/DAH for distribution (scalability) and mirroring (availability) of content and queries

Recently (december 2016) HPE added machine learning and Natural Language Procession to the capabilities.

IDOL can be used for knowledge search, e-commerce search, customer self service search and other use cases that require fast, accurate and relevant search.

Next to the “on premise” solution, HPE also enabled the IDOL platform to be used in the cloud with a range of services: Haven OnDemand. With this platform developers can quickly build search & data analytics applications. There are dozens of API’s available, amongst them:

  • Speech to text
  • Sentiment analysis
  • Image/video recognition
  • Geo/Spatial search
  • Graph analysis

So IDOL is still very much alive and kicking!

Looking for a specialist that can support you with first class search and text analytics based on HPE IDOL in the Netherlands? KnowledgePlaza Professional Services is a fully certified HPE Partner.

The seven (7) “deadly” sins of text analytics

John Martin of “BeyondRecognition” posted a couple of interesting articles on LinkedIn concerning the use of Text Analytics or Text Mining to classify files and documents.

Of course his “catch” is that one needs visual recognition as well as text based pattern recognition; BeyondRecognition delivers visual recognition technology.
In nearly every article the “problem” of having “image-only” PDFs or TIFFs is mentioned; when there is no text, text mining will not work. We all know that it is very easy to OCR PDFs and TIFFs. One step further is image recognition within photo’s. Both technologies will give us text and metadata to associate with the files.

But still, the articles have some good point that have to be taken into account when using text based classification solutions:

Parts 5 through 7 are still to come…

 

Open source search thriving on Google Search Appliance withdrawal?

Last week I had my first my first encounter with a potential client that changed their policy on open source search because of a recent event.

They were in the middle of a RFI (request for information) to see what options there are for their demands regarding enterprise search, when Google announced the end-of-life for their flag ship enterprise search product: the Google Search Appliance.

This has led them to think about this: “What if we choose a commercial or closed source product for our enterprise search solution and the vendor decides to discontinue it?”.

The news from Google has gotten a lot of attention on the internet, through blog posts and tweets. Of course there are commercial vendors trying to step into this “gap” like Mindbreeze and SearchBlox.

I have seen this happen before, in the time of the “great enterprise search take-overs”. Remember HP and Autonomy, IBM and Vivisimo, Oracle and Endeca, Microsoft and FAST ESP?
At that time organizations also started wondering what would happen to their investments in these high-class, high-priced “pure search” solutions.

In the case of the mentioned potential client the GSA was on their list of possible solutions (especially because of the needed connectors ánd the “document preview” feature). Now it’s gone.

Because of this, they started to embrace the strenght of the open source alternatives, like Elasticsearch and Solr. It’s even becoming a policy.
Surely open source will take some effort in getting all the required functionalities up and running, and they will need an implementation party. But… they will own every piece of software that is developed for them.

I wonder if there are other examples out there of companies switching to open source search solutions, like Apache Solr, because of this kind of unexptected “turn” of a commercial / closed source vendor.

Has Google unwillingly set the enterprise search world on the path of open source search solutions like Apache Solr or Elasticsearch?

 

Enterprise search or Search Based Applications or… a vision?

photoReading the article on cmswire “Enterprise search is bringing me down” by Martin White I also wonder why companies acknowledge that they have much informations (forgive me the term, but what can you make of the combination of “documents”, “databases”, “records”, “intranets”, “webpages”, “products”, “people cards” etc. … yep “informations”) spread around that they see that are (or is) valuable for everyone. There are plenty solutions and products that can help them achieve the goal of re-using that informations and put them to good use.

Still, most of the organizations focus on maintaining (storing, archiving) that informations in silo’s (CMS, DMS, FileShares, E-mail, Databases) and  not  combine it to see what business value the combination of that informations can (and will) bring. It’s pretty simple: If the informations can not be found, it’s useless. Why store it, maintain it? Just get rid of it!

But as humans… we do not like to delete informations. It’s like the working of our brain. In our brain we keep on storing informations and at some point we make use of that informations to make a decision, have a conversation, sing a song or whatever we want to do with that informations because we can retrieve it and make use of it!

Is it the costs?
Okay, building a search solution is not cheap. But if you can find a vendor/solution that can grow along the way, it will be not so expensive right from day one. There are commercial vendors and open source solutions that can deliver what you want. Just know what you want (in the end) and then discuss this with the implementation partners of product vendors. Maybe a “one size fits all” can be the way to go. Maybe the coöperation with an open source implementation partner can make it feasible on the long run?
But… always keep in mind the business case and the value that it can deliver for your business. You are always investing in the storage and maintenance of your informations at this time right? What are those costs? Why not making the informations usable? Remember that search solutions are also very good “cleaning agents”. It surfaces all informations and make it clear that something has to be done about deleting informations. I don’t even want to start on the gaps on informationsecurity that a good enterprise search system can surface…

Is it the complexity?
Whenever you want to take on a big project, at first it seems quite complex and you don’t want to even get started. It is not different with doing things in your own house. But once you have made a plan – do it room by room – then you are seeing the results in a short amount of time. And you are happy that you redecorated the room. That will give you energy in taking on the next room, right? It is the same with a search solution. If you take in on source by source or target group by target group, you will see improvement. And that will give you possitive feedback to start on another source or targetgroup!

Is it lack of vision?
… Yes… It’s the lack of vision. Vision doesn’t start with building a “Deus ex machina” that will do everything you ever wanted. It starts with small steps that will make that vision come true. It’s about making employees and customers happy. That can be achieved with having a vision, having a big plan, making small steps and scale fast.

Is the future of Enterprise search the creation of SBA’s (Search Based Applications) that optimize a couple of processes / business lines instead of optimizing the whole company? The Big Data movement is surely propagating that. They deliver solutions for analyzing big data sets that revolve around a specific business process or goal. But you don’t want lots of applications doing practically the same thing, right? That will costs you a lot of money. Well designed SBA’s work on the same set of informations while delivering value to many processes and target groups. The underlying infrastructure should be capable of serving many different business processes and information needs.

I still believe in the “Google Adigma” that all informations should be made accessible and usable for everyone, within and/or (you don’t want your internal informations out there for everyone to explore) outside of a company.

In my opinion everyone inside and outside a company should have a decent solution that gives access to the informations that are valuable and sometimes needed to perform their daily tasks. Google can do this on the internet, so why don’t you use the solutions at hand that can bring that to your customers and employees?

I won’t say it will be easy, but what has “easy” ever delivered? Surely not your edge on your competitors. Because then we all would be entreprenours with big revenues, right?

But as Martin wrote in the article, I sometimes get tired of explaining (and selling) the value of a good search solution to people who don’t just get it. Still they are using Google every day and take that value for granted… without realizing that they could have that for themselves in their own company and for their customers.

“The future of search is to build the ultimate assistant”

Last week, one of my customers pointed me to an article on Search Engine Land, titled: “The rise of personal assistants and the death of the search box“.

Google’s Behshad Behzadi explains why he thinks that the convenient little search box that sits at the top right corner of nearly every page on the web, will be replaced. The article was written by Eric Enge and of course interpreted by him.

“Google’s goal is to emulate the “Star Trek” computer, which allowed users to have conversations with the computer while accessing all of the world’s information at the same time.”

I think that’s a great goal, and these things could be happening in the not to distant future. Of course we all now Siri, Cortana and Google Now, so this is not so hard to image. Below a timeline about the growth of Google.com:

2016-04-04 10_44_35-The rise of personal assistants and the death of the search box

At this time we are talking more and more to our computers. For most people it still feels weird, but “It’s becoming more and more acceptable to talk to a phone, even in groups.”

So… search applications are getting to know our voice and the way we speak is the way we search.

That demands a lot from search engines. They need to get more intelligent to be able to interprete the questions and match them with a vast amount of possible answers hidden in documents, knowledge basis, graphs, databases etc.
When having found possible answers, the search application needs to present the possible answers in a meaningful way ánd get a dialog going to be sure that it has interpreted the question right.

This future got me wondering about “enterprise search”. All this exciting stuff is happening on the internet. Search behind the firewall is lagging behind. The vast information and development power that is available on the internet is not available in the enterprise.
An answering machine needs to be developed constantly. Better language interpretation, more knowledge graphs (facts and figures) to drive connections, machine learning to process the queries, the clicks visitors perform, other user feedback etc.

The question is if on-premise enterprise search solutions can ever deliver the same experience as the solutions that run in the cloud. It’s impossible to come up with a product that installs on-premise  and has the same rich features that Google is delivering online. One could try, but then it’s the question if the product can keep up with the improvements.

So with the “death of the search box”, will this also lead to “the death of the on-premise search solutions”? Google is dropping support for their on-premise search solution, the Google Search Appliance, for a reason. The way to the cloud and personal assistents is driving that.

Queries are getting longer. What’s the impact?

Recently I’ve been working on a project for a Dutch financial company. It concerns the search functionality of there website. The business case is clear: support self service and getting answers to common questions to take the load (and costs) of the call center.

Of course we are taking search log analysis VERY seriously because there is much to be learned from them.

Some statistics: 400.000+ user queries per month, 108.00+ “unique” queries, Top 5000 queries cover only 7% of the total queries. The long tail is big.
So focussing on the top queries will only cover 7.500 of the 108.000 queries.
68% of the queries have 3 or less terms. When we remove the “stopwords” the queries with 3 terms or less are 78%.

We did some relevancy testing (manually, so very time consuming) and we know that the queries with 2 or 3 terms perform quite good.
The analysis of a part of the long tail helps us identify stopwords and synonyms. So far… so good.

These numbers made me more curious. I want to know what the trend is on the number of terms used in formulating queries. Are people “talking” in a more natural way to search engines? (See: Longer Search Queries Are Becoming the Norm: What It Means for SEO) . I am trying to find more resources on this, so let me know if you know about them.

Why is this important?

A lot of search engines work “keyword based” when trying to find relevant results. They look if the keywords appear in a document and if so, it becomes relevant. When combining those keywords with an “AND”, the more terms you use, the less results you will find. If there are a lot of “meaningless” terms in the query, the chance that you will find what you are looking for becomes less and less. Stopwords can help out here, but one cannot cover all variants.
OK, you say, “Why don’t you combine the terms with an ‘OR’?”.  Indeed that will bring back more possible relevant documents, but with the engine we use (Google Search Appliance), the relevancy is poor.
The issue here is referred to with the concepts “Precision” and “Recall” (see: Wikipedia “Precision and Recall“).

When coping with longer queries – in natural language – the search engine needs to be smarter. The user’s intent has to be determinated so that the essence of the search request is revealed. That essence can then be used to find relevant documents/information in unstructured content.
Instead of (manually) feeding the search engine with stopwords, synonyms etc., the search engine needs to be able to figure this out by itself.

Now I know that the “search engine” is something that ignorant users (sorry for the insult) see as one “thing”. We as search consultants know that there is a lot going on in the total solution (normalization, stemming, query rewriting etc.) and that a lot depends very much on the content, but still…. the “end solution” needs to be able to cope with the large queries.

Bottom line is that search solutions need to be able to handle short queries (a few terms) as well as complete “questions” if the end user is using more and more terms.
What current products support that? We talked to a couple of companies that say that they support “natural language processing”. A lot of times this comes down to analyzing questions that are asked to the call center and creating FAQ’s that match the questions so that a search will come up with the FAQ. Although effective, that’s not completely the idea. This demands a lot of manual actions, while the search has to be “magic” and work on the existing content without changes.

My customer is now looking at IBM Watson to solve their long term plans. They want to make search more “conversational” and support the queries on the website as well as a “virtual assistant” that acts like a chat.

Will search become more conversational? Will users type in their queries as normal questions? How will search vendors react to that?

Replacing a search appliance with… a search appliance?

With the news on the Google Search Appliance leaving the stage of (Enterprise) search solutions – of which there is still no record on the official Google for Work Blog – there are a couple of companies that are willing to fill the “gap”.

I think that a lot of people out there think that the appliance model is why companies choose for Google. I think that’s not the case.

A lot of people like Google when they use it to search the Internet. That’s why I hear a lot of “I want my enterprise search to be like Google!“. That’s pretty fair from a consumer perspective – every employee and employer are also consumers, right? We enterprise search consultants – and the search vendors – need to live up to the expectations. And we try to do so. We know that enterprise search is a different beast than web search, but still, it’s good having a company that sets the bar.

There are a few companies that deliver appliance models for search, namely Mindbreeze and Maxxcat. They are hopping on the flow and they do deliver very good search functionality with the appliance model.

But… wait! Why did those customers of Google choose the Google Search Appliance? Did they want “Google in a Box”? I don’t think so. They wanted “Google-like search experience”. The fact that it came in a yellow box was just “the way it was”. Now I know that the “Business” really liked it. It was kind of nifty, right? The fact was that in many cases IT was reluctant.

IT-infrastructure has been “virtualized” for years now. That hardware based solution does not fit into that. IT wants less dedicated servers to provide the functionality. They want every single server to be virtualized so that uptime/fail-over and performance can be monitored and tuned with the solution that are “in place”.

Bottom line? There are not many companies that choose for an appliance because it is an appliance. They choose a solution and take it for granted that it’s an appliance. IT is very reluctant towards this.

I’ve been (yes the past tense) a Google Search Appliance consultant for years. I see those boxes do great things. But for anything that could not be configured in the (HTML) admin interface, one has to go back to Google Support (which is/was great by the way!). There’s no way for a search team to analyse bugs or change to configuration on a deeper layer than the admin console.

So… If you own a Google Search Appliance, you have enough time to evaluate your search needs. Do this consciously. It may well be that there is a better solution out there, even open source nowadays.

 

 

Definition of “Federated search”

Funny when reading the book of Martin White “Enterprise search – second edition” about “Federated search”.

He defines “Federated search” as:

…which is an
attempt to provide one single search box linked to one single search application that
has an index of every single item of information and data that the organization has
created.

I have been working in the field of search for a couple of years now. When talking about federated search I do not see this as “one search box with all the information (structured and unstructured) stored in one single index”. The fact is that some search vendors/solutions even have something called “search federators”. I think about HP Autonomy “federation engine” and the “One Box” feature of the Google Search Appliance (now discontinued).

I think of federated search just as the opposite of that. In a “federated search” environment the essence is that all information is NOT stored in one big index.

Since there are content systems of which you cannot get the info in your “single index” (or only with high costs and security issues) and the fact that some systems have good search functionality by itself, there is a different way of connecting content in “information silo’s”.
The goal is to present the user a complete insight in every piece of possible relevant information. For that to work, all the info doesn’t have to be stored in one single index (the “Hall” approach that Martin mentions and I agree that this does not have to be the goal and even is not realistic).

Fed_searchInstead of that, the search application can also reach out to different (search) systems at once providing a query that is distributed over that (search) systems.
The results doesn’t have be presented in one single result list. Intelligent and good designed search user interface (or maybe more like a search “portal” in this case) can present the results from the different sources next to each other, using “progressive disclosure” to peruse results from one (search) system one at a time, but in a unified interface.

Wikipedia agrees with me on this:

Federated search is an information retrieval technology that allows the simultaneous search of multiple searchable resources. A user makes a single query request which is distributed to the search engines, databases or other query engines participating in the federation. The federated search then aggregates the results that are received from the search engines for presentation to the user

Of course, Federated search has some very serious disadvantages, but mentioning them is not the goal of this article.

So in my opinion an “Enterprise search” solution can/will consist of a combination of a central index (that will hold as much info as is economically and technically possible) and federated search to other (search) systems to complete the 360 view of all information in an organization.

I just want to get the definitions straight.