A modern intranet: connecting people to people?

2017-10-12 22_27_08-connecting people to content content to content people to people. - Google zoekeToday I read “Will Intranets Disappear With the Rise of the Bots?“. The author writes about how “old” intranet were all about one-way communication and providing essential content.

But:

Intranets designed and built around document libraries, one-way communications and links to deeper knowledge are no longer the proud, highly esteemed centerpieces of old“.

According to the article this doesn’t cut it in this time anymore. Doing business and work nowadays asks for more fluïd information, fast two-way communication etc. to support decision making and innovation:

A functioning intranet has become more about people: Finding people, interacting with people, building relationships and networks.” and “Decision making needs and the drive to improve the customer experience require a more fluid and faster intranet than one that is essentially a library“.

The article goes on about bots and how those bots will assist us in getting answers to common questions and helping us with doing simple tasks.

While reading this I thought to myself “But what with the ever-present problem of capturing tacit knowledge“? The goals of Knowledge management are to get the right information to the right people (of proces) at the right time, basically to achieve “doing the things right the first time”. There are two important use cases for managing knowledge:

  1. To make sure that new people that join the company know what the company has been doing, what works/worked and what (has) not, where to get the information they need. Simply to make them productive as quickly as possible.
  2. To make sure that the company is doing the right things right. Think about R&D and product/business development. It makes no sense to develop products you already have or to to research on a topic that has been covered in the past and from which the outcome is already known.

So when the author says:

Knowledge is instantly available, discussed, shared and fully used in the time it takes to add metadata to a document

and connecting people in a social environment is more important than securing information for future reference, we risk asking the same questions to people over and over again. Also, when experienced people leave the company the existing knowledge will leave the company with them. Connecting to people also poses the risk of getting them out of there current process to. This can lead to lower productivity because of the constant disturbance of notifications, instant messaging etc,

So, I still believe in “document libraries” with high quality information and data that any employee can access and use when ever he or she needs it. We simple need to manage the knowledge, information and data so that it is readily accessible.

When the article speaks of “bots” in that context I translate that to “a fucking good search engine” that understands what’s important and what not (in the context of the user/question). Enterprise search solutions also have the ability to provide pro-active suggestions to relevant content (research, people with knowledge). It all depends on how deep you want to integrate different technologies.

So, connecting people remains important in a company. But for a company to survive for a long time, it needs to secure it’s information and “knowledge”. Surely we need more smart solutions to connect people to content, content to content, content to people and people to people.

 

Everything you wanted to know about the Dark Web… and more

Today I acquired a copy of the “Dark Web Notebook” (Investigative tools and tactics for law enforcement, security, and intelligence organizations) by Stephen E Arnold.

I know, the grumpy old man from rural Kentucky who speaks negatively about almost all large “Blue Chip” companies and “self-appointed search experts”.
I read his articles with a lot of scepticism because he seems to “know it all”.

But… with this book he surprised me.

The Dark Web is something we all heard about, but most of us don’t know what it is, including myself. Until now.

If you are curious to, you should get a copy of this book. Purchase for $49 at https://gum.co/darkweb

From the introduction in the book:

The information in this book will equip a computer-savvy person to break the law. The purpose of the book is to help those in law enforcement, security, and intelligence to protect citizens and enforce applicable laws. The point of view in the Dark Web Notebook is pragmatic and pro-enforcement

You are warned!

Enterprise Search vs. E-Discovery from a solution point of view

Last week I was invited for an “Expert meeting E-Discovery”. I’ve been in the search business for many years and I regularly encounter the concept and practice for “E-discovery” as well as “Enterprise search” (and E-commerce search, and Search Based Application etc.).

So I decided to get some information about what people think about the difference between Enterprise search and E-Discovery.

Definition of E-Discovery (Wikipedia):

Electronic discovery (also e-discovery or ediscovery) refers to discovery in litigation or government investigations which deals with the exchange of information in electronic format (often referred to as electronically stored information or ESI). These data are subject to local rules and agreed-upon processes, and are often reviewed for privilege and relevance before being turned over to opposing counsel.

Definition of Enterprise search (Wikipedia):

Enterprise search is the practice of making content from multiple enterprise-type sources, such as databases and intranets, searchable to a defined audience.

When you look at the definitions, the difference is in the “goal”. E-Discovery is dealing with legal stuff to gather evidence; Enterprise search is dealing with “general purpose” to gather answers or information to be used in some business process.
But one can also see the similarities. Both deal with digital information, multiple sources and a defined audience.

What could be seen as different is that according to these definitions, E-Discovery does not talk about a technical solution that indexes all (possibly relevant) information and makes that searchable. Enterprise search is much more close to a technical solution.

So… not much differences there, but I am beginning to have a hunch about why people could see them as different. My quest continuous.

I found two articles that are pretty clear about the differences:

I think that the differences that are mentioned come from a conceptual aspect of E-Discovery vs. Enterprise search, not from a technical (solutions) point (and even on the conceptual point they are wrong). Also I think that the authors of the article compare the likings of the Google Search Appliance to specialized E-Discovery tools like ZyLab. They just simplify the fact that there are a lot of more solutions out there that do “Enterprise search” but are very more sophisticated than the Google Search Appliance.

Below I will get into the differences mentioned in those articles from a technical or solution point of view.

From “Enterprise Search vs. E-Discovery Search: Same or Different?“:

  1. Business objective is a key consideration
    “Recall vs. Precision” (getting all the relevant informations vs. getting the most relevant informations)
    It is true that a typical Enterprise search implementation will focus on precision. To support efficient answering of common queries and speeding up information driven processes in a company, precision is important.
    This does not say that the products used for Enterprise search cannot deliver all relevant informations for a query. HPE IDOL as well as Solr can retrieve all relevant informations fast.
  2. Number of search queries matter
    “Simple vs. complex queries”
    Here a couple of keyword examples are given to illustrate how people use Enterprise search. I’ve been working with companies (intelligence) that use Enterprise search solutions (like HPE IDOL/Autonomy) to use far more complex queries to get all possible relevant informations back.
    The complex queries that are illustrated can be handled by Solr easily.
  3. The cost of relevancy
    “Transparent query expansion”
    For every search manager is important to know why results show up given a specific query. It is needed to tune the engine uses and the results that are displayed to the users.
    Solr is open source and that’s why the community invest heavily in making it transparent why results come up given a specific (complex) query.
    Furthermore there are tools that can be used with Solr that can even make E-Discovery better. Think of the Clustering engine Carrot2. That solution will make relations in informations visible even without knowing up front that those relations could even exist.

From “3 Reasons Enterprise Search is not eDiscovery“:

  1. Lenghty deployment
    “All informations for one audience” vs. “All informations for everyone”
    For this… see the first bullet under the next section “Business case”.
    But also… an Enterprise search deployment can take some time because you have to find ways to get informations out of some content systems. Will this be ease when using a E-Discovery solution? Do they have ways to get content out out ALL content systems? If so… please share this with the world and let that vendor get into the Enterprise search business. They will have the “golden egg”!
  2. Misses key data sources
    E-Discovery vs. “Intranet search”
    The whole promise of “Enterprise search” is to get all informations in a company findable by all employees. The authors of the articles must have missed some information about this. Point.
  3. Not Actionable
    “Viewing” vs. “Follow up”
    The platforms that make up a real good Enterprise search solution are designed to support many information needs. They can support many different search based applications (SBA’s). E-Discovery could as well be such a search based application. It has specific needs in formulating queries, exploring content, saving the results, annotating it and even recording queries with their explanation.

Analysis

So when I look at the differences from my piont of view (implementation and technical) I see three topics:

  • Business case
    The Business case for an E-Discovery Solution is clear: You have to implement/use this because you HAVE to. It’s a legal thing. The company has to give access to the data. Of course there is still a choice for doing this manually. But if there is too much information, the cost of labour will exceed the cost of a technical solution.
    When we look at Enterprise search (all information within the company for all employees) there is no one who will start implementing a technical solution without insight in the cost and benefits. Implementing a large (many sources, many documents, many users) Enterprise search solution is very costly.
  • Audience
    The audience (target group) for E-Discovery is the investigators that have to find out if there is any relevant information concerning an indictment or absolution in a legal case. This group is highly trained and it can be assumed that they can work with complex queries, complex user interfaces, complex reporting tools etc. Focus is getting all relevant documents, no matter how hard it is to formulate the right queries and traversing through the possible results.
    The audience for Enterprise search is “everyone”. This could be skilled informationspecialists, but also the guys from marketing, R&D and other departments, just trying to find the right template, customer report, annual financial report or even the latest menu from the company restaurant.
    Design of the user experience has to be carefully designed so that it is usable for a broad audience with different information needs. Sometimes the most relevant answer or document is OK, but in other use cases getting all the information on a topic is needed.
  • Security
    For E-Discovery in legal circumstances it’s simple: Every piece of informations has to be accessible. So no difficult stuff about who can see what.
    In Enterprise search security is a pain in the *ss. Many different content systems, many different security mechanisms and many different users that have different identities in different systems.
  • Functionality
    To provide the right tools for an E-Discovery goal a solution needs to take care about some specific demands. I am pretty sure that the search solutions I mentioned can take of most of them. It’s all in the creation of the user interface and supporting add-ons to make it happen.
    Allthough a typical Enterprise search implementation may not have this, the products used and the possibilities of creating custom reports and actions (explain, store etc.) do exist.

Connectors?

What none of the articles mention is the complexity of getting all informations out of all systems that contain the content. When abstracting from the possible tools for E-Discovery or Enterprise search, the tools for connecting to many different content systems is probably the most essential thing. When you cannot get informations out of a content system, the most sophisticated tool for search will not help you.
Enterprise search vendors are well aware of that. That’s why they invest so hard into developing connectors for many content systems. There is no “ring to rule them all” in this. If there are E-Discovery vendors that have connectors to get all informations from all content systems I would like to urge them to get into the Enterprise search business.

Conclusion

My conclusion is that there are a couple of products/solutions that can fullfill both Enterprise search needs as well as E-Discovery needs. Specifically I want to mention HPE IDOL (the former Autonomy suite) and Solr.
When looking at the cost perspective, Solr (Open source) can even be the best alternative to expensive E-Discovery tools. When combining Solr with solutions that build on top of them, like LucidWorks Fusion, there is even less to build of your own.

PS

I am only talking about two specific Enterprise search products because I want to make a point. I know that there are a lot more Enterprise search vendors/solutions that can fulfill E-Discovery needs.

Queries are getting longer. What’s the impact?

Recently I’ve been working on a project for a Dutch financial company. It concerns the search functionality of there website. The business case is clear: support self service and getting answers to common questions to take the load (and costs) of the call center.

Of course we are taking search log analysis VERY seriously because there is much to be learned from them.

Some statistics: 400.000+ user queries per month, 108.00+ “unique” queries, Top 5000 queries cover only 7% of the total queries. The long tail is big.
So focussing on the top queries will only cover 7.500 of the 108.000 queries.
68% of the queries have 3 or less terms. When we remove the “stopwords” the queries with 3 terms or less are 78%.

We did some relevancy testing (manually, so very time consuming) and we know that the queries with 2 or 3 terms perform quite good.
The analysis of a part of the long tail helps us identify stopwords and synonyms. So far… so good.

These numbers made me more curious. I want to know what the trend is on the number of terms used in formulating queries. Are people “talking” in a more natural way to search engines? (See: Longer Search Queries Are Becoming the Norm: What It Means for SEO) . I am trying to find more resources on this, so let me know if you know about them.

Why is this important?

A lot of search engines work “keyword based” when trying to find relevant results. They look if the keywords appear in a document and if so, it becomes relevant. When combining those keywords with an “AND”, the more terms you use, the less results you will find. If there are a lot of “meaningless” terms in the query, the chance that you will find what you are looking for becomes less and less. Stopwords can help out here, but one cannot cover all variants.
OK, you say, “Why don’t you combine the terms with an ‘OR’?”.  Indeed that will bring back more possible relevant documents, but with the engine we use (Google Search Appliance), the relevancy is poor.
The issue here is referred to with the concepts “Precision” and “Recall” (see: Wikipedia “Precision and Recall“).

When coping with longer queries – in natural language – the search engine needs to be smarter. The user’s intent has to be determinated so that the essence of the search request is revealed. That essence can then be used to find relevant documents/information in unstructured content.
Instead of (manually) feeding the search engine with stopwords, synonyms etc., the search engine needs to be able to figure this out by itself.

Now I know that the “search engine” is something that ignorant users (sorry for the insult) see as one “thing”. We as search consultants know that there is a lot going on in the total solution (normalization, stemming, query rewriting etc.) and that a lot depends very much on the content, but still…. the “end solution” needs to be able to cope with the large queries.

Bottom line is that search solutions need to be able to handle short queries (a few terms) as well as complete “questions” if the end user is using more and more terms.
What current products support that? We talked to a couple of companies that say that they support “natural language processing”. A lot of times this comes down to analyzing questions that are asked to the call center and creating FAQ’s that match the questions so that a search will come up with the FAQ. Although effective, that’s not completely the idea. This demands a lot of manual actions, while the search has to be “magic” and work on the existing content without changes.

My customer is now looking at IBM Watson to solve their long term plans. They want to make search more “conversational” and support the queries on the website as well as a “virtual assistant” that acts like a chat.

Will search become more conversational? Will users type in their queries as normal questions? How will search vendors react to that?

Definition of “Federated search”

Funny when reading the book of Martin White “Enterprise search – second edition” about “Federated search”.

He defines “Federated search” as:

…which is an
attempt to provide one single search box linked to one single search application that
has an index of every single item of information and data that the organization has
created.

I have been working in the field of search for a couple of years now. When talking about federated search I do not see this as “one search box with all the information (structured and unstructured) stored in one single index”. The fact is that some search vendors/solutions even have something called “search federators”. I think about HP Autonomy “federation engine” and the “One Box” feature of the Google Search Appliance (now discontinued).

I think of federated search just as the opposite of that. In a “federated search” environment the essence is that all information is NOT stored in one big index.

Since there are content systems of which you cannot get the info in your “single index” (or only with high costs and security issues) and the fact that some systems have good search functionality by itself, there is a different way of connecting content in “information silo’s”.
The goal is to present the user a complete insight in every piece of possible relevant information. For that to work, all the info doesn’t have to be stored in one single index (the “Hall” approach that Martin mentions and I agree that this does not have to be the goal and even is not realistic).

Fed_searchInstead of that, the search application can also reach out to different (search) systems at once providing a query that is distributed over that (search) systems.
The results doesn’t have be presented in one single result list. Intelligent and good designed search user interface (or maybe more like a search “portal” in this case) can present the results from the different sources next to each other, using “progressive disclosure” to peruse results from one (search) system one at a time, but in a unified interface.

Wikipedia agrees with me on this:

Federated search is an information retrieval technology that allows the simultaneous search of multiple searchable resources. A user makes a single query request which is distributed to the search engines, databases or other query engines participating in the federation. The federated search then aggregates the results that are received from the search engines for presentation to the user

Of course, Federated search has some very serious disadvantages, but mentioning them is not the goal of this article.

So in my opinion an “Enterprise search” solution can/will consist of a combination of a central index (that will hold as much info as is economically and technically possible) and federated search to other (search) systems to complete the 360 view of all information in an organization.

I just want to get the definitions straight.

 

Enterprise Search – Geschiedenis herhaalt zich

Twee decennia geleden was er een grote aanbieder van zoekoplossingen voor bedrijven: Verity. Verity leverde een oplossing voor het doorzoekbaar én vindbaar maken van álle informatie binnen een organisatie, onafhankelijk van welke bron dan ook. Deze oplossing is ook bekend onder de noemer “Enterprise Search”.
Autonomy heeft Verity begin jaren “00” overgenomen en enkele jaren geleden heeft HP Autonomy overgenomen.

Sinds die tijd zijn er vele aanbieders van “enterprise search” toegetreden tot de markt van “enterprise search”:

  • Coveo (nu IBM)
  • Endeca (nu Oracle)
  • Exalead (Nu Dassault systèm)
  • LucidWorks (Fusion)
  • en nog meer

In mijn tijd als search consultant heb ik vele oplossingen mogen implementeren en heb ik de ontwikkelingen van verschillende – ook nieuwe – aanbieders gevolgd.

Iedere Enterprise search oplossing heeft dezelfde aandachtspunten:

  1. Hoe verkrijg je de informatie uit verschillende systemen in de index (crawling, feeding, connectoren)
  2. Hoe zorg je ervoor dat de gebruikers van de zoekoplossing alleen die resultaten terug kan vinden die je ook alleen mag zien (conform de rechten die in het bronsysteem gelden).

De “oude” oplossingen zoals Autonomy hebben vele connectoren om informatiesystemen aan te sluiten compleet met oplossingen voor permissies, updates, schaalbaarheid, beschikbaarheid etc.

De “nieuwe” aanbieders lopen tegen dezelfde problemen aan die de “oude” aanbieders al hebben opgelost. Hoe kan je vaststellen welke user welk resultaat mag zien? Wat als een bronsysteem niet beschikbaar is? Verwijder je dan gewoon alle content die niet meer beschikbaar is omdat een connector er niet mee bij kan komen?

Ik ben afgelopen week tegen zo’n probleem aangelopen. In een omgeving waar we de oplossing van Google (Google Search Appliance (GSA) + Adaptor voor SharePoint) hebben geïmplementeerd bleek de adaptor (=connector) niet meer beschikbaar te zijn. Omdat deze adaptor niet meer beschikbaar was kon de GSA ook niet meer bij die bron komen.

Het gevolg? Alle documenten (4 miljoen) werden uit de index verwijderd. Het duurt ongeveer 2 weken om deze content opnieuw te vergaren. Het resultaat ofwel de gebruikerservaring kan je je voorstellen.

Het verbaast mij om te zien dat alle aanbieders van Enterprise search oplossingen iedere keer het wiel opnieuw moeten/willen uitvinden omdat ze denken dat zij het beter/anders kunnen doen. Het “not invented here” syndroom lijkt hierbij te prevaleren. Dit in plaats van het (her)gebruiken van wat anderen al hebben bedacht en daarop voor te bouwen.

Uiteraard begrijp ik het commerciële gedeelte. Ik begrijp alleen niet hoe men (lees: nieuwe aanbieders) een nieuwe oplossing wil maken zonder gebruik te maken van de kennis en oplossingen die al aanwezig zijn?

Een ezel stoot zich toch ook niet aan dezelfde steen?

Dit onderstreept ook het belang van het betrekken van een  expert op het gebied van “Enterprise search” wanneer je je als organisatie wil verdiepen in de implementatie zoekoplossingen.
De aanbieders van zoekmachines belichten vaak maar enkele kanten van een totale oplossing.

Simpele dingen die we kunnen doen om “Tacit” knowlegde inzichtelijk te maken (Engels)

One Common Model

Different authors have come back to a general concept along these lines:

  • Instill a knowledge vision
  • Manage the conversations
  • Mobilize knowledge activists
  • Create the right context for knowledge creation
  • Globalize local knowledge

At AnswerHub, we try to work within all these areas on a regular basis, although the idea of “globalizing local knowledge” — essentially, making sure that certain bits of information aren’t tied up in one person’s head/e-mail — is one of the true value-adds of our software.

All the steps above are crucial, although the terminology can feel a little “business-school-y” from time to time. What exactly does it mean to “instill a knowledge vision,” for example? How do you “mobilize knowledge activists?” Let’s see if we can break this down into some day-to-day examples.

Simple Things We Can Do To Uncover Tacit Knowledge

  1. Set one meeting a week aside as Discovery Day: Have three people picked beforehand; their goal is to do five-minute presentations (no longer than that) on an aspect of work that isn’t part of the day-to-day grind but really intrigues them. After the 15 minutes of presenting, the other participants in the meeting go to one of the three presenters (whoever interested them most) and the presenters take their colleagues and explain the idea a bit more in depth. This is a way to promote the idea of learning, looking outside the day-to-day, and fostering discovery among employees.
  2. Set one meeting at the beginning of the month as a Gaps meeting: If you want to avoid this being a meeting, you can turn it into a Shared Doc/Tumblr/etc. Essentially, everyone is supposed to list some of the biggest knowledge gaps that prevented them from doing their best work in the previous month, as well as 2-3 new things they learned in the work context. If everyone contributes in the first five days of the month, you now have a picture of your biggest knowledge gaps — as well as what you’re doing well. You can plan for the coming month off of that!
  3. Lucky Lottery Partnerships: At the beginning of a six-week cycle, bring clusters of 25-50 employees together and draw them off in a lottery into groups of six-eight. Within the next six weeks, the newfound groups need to share new types of knowledge and demonstrate how they did so; this can be weekly meetings, coffee dates, a poster or white paper, or something else. It can feel like more work — that’s where you need top-down buy-in — but in reality, it helps cement a culture where pursuit of learning / new knowledge is paramount. That type of culture will thrive long-term.
  4. Pulse checks: The idea here is to quickly (brevity is a key) figure out how your people are most comfortable learning. Would they rather learn from peers, from experts? From SlideShares, from videos? In quick bursts or day-long seminars? Remember: a key differentiator between top companies (in terms of engagement) and low-ranking companies — and this is at any size — is the access to and context around new opportunities to learn/grow. Your employees want that to be provided, so you need to figure out what makes them learn the best.
  5. Process Software:The ultimate goal with tacit knowledge capture is taking local knowledge — only Bob knows how to do this, so when Bob is out of office or Bob takes another job, we’re doomed — and making it global knowledge. Software, like AnswerHub, can be a powerful tool for doing just that. The key elements therein are:
    • Making sure Bob isn’t threatened and still realizes his value
    • Figuring out the most comfortable way for everyone else to learn Bob’s skills
    • Setting up a few different modalities/programs for Bob’s knowledge to be disseminated
    • Creating organic communication channels where people can ask Bob questions
    • Having a space where the shared knowledge is now physically shared.

In a five-step process (with help from technology), you just went from information silos and knowledge being contained locally to shared knowledge throughout your organization. It’s hard, but it’s definitely achievable

- See more at: http://answerhub.com/article/5-steps-to-tapping-into-your-tacit-knowledge

StateofEnterpriseSearch.nl presenteert: Webinar

Afgelopen week heb ik een webinar bijgewoond getiteld: “The State of Enterprise Search“.

Dit webinar is georganiseerd door BA insight.

In een soort “round table” setting werd door zeer bekende personen in het vakgebied “Enterprise Search” gediscussieerd over onderwerpen die door de moderator werden ingebracht. De deelnemers waren:

  • Martin White
  • Sue Feldman
  • Jeff Fried

De webinar is opgenomen en kan worden teruggeluisterd op: http://vimeo.com/78551770.

BA insight heeft de afgelopen weken ook twee rapporten opgeleverd: State of Search in the Enterprise: Part 1 & Part 2

Veel luister en leesplezier!

Kosten voor de aanschaf en implementatie van een zoekoplossing zijn niet transparant

In het artikel “Budgeting for Enterprise Search – Time for the Guessing to End” geeft Martin White een kijkje in de keuken van de kosten voor IT-systemen.

Zijn punt daarbij is dat de kosten voor de aanschaf van een zoekoplossing of “search engine” niet eenduidig kunnen worden bepaald.

De meeste organisaties hebben nog geen ervaring met het implementeren van zo’n oplossing waardoor ze niet kunnen putten uit ervaringscijfers. Ook zijn de licentiekosten van de verschillende oplossingen niet duidelijk omdat iedere aanbieder een ander model hanteert: aantal documenten, OTAP omgevingen, per server, aantal searches, extra kosten voor connectors en modules etc.

Het is dan ook niet verstandig om je licht op te steken bij de aanbieders van deze zoekoplossingen. Aanbieders van software verkopen hun producten en oplossingen meestal ook via “certified resellers” of partners. Indien een Partner alleen software verkoop van één aanbieder zal de informatie ook gekleurd zijn. Het gaat immers om het zo snel mogelijk afsluiten van een deal.

In de wereld van zoekoplossingen zijn er echter ook implementatie-experts die verstand hebben van meerdere producten, de zogenaamde “multi vendor” consultants. Zij kennen de markt, de verschillende prijsmodellen en de verschillen tussen de producten. Ze zijn niet gebonden aan één aanbieder en kunnen dus ook onafhankelijk adviseren.

Bottom-line? Zoek contact met een “multi vendor” implementatiepartner wanneer u op zoek bent naar een oplossing voor een enterprise search oplossing.

Ask dr. Search

We hebben een nieuw onderdeel op deze site: Ask dr. Search.

Met dit nieuwe onderdeel willen we alle mensen die een vraag hebben over

  • zoeken en vinden in het algemeen
  • een (technische) oplossing binnen een bestaand product
  • het gebruik van metadata om informatie beter vindbaar te maken
  • meer informatie-organisatorische problemen

een platform bieden om hun vragen te stellen.

Ga naar Ask dr Search en kom meer te weten of stel gelijk een vraag.