B2B Wins #26: The problem with enterprise search
The tech is promising, but tech isn’t the problem
Enterprise search has been awful for forever. It’s not because the technology isn’t very good, even though it’s not. Enterprise search has been awful because the technology relies upon experts to deploy and maintain. If there’s one thing that enterprises struggle with, it’s hiring and retaining experts dedicated to running tech. No matter how many bells and whistles are deployed, the vast majority of features on enterprise search engines, from open-source to the latest bespoke tech go unused. The problem with enterprise search is the enterprise.
A Brief History of Enterprise Search
Enterprise Search technologies help companies find information trapped in disparate systems across their businesses. This technology can be deployed on a company's public-facing website—the website search engine—to help customers find stuff or on an intranet to help functional departments locate information.
Back before the earth cooled, companies were primarily left to build their own search engines on Lucene-based open-source technologies like Solr or Elasticsearch. There were a handful of vendors in this space but the vast majority of those were just service companies repackaging Solr or Elasticsearch with relatively marginal functionality built in.
Sometime in the mid-to-late aughts, a handful of vendors, companies like Lucidworks, Coveo, and Sinequa, emerged. They were doing more than just repackaging open-source. While the core of these vendor’s offerings remained the open-source platforms, they had added functions and services that made the offering appealing to enterprise executives. That is until the 800-pound gorilla stepped in.
Around 2008, Google launched a service called Google Site Search. This complemented their hardware-based solution, Google Search Appliance, which was launched earlier in the decade. With this offering, any company could deploy Google’s algorithms on their websites, scouring their content for all the answers. Google’s search engine became the default for enterprise search for the better part of the next decade until sunset in 2018. The rest of the search industry floundered at the margins, fortunate that Google exited when it did.
A Problem: Small data sets
Search engines like Google are good because they have every piece of data on the Internet available to select answers in the form of links to the pages. It was recently reported that Google had 400 billion documents indexed. When Google wants to present definitive information on a topic you queried, there is no doubt that the answer is out there; they just have to show it to you.
If you’re a very large company, you might have a couple of million documents to fuel your search engine. Smaller companies will be dealing with much smaller data sets. You could have a few thousand or fewer documents on any one topic. Searchers should quickly find the answers to common questions in such a repository.
However, users’ needs are diverse, and language is imprecise. The likelihood that you have the answer to the myriad questions that could be asked is relatively low. This sparse information problem can be made worse because the documents themselves can be of relatively low quality—no standard terminology, written by multiple authors of differing skills in various formats in many databases/apps.
Even the emergence of AI technologies, particularly machine learning, has not done much to solve this problem. Machine learning requires data to learn, and sparse data is its kryptonite. Various strategies can be employed, most notably semi-supervised learning, to overcome the sparse data problem, but these require skills and people, which, again, are in short supply within most companies.
AI may help enterprise search
Even the most out-of-touch enterprise executive has heard about the advances in language models over the past 18 months. One would think this would help solve all the problems. Yes and no.
New vendors in the market have adopted the Large Language Model (LLM) approach for search. Phind, Perplexity, and Google’s Gemini are all search engines wrapped with language models. I don’t count ChatGPT in this list because, unlike the above, ChatGPT rarely cites sources for its output, which is critical for validation and further research.
These new entrants do what Google has done for decades: provide great search results. In addition, they expand on those results with plain-language output.* Of course, they still have billions of documents on the Internet as a source. Do they work in much smaller enterprise use cases?
A relatively new entrant—Glean—may shed some light on how to use AI in enterprise search. Glean, started by some folks from Google, seeks to crack the enterprise search and knowledge management code. At the core of their technology appears to be knowledge graphs and semantic search. On top of that, they’ve layered a language model.
One thing about Glean’s solution that got my attention was its connector ecosystem. A big challenge in knowledge management at the enterprise is the previously mentioned array of databases and applications. Connecting to them can be a challenge.
Does Glean have some magic that automatically makes all those connections, or are the connectors simply a toolkit that overworked IT folks have to make work? They say it “works out of the box”. So did my snowblower. After I assembled it. With a wrench. In the cold and dark.
Since they don’t have a demo on their website, it’s not clear how they’re solving the problem or whether they solve it well. An article by Curtis Conley on LinkedIn** speaks to his success using Glean at Blender. It appears that one of the things that Glean has done well is to provide a good user experience and design in addition to some interesting technology. Curtis is a fan, so maybe there’s a there, there.
Even if technologies are magical, powered by AI in a fantastical new way, have we finally found the solution to enterprise search? Is knowledge management finally achievable?
In short, no.
The real problem with enterprise search
The problem with enterprise search is the enterprise. To fully appreciate the low esteem that enterprise executives hold for this topic, you have to look no further than the terms that Gartner and Forrester have had to use to sell their Enterprise Search reports.
Gartner has rebranded enterprise search to “Insight Engines”. Forrester uses the term “Cognitive Search”. Those certainly sound lofty enough.
“Insight Engines! I need two!”
So why the low esteem for Enterprise Search? It all comes down to how large organizations work. Everything boils down to budgets and goals. If you have any misalignment between the two, something doesn’t get done.
First, let’s address the goals part of the equation. What is the goal of enterprise search? To make stuff more findable? For who? Sales. Marketing. Engineering. Support. Assuming each of these organizations has business goals—lower costs, more output (revenue, leads, code, satisfied customers)—finding the right stuff should help them achieve those goals.
Now, let’s look at the budget (cost) side of the equation. Should the company spend more money to get more of those goals? Sounds good. Okay, now who in the company should pay for this? Should any one of those organizations? Nope, we don’t want a thousand IT flowers to bloom. Let’s have the CIO do it.
The conversation goes like this:
CEO: “Hey, CIO, show me the revenue, leads, etc., that you’re going to generate if I give you $5M to do enterprise search.”
CIO: Hey, SVPs, ante-up your business value so that I can count your business results in my business case.”
SVPs: [Silence]
CFO: [Quitely closes purse strings. Exits conference room]
This misalignment of goals and budgets is at the heart of the problem. But let’s assume that in a rare moment of solidarity, the executive leadership team can assemble the investment rationale and implement the new tech. All is good, right?
Maybe.
Those first 12-18 months are heady. New tech. Great search results. High-fives all around. I can see this documented in Conley’s article, especially those user quotes.
After those 18 months, once the implementation budget is spent, the tech's maintenance falls to the CIO and his search manager. But as Conley points out, there is rarely a search manager. That’s also been my experience. At best, it’s a slice of someone’s day job. Business ownership is organizationally diffuse. Configurations and updates age poorly. Fewer glowing quotes come in.
This is the real problem with Enterprise Search. Nobody owns the business value. Somebody owns the budget. Therefore, the budget gets rapidly smaller.
How can we solve this misalignment so enterprise search can get its due? I wish this were easy. It requires leadership and a willingness to explore new ways of doing business. The CEO and CFO need to lead.
Because IT capabilities like enterprise search are utilities, the pricing of those utilities has to be adjusted when new capabilities are brought on board—not just for that initial investment but for the next long-term planning horizon—say, five years. If the business case is made for the initial investment, it must include operating costs in the out years. Sadly, we forget about this when next year’s budget is set. It’s always set lower. Annual budgeting is the killer.
This simple and difficult change to how we budget is the true limiting factor to better enterprise search. It will be the thing that unlocks value not just for the project but for users and business functions in the future. Without this change, all those glowing user quotes will eventually diminish until you have the same crappy experience you started with.
* They’re as likely to hallucinate as ChatGPT. A recent search for a quote on Perplexity attributed the quote to someone who never said it in an article devoid of said quote. How do I know this? I always check my sources, especially fallible ones like LLMs.
** Also, thanks to Curtis for introducing me to First Order Retrievability.
“This work required a fine eye for detail—and tons of tools. By the time I moved to MythBusters in 2003, I had well over 300 items in my model-making kit. Of course, I love tools. I also love arranging them, to the point where I came up with a name for my organizing metric: first-order retrievability…
…The finished boxes housed everything I needed, but I repeatedly rebuilt the insides until finally no tool had to be moved out of the way to get to another. That's first-order retrievability... [allowing] me to be as fast, creative, and efficient as I wanted to be.”