Enterprise Search + Discovery

Enterprise search is the practice of making content from multiple enterprise-type sources, such as databases and intranets, searchable to a defined audience.

Enterprise search systems index data and documents from a variety of sources such as: file systems, intranets, document management systems, e-mail, and databases.

Many enterprise search systems integrate structured and unstructured data in their collections. Enterprise search systems also use access controls to enforce a security policy on their users.

Components of an Enterprise Search System

In an enterprise search system, content goes through various phases from source repository to search results:

Content Awareness

Content awareness (or “content collection”) is usually either a push or pull model.

In the push model, a source system is integrated with the search engine in such a way that it connects to it and pushes new content directly to its APIs. This model is used when real-time indexing is important.

In the pull model, the software gathers content from sources using a connector such as a web crawler or a database connector. The connector typically polls the source with certain intervals to look for new, updated or deleted content.

Content Processing and Analysis

Content from different sources may have many different formats or document types, such as XML, HTML, Office document formats or plain text. The content processing phase processes the incoming documents to plain text using document filters.

It is also often necessary to normalize content in various ways to improve recall or precision. These may include stemming, lemmatization, synonym expansion, entity extraction, part of speech tagging.

As part of processing and analysis, tokenization is applied to split the content into tokens which is the basic matching unit. It is also common to normalize tokens to lower case to provide case-insensitive search, as well as to normalize accents to provide better recall.

Indexing

The resulting text is stored in an index, which is optimized for quick lookups without storing the full text of the document. The index may contain the dictionary of all unique words in the corpus as well as information about ranking and term frequency.

Query Processing

Using a web page, the user issues a query to the system. The query consists of any terms the user enters as well as navigational actions such as faceting and paging information.

Matching

The processed query is then compared to the stored index, and the search system returns results (or “hits”) referencing source documents that match. Some systems are able to present the document as it was indexed.

Differences from Web Search

Beyond the difference in the kinds of materials being indexed, enterprise search systems also typically include functionality that is not associated with the mainstream web search engines. These include:

Adapters to index content from a variety of repositories, such as databases and content management systems.

Federated search, which consists of:

Transforming a query and broadcasting it to a group of disparate databases or external content sources with the appropriate syntax,
Merging the results collected from the databases,
Presenting them in a succinct and unified format with minimal duplication.
Providing a means, performed either automatically or by the portal user, to sort the merged result set.

Enterprise bookmarking, collaborative tagging systems for capturing knowledge about structured and semi-structured enterprise data.

Entity extraction that seeks to locate and classify elements in text into predefined categories such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc.

Faceted search, a technique for accessing a collection of information represented using a faceted classification, allowing users to explore by filtering available information.

Access control, usually in the form of an Access control list (ACL), is often required to restrict access to documents based on individual user identities.

There are many types of access control mechanisms for different content sources making this a complex task to address comprehensively in an enterprise search environment.

Text clustering, which groups the top several hundred search results into topics that are computed on the fly from the search-results descriptions, typically titles, excerpts (snippets), and meta-data.

This technique lets users navigate the content by topic rather than by the meta-data that is used in faceting. Clustering compensates for the problem of incompatible meta-data across multiple enterprise repositories, which hinders the usefulness of faceting.

Create a Single Source of Truth

Boost your team’s productivity by unifying all your content platforms — ECM, Google Drive, Salesforce, etc. — into a personalized search experience.

With its ease of deployment, pre-tuned relevance, and intuitive interface, iig has the solution for your organizational search needs.

How to Get Started with Enterprise Search

With its ease of deployment, pre-tuned relevance, intuitive interface, and hassle-free administration, iig delivers a powerful yet simple way to apply the relevance, scale, and speed of search to your organizational search needs.

When you create a single source of truth, people can spend less time finding and more time doing.

Learn how to spin up the all-new Enterprise Search on your own hardware at no charge during the beta period and start instantly indexing content across a variety of common cloud applications (Salesforce, Dropbox, Drive, GitHub, or connect to any source with the Custom Source API).

Take the next step:

Get a rundown of downloading, prerequisites, and beta details
Review common cloud and enterprise data sources
Understand different search capabilities: weights, autocomplete, facets, and more
Monitor and customize the search experience
Learn how to get involved in the Enterprise Search community