3 Architecture

This chapter covers

High-level architecture and Elasticsearch’s building blocks
Search and indexing mechanics
How an inverted index works
Relevancy and similarity algorithms
Routing algorithm

In the last chapter, we played with some fundamental ElasticSearch features; we indexed some documents, executed search queries, walked through analytical functions, and more. We briefly played with the server without knowing much about its internals. The good news is that we don’t need to break a sweat to get started with Elasticsearch.

Elasticsearch, like any other search engine, requires deep dives in order to become a master of the technology. That said, the product is designed to work out of the box with intuitive APIs and tools, and you can make use of the software without much prerequisite knowledge. Before we get carried away with the easy-to-use, hands-on aspects of Elasticsearch, it would benefit us in the long run if we gain an understanding of the high-level architecture, the inner workings of the server, and the dichotomy of its moving parts.

3.1 A 10,000 foot overview

3.1.1 Data in

3.1.2 Processing the data

3.1.3 Data out

3.2 The building blocks

3.2.1 Document

3.2.2 Removal of types

3.2.3 Index

3.2.4 Data streams

3.2.5 Shards and replicas

3.2.6 Nodes and clusters

3.3 Inverted indexes

3.3.1 Example

3.4 Relevancy

3.4.1 Relevancy algorithms

3.4.2 Relevancy (similarity) algorithms

3.5 Routing Algorithm

3.6 Scaling

3.6.1 Scaling up (vertical scaling)

3.6.2 Scaling out (horizontal scaling)

3.7 Summary