Once the Elasticsearch cluster is ready for production, umpteen things can go wrong, from users complaining about slow searches to unstable nodes, network problems, over-sharding troubles, memory problems, and more. Maintaining the cluster’s health in a GREEN (healthy) state is paramount. Constantly keeping an eye on the cluster’s health and performance is one of the primary jobs of an administrator.
Troubleshooting an unstable cluster requires a good understanding of the inner workings of Elasticsearch, networking concepts, node communication, memory settings, and many APIs for nodes, clusters, cluster allocation, and other uses. Similarly, tweaking configurations to understand the document models, appropriate refresh times, and so forth helps tune the cluster for greater performance.
In this chapter, we look at common problems such as slow query and ingestion speeds to understand the reasons behind them. Because Elasticsearch is a complex distributed architecture, there are several places to look for fixes. We discuss the most obvious and commonly applied solutions in this chapter.