concept Prometheus in category kubernetes

This is an excerpt from Manning's book Learn Kubernetes in a Month of Lunches MEAP V07.
14.1 How Prometheus monitors Kubernetes workloads
Metrics in Prometheus are completely generic - each component you want to monitor has an HTTP endpoint which returns all the values which are important to that component. A web server includes metrics for the number of requests it serves and a Kubernetes node includes metrics for how much memory is available. Prometheus doesn’t care what's in the metrics, it just stores everything the component returns; what's important to Prometheus is list of targets it needs to collect from. Figure 14.1 shows how that works in Kubernetes, using Prometheus' built-in service discovery.
Figure 14.1 Prometheus uses a pull model to collect metrics, automatically finding targets
![]()
The focus on this chapter is getting Prometheus working nicely with Kubernetes, to give you a dynamic monitoring system that keeps working as your cluster expands with more nodes running more applications. I won't go into much detail on how you add monitoring to your applications or what metrics you should record - Appendix 2 of this book is the chapter Adding observability with containerized monitoring from Learn Docker in a Month of Lunches, which will give you that additional detail.
We'll start by getting Prometheus up and running. The Prometheus server is a single component which takes care of service discovery and metrics collection and storage, and it has a basic web UI which you can use to check the status of the system and run simple queries.
# switch to this chapter's folder: cd ch14 # create the Prometheus Deployment and ConfigMap: kubectl apply -f prometheus/ # wait for Prometheus to start: kubectl wait --for=condition=ContainersReady pod -l app=prometheus -n kiamol-ch14-monitoring # get the URL for the web UI: kubectl get svc prometheus -o jsonpath='http://{.status.loadBalancer.ingress[0].*}:9090' -n kiamol-ch14-monitoring # browse to the UI and look at the /targets pagePrometheus calls metric collection scraping. When you browse to the Prometheus UI you'll see there are no scrape targets, although there is a category called
test-pods
which lists zero targets - figure 14.2 shows my output. Thetest-pods
name comes from the Prometheus configuration you deployed - in a ConfigMap which the Pod reads from.Figure 14.2 No targets yet, but Prometheus will keep checking the Kubernetes API for new Pods
![]()
Configuring Prometheus to find targets in Kubernetes is fairly straightforward, although the terminology is confusing at first. Prometheus uses jobs to define a related set of targets to scrape, which could be multiple components of an application. The scrape configuration can be as simple as a static list of domain names which Prometheus polls to grab the metrics, or it can use dynamic service discovery. Listing 14.1 shows the beginning of the
test-pods
job configuration, which uses the Kubernetes API for service discovery.Listing 14.1 - prometheus-config.yaml, scrape configuration with Kubernetes
scrape_configs: # this is YAML inside the ConfigMap - job_name: 'test-pods' # one job which is used for test apps kubernetes_sd_configs: # find targets from the Kubernetes API - role: pod # search for Pods relabel_configs: # and apply these filtering rules - source_labels: - __meta_kubernetes_namespace action: keep # only include Pods where the namespace regex: kiamol-ch14-test # is the test namespace for this chapterIt's the
relabel_configs
section which needs explanation. Prometheus stores metrics with labels, which are key-value pairs that identify the source system and other relevant information. You'll use labels in queries to select or aggregate metrics, and you can also use them to filter or modify metrics before they get stored in Prometheus. This is relabelling and conceptually it’s similar to the data pipeline in Fluent Bit - it's your chance to discard data you don't want and reshape the data you do want.Regular expressions rear their unnecessarily complicated heads in Prometheus too, but it's rare that you need to make changes - the pipeline you set up in the relabelling phase should be generic enough to work for all your apps. The full pipeline in the configuration file applies these rules:
My output is in figure 14.3, where I've opened two browser windows so you can see what happened when the app was deployed. Prometheus saw the timecheck Pod being created, and it matched all the rules in the relabel stage, so it got added as a target. The Prometheus configuration is set to scrape targets every 30 seconds. The timecheck app has a
/metrics
endpoint which returns a count for how many timecheck logs it has written. When I queried that metric in Prometheus, the app had written 22 log entries.Figure 14.3 Deploying an app to the test namespace - Prometheus finds it and starts collecting metrics
![]()
There are two very important things to realize here: the application itself needs to provide the metrics as Prometheus is just a collector; and those metrics represent the activity for one instance of the application. The timecheck app isn't a web application, it's just a background process so there's no Service directing traffic to it. Prometheus gets the Pod IP address when it queries the Kubernetes API and it makes the HTTP request directly to the Pod. You can configure Prometheus to query Services too, but then you'd get a target which is a load-balancer across multiple Pods and you want Prometheus to scrape each Pod independently.
You'll use the metrics in Prometheus to power dashboards showing the overall health of your apps, and you may aggregate across all the Pods to get the headline values - but you need to be able to drill down too, to see if there are differences between the Pods. That will help you identify if some instances are performing badly, and that will feed back into your health checks. We can scale up the timecheck app to see the importance of collecting at the individual Pod level.