Before getting into the details of Fluentd, we should first focus on the motivations for using a tool such as Fluentd. How can logging help us? What are log analytics, and why is log unification necessary? These are among the questions we will work to answer in this chapter. We’ll highlight the kinds of activities logging can help or enable us to achieve.
Let’s also take a step back and understand some contemporary thinking around how systems are measured and monitored; understanding these ideas will mean we can use our tools more effectively. After all, a tool is only as good as the user creating the configuration or generating log events to be used.
As we do this, it is worth exploring how Fluentd has evolved and understanding why it holds its position within the industry. If you are considering Fluentd as a possible tool or looking to make a case for its adoption, then it is helpful to understand its “origin story,” as this will inform how Fluentd may be perceived.
highlight, annotate, and bookmark
You can automatically highlight by performing the text selection while keeping the alt/ key pressed.

Given that you’re looking at this book, we presume you have at least heard of Fluentd and probably have a vague sense of what it is. Let’s start with the “elevator pitch” as to what Fluentd and Fluent Bit are.
The primary purpose of Fluentd and its sibling Fluent Bit is to capture log events from a diverse range of possible sources (infrastructure such as network switches, OS, custom applications, and prebuilt applications, including Platform as a Service and Software as a Service). It then gets those events to an appropriate tool where the log events can be processed to extract meaning and insight, and possibly trigger actions. Fluentd’s primary job is not to perform detailed log analytics itself, although it can derive meaning, and deeper analysis could be incorporated into its configuration if needed.
By unifying the log events from all the sources of logs impacting the operation of our solution, we have the opportunity to see the big picture. For example, was the error in the database the cause of an error returned to a user by the application, or was the database error a symptom of the operating system not being able to write to storage?
We’ve described Fluentd in terms of log events, so what qualifies as a log event? A log event is best described as the following:
- Log events are humanly readable information that is primarily textual in nature. The textual information can range from unstructured to highly structured.
- Each log event has a place in time, defined with a timestamp (usually absolute 01:00:00 1 Jan 1970, but could be relative +0.60), or time can be inferred by the log event’s position in a series of events.
- Each event also has an explicit or implicit association to a location that can be associated with a component running in a location that may be physical or logical.
Let’s illustrate the point. Anyone with some coding experience will probably recognize the screenshot shown in figure 1.1 as an extract of log output. In this case, the output is generated by Fluentd. As you can see, there is a timestamp for the event; a location, which comes from the host the events are occurring on; and some additional semistructured content.
Those who have worked with middleware (e.g., Apache Camel, MuleSoft, Oracle SOA Suite) will appreciate the idea of describing Fluentd as an enterprise service bus specialized in logs. Figure 1.2 suggests this, with the concept of input and output and capabilities to route and transform the log events. This will become ever more apparent as the book progresses.
Figure 1.2 Illustration showing different types of Fluentd plugins and their relationship to the core

NOTE
If you’d like to explore this analogy further, you might consider reading the liveBook version of Open-Source ESBs in Action by Tijs Rademakers and Jos Dirksen (Manning, 2008) at http://mng.bz/Nx6n.
Definition
Definition
discuss

We create log entries for a wide range of reasons. Some of the use cases for logs are only needed a fraction of the time but are invaluable when needed. Nearly every use case we can think of will fall into one of the following categories:
- Debugging—Knowing which parts of the code are being executed in a scenario makes it easy to isolate a bug. Yes, we have debuggers, and so on, but often it’s just as easy to drop a few log lines in to help. Some of these log messages will be left in to provide assurance that things are running fine during production. Other lines of log messages may be disabled while we’re not developing and testing software. Note that we would never recommend trying to connect to a production environment with a debugger. Allowing a production system to log information intended for debugging should be done with an understanding of the possible consequences (later in the book, we’ll explore why this is so).
- Unexpected data values or abnormal conditions occurring—When code encounters data values that are out of bounds, sometimes it is better to flag and keep going, as you would see when
- Using the default condition in a switch statement, when the code should have a value you have allowed for in the switch. But as a result of a change or bug elsewhere, your code needs to gracefully handle the situation and make it known (e.g., the classic problem of a presentation layer [UI] differing from the backend supported data values):
switch (caseSwitch) { case 1: // do something expected break; case 2: // do something expected break; default: System.Diagnostics.Debug.Write("Unexpected " + caseSwitch); // unexpected path – log this as it may be indicative // of a bug break; }
-
- Applying defensive coding. For example, before using an object variable, checking that it isn’t null—a standard action when first loading configuration data to ensure everything is as expected.
- Reporting when the code handling connection issues experiences an error, and you’re going to fall back and try again. This is so we can understand the cause of a slow response that impacts user experience from the logs.
- Audit and security—We live in a world where internal and external actors try to get hold of data for illegitimate use. To help us watch for misuse, we need to know what is going on. Events need to be recorded, if not reported. Sometimes this is to search for abnormal behavior patterns, and other times to show that the system did everything as it should. We often see this kind of use case referred to as forensic logging or application security monitoring and security information and event management (SIEM). Bringing log events together that can create an audit trail is important. A single out-of-norm event may be insignificant. But when you can see the same kind of event reoccurring regularly in an unusual manner, over time it may point to something more suspicious.
- Root cause analysis—Sometimes we see a problem, but the cause isn’t apparent. Often this is because we are looking only at the logs from a small set of components. For example, an application based on its logs appears to slow down over time, but there is no evidence of a memory leak. Only when we bring logs together from all the sources can we identify a cause and separate other problems as side effects. For example, our application could be fine. Still, we use another service on the same server, which never releases CPU threads properly, resulting in the server slowly running out of resources to run all applications. But this can’t be seen until all the information is presented together.
- Determining the cause of performance issues—Tools such as Prometheus (https://prometheus.io/) and Grafana (https://grafana.com/) are well known for gathering metric data to provide insight into the performance of software being run. While the data may show you what is happening, it doesn’t necessarily tell you why. It is textual logs that describe what is happening—whether that is database query logs or application thread traces.
- Anomaly detection—While a system may appear to operate perfectly fine and yields the expected results when a solution is tested, anomalies occur in the results during the system’s regular operation. Logging can facilitate the detection of such issues by helping to find correlations in the log events when anomalies arise, providing an indicator of the cause. An example of this was the occurrence of the Intel Pentium FDIV bug in the 1990s, where an error in the design of specific Pentium processors meant that while the software ran perfectly, some calculations in specific conditions produced an incorrect result. If we log events such as the outcomes of important calculations even when the software is running as expected, it becomes easier to spot any possible anomalies and examine activities to identify the origin of the anomaly (for more detail, see https://en.wikipedia.org/wiki/Pentium_FDIV_bug). Another example of an anomaly that can be seen is running our apps in production environments where we share resources with other processes. Our test environments show that everything is fine, but in production, we experience out-of-memory errors. These scenarios can result from test conditions being subtly different than production, where we may have been able to use more memory than is available in production conditions. Seeing what else is running and the details around the errors can help diagnose resource conflict issues. Not as high profile as a chip flaw, but still an issue that can be challenging to isolate.
- Operational effectiveness and troubleshooting—Mature, well-produced log events can include the use of error codes. An error code can be linked to a particular problem and guidance on how to resolve the issue.
- Determine when to trigger subsequent actions—Use log events to recognize specific needs and initiate processes automatically instead of requiring manual intervention. This can be particularly helpful for legacy states where the software and hardware environments are fragile and poorly understood but operationally critical; people become risk-averse to change (or may not even be able to implement change for off-the-shelf solutions). Therefore, to implement tasks like preventive measures for errors, we need to implement solutions outside the application being monitored. This could be simply watching for completion messages reporting success, at which point the next operation or error prevention can be started.
settings

Ideas around log management and the application of logging have been evolving a fair bit over the last four or five years; this is partly driven by the rapid progression of containerization. Docker and Kubernetes and the effective growth in individual small services (microservices/macroservices/mini-services) to support dynamic and hyper-scaling mean environments and deployed applications are far more transient in nature. Other factors such as broader adoption to varying degrees of DevOps have also evolved. The net result is that a couple of concepts have developed that are worth noting.
Observability was probably the first of the modern monitoring concepts to develop. Discussions around observability started to gain mainstream recognition around 2016 and showed up in what have become referential texts, such as Google’s site reliability engineering (SRE) guide (available at https://landing.google.com/sre/sre-book/toc/). The idea isn’t new; it’s just been well defined.
Observability essentially states that we should track or observe and measure what software is doing to manage and understand a system. Industry thinking has evolved this premise to the tracking of four specific signals, often referred to as the four golden signals of SRE: latency, errors, traffic, and saturation. These four signals are sometimes referred to as metrics, measures, or indicators (the language is used interchangeably; personally, the term signal feels very binary, and life is rarely that). Here is what the signals mean:
- Latency—How long it’s taking to address a request. A growing latency indicates potential performance issues from the increasing demand of need, or lack of performance tuning, for software or configuration.
- Errors—Problems that can impact the service and the frequency, and whether they are self-recovered (e.g., not getting a DB connection means fall back and try again). Fluentd will come into its own handling errors, as we will see as we progress through the book.
- Traffic—Increased traffic can indicate growing demand or malicious intent, depending on the gain or loss of effectiveness if traffic drops.
- Saturation—Reflects how full or heavily used a system is (e.g., CPU and disk utilization). Once a system passes a certain saturation threshold, performance degradation will be experienced as the operating system has to dedicate more effort to manage its limited resources.
While deriving all four signals from logs alone is not desirable (e.g., service degradation would require us to hold multiple performance measures over time and compare them), halfway-decent logging can yield the signals given the use of timestamping. Latency could be derived by the time difference between the first and last log events occurring; for example, throughput could be indicated through volumes of log entries.
Another perspective of observability that has become popular in the industry relates to the character of the things we monitor. The type of information gathered when monitoring can be described by one of several definitions. As a result, observability is made up of three pillars, or core ideas:
- Metrics—Typically numerical and quantify the state of things. We then regularly sample the data points in the environment (e.g., CPU utilization).
- Logs—Primarily textual but event-based, therefore having characteristics of time and description (e.g., Simple Network Management Protocol [SNMP] traps).
- Traces—Tracking execution flows and the time it takes for transactions and subtransactions to execute different steps. Trace logs are largely numerical, being made up of timestamps as code executions enter and leave different parts of the solution. To provide these times with context, identifiers, such as transaction ID and the entry and exit points, are identified.
Everyone will be familiar with metrics, as we have all at some point needed to see how hard a CPU is working or have experienced constraints because of a lack of memory or how much storage is available on our hard disks.
Tracing is probably most strongly associated with the OpenTracing initiative (https://opentracing.io/) and the Cloud Native Computing Foundation (CNCF) project Jaeger (https://jaegertracing.io/). OpenTracing has combined with a project called OpenCensus (https://opencensus.io/) to form OpenTelemetry (https://opentelemetry.io/). Yet logging may contribute to this space, as specific log entries may act as a measuring point within a trace—particularly within legacy solutions. There is the risk that people will merge thinking about tracing with logging. It is often desirable to correlate trace performance information back to logs, so logs can be used as a key diagnostic tool in determining where the low performance occurs. However, the tooling available to each pillar has distinct differences and strengths. We can see this by considering Jaeger’s visualization of execution paths (traces) versus Fluentd’s ability to parse log events and trigger actions. While these CNCF projects have brought tracing to the fore, the idea isn’t new, and many service bus solutions (such as Oracle SOA Suite and MuleSoft) have some sort of mechanism for tracing. The difference is that OpenTracing and OpenTelemetry are trying to drive standardization.
We are seeing signs that these standards are being adopted by open source implementation frameworks and commercial solutions. How does this relate to Fluentd? Depending upon the log output, it can represent a means to trace execution (e.g., record a transaction, an identifier, an execution point in the codebase, and a time). In other words, a trace is a specialized log. This relationship and the deployment models being supported make Fluentd and Fluent Bit capable of being part of an OpenTelemetry solution. As a result, the OpenTelemetry Protocol (OTLP) is being incorporated into Fluentd. All these measures play a part at different levels of a solution (infrastructure to business logic), as figure 1.3 illustrates.
- Business application monitoring—This presents pure abstracted business application monitoring or business activity monitoring (BAM) and relates to the measurement of application/business tasks described by things like Business Process Execution Language (BPEL).
- Application monitoring—This reflects traditional monitoring of applications and middleware/workflow technologies such as Oracle’s SOA Suite or Microsoft’s BizTalk underpinning BPEL implementations.
- Virtual machine/container monitoring—This measures whether the engine that shares host computing services gives appropriate levels of resources to the guest environment(s). It monitors to ensure that the virtualized hardware is running smoothly.
- Host/infrastructure monitoring—This detects hardware problems, such as storage capacity, overheating CPUs, fan failures, and so on.
NOTE
More information about BAM can be found in the liveBook version of Activiti in Action by Tijs Rademakers (Manning, 2012) at http://mng.bz/DxgR.
Of these two concepts, I believe the four signals are better considered as measures. By measuring the data that each signal describes, the signal will indicate whether something is right or wrong. More importantly, do the changes in the signals being received show a trend or pattern that at least means that the solution being monitored is not degrading anymore? Ideally, we want a trend indicating continued improvement. Regardless, this information will not give you information on the root problem. For example, signals showing a highly saturated system won’t tell you why the system is saturated, which can occur if code is stuck in an infinite loop. For this, you still need to understand what the software is doing. This is not to say signals are wrong; they are, without a doubt, the best way to provide a cue that there’s an issue. But it is through the lens of the three pillars, I believe, that a deeper appreciation of what is or isn’t happening can be achieved with the sight of cause and effect in the way software is behaving.
You may have observed that, in the reasons for logging (for debugging, audit, etc.), various activities will be handled by more than one or two individuals in an organization. Once an organization grows beyond a certain size, we have specialists working in different areas. The specialization of roles brings pressure for different tooling. While many monitoring tools have plugin features, and so on, they may not support every individual need. This can mean we end up with multiple tools in an Enterprise IT landscape, and in some organizations, people and organization politics will further complicate the IT tooling landscape. Yet, they all need a blend of data from the same source systems.
highlight, annotate, and bookmark
You can automatically highlight by performing the text selection while keeping the alt/ key pressed.

Fluentd, Logstash, and other related tools are sometimes referred to as log unification tools. But what is meant by this, and what value(s) should a unification tool have? Let’s look more closely at the value of unification and differentiate it from some other associated ideas.
The Cambridge English Dictionary describes unification as “the act or process of bringing together or combining things or people” (http://mng.bz/lax2). This is what we use Fluentd for—collecting log events from diverse sources and bringing them together with a single tool so the log events can be processed and sent to the appropriate endpoint solutions(s).
This ability is essential, as it provides many significant benefits; we have touched on some of these when looking at the application of logs. As we bring these value points together, we can roughly group them into log sourcing and log-based insights.
- It eases the task of locating and retrieving logs and log events. Through a single platform, locating relevant log events becomes far easier. We can route the log events to a convenient location/tool, rather than needing to access multiple platforms with potentially many different locations and ways of accessing the log events.
- With virtualization, containerization, and more recently functions as a service, the hosting of logic becomes transient, so the means to easily gather log information before it is lost is more critical than ever. Using Fluentd, we can configure lightweight processes into these transient environments that push log events to a durable location.
- A single technology brings logs events together regardless of the source or target. As a result, log event management becomes easier and more accessible. We don’t have to master how all the different ways to log events can be captured and stored (e.g., Syslog, SNMP, Log4J, and the many other log forms and protocols), as Fluentd makes this easier.
- Operating systems are complex, made up of many discrete processes and applications. Often, discrete components come with their own logs. We need to bring these together to trace an event through the different components. Some of this has been solved with operating systems and network equipment adopting a small group of standards like Syslog and SNMP traps. It would be easy to think that Syslog and SNMP can meet all our logging needs. But software is more than a bunch of OS components that can use SNMP or Syslog, so we need to bring these sources together at another level of unification. For example, Syslog is predominantly a Linux solution; its use of UDP means there is a risk of event loss, and UDP has size limits. The data structures and predefined values are infrastructure-centric, to name a few of the Syslog constraints.
- In the era of the network and the internet, our applications pass events through many different managed devices, creating a real change in the number of places where our communications could be disrupted. Unifying the log events at this scale of distribution brings the problem to manageable proportions.
- It is easier to create holistic view(s) of log events, allowing us to see the cause and effect more easily.
- With logs unified into an analytics platform, the data can be capitalized on with processes such as
- A unification platform creates the opportunity for us to move from a reactive, post-event analysis approach to identifying issues and then proactively acting on them as they occur. This potentially can extend to a position where we identify warning signs and proactively perform actions to avoid a problem. The ability to become proactive comes from the unification tool’s ability to filter, route, and apply meaning to log events.
- Infrastructure as a Service and Platform as a Service have brought whole new levels of dynamic change and routing complexity. As a result, the unifying of logs reduces the scale of the challenge of tracking what could be impacting our solution.
While we have discussed the why and what of log unification, we should also differentiate it from other concepts associated with processing log events, particularly log analytics.
Note
For more information about SNMP, see the liveBook version of Software Telemetry by Jamie Riedesel (Manning, 2021) at https://livebook.manning.com/book/software-telemetry/chapter-2/155.
Many tools in the logging space come into the category of log analytics, where the focus is on applying data-analysis techniques such as pattern searching, using complex rules across many data records. Such processing is often associated with big data and search engine technologies. The best known of these is probably Splunk, as a purely commercial product, and Elasticsearch, as an open source solution with commercial options.
The log events need to be ingested into an analytics engine to enable log analysis to be performed. Such analytical processes may include event correlation (e.g., determining which systems or components generate the most errors or when the fault frequency relates to a particular event during the day). Getting log events into the engine can be done manually if necessary. Typically, analytics products like Splunk have tools to harvest or aggregate the log events using one of the more common protocols in the analytics engine. These services are then deployed to multiple locations to gather different log sources. This is a simple act of aggregation, as the harvesting is not intelligent; there is no possibility of handling the log events effectively until they are in the analytics engine. Harvesters typically don’t have the same levels of connectivity and configuration seen with unification tools.
The differentiator is that a log analytics engine’s strength is applying search and computational science to many logs, not the gathering and routing of log events. Whereas the strength of unification tools is sourcing and delivering the log events, it typically has relatively simplistic analytical capabilities such as event counts over time.
Both technologies have some standard capabilities, regarding the transformation/application of meaning to the data (i.e., the process of data becoming usable information). Without these abilities, neither solution can be very effective. Both technologies have strong event-filtering capabilities, but are applied in different ways.
definition
Definition
discuss

The industry has been talking about software stacks since 2000 (some have attributed this term to David Axmark and Michael “Monty” Widenius, cofounders of MySQL), when the best-known stack was named: the LAMP (Linux, Apache, MySQL, PHP) stack. By software stack, we mean a standard combination of products (typically open source) used together to deliver software solutions. Another well-known stack is MEAN (MongoDB, Express, AngularJS, Node.js). A complete list of stacks can be found at https://en.wikipedia.org/wiki/Solution_stack.
The best-known stack within the software landscape for log processing is ELK (Elasticsearch, Logstash, Kibana). This combination of products provides the ability to perform log analytics with Elasticsearch, visualization through Kibana, and log routing and aggregation with Logstash. The ELK stack has fitted together so well because all three components, while open source, have been developed by Elastic (www.elastic.co), which has been successful, like Red Hat, with an open source–based business model.
While a single vendor for these components leads to them being neatly integrated and complementing each other’s features, it also means that development effort can be heavily influenced by the vendor’s business model and objectives. For Elastic, this is to sell more services and enterprise extensions to the different parts of the ELK stack. This issue can be addressed by the open source product being governed by an external and neutral organization such as Apache, CNCF, or the Linux Foundation. But ELK is not subject to such governance.
Unfortunately, Logstash, as part of this stack, has been impacted by the perception that it is biased to Elasticsearch as a target solution for log events (which may or may not be valid). Logstash does have plugins for products other than Elasticsearch. However, it could be argued that these plugins have had to come from vendors wanting to compete with Elasticsearch in the ELK stack, or Elastic has had to implement them to remain competitive. In comparison to Elastic, the founders of Fluentd didn’t have their own analytics product as a preferred location for log events to be sent. We could also consider the adoption of Fluentd by CNCF as an implicit recognition of being free from these biases. It also helps that the community around Fluentd has produced more plugins, making it more flexible than Logstash.
This has led to a variant stack known as EFK that is gaining traction (Elasticsearch, Fluentd, Kibana). As Fluentd has plugins for Elasticsearch and Kibana, this alternate stack is viewed as equally capable but with greater flexibility for unification. OpenShift, for example, adopted EFK to manage log events (see http://mng.bz/YwDj).
As shown in figure 1.4, both ELK and EFK have lightweight, smaller variants of the unification capability. Beat’s relationship to Logstash is the same as Fluent Bit’s relationship to Fluentd (more on Beats and Fluent Bit later in this chapter).
Figure 1.4 ELK vs. EFK software stacks, illustrating how the stacks differ and which products are involved in each stack

In table 1.1, we have tried to draw out the differentiators of the two products. Both have a lot in common, which is why it is possible to replace Logstash with Fluentd in the stack. However, there are differences worth highlighting.
Table 1.1 Fluentd and Logstash comparison (view table figure)
Yes (more robust option, as support can cover the full stack) |
||
Highly configurable cache options with file and memory caching out the box |
||
Fluentd has a small C-based kernel, but the bulk of the product is built using Ruby. This brings a bit of a tradeoff. The core tradeoff with Ruby is that it runs on an interpreter (although several variants utilize the Java Virtual Machine, Truffle, and so on, instead of the original interpreter, such as JRuby, used by Logstash). Ruby uses a packaging tool known as Gems to provide additional libraries and even applications. To enable Fluentd to be used in Internet of Things (IoT) situations, a smaller resource footprint is needed for devices like a smart meter or Raspberry Pi. The objective of creating a minimal footprint version of Fluentd led to the creation of Fluent Bit. Fluent Bit provides a subset of the Fluentd features, focusing on taking log events and routing them to a more centralized location. The log events can then be processed (filtered, transformed, enriched, etc.) more effectively—as you would expect of Fluentd. Table 1.2 the differences between Fluentd and Fluent Bit.
Table 1.2 Fluentd vs. Fluent Bit (view table figure)
Despite these differences, Fluent Bit and Fluentd are more than capable of working together, as we’ll see later in the book. IoT isn’t the only use case that lends itself well to the use of Fluent Bit. When considering microservices, small footprints and rapid startup times are highly desirable for some containers. We’ll explore the deployment possibilities later in the book for microservices and the use of Fluentd or Fluent Bit.
The relationship between Beats and Logstash does differ a bit from that between Fluentd and Fluent Bit. For a start, the Beats are actually a set of individual small footprint components collecting data for one thing. Each individual Beat solution is built upon a Go library called libbeat, compared with Logstash’s use of Java. The Beats family are made up of the following:
- Filebeat—Collects log files (with specific modules to handle Apache, server logs, etc.)
- Packetbeat—Collects network packet data (DNS, HTTP, ICMP, etc.)
- Metricbeat—Collects server metrics
- Heartbeat—Provides an uptime monitor
- Auditbeat—Collects audit events to monitor activities through systemd (http://mng.bz/6Z9o) and Auditd (http://mng.bz/oa5d) on Linux
- Winlogbeat—Integrates into Windows OS to run PowerShell scripts and Sysmon, among others
- Functionbeat—Works with serverless solutions, currently just on AWS (Amazon Web Services)
The libbeat library has been made available as open source. It has made it a lot easier (and given the assurance of code independence) for third parties, including the open source community, to build more Beat solutions using the framework. All the beats use a shared data structure definition to communicate the data collected.
settings

Take our tour and find out more about liveBook's features:
- Search - full text search of all our books
- Discussions - ask questions and interact with other readers in the discussion forum.
- Highlight, annotate, or bookmark.
With infrastructure becoming increasingly configuration-driven rather than being physical boxes and cables, the points where data can have ingress and egress to an environment can increase quickly, as it is simply a case of configuring new points where data can come and go. It is preferable that the number of points at which data passes between public and private networks be limited—this is just one of many reasons for having backend (or reverse) proxies. With logging agents in the pure aggregation model, each node wants to talk directly to the point of aggregation. This can be mitigated if the solution can tolerate network proxies. But would it not be better to use a proxy that better understands what is being routed, such as Fluentd?
Definition
Proxies are servers that retrieve resources on behalf of a client from one or more servers. The retrieved resources are then returned to the requestor, appearing as originating from the proxy server itself. Proxies are described as a backend or reverse if deployed closer to the server performing the computation rather than the (usually lightweight) client. Proxies are usually implemented to optimize network load by implementing traffic caching and applying security by controlling where data enters and leaves a network.
The log routing capabilities of Fluentd, as we’ll see, allow us to use Fluentd nodes as routers/consolidators of logs, meaning we can control network exposure, as well as several other considerations.
Security considerations within Fluentd go beyond configuring routing to control network points for ingress and egress of logs in networks. Fluentd supports the use of SSL/TLS certificates, so that the data being sent between Fluentd nodes or between Fluentd and other networked services (e.g. MongoDB) is secure. This increases security by making checks for authenticity and the ability to encrypt the data. Today, security needs to be an aspect of everything we do, rather than a bolt-on; we’ll address such issues directly where appropriate throughout the book.
highlight, annotate, and bookmark
You can automatically highlight by performing the text selection while keeping the alt/ key pressed.

Another perspective worth considering is the life cycle of a log event. When a software component of some kind generates a log entry, to get value from it, it needs to be passed through a life cycle, shown in figure 1.5.
As figure 1.5 shows, we start with capturing the log event (information source capture), and as the event flows down, it gains more meaning and value. Based on what we’ve already discussed, any log unification tool, including Fluentd, is most effective with the information source capture and the structure and route phases. The aggregate and analyze phase will see features for analysis focusing on individual events but will lean on aggregating and analyzing aggregated logs. Visualize data is the product’s weakest area. Given these tools’ routing and connectivity capabilities, the notify and alert phase is easily realized by connecting suitable services. Not only that, but there is also the potential for this phase to be moved upward, as we don’t always need the analytics products to decide whether it is necessary to notify and alert.
As shown in the figure, tools like Fluentd support the upper half of the life cycle very well (from capture to aggregate and some of the analyze stage). The lower half is well supported by log analytics (aggregate and analyze, visualize data, notify and alert).
discuss

In this section, we will look at the events that led to the creation of Fluentd and its rapid growth in adoption. Figure 1.6 shows a timeline of key events in the evolution of Fluentd.
Fluentd’s origins go back to 2011 when big data, through the use of Hadoop, was impacting mainstream IT. As a Silicon Valley startup, Treasure Data was established to create value around Hadoop-based processing of semi-structured data. Treasure Data found it needed a tool to help it capture data from multiple sources and ingest the data into a Hadoop data store. As a result, it set about building Fluentd and made it available as free and open source software (FOSS) using the Apache 2 License (www.apache.org/licenses/LICENSE-2.0). This made it easy to build upon, extend, and exploit the tool. As a result, developers (other than just those working for Treasure Data) contributed to and extended Fluentd.
NOTE
To learn more about Hadoop, check out the liveBook version of Mastering Large Datasets with Python by John T. Wolohan (Manning, 2020) at http://mng.bz/do2o.
In 2013, Fluentd got a big boost due to the recommendation by AWS for data collection across and onto their platform. This was further helped by Google using Fluentd with its BigQuery product and then incorporating Fluentd into its monitoring solution.
The next major event for Fluentd was its adoption by the Cloud Native Computing Foundation (CNCF). CNCF’s existence was strongly influenced by Google in conjunction with the Linux Foundation to give Kubernetes a vendor-neutral home. Kubernetes is designed to run multiple containers across one or more servers, with containers hosting one or more different applications. Not to mention that containers can be started and torn down on the different servers as needed. From this, it is clear that corralling and routing log data is a critical challenge that can be answered well by Fluentd’s capabilities.
Infrastructure as a Service (IaaS) and Platform as a Service (PaaS) solutions have influenced and been influenced by CNCF projects. It is only natural that those technologies have the best chance of being incorporated or supported by cloud platform offerings. When it comes to Fluentd, we have seen the major vendors (AWS, Azure, Google, Oracle, DigitalOcean, Alibaba, etc.) do one of the following:
- Directly leverage Fluentd for their own needs (e.g., Google, Oracle)
- Package it up as part of a larger offering (Bitnami, Google Stackdriver)
- Expose their various services to being accessible as inputs or outputs to their services (AWS—S3, RDS, CloudWatch, Beanstalk, etc.)
For example, AWS has output plugins for its storage services. AWS’s CloudWatch solution can both receive and send log information to Fluentd. As we have seen, Google embraced Fluentd early on.
Beyond the IaaS cloud offerings, there is a range of specialist PaaS services for performing log analytics, ranging from Loggly (www.loggly.com) to Datadog (www.datadoghq.com). These vendors have provided plugins into Fluentd, so it becomes effortless for customers to route log data to these services.
settings

Fluentd and Fluent Bit can be used or adapted to almost any situation, from running in containers to being deployed on IoT devices to mainframe solutions. As we have seen, Fluent Bit’s footprint is small enough to operate on a vast range of IoT devices, which is reflected in part by Arm’s acquisition of Treasure Data. Fluentd and Fluent Bit together cover at least 90% of the OS platforms in use today. As already discussed, Fluentd works well with cloud offerings, but it is not bound to the cloud and can work in more traditional virtualized or dedicated server deployments.
The more relevant question should be, will deploying Fluentd or Fluent Bit make your job easier? Should you use Fluent Bit or Fluentd?
Using Fluentd and Fluent Bit, from a basic laptop or desktop machine, to servers, physical and virtual, running Windows and Linux OSes, can be done without any worries. This means that as we get hands-on in the rest of the book, putting Fluentd into action should be possible. The possible exception is chapter 8, when we run Kubernetes and Docker, which will need a bit more power, but we’re still talking about a midrange desktop or laptop. But understanding the limits of Fluentd can help beyond that.
Beyond the OS and hardware, platform constraints are minimal. The most basic environment just needs to be able to run the Ruby engine. Ruby is supported by a range of standard package-based installations (yum, Homebrew, apt, RubyInstaller for Windows, to name a few). Making the installation for all the standard OSes is straightforward, and the package managers should help resolve any dependency issues. But for the less common environments, Ruby also provides a “from source” installation guide (www.ruby-lang.org/). If there isn’t a prebuilt installation option for Fluentd itself—a rare situation given the prebuilt installers covered (RPM, Deb, MSI, and RubyGems)—then the Fluentd website (https://fluentd.org) provides details of how to achieve an installation from the source code.
Depending upon the configurations that need to be established, Fluentd has additional plugins that may be required. Fluentd plugins are typically deployed using RubyGems (an open source package manager for Ruby components; https://rubygems.org/). Gems can be installed from a local location if you need stringent network controls.
Optionally, there are prebuilt solutions that can be deployed to Docker and Kubernetes if preferred. We will address the question of deployment of Fluentd as part of Kubernetes later in the book.
The last option—while we believe it is possible, we haven’t heard of it being tried—is the creation of a platform-native binary of Fluentd through the use of GraalVM (https://www.graalvm.org/docs/getting-started/). GraalVM is a next-generation language virtual machine that incorporates Java (JVM) and several other language interpreter packs, including Ruby (https://github.com/oracle/truffleruby). But GraalVM also can create platform native binaries for Java and the other supported languages as well.
With Fluentd deployed, it needs to read one or more configuration files that tells Fluentd (typically installed as a daemon process in production) what it should do.
Definition
A daemon is a computer program that runs as a background process and is usually started and stopped by the operating system when it starts up and shuts down. This term is more commonly associated with Linux- and Unix-based operating systems. Often, applications designed to operate this way will have their name end with a d; for example, syslogd is a daemon that implements system logging (Syslog) in Linux. In Windows operating systems, these processes are referred to as Windows services.
So, any deployment location needs to be able to read the file and ideally allow updates to the file for Fluentd. Fluent Bit can use a configuration file or even interpret the configuration from a command-line parameter.
highlight, annotate, and bookmark
You can automatically highlight by performing the text selection while keeping the alt/ key pressed.

Fluentd does have a browser-delivered user interface. Figure 1.7 illustrates one of the UI screens to give a sense of what it is like.
- Editing the configuration file
- Managing a Fluentd instance in terms of stopping and starting
- Getting plugins patched or installed
- Inspecting Fluentd’s logs
We will focus directly on the configuration file for this book, as this will help explore the more complex nuances and is more mature than the UI. We’ll take a brief tour of the UI in chapter 2 once we have completed the installation of Fluentd.
In addition to the web-based UI, there is an additional plugin for Microsoft’s Visual Studio Code that will help with syntax highlighting when editing configuration files. The plugin can be used to help address typical issues like missing brackets. This plugin can be downloaded from within Visual Studio Code or from http://mng.bz/GOeA. Other editors, such as Sublime Text, also have open source packages/plugins to support the syntax editing of the Fluentd configuration file.
discuss

Take our tour and find out more about liveBook's features:
- Search - full text search of all our books
- Discussions - ask questions and interact with other readers in the discussion forum.
- Highlight, annotate, or bookmark.
As mentioned earlier, Fluentd’s plugins in the breadth and depth of coverage outweigh most if not all competition. We cannot cover every possible plugin in the following chapters, so we’ll focus on those that help illustrate core ideas and represent cases that most Fluentd deployments are likely to encounter.
But given the scope of plugins, it is worth getting a sense of what is available and what could be achieved. Fluentd plugins can be grouped into the following categories, and with each category, we have provided some examples:
- Inputs
- File storage—AWS S3, text files (HTTP log files, etc.)
- Data(base) source—MongoDB, MySQL, generic SQL (for all ANSI SQL DBs)
- Event sources—AWS Kinesis, Kafka, AWS CloudWatch, GCP Pub/Sub, RabbitMQ
- OS—System, HTTP Endpoint, dstat, SNMP
- App servers—IIS (Internet Information Services), WebSphere, Tomcat
- Outputs
- Log/Event Manipulation (parsers, filters, and formatters)
- Map—Log format mapper
- Numeric Monitor—Generates stats relating to logs
- Text to JSON
- Key/Value Parsing
- GeoIP—Translating IP addresses to geographic location based on published information (for more information on the use of GeoIP, see the liveBook version of Securing DevOps by Julien Vehent (Manning, 2018) at http://mng.bz/raJJ.
- JWT—Working with JSON Web Tokens (more background on this can be found in the liveBook version of OpenID Connect in Action by Prabath Siriwardena (Manning, 2022) at http://mng.bz/VlMy.
- Redaction—The masking of data so sensitive data values can’t be seen by those not authorized to do so
- Formatters—The means to lay out the data into different structures and potentially different notations (e.g., XML to JSON)
- Storage
- Service Discovery—Configuration to find other nodes that understand Fluentd’s comms mechanisms
Note
A complete list of available Fluentd plugins is managed at www.fluentd.org/plugins/all.
settings

Throughout this chapter, we have examined a number of the scenarios and use cases that Fluentd can help with. As we progress through the book, we will introduce scenarios and look at increasing complexity.
Rather than waiting until log events are collected together before anything is done with the content, it is possible to create configurations so that as they are received, they can be processed. Such processing could include filtering to find the events that require immediate attention. If a system logs an event that typically only occurs shortly before the solution fails—for example, the OS goes into a panic state (for more on kernel panic, see https://wiki.osdev.org/Kernel_Panic)—then as soon as that event is detected, we could send a message to someone responsible for handling such events via near real-time channels like PagerDuty or Slack (we will illustrate the Slack scenario in chapter 4). But actionable log events can easily extend further, such as triggering a script to perform automated remediation (e.g., purging or archiving older log files so storage isn’t exhausted).
The actionable event can also be extended to provide a means by which log events can be made more meaningful. In larger, long-lived organizations, there are legacy solutions that are still business-critical (they are typically very large and embody lots of logic to ensure compliance to requirements that very few people understand). As a result, the replacement cost can be huge, and no one wants to take on the risk of making modifications, even to improve log messages to make support easier. But such problems can be addressed; those innocent-looking log messages that are harbingers of doom if someone doesn’t execute some remediation soon can be modified to have things like error codes attached. Ops people can then easily find the operational protocol.
The application of meaning can go further; some logs will have structures that don’t align to standard formats, such as JSON and XML. But Fluentd can be used to impose structure quickly and early, so downstream, the log events can be handled more efficiently. If an application accidentally logs sensitive data, the sooner such information is removed or masked, the better. Otherwise, all downstream log-processing solutions have to implement a far more stringent security setup, because they will be receiving sensitive data such as credit card data (PCI compliance), personal data (General Data Protection Regulation), or data subject to similar legislation. If such issues become a problem, and the source of their logs can’t be fixed, Fluentd can filter out or modify the log event to mask such content.
Over the last 10 years, there has been an explosion of different programming languages. As a result, we often talk about polyglot environments where many different languages are used in an end-to-end solution; for example, R or Python may be used to extract deep meaning from data, while web interfaces could be written in JavaScript. Backend solutions could be Java, Scala, Clojure, dot Net (.NET), and PHP. Thick client applications working with the same backend could be written with C#, VB.Net, or Swift. In these types of environments, we need an agnostic solution of the implementation language of applications. Fluentd provides this, but many languages have libraries that allow log events to be passed in an optimized manner directly to Fluentd.
The multiple targets issue embodies the fact that it is common to have teams dedicated to specific tasks in larger organizations, such as information security. Different teams want to use different tools to support their specialism—for example, algorithms are particularly good at detecting patterns indicating malicious security activities.
Log events, like any operational data, need storage and consume network capacity when moved, which results in costs. That cost can be noticeable when large volumes of uncompressed or unfiltered text exit a cloud provider’s network and are communicated over a business’s internet connection. Yet, at the same time, we don’t want to be overly parsimonious with logging; otherwise, we will never appreciate what is happening. Fluentd can help with this by filtering and storing some log events locally where the log events have limited value. But the log information that can be of further help can get sent onward to a central location. Not only that, but the transmission can also be optimized through compression mechanisms (bulk log events can be highly compressed).
Previously we introduced the three pillars of observability (logs, traces, metrics). In some cases, we want to get metrics, such as how many occurrences of a log event occur, or which process is alive or dead by looking at logs for signs of life (i.e., whether events have been created). With the plugins, it is possible to generate such measures and share such data with Prometheus and Grafana.
This can be extended through the possibility of Fluentd monitoring its own deployed nodes—when you get into complex distributed use cases, this can also be highly desirable. After all, Fluentd is just another piece of software and is therefore as vulnerable to bugs as any other code.
Company mergers and acquisitions can drive the need to consolidate operational resources, such as operational teams. Such consolidation will happen quicker than any process to consolidate major IT systems. We can easily direct log data to current operational support team tools to monitor and reduce the time and effort to absorb new systems into the operations organization through log unification.
- Key concepts influencing modern thinking around monitoring come from ideas such as Google’s four golden signals and the three pillars of observability.
- Log analytics differs from log unification by focusing on a platform to mine the log data. In contrast, log unification is about bringing logs together and directing the content to necessary tools.
- Fluentd and Fluent Bit started as open source initiatives from Treasure Data before coming under the governance of the CNCF.
- Fluentd and Fluent Bit are not aligned to any analytics platform. Considering the association with CNCF has helped the adoption of Fluentd by IaaS and PaaS vendors as either a part of a monitoring product or service or as supporting connectivity between Fluentd and their product.
- Fluentd has seen strong adoption in the microservices space, but it can fit equally well with a legacy landscape.
- Fluentd has a broad range of plugins available and a framework that enables custom plugins to be developed when needed.
- Fluent Bit trades off the highly pluggable nature for a tiny optimized footprint.
- Both Fluentd and Fluent Bit can support the majority of platforms with prebuilt artifacts. Both are open source solutions; it is possible to build the kernel and plugins on just about any conceivable platform.
- The application of logging is wide-ranging and offers value during the software’s entire life cycle.
- Fluentd supports a wide range of use cases, from debugging distributed solutions to operational monitoring.
- Understand how Fluentd fits into the EFK software stack, and what the differences are between the ELK and EFK software stacks.