1 Crafting Experiences for Cloud Native Development

MEAP v1

This chapter covers

What developer experience is and why it matters
Paving the path from idea to production through the inner loop and the outer loop
Using continuous delivery to build higher-quality software faster and safer
Main challenges and friction points impacting the developer experience
Introducing the cloud native project used in the book

As organizations adopt cloud native technologies and Kubernetes, they are fundamentally transforming how they build, deploy, and manage software. This shift is driven by the need to deliver software faster, more reliably, and at scale. At the core of this transformation is the developer experience—the daily reality of how developers interact with tools, platforms, and processes to deliver value to customers through software.

While these technologies offer powerful capabilities, they often come with increased cognitive load and reduced developer productivity. Understanding and optimizing the developer experience has become crucial for organizations aiming to succeed in their cloud native journey.

This chapter lays the foundation for understanding developer experience in modern software engineering. We will explore why developer experience is important, examine the fundamental challenges that teams face when building applications for Kubernetes, and introduce key concepts such as the inner and outer development loops.

1.1 Why does developer experience matter?

In today's technology-driven landscape, where automated tools and solutions prevail, why is it essential to focus on developer experience? Even with the advancements in artificial intelligence (AI), application developers remain indispensable in the software delivery process.

This section will analyze the role of application developers and the challenges of building cloud native applications. We will look into why it's vital for organizations to help developers thrive in the presence of complex tools, lack of standardization, and despite challenges in implementing best practices in an environment where the only constant is change. Finally, it will provide a definition of developer experience that we’ll use for the rest of the book.

1.1.1 The Role of Application Developers

Application developers make today's world move forward. Regardless of industry, every company likely relies on software to operate and thrive. The work of developers creates tangible value by addressing and solving customers' problems through software solutions. However, if software cannot swiftly adapt to meet customers' evolving needs, they will inevitably turn to more efficient providers.

The journey from idea to value begins with a clear problem statement and a well-defined requirement for solving it. Developers are tasked with translating these requirements into functional software. That involves building new applications, implementing more features, and fixing bugs (figure 1.1). The speed and quality of this transformation, from the initial requirement to the final software delivered to customers (what we call the path to production), are critical to the organization's success.

Figure 1.1 The path from idea to production, where software delivers value to the organization and its customers.

In this book, when we refer to customers, we mean any user of the applications developers build and continuously enhance. Customers can be private end-users, organizations, or even internal teams within the same company developing the software. These applications span a wide range of use cases: from your home banking app to your favorite streaming platform, from services controlling offshore wind farms to your organization's internal portal to software solutions hospitals use to manage patient records and appointments. We call customers all users of these diverse applications.

The journey from idea to value is crucial for delivering software that meets customers' needs effectively. Cloud native technologies are an essential enabler for building modern software solutions. However, they introduce a unique set of challenges that application developers must address. That's the topic of the next section.

1.1.2 The Challenges of Cloud Native Development

In the past 15 years, companies have drastically changed how they design, build, and ship software to their customers. New technologies and practices brought along some challenges for application developers in three main areas:

Infrastructure and platforms
Architectures and design
Organizations and practices.

This section provides a generic overview of the main challenges. We'll analyze and address them throughout the book, so don't worry if something is unclear.

Infrastructure and Platforms

The cloud made it possible to consume infrastructure as a service, turning computing, network, and storage resources into commodities. Containers entered the stage and quickly became one of the most used methods for packaging and running applications while ensuring portability across environments, from development to production. Kubernetes raised the abstraction level on top of infrastructure and containers, laying the foundation for building platforms that application developers can consume via APIs.

We witnessed an exponential increase in new tools for cloud-based and Kubernetes-based ecosystems that can easily make developers feel overwhelmed by their number and complexity. Even more so when adopting these new technologies, developers are forced to know all the details of Kubernetes and related tools, resulting in a substantial cognitive load increase and reduced productivity.

The recent rediscovery of platform engineering focused more on the separation of concerns and the value of abstractions. Still, it's common for many organizations to require developers to interact with low-level details in Kubernetes and cloud infrastructures, reducing the time they can spend on producing more value for their customers. Does that happen in your organization?

Note

Platform engineering is a specialized branch of software engineering dedicated to creating platforms that empower development teams throughout their daily, iterative journey from idea to value. In the context of cloud native technologies, platforms are often built on top of Kubernetes and designed to offer on-demand services to developers while hiding the complexity of internal infrastructure. In later chapters, we'll dive deeper into cloud native platforms and their importance for application developers. To learn more about platform engineering, check the book "Platform Engineering on Kubernetes" by Mauricio Salatino (Manning, 2023).

Architectures and Design

Software architectures have also evolved. We've been building increasingly distributed systems with increasingly demanding requirements for scalability and resilience. New use cases were unlocked thanks to new architectural styles and infrastructures, but the complexity of the systems and related development environments increased. That's especially true when those architectural styles are implemented incorrectly. We've seen countless transitions to microservices go wrong, resulting in unmanageable distributed monoliths. As a consequence, many organizations are now considering adopting modular monoliths.

However, the problem we've been trying to solve hasn't changed in decades. Software decomposition is hard yet necessary whether you're building a monolithic or microservice-based system. Distributed systems are complex. When implemented on top of cloud infrastructures, they can substantially impact daily application development workflows, creating friction for developers when building against cloud services in their development environments or needing to run all the necessary dependencies to work on new features.

Note

Software decomposition and modularization have been challenging since the early days of software development. In 1972, D.L. Parnas published a paper titled "On the criteria to be used in decomposing systems into modules". Still, today, we continue to struggle with designing loosely coupled, maintainable solutions. Whether you're building a monolithic application or microservices, correctly decomposing a system into modules is essential to the success of a software product.

Organization and Practices

Organizations themselves underwent radical changes, trying to catch up with the rise and mainstream adoption of cloud computing. However, organizational transformations don't always succeed. The DevOps movement, which attempted to break the silos and friction between Development and Operations, has often been misunderstood and resulted in simply renaming the old Operations team to a DevOps team, creating a new silo named DevOps[1], or even pushing all operational responsibilities to the Development team. None of those changes effectively solve the underlying problem: streamlining the software delivery process.

Note

There is no universally accepted definition of DevOps. We find the one proposed by Ken Mugrage (principal technologist at ThoughtWorks) particularly interesting: "A culture where people, regardless of title or background, work together to imagine, develop, deploy, and operate a system”.[2]

One of the key tenets of the cloud computing model is its on-demand, self-service nature. That convenience is void when organizations adopt heavy processes and require development teams to submit a ticket to some Infrastructure team whenever they need to provision a new database or a virtual machine. Does that sound familiar to you? Getting a change from code to production often goes through expensive and slow processes, requiring many hand-overs and manual approvals.

All these factors impact application developers' productivity and their experience trying to get a code change into the hands of their customers. In 2010, Jez Humble and David Farley formalized the concept of continuous delivery[3], a holistic approach to developing and delivering higher-quality software faster, safer, and in a repeatable way. Continuous delivery practices give us well-tested solutions to improve the path from idea to production, but the adoption might be challenging. Unfortunately, many companies still struggle to implement these ideas.

Cloud Native

All the challenges mentioned so far are common when discussing cloud native development. But what does cloud native mean? The Cloud Native Computing Foundation (CNCF) answers that question in its cloud native definition[4]:

“Cloud native practices empower organizations to develop, build, and deploy workloads in computing environments (public, private, hybrid cloud) to meet their organizational needs at scale in a programmatic and repeatable manner. It is characterized by loosely coupled systems that interoperate in a manner that is secure, resilient, manageable, sustainable, and observable.

Cloud native technologies and architectures typically consist of some combination of containers, service meshes, multi-tenancy, microservices, immutable infrastructure, serverless, and declarative APIs — this list is non-exhaustive.”

The definition continues by highlighting the benefits of cloud native:

“Combined with robust automation, cloud native practices allow organizations to make high-impact changes frequently, predictably, with minimal toil and clear separation of concerns.”

Application developers' productivity has become a serious issue for organizations that want to adapt faster while taking advantage of new approaches and tools constantly introduced in the cloud native ecosystem. This book focuses on the main pain points that developers face when working with cloud native applications and trying to adopt all these new tools and practices. How can we combine them to achieve a great developer experience? That's what this book is all about!

Note

If you'd like to learn more about the definition and properties of cloud native applications from a developer perspective, you can refer to Chapter 1 of the book "Cloud Native Spring in Action" (Manning, 2022) by Thomas Vitale.

1.1.3 Defining Developer Experience

What do we mean by developer experience? It sure sounds like a buzzword. Like other buzzwords in our field, it can be confusing because it means different things to different people.

We rely on the insightful work by F. Fagerhold and J. Munch, who suggested a comprehensive definition in their paper “Developer Experience: Concept and Definition”[5]:

“...developer experience could be defined as a means for capturing how developers think and feel about their activities within their working environments, with the assumption that an improvement of the developer experience has positive impacts on characteristics such as sustained team and project performance.”

This definition captures the essence of why developer experience matters: improving it has a positive impact on development teams and increases their productivity. That means the better the developer experience, the higher the value produced.

Many factors influence developers' activities within software engineering projects. The paper suggests dividing these factors into three groups:

Development Infrastructure Factors: How developers perceive the development infrastructure. That includes interactions with tools, frameworks, platforms, and organizational processes.
Work Feelings Factors: How developers feel about their work. That includes social aspects such as respect and a sense of belonging within their team and organization.
Value Contribution Factors: How developers perceive the value of their contributions. That includes aligning personal goals with project objectives and their sense of purpose within the team and organization.

This book focuses on the first dimension, which is all about tools and software practices and how developers perceive them while translating requirements into running software. As a developer, you might be overwhelmed by the number of tools you need to learn and master to complete your daily tasks. Or you might get frustrated due to slow and suboptimal tools. Perhaps your organization has added unnecessary hurdles and constraints that impact the performance of your team, making it more challenging to go from idea to value.

We can now suggest a more specific definition focused on the dimension of experiences that this book will cover.

“Developer Experience captures how developers interact with, and are empowered by, their technical environment to deliver customer value through software. This includes their ability to maintain flow and productivity while using development tools, frameworks, platforms and organizational processes. The assumption is that a well-designed, low-friction development infrastructure enables developers to focus on problem-solving rather than wrestling with tooling complexity or inefficient processes—positively impacting software delivery outcomes and team sustainability.”

Good developer experiences don't emerge spontaneously—they must be deliberately designed to align with the software being developed and the toolchain in use. Yet most teams work with an inherited patchwork of tools, each designed with different intentions and constraints. That creates a "Frankenstein Experience": a cobbled-together environment that undermines developer productivity rather than enhancing it.

The first step towards better experiences is understanding, at a deeper level, the main development activities, their relations with the overall software delivery cycle, and what can go wrong. We'll explore this in the next section while learning about the inner loop and outer loop.

1.2 The Inner and Outer Loops

Let's consider the path from idea to production. It all starts with a clear problem statement and the definition of a requirement for solving it.

A requirement could involve creating a new application, adding a new feature to an existing distributed system, fixing a bug in a microservice, or refactoring existing code in a modular monolith. It would typically include acceptance criteria, which are the conditions that must be met for the requirement to be considered complete. Requirements can come from various sources, such as product managers, business analysts, customers, or developers.

Given a requirement, many activities must be performed to deliver a working software solution. These activities can be grouped into two main categories: the inner loop (also called inner dev loop) and the outer loop (also called outer dev loop). The requirement is the starting point for the development process and the input to the inner loop. The transition from the inner loop to the outer loop happens whenever a developer pushes a change into the version control system.

Finally, the outer loop can feed back learnings and insights from production, leading to new requirements and closing the software development lifecycle (figure 1.2).

Figure 1.2 The key parts of the software development lifecycle, triggered by a requirement and ending with value delivered to customers

1.2.1 Inner Loop

The inner loop is where developers spend most of their time. It's the cycle of activities that a developer performs to write, test, run, and debug code (figure 1.3). It is triggered by a requirement to implement a new feature, fix a bug, design a new system, or refactor existing code. Developers would take on a requirement from the backlog and start working on it. The focus of this loop is on rapid feedback and fast iteration.

Note

Other terms used to refer to the inner loop are development workflow, pre-commit workflow, or local development.

In the inner loop, developers go through these activities:

Code. Given a requirement, developers write code to implement the feature or fix the bug.
Test. For each code change, developers write tests to verify that the code works as expected.
Run. Developers build and run the code to see the change in action and validate its behavior.
Debug. If the code doesn't work as expected, developers debug the code to identify and fix the issue.

The activities in the inner loop take place in the development environment, which includes all the tools and services the developers need to make a change. Some might follow a Test-Driven Development (TDD) approach, where they write a failing test first, then write the code to make the test pass, and finally tidy up[6]. Others prefer to write the code first and then write the tests. When building a web application, developers might also establish an automated workflow to run the application and see the changes in real-time as they make them.

Figure 1.3 The inner loop is triggered by a requirement; it consists of all the activities carried out in a development environment to make a change, and it ends with the change committed to the mainline.

Once satisfied with the change, developers push it into the remote version control system. This action triggers the outer loop. It can take several iterations in the inner loop to complete a requirement and meet the acceptance criteria. However, that shouldn't delay the process of frequently pushing changes. Following the practice of continuous integration[7], developers should make small, incremental changes and push them frequently into the remote version control system. The goal is to get fast feedback and avoid integration issues.

At a minimum, developers should push their changes at least once per day. Less than that, and it's hard to call it continuous. You might think: "But I'm working on a feature that will take me a week to complete. Should I push every day?" The answer is yes. It will help you avoid conflicts with other developers and integration issues. It will give you the confidence that your changes are not breaking the build or the tests. It will also enable your peers to give you feedback early in the process. "But the feature is not complete yet. Wouldn't that cause issues for users?" That's fine. You can use techniques such as keystone interfaces[8] and feature flags to hide incomplete features from users (we'll explore that later in the book) while retaining all the benefits of continuous integration.

The action triggering the outer loop is the push of a change into the remote version control system. There are a few different strategies for pushing changes[9]. Considering the aspiration for the inner loop to get fast feedback, developers should push their changes frequently. When practicing continuous integration, all developers push their changes to the mainline in the remote version control system. In Git, that would typically be the main branch. This practice helps avoid long-lived branches, which can lead to integration issues and slow feedback loops. Developers might directly push their changes to the main branch when doing pair programming. In other contexts, a pull request might be required to review the changes before merging them (pre-integration reviews). In that case, developers would create a short-lived branch, make the changes, open a pull request, wait for the review, and then merge the changes to the mainline.

Are you really practicing continuous integration?

Continuous integration is a practice that is too often misunderstood. Jez Humble, co-author of the Continuous Delivery book, suggested a simple test to determine whether a development team is really practicing continuous integration[10].

Is every developer in your team pushing changes to the mainline (no feature branches) at least once per day?
Every time a developer pushes a change to the mainline, does that trigger an automated build and test process?
If the build and test fail, will developers fix that within 10 minutes?

If you answered “Yes” to all three questions, you are indeed practicing continuous integration and can award yourself the Continuous Integration Certification[11].

The trigger for the outer loop is the push to the mainline. When using pre-integration reviews or long-lived features branches, we consider the additional activities required to merge the changes back to the mainline as part of the inner loop. Suppose it takes too long to perform a pre-integration review. In that case, developers will start pushing changes less frequently, slowing down the feedback loop and increasing the cycle time (the time it takes to deliver a change to production). The feedback loop gets even slower when using feature branches instead of continuous integration, as developers would integrate their changes only at the end of the feature development process. Furthermore, low-frequency integration discourages code refactoring since that would lead to expensive conflicts when merging the changes back to the mainline.

Our primary focus in this book is on the experience of development teams working on software products full-time, adopting the discipline of continuous delivery and all its foundational practices. However, it's essential to understand that these practices might not apply to all contexts. For example, continuous integration would not work for open-source projects, where developers may not be part of the same team and might not have the same level of trust or time commitment to the project. In that case, feature branches would be a better fit. Developers would fork the repository, make the changes in their fork, and then open a pull request to merge the changes back to the original repository only after they have been reviewed and approved.

1.2.2 Outer Loop

The outer loop is the cycle of activities after a developer commits a change to the mainline in a version control system until the change is deployed to production and operational. The central part of the outer loop is the deployment pipeline, which is the key pattern in continuous delivery and represents the only path to production.

Based on the concepts described by Jez Humble and Dave Farley in their books[12], we can group the activities in the outer loop into three main stages: commit, acceptance, and production (figure 1.4).

Figure 1.4 The outer loop is triggered every time a new change is committed to the mainline; it goes through multiple stages until the change is released to customers and delivers value.

Continuous Delivery vs. CI/CD

Since continuous integration is a foundational practice of continuous delivery, the combination is sometimes called CI/CD. Consequently, deployment pipelines are often called CI/CD pipelines. However, continuous integration is only one of the practices included in the continuous delivery discipline.

Jez Humble and Dave Farley do not use the CI/CD term in their Continuous Delivery book or in any other book they have written on the subject. Also, it can be confusing. Does CD stand for continuous delivery or continuous deployment?

In this book, we adhere to the terminology used by the discipline's authors. We refer to continuous delivery as the holistic discipline to developing and delivering higher-quality software faster, safer, and in a repeatable way. As part of the discipline, we consider continuous integration as a practice and the deployment pipeline as the pattern to get your software from commit to production.

Commit Stage

This stage is triggered every time a developer pushes a change to the mainline. It includes activities such as compiling the code, running the tests (mostly unit and component tests), performing static code analysis, and creating a build artifact. The goal of this stage is to provide fast feedback to the developer about the quality of the change.

If the change doesn't pass the tests, the developer should fix the issue immediately by committing a new change or reverting to the previous state. Ideally, this stage should take less than 5 minutes to complete. If it takes longer, developers would have to wait too long for feedback, increasing the cycle time and causing friction.

The activities performed in this stage are run against a build environment (or continuous integration environment) and can be supported by a wide array of tools, including build services (or continuous integration services), such as Jenkins, GitLab CI, or GitHub Actions, among others. Developers use such tools but are not responsible for their configuration or maintenance, which is typically the responsibility of a platform team.

At the end of this stage, a build artifact is produced, representing a release candidate. Depending on the technology stack, it could be a binary executable, a container image, or something else. The release candidate is then promoted to the acceptance stage.

Acceptance Stage

This stage is triggered when a new release candidate is available. It includes activities such as running functional acceptance tests (validating the original acceptance criteria of the implemented requirement) and non-functional acceptance tests (assessing security, performance, capacity, and so on). The goal of this stage is to provide confidence that the change is ready to be released to production.

If the change doesn't pass the tests, the release candidate is discarded, and developers should fix the issue as soon as possible. Ideally, this stage should take less than 60 minutes to complete. In cases of high-frequency integrations, the commit stage would produce multiple release candidates while the acceptance stage is still running. That's why only the latest output from the commit stage triggers the acceptance stage at any given time.

The tests performed in this stage are run against production-like environments, which are as similar as possible to the production environment.

Many tools are available to support the activities in this stage. Developers use such tools but should not be responsible for configuring or maintaining them. Platform teams can provide the necessary capabilities for automating deployments and operations via self-service platforms.

Most activities in this stage are automated, but manual testing activities, such as exploratory or usability testing, might be helpful. In that case, the testers would install the release candidate on a production-like environment using the same deployment automation adopted in the automated tests. Such activities might be part of this stage or run independently from the outer loop to not affect the overall cycle time.

At the end of this stage, the release candidate is proven to be releasable to production and is promoted to the next stage. Overall, the deployment pipeline is only as good as the quality of the tests in both the commit and acceptance stages. The pipeline will not provide the expected feedback and confidence if the tests cannot reliably detect issues.

Production Stage

When a release candidate is promoted to this stage, it is ready to be released to production. This stage includes activities such as deploying the release candidate to the production environment, running smoke tests (validating the system's basic functionality), and monitoring the system for any issues. The goal of this stage is to provide confidence that the change is working as expected in production.

An essential benefit of continuous delivery is ensuring our software is always in a releasable state. When to release a specific version then becomes a business decision (not technical), and it can be triggered manually or automatically. If we decide to release automatically to production whenever a release candidate is promoted to this stage, then we would be practicing continuous deployment[13].

This stage is the last one in the deployment pipeline, providing the most value to the business since it's where customers use the software. The feedback from production can inform new requirements, thus closing the overarching loop from idea to production and back.

Drones and Loops

Tracing the first usage of the terms inner loop and outer loop in the context of software delivery has proven challenging. However, it's evident that these concepts originated from the field of electronic control systems.

Consider an autonomous drone. To fly effectively, the drone relies on a sophisticated control system with two primary feedback loops: the inner loop and the outer loop. The inner loop operates at a high frequency, making rapid adjustments to the drone's motors to maintain stability. Meanwhile, the outer loop runs at a lower frequency, managing the drone's altitude, orientation, and navigation.

This structure parallels software delivery. The inner loop consists of rapid iterations focused on immediate feedback and small refinements, while the outer loop involves slower cycles that ensure overall performance and stability.

1.3 The 10 Friction Points of Developer Experience

The daily workflow of an application developer can be full of challenges, creating friction throughout the software development lifecycle. We identify ten main areas that affect the developer experience. We call them "The Ten Friction Points of Developer Experience" (figure 1.5). Let's see if these sound familiar to you or your development teams.

If you're a developer, each friction point might cause you pain. Throughout the book, we'll explore several tools and techniques to help make each point as frictionless as possible.

If you're a platform engineer working in a team that enables developers, consider how to reduce friction in each of these points. After all, developers are the users of your platform. If the experience you provide is not good, they will not use it.

If you're management, why would you want a better experience for your developers? Removing friction from each of these points not only benefits developers but also increases productivity, results in more compliant and higher-quality results, and improves the working environment where developers like to be, leading to a higher employee retention rate.

Figure 1.5 Along the path to production, there are 10 main points that can cause friction and toil for developers.

Using these ten friction points, you can evaluate how your development teams rank their development experience. Appendix A includes a checklist to assess your current development experience, discover the significant pain points, and for each friction point, find a reference to which chapters in this book help you mitigate it.

We know that the cloud native space is constantly evolving, and new projects are popping up every day. Hence, we designed this book and the checklist to focus on friction points and how to mitigate them rather than focusing too much on specific tools. While we will mention concrete tools and solutions to mitigate some of these challenges, we aim to address the issues in a flexible way, where multiple tools can be applied depending on the context and the skills of the teams adopting them.

1.3.1 Kicking Off a New Project

Starting a new project can be a daunting task. Whether it's a new service in an existing system or a brand-new project (greenfield), you need to consider how to set up a Git repository, which architecture to adopt, which programming language and framework to use, and which conventions to follow. If the project is part of an existing, larger system, there would be some documentation or guidelines to follow already, but not necessarily up-to-date or easy to find. If it's a new project, you need to make all these decisions from scratch or follow guidelines from other projects in the organization.

The initial setup of a project and the decisions that need to be made can cause friction for developers, leading to delays in starting the project or making the wrong decisions.