1 The what and why of Python packages

published book

This chapter covers

  • Packaging code to make it more accessible to others
  • Using packages to make your own projects more manageable
  • Building Python packages for different platforms

Imagine that you’ve written a groundbreaking piece of Python software for use in self-driving cars. Your latest work is going to change the world, and you want as many people using it as possible. You’ve convinced CarCorp to use your solution, and they want to retrieve the code to get started with it.

When CarCorp calls to ask how to install and use your code, you go through all the gory details of copying each file to the right directory, making some files executable so they can be run as commands, and so on. Because you wrote the software, this is all second nature to you. To your surprise, the developers on the other end of the phone are a bit lost. What happened?

You’ve discovered the chasm that often exists between those who create software and those who use it. These days, people are used to visiting the app store on their iPhone when they need something new. You have a bit of work to do if you want to improve the user experience of your software!

In this book, you’ll learn how distributing your Python project as an installable package can make it more accessible to others. You’ll also learn how to create a repeatable process for managing your projects, reducing the effort you’ll spend maintaining them, so you can focus on your real aspiration: to change the world. You’ll do all this by building a real project using some popular packaging tools and automating several aspects of the process. Although the Python community has developed standards for some areas of packaging, the One True Way© of doing things has not yet emerged. Nor may it ever do so!

Even if you’ve created or published a Python package before, you’ll find something in this book for you. The suggestions and tools you’ll learn in this book are time-tested approaches to some of the more loosely defined packaging practices. Python packaging has a messy history and many current alternative options, so in addition to seeing and using the tools available now, you’ll also learn the methodology behind how they work to continue adapting as the landscape matures. To that end, it’s important to first understand why software is packaged at all.

1.1 What is a package, anyway?

To save your relationship with CarCorp, you promise to come back in a few weeks with an overhauled process that will help them install your software in a snap. You know that some of your favorite Python code, like pandas and requests, are available as packages online, and you want to provide the same ease of installation to your own consumers.

Packaging is the act of archiving software along with metadata that describes those files. Developers usually create these archives, or packages, with the intent of sharing or publishing them.

Important

The Python ecosystem uses the word package for two distinct concepts. The Python Packaging Authority (PyPA ) differentiates the terms in the Python Packaging User Guide (https://packaging.python.org) as follows:

  • Import packages organize multiple Python modules into a directory for discovery purposes (http://mng.bz/wypg).
  • Distribution packages archive Python projects to be published for others to install (http://mng.bz/qoNz).

Import packages aren’t always distributed in an archive, though distribution packages often contain one or more import packages. Distribution packages are the main subject of this book and will be disambiguated from import packages where necessary to avoid confusion.

With a probably infinite number of ways to roll software and its metadata together, how do maintainers and users of that software manage expectations and reduce manual work? That’s where package management systems come in.

1.1.1 Standardizing packaging for automation

Package management systems, or package managers, standardize the archive and metadata format for software packages in a particular domain. Package managers provide tools to help consumers install dependencies at the project, programming language, framework, or operating system level. Most package managers ship with a familiar set of instructions to install, uninstall, or update packages. You may have used some of the following package managers:

Software repositories standardize packaging further by acting as centralized marketplaces to publish and host packages that others can install (see figure 1.1). Many programming language communities provide an official or de facto standard repository for installing packages. PyPI (https://pypi.org), RubyGems (https://rubygems.org/), and Docker Hub (https://hub.docker.com/) are a few popular software repositories.

Figure 1.1 Packages, package managers, and software repositories are all critical to sharing software.

If you own a smartphone, tablet, or desktop computer and you’ve installed apps from an app store, that’s packaging at work. Packages are software bundled together with metadata about that software, and that’s precisely what an app is. Software repositories host software that people can install, and that’s what an app store is.

So, packages are software and metadata rolled together in an agreed-upon format, codified in the relevant package management system. At a more granular level, packages also typically include a way to build the software on a user’s system, or they may provide several prebuilt versions of the software for a variety of target systems.

1.1.2 The contents of a distribution package

Figure 1.2 shows some of the files you might choose to put in a distribution package. Developers often include the source code files in a package, but they can also provide compiled artifacts, test data, and whatever else a consumer or colleague might need. By distributing a package, your consumers will have a one-stop shop to grab all the pieces they need to get started with your software.

Figure 1.2 A package often includes source code, a makefile for compiling the code, metadata about the code, and instructions for the consumer.

Distributing noncode files is an important capability. Although the code is often the reason to distribute anything in the first place, many users and tools depend on the metadata about the code to differentiate it from other code. Developers usually specify the name of a software project, its creator(s), the license under which it can be reused, and so on in the metadata. Importantly, the metadata often includes the version of the archive to distinguish it from previous and future publications of the project.

Now that you’re familiar with what goes into a package, you’ll learn how this approach to sharing software solves specific problems in practice.

1.1.3 The challenges of sharing software

Your call with CarCorp is growing tense, and you realize you forgot to have them install all your project’s dependencies first. You back up a few steps and navigate them through the dependency installation. Unfortunately, you forgot to check which version you’ve been using for one of your major dependencies, and the latest version doesn’t seem to work. You walk them through installing each previous version until you finally find one that works. Crisis narrowly averted.

As you develop increasingly complex systems, the effort to make sure you’ve installed the required version of each dependency correctly grows quickly. In the worst cases, you might reach a point where you need two different versions of the same dependency, and they can’t coexist. This is affectionally known as “dependency hell.” Detangling a project from this point can prove challenging.

Even without running into dependency hell, without a standardized approach to packaging, it can be difficult to share software in a standard way so that anyone, anywhere knows what other dependencies they need to install for your project. Software communities create conventions and standards for managing packages, codifying those practices into the package management systems you use to get your work done.

Now that you understand why packaging is good for sharing software, read on to learn about some of the advantages that packaging can provide even if you aren’t always making your software publicly available.

1.2 How packaging helps you

If you’re new to packaging, it may seem so far like it’s mainly useful for sharing software with people across the globe. Although that’s certainly a good reason to package your code, you may also like some of the following benefits that packaging brings when developing software:

  • Stronger cohesion and encapsulation
  • Clearer definition of ownership
  • Looser coupling between areas of the code
  • More opportunity for composition

The following sections cover these benefits in detail.

1.2.1 Enforcing cohesion and encapsulation through packaging

A particular area of code should generally have one job. Cohesion measures how dutifully the code sticks to that job. The more stray functionality is floating around, the less cohesive the code is.

You’ve probably used functions, classes, modules, and import packages to organize your Python code (see Dane Hillard, “The Hierarchy of Separation in Python,” Practices of the Python Pro, Manning Publications, 2020, pp. 25–39, http://mng.bz/m2N0). These constructs each place a kind of named boundary around areas of code that have a particular job. When done well, naming communicates to developers what belongs inside the boundary and, importantly, what doesn’t.

Despite best efforts, names and people are rarely perfect. If you put all your Python code in a single application, chances are some code will eventually seep into areas it doesn’t belong. Think about some of the larger projects you’ve developed. How many times did you create a utils.py or helpers.py module containing a grab bag of functionality? The boundaries you create with a function or a module are readily overcome. These “utility” areas of the code tend to attract new “utilities,” with the cohesion trending down over time.

Imagine that your self-driving car system can use lidar (https://oceanservice.noaa.gov/facts/lidar.html) as one type of input. CarCorp’s vehicles don’t include lidar sensors. Being the diligent developer you are, you create a lidar-specific part of the code base to separate it from other concerns. Although assessing naming and regularly refactoring the code base can keep cohesion higher, it’s also a maintenance burden. Distribution packages increase the barrier to adding code where it may not belong in the first place. Because updating a package necessitates going through a cycle of packaging, publishing, and installing the update, it prompts developers to think more deeply about the changes they make. You will be less likely to add code to a package without explicit intent that’s worth the investment of the update cycle.

Creating cohesion and packaging a cohesive area of code is a gateway into encapsulation. Encapsulation helps you build the right expectations with your consumers about how to interact with your code by defining whether and how the code’s behavior is exposed. Think of a project you built and shared with someone to use. Now think about how many times you changed your code, and how many times they had to change their code in turn. How frustrating was it for them? How about for you? Encapsulation can reduce this kind of churn by better defining the API contract that’s less subject to change. Figure 1.3 shows how you might create multiple packages out of cohesive areas of code.

Figure 1.3 Packaging can reduce unexpected interdependence between areas of code by introducing stronger boundaries.

You might’ve felt frustration in the past when you found that a piece of code meant only for use internal to a module was being used widely throughout the code. Each time you update that “internal” code, you need to update usages elsewhere. This high-churn environment can lead to bugs when you don’t propagate a change everywhere, leaving you or your team that much less productive.

Well-encapsulated, highly cohesive code will change rarely, even when used widely. This kind of code is sometimes labeled “mature.” Mature code is a great candidate for distributing as a package because you won’t need to republish it frequently. You can get a start in packaging by extracting some of the more mature code from your code base and then use what you know about cohesion and encapsulation to bring less mature code up to snuff.

1.2.2 Promoting clear ownership of code

Teams benefit from clear ownership over areas of code. Ownership often goes beyond maintaining the behavior of the code itself. Teams build automation to streamline unit testing, deployment, integration testing, performance testing, and more. That’s a lot of plates to keep spinning at once. Keeping the scope of a bounded area of code small so that a team can own all these aspects will ensure the code’s longevity. Packaging is one tool for managing scope.

The encapsulation you create through packaging code enables you to develop automation independent of other code. As an example, automation for a code base with little structure may require you to write conditional logic to determine which tests to run based on which files changed. Alternatively, you might run all the tests for every change, which can be slow. Creating packages that you can test and publish independently of other code will result in clearer mappings from source code to test code to publication code (see figure 1.4).

Figure 1.4 Teams can take full ownership over individual packages, defining how they want to manage the development, testing, and publishing life cycle.

A clear delineation of purpose for a package makes it likelier to have a clear delineation of ownership. If a team isn’t sure what they’re committing to by taking ownership of some code, they’re going to be wary. Try providing a package with a clear scope, story, and operator’s manual to see how the mood shifts.

1.2.3 Decoupling implementation from usage

You may have heard the term loose coupling used to describe the level of interdependence between areas of code.

Definition

Coupling is a measure of the interdependence between areas of code. Loosely coupled code provides multiple avenues of flexibility so you can implement and choose from a variety of execution strategies instead of being forced down a particular path. Two pieces of code with low coupling have little or no dependence on each other, and they can be changed at different rates.

The cohesion and encapsulation practices you read about earlier in this chapter are a way to reduce the likelihood of tight coupling due to poor code organization. Highly cohesive code will have tight coupling within itself and loose coupling to anything outside its boundary. Encapsulation exposes an intentional API, limiting any coupling to that API. Your choices about packaging and encapsulation, then, help you decouple your consumers from implementation details in your code. Packaging also makes it possible to decouple consumers from implementation through versioning, namespacing, and even the programming language in which software is written.

In a big ball of mud, you’re stuck running whatever code is in each module. If you or someone on your team updates a module, all code using that module needs to accommodate the change immediately. If the update changes a call signature or a return value, it may have a wide blast radius. Packaging significantly reduces this restriction (see figure 1.5).

Figure 1.5 Packaging provides flexibility so two areas of code can evolve at different rates.

Imagine if each update to the requests package required you to react immediately by updating your own code. That would be a nightmare! Because packages version the code they contain, and because consumers can specify which version they want to install, a package can be updated many times without impacting consuming code. Developers can choose precisely when to incur the effort of updating their code to accommodate a change in a more recent version of the package.

Another point at which you can decouple code is namespacing. Namespaces attach values and behavior to human-readable names. When you install a package, you make it available at the namespace it specifies. As an example, the requests package is available in the requests namespace.

Different packages can have the same namespace. This means they could conflict if you install more than one of them, but it also makes something interesting possible: this flexibility in namespaces means packages can act as full alternatives to one another. If a developer creates an alternative to a popular package that’s faster, safer, or more maintainable, you can install it in place of the original as long as the API is the same. As an example, the following packages all provide roughly equivalent MySQL (https://www.mysql.com) client functionality (specifically, they implement some level of compatibility with PEP 249; https://www.python.org/dev/peps/pep-0249/):

Finally, Python packaging can even decouple usage in Python from the language in which a package is written! Many Python packages are written in C and even Fortran for improved performance or integration with legacy systems. Package authors can provide precompiled versions of these packages alongside versions that can be built from source by the consumer if needed. This also makes packages more portable, decoupling developers somewhat from the details of the computer or server they’re using. You’ll learn more about packaging build targets in chapter 3.

You might like to package some of your code to experiment with the freedom of version decoupling to see how your versioned packages evolve over time. Those that change quickly may point to low cohesion because the code has many reasons to change. On the other hand, it may indicate only that the code is still maturing. At the very least, these data points will be observable! You’ll learn more about versioning in chapter 9.

1.2.4 Filling roles by composing small packages

The act of extracting code into multiple packages is a bit like decomposition. Successful decomposition requires a good handle on loose coupling. Decomposing code is an art that separates pieces of code so they can be recombined in new ways (for a wonderfully concise rundown of decomposition and coupling, see Josh Justice, “Breaking Up Is Hard to Do: How to Decompose Your Code,” Big Nerd Ranch, http://mng.bz/5mpq).

By packaging smaller areas of your code, you’ll start to identify code that accomplishes a very specific goal that can be generalized or broadened to fulfill a role. As an example, you can create one-off HTTP requests using a built-in Python utility like urllib.request.urlopen. Once you’ve done this a few times, you can see commonalities between the use cases and generalize the concept into a higher-level utility. So the requests package isn’t built to make just one specific HTTP request; it fills a general role as an HTTP client. Some of your code may be very specific now, but as you find new areas where you need similar behavior, you may see an opportunity to identify the role it fills, generalize a bit, and create a package that can fill that role.

As you work on revamping your software for CarCorp, you remember that a major portion of the code deals with the car’s navigation systems. You realize that with a bit of tweaking, the navigation code will also work for Acme Auto’s vehicles. This code could fill the role of communicating with vehicle navigation systems. Because you’ve learned that packages can depend on other packages, and because your navigation system code is already fairly cohesive, you commit yourself to creating not one but two packages before your next CarCorp meeting.

Thinking about composition and decomposition highlights the fact that distribution packages can exist at any size, just as functions, classes, modules, and import packages do. Look to cohesion and decoupling as guiding lights to strike the right balance. One hundred distribution packages that each provide a single function would be a maintenance burden, and one distribution package that provides one hundred import packages would be about the same as having no package at all. If all else fails, always ask yourself, “What role do I want this code to fill?”

Now that you’ve learned that packaging can help you write cohesive, loosely coupled code with clear ownership that you can deliver to consumers in an accessible way, I hope you’re rolling up your sleeves to dive into the details.

Summary

  • Packages archive software files and metadata about the software, such as the name, creator, license, and version.
  • Package managers automate installing packages and managing the interdependencies between them.
  • The packaging process has a number of pitfalls that can be overcome with tools and a repeatable process.
  • Software repositories host published packages for others to install.
  • Packaging is a great way to separate and encapsulate code with high cohesion.
  • Packaging can be used as a decoupling tool to gain flexibility in developing and maintaining code.
  • Versioned packages are a great way to reduce churn across the code base for each individual update.
sitemap
×

Unable to load book!

The book could not be loaded.

(try again in a couple of minutes)

manning.com homepage