How to select a distribution

People who are interested in trying Free Software are frequently bewildered at the number of available software distributions. Perhaps doubly so because there seems to be so little difference between them. And compounded further because Free Software development is a cooperative endeavor, so the distributions which represent the best that Free Software have to offer tend to avoid criticism of their peers.

I do not intend to tell you what distribution to use, but I do want to encourage you to think about who you trust, because that is the thing that will vary the most from project to project. I’ll describe some of the things that I look at when I consider distributions, and how that affects my trust in a project.

I’ve been developing software and working in operations-focused roles since the mid 1990s, so expect a lot of software development philosophy to follow.

First of all, what is a distribution?

A distribution is a project that distributes software.

One of the reasons that it might be difficult to select a distribution is that people use that term to describe something other than the project. In particular, people tend to refer to the software itself as the distribution. I think that’s confusing, because the software is largely the same from distribution to distribution. If you install GNOME on Fedora or GNOME on Ubuntu, they are largely the same software, because the software is developed by the GNOME project. It’s merely being distributed by the distributions. (To the extent that there are differences, I think that is a flaw in one or the other release. It is almost never in the best interest of users or the primary developers of a project for a distribution to make significant changes to their software.)

Mature distributions include tens of thousands of applications and libraries. Offering all of those applications and libraries as a single collection makes them easier to discover and easier to install, but I think the most important function is managing updates. Every individual project is free to discontinue a release series, and to start a new release series at any time. And any new release series may or may not be backward compatible with previous releases. Each of the tens of thousands of projects has to be monitored for updates, and each update has to be reviewed to determine how it affects everything else in the collection.

A distribution sits in between tens of thousands of Free Software projects and millions of users in order to turn tens of thousands of update streams into just one update stream.

Users who are trying to select a distribution tend to ask for a distribution that does X or one that does Y, and the question doesn’t make sense, because the functionality they are asking about is typically developed in upstream projects, not in distributions. If the software has that functionality, it’ll be available to users regardless of who delivers the software to them.

In fact, significant development happening in the distribution rather than upstream creates a lot of friction in the process. It makes it less clear to users where they should report bugs. It frustrates upstream developers who get bug reports for software they did not write and do not maintain. And similarly, especially with LTS distributions, it leads to a lot of bug reports upstream for bugs that were fixed long ago, or reports on release series that are no longer maintained.

The best thing that a distribution can do is to bring users and developers closer together, and get out of the way. That means patching as little as possible, and shipping upstream releases without filtering them or delaying them.

Contrary to what you might expect, the less a distribution does, the better it is.

What differentiates distributions from each other?

The purpose of a distribution is to deliver software, and simplify the process of updating it. But there are details that differ from project to project. I’ll run down a list, ordered from least significant to most significant.

What is included? This item tends not to vary much from distribution to distribution. We’re all building distributions from the same pool of Free Software, and we’re including as much as we can subject to the time our maintainers have available and our notions of what is useful. There is some variation, though, because some parts of the systems we build are difficult or impossible to change after the system is built. That is, if you build a system with GNU libc, you probably won’t also build and distribute uClibc, because your users can’t exchange one for the other. These differences are relatively uncommon, and typically only affect your decision if you are after a very specific and uncommon project.
How is integration managed? Source code often does not transform deterministically into usable software. Most software starts out by discovering features in the environment where it is built, and adapting itself to those features. As a result, the features present in the result of a build are influenced by what other packages were present in the build environment, and often what behaviors were specified on the command line during the build. That means that a maintainer has to make choices about what build dependencies to specify, and what configuration to specify in order to create a binary package with a feature set that’s consistent with expectations, and consistent from build to build. Maintainers need to understand the default behavior of each package, and what their users need from the package in order to make sure that everything within the distribution is integrated well.
How much are the defaults changed? Some distributions are trying to create something unique, and others prefer to deliver software to users in the configuration that its developers intended, as much as possible.
How much is the software changed? Some distributions apply a large set of patches to the software they distribute, and others adopt a policy of pushing changes to the upstream developers first in order to reduce ongoing maintenance overhead and security risks.
What is the distribution’s release cadence? Some distributions, especially those that are oriented toward infrastructure workloads, might release infrequently and support each release for a long term. Those distributions will get new features much less often. Other distributions might release relatively frequently with somewhat shorter support periods. Still others adopt a “rolling” model where there are no distinct releases, just one “current” release that continually receives new features as they’re ready. Many users conclude that they want long-term releases for systems the can set and forget, and I want to caution readers on that point. Most of the projects included in a distribution are not maintained for long-term support. Shipping software to users after support is discontinued by the developers is typically bad for both the developers and the users.
Where is the build infrastructure? Some distributions provide a build infrastructure that isn’t directly accessible to the maintainers, while others allow maintainers to build software on their own systems and upload the results. Providing an infrastructure for builds that maintainers can’t directly access helps ensure that binary packages are the result of the source code and the build scripts, with less opportunity for humans to compromise the build process.
Where is the source? Community oriented distributions offer transparency by publishing their build scripts and patches for review. Secure distributions provide shared infrastructure for source code, because that allows them to enforce policies like “protected branches,” which prevent developers from rewriting the history of source code.
How is software integrity ensured? Security-minded users want to see things like signed kernels and boot loaders (for Secure Boot), and signed packages. Some distributions sign their packages directly when they are built, while others might sign the metadata when the collection is published. In order to trust signatures, packages should be signed as early as possible after built, and both the build and signing systems should not be directly available to maintainers.
How are decisions made? In order to ensure that a distribution addresses the actual needs of its developers and users, decision making processes should be well documented and public.
Who uses it? One of the things you may want to consider when selecting a distribution is its user community. When you have questions, a larger community or a more technically experienced community may be better able to answer those questions. From that point of view, you might choose to select a distribution that’s used by mature organizations, or has a large set of known experienced users.
Is there a code of conduct, and does it align with your values? Does it encourage the kind of community that you want to be a part of?
Is the project sustainable? For many years, we’ve seen notable security events that weren’t the result of flaws in the software, but the result of changes in project membership. If you can take over a project with a large user base, you can ship software to a large user base who wouldn’t voluntarily download and run your software. Sustainability is a critical security concern. When you are selecting a software provider, you want to know not only that you can trust them, but that you can continue trusting them in the future. Large projects with diverse participants tend to be more secure against hostile takeover. Distributions that are derived from other distributions are often the work of much smaller teams who rely on larger projects to do the bulk of the work involved. Those projects might look attractive, but they might also be at greater risk of takeover due to normal turnover among participants.

Practical examples

If that list seems abstract, and you aren’t sure how to evaluate a project, I’ve described some of those characteristics with respect to Fedora, because that is a system that I understand well.