<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.4.1">Jekyll</generator><link href="https://gordonmessmer.codeberg.page/dev-blog/feed.xml" rel="self" type="application/atom+xml" /><link href="https://gordonmessmer.codeberg.page/dev-blog/" rel="alternate" type="text/html" /><updated>2026-05-04T02:13:02-05:00</updated><id>https://gordonmessmer.codeberg.page/dev-blog/feed.xml</id><title type="html">WhtsThsWeirdMsg</title><subtitle>Writing about Linux, packaging, development, and systems engineering</subtitle><author><name>Gordon Messmer</name></author><entry><title type="html">Fedora Is Not a Product</title><link href="https://gordonmessmer.codeberg.page/dev-blog/2026/04/28/fedora-is-not-a-product.html" rel="alternate" type="text/html" title="Fedora Is Not a Product" /><published>2026-04-28T00:00:00-05:00</published><updated>2026-04-28T00:00:00-05:00</updated><id>https://gordonmessmer.codeberg.page/dev-blog/2026/04/28/fedora-is-not-a-product</id><content type="html" xml:base="https://gordonmessmer.codeberg.page/dev-blog/2026/04/28/fedora-is-not-a-product.html"><![CDATA[<p>This is going to sound crazy, but I believe that the purpose of a
distribution is to distribute software.</p>

<p>Hear me out…</p>

<details>

<summary>TL;DR</summary>

Synchronizing component lifecycles reduces complexity in a collection,
which benefits collections that are productized. Counter-intuitively,
it is harmful for collections that are not products. Modularity was
one of a series of mechanisms to support asynchronous work in Fedora,
illustrating that asynchronous work is something that developers
need. One way that Fedora could attract developers may be to provide a
federated, language-agnostic source code registry to help them build
and distribute software.

</details>
<p><br /></p>

<p>Reading <a href="https://docs.fedoraproject.org/en-US/modularity/history/">the
history</a> of
a now-defunct Fedora feature called “Modularity,” one passage in
particular caught my attention. The document described “a classic
problem that Linux distributions have faced: the “Too Fast/Too Slow”
problem.”</p>

<p>I think the problem is not that some components move too fast or some
move too slow, it’s that independent developers work asynchronously,
and distributions try to synchronize them. That might sound like a
trivial distinction, but I think understanding the difference is key
to making the distribution more scalable and expanding the developer
community.</p>

<p>Let’s consider the example of a synchronous system.</p>

<p>Red Hat Enterprise Linux is a system that (within several categories)
synchronizes the lifecycle of its components. That’s one of the things
that Red Hat is selling. Rather than seeking support from thousands of
projects individually, customers have one vendor to work with.  Rather
than thousands of individual release cadences and maintenance windows,
customers have one cadence and maintenance window to which thousands
of components have been synchronized.</p>

<p>Synchronizing those lifecycles is expensive. Red Hat engineers take on
the responsibility of fixing bugs and security issues for the
components they support for years past the end of upstream
maintenance. Even with thousands of professional developers, Red Hat
Enterprise Linux supports a far smaller set of components than Fedora
does.</p>

<p>This process is one of the things that makes Red Hat Enterprise Linux
a product. The lifecycle and feature set are tuned to a specific
purpose, making them useful for a specific market segment.</p>

<p>In many ways, Fedora exists in contrast to that model. Rather than a
small, focused set of features, Fedora tries to distribute as much
software as possible. Rather than a large professional staff of
engineers, much of Fedora is maintained by volunteers. Rather than
unifying lifecycles, Fedora uses a rapid release cadence, a fairly
short maintenance window, and in some cases rolling release packages
to minimize the amount of downstream synchronization necessary to ship
a collection of components while their lifecycles naturally overlap.</p>

<p>The purpose of productizing a release is to allow an organization to
act as a vendor in place of upstream projects. It makes the
organization a middleman, placing them in between users and the
upstream projects, so that their customers have one support contract
to maintain. Fedora is not that. Fedora should do the opposite of
that. As a community project, Fedora should <em>minimize</em> the separation
between users and upstream projects. Fedora should bring those people
together.</p>

<p>Still, many Fedora policies imitate the superficial aspects of RHEL.
A new package must be “the latest version” (as if there is only one).
For most languages, components must not bundle their dependencies.
Each component will provide only one version unless extraordinary
steps are taken. Branches in the dist-git repositories represent
Fedora releases, not the upstream project’s release series. These
policies create something that’s product shaped, but lacking the
staffing that allows RHEL to function as a product.</p>

<p>Those policies contrast with the world of package registries and
direct publishing channels used by developers outside the realm of
software distributions. Outside of distributions, developers are free
to publish multiple simultaneous release streams under one component
name (the same way that Fedora ships multiple simultaneous releases of
the distribution). Developers can readily use a version from any
supported release series of their dependencies. Developers are free to
bundle dependencies, for better or worse… Some developers use this
to avoid tracking updates, but others bundle to allow tighter
integration and faster delivery of features developed in the dependent
project. Strictly prohibiting bundling in Fedora ignores the value of
collaboration in upstream projects, and actually creates silos which
do not naturally exist in the Free Software model and which make small
projects less sustainable.</p>

<p>Many of the things that are difficult in Fedora are difficult because
we try to synchronize a massive collection of packages, like RHEL,
rather than enabling developers to work asynchronously, like a package
registry. Synchronization is difficult even for professional
developers, but it’s reasonable when creating a product and taking on
the maintainer role. Synchronizing the collection isn’t just difficult
for volunteer packagers, it’s harmful when responsibility for fixing
bugs remains with the upstream projects. Even with a rapid release
cycle and a short maintenance window, Fedora maintainers do have to
synchronize some packages. In some cases they approach that problem by
modifying a project’s dependency information to allow it to build
against a dependency version that the developers haven’t tested. That
not only annoys developers by periodically creating bug reports for
what is effectively a fork, the presence of that patch in Fedora makes
it more difficult to automate future patch-level updates. When
maintainers don’t modify a package to synchronize it with the
dependencies in the release, they may need to create a duplicate
version of a package for the version they need. At best that process
involves tickets, and typically it involves asking the owner of the
dependency to do the work. All of this is redundant work, unnecessary
when developers are empowered to publish their own releases.</p>

<p>One of the problems that modularity solved was allowing packages and
sets of packages to evolve asynchronously with the rest of the
distribution. That’s necessary for lots of non-trivial software, where
there are deep integrations between software components. That deep
integration often indicates collaboration between projects, and we
should <em>encourage</em> that. Fedora’s policies against bundling packages
sometimes makes project collaboration more difficult.</p>

<p>For example vllm needs a very recent PyTorch, but a slightly older
Python3.  That’s not impossible in Fedora. A maintainer could request
a fork of PyTorch.  But there is a lot of friction. In particular, if
the maintainer of the application wants a branch of a dependency owned
by someone else, and if that maintainer doesn’t want to manage another
package, it may be difficult to move forward.</p>

<p>Illustrations might help explain why Fedora’s synchronization makes
their work less useful to developers, and how a minor change could
make them significantly more useful.</p>

<p>In upstream projects with stable releases, it’s common to use branches
to represent each release series.</p>

<p><img src="/dev-blog/images/png/release-branches.drawio.png" alt="Diagram of upstream release branches" /></p>

<p>In some cases, especially for projects that have a cadence similar to
Fedora’s, Fedora’s release branches follow very similar paths.</p>

<p><img src="/dev-blog/images/png/distro-branches.drawio.png" alt="Diagram of Fedora release branches" /></p>

<p>In other cases, Fedora’s release branches might rebase from one
upstream release series to another. Or, Fedora might not have a branch
representing an upstream release series at all, because no Fedora
release is ready to use it. In either case, Fedora’s repositories
don’t lend themselves to reuse as a general purpose source code
registry because developers can’t reliably select a release series to
follow, without synchronizing to a Fedora release.</p>

<p><img src="/dev-blog/images/png/distro-rebase.drawio.png" alt="Diagram of Fedora release branches after a component has been rebased" /></p>

<p>If Fedora’s dist-git repos provided branches that represented upstream
release series, developers could define a complex build, without being
limited to one centralized package registry (like PyPI), and without
writing long shell scripts to do it.</p>

<p>This hypothetical example might use Fedora 44 packages where no
specific branch is called for, but build specific releases of Python3,
PyTorch, and vllm, and the resulting RPMs could then be used to create
a container image in which some of those RPMs are installed.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>base: fedora-44-x86_64
build:
  - type: dist-git
    url: https://src.fedoraproject.org/rpms/
    packages:
      - python3.11:release-3.11
      - python-torch:release-2.11

  - type: git
    url: https://github.com/vllm-project/
    packages:
      - vllm:releases/v0.20.0

install:
  type: container
  base_image: registry.fedoraproject.org/fedora:44
  tag: vllm:latest
  registry: quay.io/vllm
  packages:
    - vllm
</code></pre></div></div>

<p>Minor changes to Fedora’s dist-git branching could transform Fedora
from a self-contained software distribution into the central hub in a
federated, language-agnostic package registry, providing an essential
tool that addresses the needs of an underserved developer community.</p>]]></content><author><name>Gordon Messmer</name></author><summary type="html"><![CDATA[This is going to sound crazy, but I believe that the purpose of a distribution is to distribute software.]]></summary></entry><entry><title type="html">Kernel packaging</title><link href="https://gordonmessmer.codeberg.page/dev-blog/2026/04/17/kernel-packaging.html" rel="alternate" type="text/html" title="Kernel packaging" /><published>2026-04-17T00:00:00-05:00</published><updated>2026-04-17T00:00:00-05:00</updated><id>https://gordonmessmer.codeberg.page/dev-blog/2026/04/17/kernel-packaging</id><content type="html" xml:base="https://gordonmessmer.codeberg.page/dev-blog/2026/04/17/kernel-packaging.html"><![CDATA[<h2 id="tldr">TL;DR</h2>

<p>If you are interested in building alternate kernel packages or
kernel module packages for Fedora, or if you’re interested in testing
alternate kernels or kernel modules on Fedora, let’s chat.</p>

<h2 id="fedoras-kernel-works-well-if-you-dont-need-out-of-tree-drivers">Fedora’s kernel works well if you don’t need out of tree drivers</h2>

<p>When I look through user forums, one of the problems I see described
most frequently is a blank screen on boot. Sometimes this is a
first-time setup and the user hasn’t followed all of the documented
installation steps, or they haven’t followed the correct list of
steps. (In theory, this should be a one-click operation for users that
have enabled third party repos, but the last time I checked that
doesn’t actually work because rpmfusion only provides AppStream data
for their primary repos, not for the NVIDIA driver repo that users can
enable as a third party repo.) Sometimes they’ve rebooted the system
without waiting for the invisible background process of building the
display driver module and the kernel initramfs to complete. Sometimes
it fails and there’s simply no indication to the system’s user
why. Maybe the disk ran out of space.</p>

<p>Whatever the cause of the problem, I think that this is one of the top
reasons that Fedora is often not rated as a “beginner-friendly”
distribution.</p>

<p>Fedora’s policies prohibit alternate kernels, and packaging kernel
modules. I suspect that this is driven at least in part by
requirements imposed by the agreements under which their Secure Boot
signing keys are signed by the UEFI 3rd party signing CA. That’s all
perfectly reasonable, but it’s also a barrier to any kind of
experimentation or improvement.</p>

<p>I am convinced that Fedora users need pre-built kernel modules. As an
SRE, I believe that reliable systems build code, test the build, and
then deploy the tested build. Systems like akmods and dkms deploy
the code first, then build it in place and “test in prod.” It is
inevitable that such systems will fail regularly.</p>

<p>The Nova driver will eventually resolve this problem for most NVIDIA
users, but there will continue to be users who want out of tree drivers
for ZFS, VirtualBox, WiFi drivers that haven’t merged yet, etc.</p>

<h2 id="ready-to-run-signing-infrastructure">Ready-to-run signing infrastructure</h2>

<p>There’s no shortage of information about how to sign code with pesign,
but they’re not always easy to use. Some guides don’t actually work on
contemporary releases. Some guides are hardware specific.</p>

<p>The best way to promote a process is to make it as easy as possible.
If a process can simply be “fork and build”, it’s much more likely to
be adopted and deployed. I’ve developed a Terraform project that you
can fork and build to deploy a VPC in AWS in which a forgejo-runner
has an HSM with code signing certificates. Users who install the
signing certificate in their MOK can use kernels and kernel modules
produced on this infrastructure.</p>

<p>Below, you can find the Terraform project, a kernel rpm, a kernel
module rpm, an Atomic desktop configuration, and an Atomic desktop
container image, all of which can serve as starting points for
further development:</p>

<ul>
  <li><a href="https://codeberg.org/orb-project/signed-code-build-stack">https://codeberg.org/orb-project/signed-code-build-stack</a></li>
  <li><a href="https://codeberg.org/gordonmessmer/kernel-longterm/releases">https://codeberg.org/gordonmessmer/kernel-longterm/releases</a></li>
  <li><a href="https://codeberg.org/gordonmessmer/nvidia-open-kmod/releases">https://codeberg.org/gordonmessmer/nvidia-open-kmod/releases</a></li>
  <li><a href="https://codeberg.org/gordonmessmer/kernel-longterm-yumrepo">https://codeberg.org/gordonmessmer/kernel-longterm-yumrepo</a> (<a href="https://k-build-yum-dev.s3.us-west-2.amazonaws.com/">S3</a>)</li>
  <li><a href="https://copr.fedorainfracloud.org/coprs/gordonmessmer/kernel-longterm-6.18-plus/">https://copr.fedorainfracloud.org/coprs/gordonmessmer/kernel-longterm-6.18-plus/</a></li>
  <li><a href="https://pagure.io/fork/gordonmessmer/workstation-ostree-config">https://pagure.io/fork/gordonmessmer/workstation-ostree-config</a></li>
  <li><a href="https://quay.io/repository/gordonmessmer/atomic-desktop/silverblue">https://quay.io/repository/gordonmessmer/atomic-desktop/silverblue</a></li>
</ul>

<p>If you’ve installed an Atomic desktop, you can try the Fedora Remix:</p>

<p><code class="language-plaintext highlighter-rouge">sudo rpm-ostree rebase ostree-unverified-image:registry:quay.io/gordonmessmer/atomic-desktop/silverblue:43.20260411.0</code></p>

<h2 id="various-guides-to-signing-code-for-secure-boot">Various guides to signing code for Secure Boot</h2>

<ul>
  <li><a href="https://fedoraproject.org/wiki/User:Pjones/SecureBootSmartCardDeployment">https://fedoraproject.org/wiki/User:Pjones/SecureBootSmartCardDeployment</a> : Peter Jones described how to set up signing infrastructure for Feodra systems</li>
  <li><a href="https://forge.fedoraproject.org/infra/ansible/src/branch/main/playbooks/groups/buildhw.yml">https://forge.fedoraproject.org/infra/ansible/src/branch/main/playbooks/groups/buildhw.yml</a> : Fedora’s infrastructure playbooks describe its signing setup</li>
  <li><a href="https://forge.fedoraproject.org/infra/ansible/src/branch/main/roles/bkernel/tasks/main.yml">https://forge.fedoraproject.org/infra/ansible/src/branch/main/roles/bkernel/tasks/main.yml</a></li>
  <li><a href="https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/8/html/managing_monitoring_and_updating_the_kernel/signing-a-kernel-and-modules-for-secure-boot_managing-monitoring-and-updating-the-kernel">https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/8/html/managing_monitoring_and_updating_the_kernel/signing-a-kernel-and-modules-for-secure-boot_managing-monitoring-and-updating-the-kernel</a> - Red Hat documents signing kernels and modules</li>
  <li><a href="https://wiki.almalinux.org/development/private-keys/secure-boot.html">https://wiki.almalinux.org/development/private-keys/secure-boot.html</a> : AlmaLinux documents their setup</li>
  <li><a href="https://gist.github.com/chenxiaolong/520914b191f17194a0acdc0e03122e63">https://gist.github.com/chenxiaolong/520914b191f17194a0acdc0e03122e63</a> : Building Fedora RPMs that use pesign</li>
  <li><a href="https://gist.github.com/joostd/ac44db2d4e8e9bdbdde7cdab5c05c0fb">https://gist.github.com/joostd/ac44db2d4e8e9bdbdde7cdab5c05c0fb</a> : Signing EFI images with keys generated on a YubiHSM 2 device</li>
  <li><a href="https://github.com/tianocore/tianocore.github.io/wiki/EDK-II-User-Documentation">https://github.com/tianocore/tianocore.github.io/wiki/EDK-II-User-Documentation</a> : EDK II User Documentation includes Signing UEFI Images.pdf V1.31 This document describes how to sign UEFI images for the development and test of UEFI Secure Boot</li>
</ul>]]></content><author><name>Gordon Messmer</name></author><summary type="html"><![CDATA[TL;DR]]></summary></entry><entry><title type="html">Building a Robust Code Signing Infrastructure with AWS KMS</title><link href="https://gordonmessmer.codeberg.page/dev-blog/2026/04/10/signed-code-stack.html" rel="alternate" type="text/html" title="Building a Robust Code Signing Infrastructure with AWS KMS" /><published>2026-04-10T00:00:00-05:00</published><updated>2026-04-10T00:00:00-05:00</updated><id>https://gordonmessmer.codeberg.page/dev-blog/2026/04/10/signed-code-stack</id><content type="html" xml:base="https://gordonmessmer.codeberg.page/dev-blog/2026/04/10/signed-code-stack.html"><![CDATA[<p><img src="/dev-blog/images/png/kernel-build.drawio.png" alt="Code Signing Infrastructure" /></p>

<h2 id="motivation">Motivation</h2>

<p>One of the reasons that I often see other systems recommended to new
users is that the software required to support a large number of
common devices is harder to set up and more troublesome to maintain on
Fedora than it is on other systems.</p>

<p>That makes sense to me. I’m not a huge fan of DKMS/akmods. With an SRE
background, I tend to believe that software should follow a
build-&gt;test-&gt;deploy sequence, but installation from source (as in
DKMS/akmods) is more of a deploy-&gt;build-&gt;test sequence. If the build
process is disrupted (e.g.  power loss, reboot during the silent
background build, running out of free disk space, etc), or if tests
fail, recovery options are limited because the software has already
been deployed. That’s not a recipe for reliability.</p>

<p>Moreover, I advocate the use of Secure Boot. While there are systems
in place to support local builds on a host that has Secure Boot
enabled, they require the signing key to be located on the host, and
usable by automated processes. If your package manager can build and
sign kernel modules, then a rootkit can do the same thing. These
systems defeat the purpose of the Secure Boot system.</p>

<p>I want:</p>
<ul>
  <li>a system that can build and sign code for use with Secure Boot</li>
  <li>using an HSM so that the signing key cannot be exfiltrated</li>
  <li>providing transparency logs so that the key cannot be quietly misused</li>
  <li>on infrastructure that can be readily forked, deployed, discussed, and improved</li>
</ul>

<h2 id="the-challenge-securing-code-signing-in-cicd">The Challenge: Securing Code Signing in CI/CD</h2>

<p>Code signing is fundamental to software security—it proves that
binaries haven’t been tampered with and come from a trusted
source. But some signing configurations have a critical vulnerability:
the private key must exist somewhere, and wherever it exists, it can
potentially be stolen.</p>

<p>This project tackles that problem head-on by building a code signing infrastructure where:</p>
<ul>
  <li><strong>Keys cannot be exfiltrated</strong></li>
  <li><strong>Keys cannot be used by unauthorized repositories</strong></li>
  <li><strong>Every signing operation is transparent and auditable</strong></li>
</ul>

<h2 id="the-solution-hardware-backed-signing-with-public-transparency">The Solution: Hardware-Backed Signing with Public Transparency</h2>

<p>The architecture uses AWS KMS (Key Management Service) asymmetric
keys—RSA-4096 keys backed by FIPS 140-2 Level 2 validated hardware
security modules (HSMs). The private key material never leaves the HSM
boundary. Ever. Not in memory, not on disk, not over the network.</p>

<h3 id="core-security-properties">Core Security Properties</h3>

<h4 id="1-key-exfiltration-is-cryptographically-impossible">1. Key Exfiltration is Cryptographically Impossible</h4>

<p>Unlike traditional signing where a private key file exists on disk
(even if encrypted), AWS KMS keys exist only within HSM boundaries:</p>

<ul>
  <li>The private key is generated inside the HSM</li>
  <li>All signing operations happen inside the HSM</li>
  <li>The private key material is never exposed in plaintext</li>
  <li>Even AWS administrators cannot extract the key</li>
  <li>The key can be used only via authenticated AWS API calls</li>
</ul>

<p>This means an attacker who fully compromises a build runner gets
<strong>nothing</strong>. No key file to steal. No memory to dump. No credentials
that grant signing access outside the controlled environment.</p>

<h4 id="2-cryptographic-transparency-logs">2. Cryptographic Transparency Logs</h4>

<p>Every signing operation is automatically logged to a public S3 bucket
via a Lambda function that processes CloudTrail events:</p>

<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w">
  </span><span class="nl">"version"</span><span class="p">:</span><span class="w"> </span><span class="s2">"1.0"</span><span class="p">,</span><span class="w">
  </span><span class="nl">"timestamp"</span><span class="p">:</span><span class="w"> </span><span class="s2">"2024-01-15T12:00:00Z"</span><span class="p">,</span><span class="w">
  </span><span class="nl">"kms_key_id"</span><span class="p">:</span><span class="w"> </span><span class="s2">"arn:aws:kms:us-east-1:...:key/..."</span><span class="p">,</span><span class="w">
  </span><span class="nl">"signing_algorithm"</span><span class="p">:</span><span class="w"> </span><span class="s2">"RSASSA_PKCS1_V1_5_SHA_256"</span><span class="p">,</span><span class="w">
  </span><span class="nl">"message_digest_sha256"</span><span class="p">:</span><span class="w"> </span><span class="s2">"abc123..."</span><span class="p">,</span><span class="w">
  </span><span class="nl">"signature_base64"</span><span class="p">:</span><span class="w"> </span><span class="s2">"def456..."</span><span class="p">,</span><span class="w">
  </span><span class="nl">"record_hash"</span><span class="p">:</span><span class="w"> </span><span class="s2">"789ghi..."</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>

<p>These logs provide:</p>
<ul>
  <li><strong>Public auditability</strong>: Anyone can verify what was signed and when</li>
  <li><strong>Non-repudiation</strong>: The signature proves the key owner signed the digest</li>
  <li><strong>Tamper evidence</strong>: S3 versioning ensures logs cannot be silently modified</li>
  <li><strong>Cryptographic proof</strong>: Each log includes the signature that can be verified with the public key</li>
</ul>

<p>The Lambda function sanitizes the logs to exclude:</p>
<ul>
  <li>Source IP addresses</li>
  <li>IAM identities</li>
  <li>Request context</li>
  <li>Any sensitive AWS metadata</li>
</ul>

<p>Only the cryptographically verifiable facts are published.</p>

<h2 id="the-architecture-defense-in-depth">The Architecture: Defense in Depth</h2>

<p>The system implements multiple security layers that work together:</p>

<h3 id="network-isolation">Network Isolation</h3>

<p>Runners operate in a public subnet but with security groups that block
all inbound traffic. The runner pulls CI jobs and pushes artifacts.</p>

<p>Note: During early development of the architecture, administrative
access is available via AWS Systems Manager Session Manager. The final
production deployment is intended to provide no interactive access to
the runner.</p>

<p>VPC endpoints keep AWS service traffic within the AWS network:</p>
<ul>
  <li>S3 endpoint (Gateway type): No data transfer charges, private S3 access</li>
  <li>KMS endpoint (Interface type): KMS operations never traverse the public internet</li>
  <li>Secrets Manager endpoint: Runner tokens retrieved privately</li>
  <li>CloudWatch Logs endpoint: Monitoring traffic stays private</li>
</ul>

<h3 id="iam-least-privilege">IAM Least Privilege</h3>

<p>Each runner has a dedicated IAM role with minimal permissions:</p>

<p><strong>Code Signing Runner</strong> can:</p>
<ul>
  <li>Call <code class="language-plaintext highlighter-rouge">kms:Sign</code></li>
  <li>Retrieve the public key via <code class="language-plaintext highlighter-rouge">kms:GetPublicKey</code></li>
  <li>Write to the transparency logs S3 bucket</li>
  <li>Read the Forgejo runner token from Secrets Manager</li>
</ul>

<h3 id="immutable-audit-trail">Immutable Audit Trail</h3>

<p>Multiple overlapping audit systems ensure complete accountability:</p>

<ol>
  <li><strong>CloudTrail</strong>: Logs every KMS API call with AWS identity, IP, timestamp</li>
  <li><strong>CloudWatch Logs</strong>: Real-time streaming of signing operations</li>
  <li><strong>S3 Transparency Logs</strong>: Public, versioned, immutable records</li>
  <li><strong>S3 Access Logs</strong>: Track who reads the transparency logs</li>
  <li><strong>Lambda Execution Logs</strong>: Record transparency log publication</li>
</ol>

<p>All logs are encrypted at rest, and S3 versioning means modifications
are visible in the version history.</p>

<h3 id="auditability-and-compliance">Auditability and Compliance</h3>

<p>The transparency logs enable:</p>

<p><strong>Public Accountability</strong>: Anyone can verify that signatures are legitimate by:</p>
<ol>
  <li>Fetching the transparency log entry</li>
  <li>Downloading the signed artifact</li>
  <li>Computing its SHA-256 hash</li>
  <li>Verifying it matches the logged digest</li>
  <li>Verifying the signature with the public key</li>
</ol>

<p><strong>Incident Response</strong>: If a compromised binary is discovered:</p>
<ol>
  <li>Find it in the transparency logs (indexed by date)</li>
  <li>Identify the exact timestamp</li>
  <li>Review CloudTrail for the signing operation</li>
  <li>Determine the source (instance ID, IAM role, Forgejo workflow)</li>
  <li>Investigate the build that produced the artifact</li>
</ol>

<p><strong>Compliance</strong>: The architecture supports:</p>
<ul>
  <li>SOC 2 (audit logging, encryption, access control)</li>
  <li>ISO 27001 (security controls, monitoring, incident response)</li>
  <li>FIPS 140-2 Level 2 (KMS hardware-backed keys)</li>
  <li>Non-repudiation requirements (cryptographic signatures + immutable logs)</li>
</ul>

<h2 id="building-trust-through-transparency">Building Trust Through Transparency</h2>

<p>Code signing is fundamentally about trust. Users need to trust that
the software they run is legitimate and hasn’t been tampered with. But
traditional signing approaches require trusting that the private key
is kept secure—a trust that’s regularly violated.</p>

<p>This infrastructure shifts the trust model. Instead of “trust that the key is secure,” it’s:</p>
<ul>
  <li><strong>Trust the cryptographic impossibility</strong> of extracting KMS keys</li>
  <li><strong>Trust the mathematical proof</strong> of signatures verified by public keys</li>
  <li><strong>Trust the audit trail</strong> in public transparency logs</li>
  <li><strong>Trust the infrastructure-as-code</strong> that can be reviewed and reproduced</li>
</ul>

<p>The security doesn’t depend on secrecy. The entire architecture is
public (this repository, the transparency logs, the public
keys). Security comes from cryptographic properties and defense in
depth.</p>

<p>For anyone building CI/CD infrastructure for security-critical
artifacts—whether kernel modules, container images, firmware, or
applications—this architecture provides a template for signing without
the risk of key compromise.</p>

<p>The code is here: <a href="https://codeberg.org/gordonmessmer/signed-code-build-stack">https://codeberg.org/gordonmessmer/signed-code-build-stack</a></p>

<p>The transparency logs are here: (configured per deployment)</p>

<hr />

<h2 id="technical-appendix">Technical Appendix</h2>

<h3 id="references">References</h3>

<ul>
  <li><strong>Infrastructure Repository</strong>: <a href="https://codeberg.org/gordonmessmer/signed-code-build-stack">https://codeberg.org/gordonmessmer/signed-code-build-stack</a></li>
  <li><strong>Copr Packages</strong>: <a href="https://copr.fedorainfracloud.org/coprs/gordonmessmer/aws-kms-pkcs11/">https://copr.fedorainfracloud.org/coprs/gordonmessmer/aws-kms-pkcs11/</a></li>
  <li><strong>Kernel Repository</strong>: <a href="https://codeberg.org/gordonmessmer/kernel-longterm-6.12-plus">https://codeberg.org/gordonmessmer/kernel-longterm-6.12-plus</a></li>
  <li><strong>AWS KMS Documentation</strong>: <a href="https://docs.aws.amazon.com/kms/">https://docs.aws.amazon.com/kms/</a></li>
  <li><strong>Forgejo Documentation</strong>: <a href="https://forgejo.org/docs/">https://forgejo.org/docs/</a></li>
  <li><strong>Fedora’s build infra definition</strong>: <a href="https://forge.fedoraproject.org/infra/ansible/src/branch/main/playbooks/groups/buildhw.yml">https://forge.fedoraproject.org/infra/ansible/src/branch/main/playbooks/groups/buildhw.yml</a></li>
</ul>]]></content><author><name>Gordon Messmer</name></author><summary type="html"><![CDATA[]]></summary></entry><entry><title type="html">Comments are hard</title><link href="https://gordonmessmer.codeberg.page/dev-blog/2026/03/09/comments-are-hard.html" rel="alternate" type="text/html" title="Comments are hard" /><published>2026-03-09T00:00:00-05:00</published><updated>2026-03-09T00:00:00-05:00</updated><id>https://gordonmessmer.codeberg.page/dev-blog/2026/03/09/comments-are-hard</id><content type="html" xml:base="https://gordonmessmer.codeberg.page/dev-blog/2026/03/09/comments-are-hard.html"><![CDATA[<p>As the joke goes, “There are 2 hard problems in computer science:
cache invalidation, naming things, and off-by-1 errors.”</p>

<p>In truth, one of most difficult things in software development might
be comments, and the more experience you have developing software, the
harder it becomes.</p>

<p>There are some things that should be very obvious about comments at
some point. Comments should explain why, not what. Well written code
will tell readers what is happening, but not necessarily why. A
developer may look at a block of code and unerstand that this code
builds a data structure… Why does it do that? How is the data used
later? How much memory does the structure cost, and how much does that
improve run time? If you don’t know the answer to these “why”
questions, it’s hard to tell whether things work as expected in the
future. The same is true for commit messages. The diff tells a reader
what is being changed, the commit message doesn’t need to describe
that. Why is it being changed?</p>

<p>That’s the easy part.</p>

<p>The hard part is the question: what will the next developer to work on
this know, and what do they need to be told?</p>

<p>You might be familiar with the the Dunning–Kruger effect. Their paper
illustrates that skill exists along a spectrum, and both people at
both ends have a poor understanding of people at other points along
the way.</p>

<p>Without a clear understanding of what the typical developer
understands, it is very difficult to write comments that answer “why”
questions that most developers will have. An experienced developer
will tend to describe too little, assuming that most developers
already know what they need to know to contrubute. It can be
especially difficult for experienced developers to overcome that
tendency, and offer those explanations, because explaining things that
seem obvious to them will, at some point, feel condescending.</p>

<p>My advice, or perhaps my request, to all developers is this: ask
yourself why code needs to do what it is doing, and explain that as if
to a student developer. Explain it more than you think you need to.
Write until it is embarrassing to write. Tell yourself that the
embarrasment is the cost of experience.</p>]]></content><author><name>Gordon Messmer</name></author><summary type="html"><![CDATA[As the joke goes, “There are 2 hard problems in computer science: cache invalidation, naming things, and off-by-1 errors.”]]></summary></entry><entry><title type="html">Memory Efficiency With Arena Allocators</title><link href="https://gordonmessmer.codeberg.page/dev-blog/2026/03/02/memory-efficiency-with-arena-allocators.html" rel="alternate" type="text/html" title="Memory Efficiency With Arena Allocators" /><published>2026-03-02T00:00:00-06:00</published><updated>2026-03-02T00:00:00-06:00</updated><id>https://gordonmessmer.codeberg.page/dev-blog/2026/03/02/memory-efficiency-with-arena-allocators</id><content type="html" xml:base="https://gordonmessmer.codeberg.page/dev-blog/2026/03/02/memory-efficiency-with-arena-allocators.html"><![CDATA[<p>If you’ve ever heard someone ask about lightweight desktops for older
hardware, you’ve probably heard GNOME referred to as a heavy weight
option.</p>

<p>One of my hobbies is resource efficiency projects.</p>

<p>Last week I circled back around to a project that I’d put on my list a
couple of years ago but never got around to: gnome-software. On a
typical workstation, gnome-software will often use more memory than
even gnome-shell itself. It will usually be the single most memory
expensive component of the GNOME desktop.</p>

<p>gnome-software serves several purposes. In the GNOME desktop overview,
it provides search results for applications that are available but not
installed. It also provides a GUI for software management. And it
provides notifications to the user when there are updates available to
install. Notifying users that updates are available is important to
maintaining a good security posture, so disabling the application
entirely isn’t a great option. But its memory use tends to cause some
users to seek out more resource friendly shells for their older
hardware.</p>

<p>I’d observed that the gnome-shell process tended to increase in size
as it handled search requests from the GNOME desktop overview, so
I started by splitting the application search functionality out of
gnome-software, and into its own application. As a separate application,
it was much easier to profile the application and its memory use.
Many profiling tools will hide details that are very small relative
to the whole application, so getting the GTK+ code out of the process
made it easier to see where memory was being allocated.</p>

<p>valgrind didn’t report any leaks, so the memory allocations that
increased the resident size of the process were being tracked. I moved
on to valgrind’s massif tool to get information about where memory
was allocated. The tool confirmed that there were peaks of high memory
use, but it also indicated that most dynamic allocations were being
freed eventually. GNU libc has a “malloc_trim” API that can be used
to release memory that had been freed, but using it released far less
memory than expected, given the amount of memory that valgrind indicated
was still allocated.</p>

<p>This suggested that I might be looking at a problem that is common and
well understood, but difficult to solve: dynamic allocations that
the application managed were interspersed with allocations that were
made and managed within shared libraries.</p>

<p>The basic problem is that memory can only be returned to the OS by
<code class="language-plaintext highlighter-rouge">free()</code> or <code class="language-plaintext highlighter-rouge">malloc_trim()</code> in relatively large, contiguous blocks.
As long as some memory within a block has not been freed, that block
cannot be released. A POSIX process typically shares an address space,
a memory allocator, and a heap with all of the shared libraries that
it uses.</p>

<p>Sometimes the easiest way to solve this problem is to use <code class="language-plaintext highlighter-rouge">fork()</code> to
create a new process that can handle a request, and then exit that
process when it’s done, which will reliably release any memory
allocated by the process and its shared libraries. But that isn’t a
good option if there’s expensive setup for the first request, because
forking for every request would mean repeating that expensive setup
each time.</p>

<p>What we really want is an arena allocator that can help keep
contiguous the memory that the application doesn’t manage.  That
would allow shared libraries to allocate memory in a way that doesn’t
spread untracked allocations through the application’s main heap.</p>

<p>As I pondered that idea, I remembered… glibc does have an arena
allocator. It uses per-thread arenas to reduce lock contention during
allocation in threaded applications. And I wondered, how difficult
would that be to expose to applications so that they could provide a
hint that they wanted allocations to use a different memory pool.</p>

<p>Such an API should be very simple. There should be a function to
request a new arena, and there should be a function to swap the
current arena for a new one. An application could then allocate a new
arena for shared libraries that are known to allocate memory, and
it could swap memory arenas before and after making calls into such
a shared library.</p>

<p>The idea was simple, but I wasn’t familiar with the design and
architecture of glibc. So I described the API that I wanted to add,
and asked Claude to implement that API in glibc’s malloc, consistent
with the coding standards used in the library.</p>

<p>Before diving into the implementation details, let’s visualize the
problem and solution.</p>

<h2 id="understanding-the-problem">Understanding the Problem</h2>

<p>To visualize why arena segregation matters, consider how memory
allocations are typically distributed.</p>

<h3 id="standard-glibc-interleaved-allocations">Standard glibc: Interleaved Allocations</h3>

<p><img src="/dev-blog/images/svg/arena-without-api-before.svg" alt="Interleaved allocations" /></p>

<p>Without an arena API, all allocations go to the main arena. Library
allocations (red) are scattered throughout, interleaved with
application allocations (green). Even with a relatively small number
of allocations by shared libraries, most memory pages contain at least
one library allocation.</p>

<h3 id="after-freeing-app-memory-standard-glibc">After Freeing App Memory (Standard glibc)</h3>

<p><img src="/dev-blog/images/svg/arena-without-api-after.svg" alt="After freeing without API" /></p>

<p>Even though the application frees 95% of the memory, each page still
contains at least one library allocation (red). Since the OS can only
reclaim entire pages, none of this memory can be returned.</p>

<h2 id="using-the-api">Using the API</h2>

<p>The API is designed to be simple and lightweight:</p>

<p><img src="/dev-blog/images/svg/api-usage-pattern.svg" alt="Arena API usage pattern" /></p>

<p>The typical pattern is:</p>
<ol>
  <li>Create a dedicated arena once during initialization</li>
  <li>Attach the arena before calling library functions</li>
  <li>The library’s allocations go to the dedicated arena</li>
  <li>Restore the previous arena after the call returns</li>
</ol>

<h3 id="with-arena-api-segregated-allocations">With Arena API: Segregated Allocations</h3>

<p><img src="/dev-blog/images/svg/arena-with-api-before.svg" alt="Segregated allocations with API" /></p>

<p>The arena API segregates allocations into separate arenas, so that the
allocations that the application manages are contiguous. Application
allocations (green) go to the main arena, while library allocations
(red) go to a dedicated library arena. There is no interleaving.</p>

<h3 id="after-freeing-app-memory-with-arena-api">After Freeing App Memory (With Arena API)</h3>

<p><img src="/dev-blog/images/svg/arena-with-api-after.svg" alt="After freeing with API" /></p>

<p>When the application frees its memory, the main arena pages contain no
active allocations. Entire pages are immediately returned to the
OS. Library allocations remain active in their isolated arena.</p>

<h2 id="the-development-process">The Development Process</h2>

<h3 id="design-by-blog">Design by blog</h3>

<p>At first I simply intended to write about arena allocators. Arena
allocators are often written about as a technique to reduce the risk
of memory leaks and simplify allocation tracking. An arena allocator
can free a collection of allocations associated with an arena all at
once.  Although memory reclamation issues caused by interleaved
allocations is common and well understood, the utility of arena
allocators in mainting contiguous allocations is not frequently
mentioned, among reference and discussions that I’ve seen.</p>

<p>I started with the initial intent merely to discuss that function of
arena allocators. I described them as follows:</p>

<p>An application might process a data stream and dynamically allocate
memory as is processes elements in that stream. If it uses a shared
library as it processes element, the shared library might also
dynamically allocate memory for a private internal cache. In such a
case, the heap will contain application allocations interleaved with
allocations from the shared library. Even if the application reliably
tracks its allocations and frees them when it finishes processing the
data stream, the heap might still contain small allocations from the
shared library, which prevent libc from returning memory to the
operating system.</p>

<p>Hypothetically, a malloc implementation could allow an application to
register new memory arenas. The application could then set the
preferred arena for a thread to an arena dedicated to a shared library
before calling that shared library’s functions, and restoring the
default arena on return. By segregating the arenas used by a shared
library and by the rest of the process, an application could avoid
allocations that it can’t track within its own memory arena, which
would improve its ability to compact its memory.</p>

<p>Because the shared library’s allocations will be in a dedicated arena,
the application should be able to return memory from its own arenas to
the OS, reducing its resident size.</p>

<p>For example, the application might looks something like:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#include</span> <span class="cpf">&lt;malloc.h&gt;</span><span class="cp">
</span>
<span class="k">static</span> <span class="n">arena_hd</span> <span class="o">*</span><span class="n">netio_hd</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>

<span class="k">static</span> <span class="kt">void</span>
<span class="nf">app_register_netio_hd</span><span class="p">()</span> <span class="p">{</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">netio_hd</span><span class="p">)</span> <span class="k">return</span><span class="p">;</span>
    <span class="n">netio_hd</span> <span class="o">=</span> <span class="n">malloc_new_arena</span> <span class="p">();</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">netio_hd</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
        <span class="c1">// check errno and handle allocation failure</span>
    <span class="p">}</span>
<span class="p">}</span>

<span class="k">static</span> <span class="kt">void</span>
<span class="nf">app_process_element</span><span class="p">(</span><span class="n">AppElement</span> <span class="o">*</span><span class="n">element</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">arena_hd</span> <span class="o">*</span><span class="n">current</span><span class="p">;</span>

    <span class="c1">// Switch to a dedicated arena</span>
    <span class="n">current</span> <span class="o">=</span> <span class="n">malloc_swap_thread_arena</span> <span class="p">(</span><span class="n">netio_hd</span><span class="p">);</span>
    <span class="n">netio_process_element</span> <span class="p">(</span><span class="n">element</span><span class="p">);</span>
    <span class="c1">// Restore the default arena</span>
    <span class="n">malloc_swap_thread_arena</span> <span class="p">(</span><span class="n">current</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<h3 id="initial-design">Initial Design</h3>

<p>In that markdown file, I had describing the API I wanted. I decided to
see if Claude to help me implement it quickly to determine whether the
idea was worth pursuing.</p>

<p>My first prompt was detailed:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
In ../malloc-blog/malloc-arenas-proposed.md I described a problem, in
which an application and a shared library might allocate memory in
interleaved pages within an arena, and suggested that libc might
expose an API that allows an application to request an extra arena and
set a preferred arena before and after calling functions in a shared
library. The current directory contains glibc, and its malloc
implementation is in the malloc directory. I believe that this
implementation uses per-thread arenas. Review this malloc API and
suggest an idiomatic API extension that would allow an application to
request an arena and set a preferred arena for the current thread. The
API will need tests, and I'd also like a demo application consisting
of a main application and a simple shared library that demonstrates
the new API. It should allocate around 200MB of memory total, at 512
bytes per allocation, mostly in the application code but with some
allocations in the library.  Once the memory is allocated, the program
should print stats about its memory use including its resident
size. Then it should free the allocations from the application but not
the library and print stats again. Prioritize consistency with the
programming style in this codebase.

</code></pre></div></div>

<p>The first implementation looked pretty good at first glance, but
failed to build.  Claude was able to process the build failures,
determine that the problem was that it had defined functions that
should have been a public API with <code class="language-plaintext highlighter-rouge">libc_hidden_def</code> macros, and
corrected the problem.</p>

<p>Once the library and the demo compiled successfully, I was able to run
the demo and compare the results. Unfortunately, resident memory use
in the application with a standard glibc and the version that used the
new API was basically the same.</p>

<p>The initial memory information wasn’t very detailed, but I knew that
glibc supported a <code class="language-plaintext highlighter-rouge">malloc_stats()</code> function that might give me more
information.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
This works but each verion of the demo app we've tried shows no
significant difference between the version with the new API and the
version without the new API. Can you add malloc_stats and we'll see if
that provides any hints

</code></pre></div></div>

<p>The new build produced information that indicated that the new API
was successfully creating a new memory arena, but that no allocations
were expanding its size.</p>

<p>I reviewed the new <code class="language-plaintext highlighter-rouge">malloc_arena_new()</code> function and found that it was
nearly identical to <code class="language-plaintext highlighter-rouge">_int_new_arena()</code>, which had read about in
reference material beforehand. I examined the differences closely and
determined that it was initially attempting to allocate a size that
was incorrect. However, that didn’t seem likely to be the cause. One
thing that I was less sure how to handle was arena ownership. There
was code in the initial implementation that handled reference counting
and free list handling that looked like it was appropriate for normal
arena handling, but which might result in an arena being released and
removed in the intended use pattern. In the design I’d proposed, a
thread should own multiple arenas.</p>

<p>I told Claude to re-sync with the changes I’d made, and to suggest
appropriate handling of reference counting and free list handling:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
I've made some changes to malloc_arena_new to make it more consistent
with _int_new_arena. The arena that we're creating with this API is
intended to be used in the current thread and temporarily swapped
while calling a library. So I think this API should avoid some of the
free list accounting normally associated with changing an arena. When
the arena is initially created, it should appear to be attached to one
thread even though it isn't. And when it is attached with
malloc_arena_attached, the free lists shouldn't be changed, nor should
the atttached thread count. Basically, one thread is using both arenas
concurrently.

</code></pre></div></div>

<p>Claude updated the API, removing the sections that I suspected did not
belong, but didn’t understand well enough to adjust on my own.</p>

<p>I rebuilt glibc and the demo app.</p>

<p>Still, no dice. The demo app was still allocating memory in the main
arena, while all available debugging information confirmed that the
thread_arena pointer was being updated to reference the new arena.</p>

<p>I prompted Claude again:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
OK, with the current state of malloc/ and demo/, there are no signs
that the new arena is being used. malloc_stats still shows a second
arena, but basically no utilization. system bytes = 167936 and in use
bytes = 2160. The library's call to malloc appears to be using the
main arena. Check over the malloc implementation to see how it selects
an arena. Maybe setting thread_arena is insufficient?

</code></pre></div></div>

<p>Claude pondered the malloc code further and found that single-threaded
applications bypassed the arena selection logic. That makes sense, of
course, since the per-thread arena feature is intended to reduce lock
contention in threaded applications. If this feature were adopted, I’d
want to enable arena selection when a new arena was created, but for
initial implementation purposes, Claude simply created a temporary
thread in the demo app.</p>

<p>With that, the demo app started working!</p>

<p>There were some minor inconsistencies in the output from the demo
application, so I reorganized some of the reporting code, which
finished the initial work on the feature.</p>

<h2 id="does-this-idea-really-work">Does This Idea Really Work?</h2>

<p>Theory is one thing, but does arena segregation actually solve the
memory fragmentation problem in practice?</p>

<p>The demo allocates 200MB total: 190MB for the application and 10MB in
a library, using interleaved 512-byte allocations. This creates
realistic fragmentation where library allocations are scattered
throughout memory.</p>

<p>Results from running the demo:</p>

<p><strong>Without arena API:</strong></p>
<ul>
  <li>After all allocations: RSS = 206MB</li>
  <li>After freeing app memory: RSS = 205MB (minimal reduction)</li>
  <li>After <code class="language-plaintext highlighter-rouge">malloc_trim()</code>: RSS = 96MB (still high)</li>
</ul>

<p>In the demo, malloc_trim is able to find some areas large enough to
return to the OS, but a significant amount of memory remains resident.</p>

<p><strong>With arena API:</strong></p>
<ul>
  <li>After all allocations: RSS = 206MB</li>
  <li>After freeing app memory: RSS = 15MB</li>
  <li>After <code class="language-plaintext highlighter-rouge">malloc_trim()</code>: RSS = 15MB</li>
</ul>

<p>The difference is dramatic: with arena segregation, the application
can reclaim the entirety of its 190MB allocation, while the library’s
10MB remains in use. Without interleaved allocations, hidden from the
application, the application is able to release memory from its main
arena. In the case of the demo, the application doesn’t even need to
call <code class="language-plaintext highlighter-rouge">malloc_trim()</code>!</p>

<h3 id="broader-implications">Broader Implications</h3>

<p>This kind of exploration—prototyping a new API in a complex codebase
to validate an architectural idea—would have been difficult to justify
without AI assistance. The learning curve for glibc’s malloc is steep:
understanding arena management, thread-local storage, optimization
paths, and symbol versioning all at once is a significant
investment. Without assistance, the time required to explore the idea
would have seemed much too high a barrier, with no means to evaluate
the chance of useful results.</p>

<p>Using Claude Code allowed me to explore an idea that could have taken
weeks or months in just a couple of days during a weekend. With
Claude, I could focus on the problem I wanted to solve while getting
guidance on implementation details.</p>

<p>The result is a working implementation that demonstrates both the problem
and the solution, ready for consideration by the glibc maintainers.</p>

<h2 id="next-steps">Next Steps</h2>

<p>The proof-of-concept demonstrates that arena segregation can dramatically
reduce memory fragmentation when application and library allocations are
interleaved. The next step is to propose this API to the glibc community
and gather feedback on the design and implementation.</p>

<p>If accepted, this simple API could help applications like gnome-software
reduce their memory footprint significantly, making GNOME more viable
on resource-constrained systems. And beyond GNOME, any long-running
application that loads shared libraries with different allocation
patterns could benefit from this approach.</p>

<p>The demo code and implementation are available in my glibc fork on
codeberg for anyone interested in experimenting with the API or
understanding the fragmentation problem in more detail.</p>

<p><a href="https://codeberg.org/gordonmessmer/glibc">https://codeberg.org/gordonmessmer/glibc</a></p>]]></content><author><name>Gordon Messmer</name></author><summary type="html"><![CDATA[If you’ve ever heard someone ask about lightweight desktops for older hardware, you’ve probably heard GNOME referred to as a heavy weight option.]]></summary></entry><entry><title type="html">Improving Memory Compaction with Dedicated Malloc Arenas</title><link href="https://gordonmessmer.codeberg.page/dev-blog/2026/03/01/malloc-arenas-illustrated.html" rel="alternate" type="text/html" title="Improving Memory Compaction with Dedicated Malloc Arenas" /><published>2026-03-01T00:00:00-06:00</published><updated>2026-03-01T00:00:00-06:00</updated><id>https://gordonmessmer.codeberg.page/dev-blog/2026/03/01/malloc-arenas-illustrated</id><content type="html" xml:base="https://gordonmessmer.codeberg.page/dev-blog/2026/03/01/malloc-arenas-illustrated.html"><![CDATA[<h2 id="mallocs-design-can-make-it-difficult-to-return-memory-to-the-os">malloc’s design can make it difficult to return memory to the OS</h2>

<p>A POSIX process shares an address space, a memory allocator, and its
heap with the shared libraries that it uses. Because the application
and the shared library are allocating memory in the same heap, it can
be difficult to develop compact, memory-efficient services, even if
there are no memory leaks.</p>

<p>An application might process a data stream and dynamically allocate
memory as it processes elements in that stream. For example, a stream
might describe a list of applications in a package repository, and the
process might allocate memory for entries detailing the release
history and for application icons. If the process uses a shared
library as it processes elements, the shared library might also
dynamically allocate memory for a private internal cache. In such a
case, the heap will contain application allocations interleaved with
allocations from the shared library.</p>

<p>Even if the application reliably tracks its allocations and frees them
when it finishes processing the data stream, the heap might still
contain small allocations from the shared library, which prevent libc
from returning memory to the operating system. A small number of
library allocations can keep a large amount of application memory
occupied and prevent it from being returned to the OS.</p>

<h3 id="one-arena-with-interleaved-allocations">One Arena with Interleaved Allocations</h3>

<p><img src="/dev-blog/images/svg/arena-single-interleaved.svg" alt="Single arena with interleaved allocations" /></p>

<p>When both application and library code allocate from the same arena, their
allocations become interleaved in memory.</p>

<p><img src="/dev-blog/images/svg/arena-single-fragmented.svg" alt="Arena after freeing application allocations" /></p>

<p>Freeing memory allocated by the application may not be sufficient to
return memory to the operating system and reduce RSS because library
allocations are scattered throughout the arena, fragmenting the
heap. In this illustration, a few small library allocations prevent a
large heap from being returned to the OS. This illustrates how a small
amount of uncontrolled allocation can have an outsized impact on
memory compaction.</p>

<hr />

<h2 id="dedicated-arenas-for-library-code">Dedicated Arenas for Library Code</h2>

<p>glibc already provides per-thread arenas, to reduce lock contention
when allocating memory in a threaded process. I’d like to propose
exposing an interface that allows an application to request an arena
handle, and to set a preferred arena for a thread.</p>

<p>Hypothetically, a malloc implementation could allow an application to
register new memory arenas. The application could then set the
preferred arena for a thread to an arena dedicated to a shared library
before calling that shared library’s functions, and restoring the
default arena on return.</p>

<p>By segregating the arenas used by a shared library and by the rest of
the process, an application could avoid allocations that it can’t
track within its own memory arena, which would improve its ability to
compact its memory.</p>

<p><img src="/dev-blog/images/svg/arena-separate-allocated.svg" alt="Separate arenas for application and library" /></p>

<p>Using dedicated arenas, application and library allocations can
avoid interleaved allocations. When the application’s allocations
are contiguous and its arena is free of untracked allocations,
the application can reduce its resident size when it releases
allocations.</p>

<p><img src="/dev-blog/images/svg/arena-separate-freed.svg" alt="After freeing with separate arenas" /></p>

<hr />

<h2 id="example-implementation">Example Implementation</h2>

<p>For example, the application might look something like:</p>

<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#include</span> <span class="cpf">&lt;malloc.h&gt;</span><span class="cp">
</span>
<span class="k">static</span> <span class="n">arena_hd</span> <span class="o">*</span><span class="n">netio_hd</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>

<span class="k">static</span> <span class="kt">void</span>
<span class="nf">app_register_netio_hd</span><span class="p">()</span> <span class="p">{</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">netio_hd</span><span class="p">)</span> <span class="k">return</span><span class="p">;</span>
    <span class="n">netio_hd</span> <span class="o">=</span> <span class="n">malloc_new_arena</span><span class="p">();</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">netio_hd</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
        <span class="c1">// check errno and handle allocation failure</span>
    <span class="p">}</span>
<span class="p">}</span>

<span class="k">static</span> <span class="kt">void</span>
<span class="nf">app_process_element</span><span class="p">(</span><span class="n">AppElement</span> <span class="o">*</span><span class="n">element</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">arena_hd</span> <span class="o">*</span><span class="n">current</span><span class="p">;</span>

    <span class="c1">// Switch to a dedicated arena</span>
    <span class="n">current</span> <span class="o">=</span> <span class="n">malloc_set_arena</span><span class="p">(</span><span class="n">netio_hd</span><span class="p">);</span>
    <span class="n">netio_process_element</span><span class="p">(</span><span class="n">element</span><span class="p">);</span>
    <span class="c1">// Restore the default arena</span>
    <span class="n">malloc_set_arena</span><span class="p">(</span><span class="n">current</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<hr />]]></content><author><name>Gordon Messmer</name></author><summary type="html"><![CDATA[malloc’s design can make it difficult to return memory to the OS]]></summary></entry><entry><title type="html">Memory Fragmentation in gnome-software Search Provider</title><link href="https://gordonmessmer.codeberg.page/dev-blog/2026/02/28/gnome-software-memory-fragmentation.html" rel="alternate" type="text/html" title="Memory Fragmentation in gnome-software Search Provider" /><published>2026-02-28T00:00:00-06:00</published><updated>2026-02-28T00:00:00-06:00</updated><id>https://gordonmessmer.codeberg.page/dev-blog/2026/02/28/gnome-software-memory-fragmentation</id><content type="html" xml:base="https://gordonmessmer.codeberg.page/dev-blog/2026/02/28/gnome-software-memory-fragmentation.html"><![CDATA[<h2 id="the-problem-in-one-sentence">The Problem in One Sentence</h2>

<p>TLS trust store allocations (~4MB) scattered through the heap prevent glibc from returning ~250MB of freed memory to the OS.</p>

<h2 id="problem-summary">Problem Summary</h2>

<p>When gnome-software handles search requests from gnome-shell, it allocates memory for:</p>
<ul>
  <li><strong>Release history</strong> - Application version history and metadata</li>
  <li><strong>Icons</strong> - Application icon images</li>
  <li><strong>TLS trust stores</strong> - gnutls certificate trust lists (via libsoup HTTPS connections)</li>
</ul>

<p>After the request completes, gnome-software frees the release history and icons, but the TLS trust store allocations remain. These small, scattered allocations prevent glibc from returning memory to the OS, causing the process to grow hundreds of MB larger than necessary.</p>

<h2 id="root-cause">Root Cause</h2>

<ol>
  <li>Each HTTPS request to Flathub triggers a new TLS connection</li>
  <li>Each TLS connection causes gnutls to load the system CA trust store (~150-200 certificates)</li>
  <li>This creates thousands of small ASN.1 allocations scattered through the heap</li>
  <li>When gnome-software frees app data, the trust store allocations remain</li>
  <li>These prevent glibc from returning freed memory regions to the OS (heap fragmentation)</li>
</ol>

<h2 id="color-coding">Color Coding</h2>

<p>Throughout the diagrams:</p>
<ul>
  <li>🟢 <strong>Green</strong> = Release history allocations</li>
  <li>🔵 <strong>Blue</strong> = Icon allocations</li>
  <li>🔴 <strong>Red</strong> = TLS trust store allocations (gnutls)</li>
  <li>⬜ <strong>Gray</strong> = Free memory</li>
</ul>

<h2 id="visual-explanation">Visual Explanation</h2>

<h3 id="phase-1-initial-state-idle">Phase 1: Initial State (Idle)</h3>

<p><img src="/dev-blog/images/svg/phase1-idle.svg" alt="Phase 1: Idle State" /></p>

<p>The process starts with minimal memory usage - only ~50 MB RSS.</p>

<h3 id="phase-2-handling-search-request">Phase 2: Handling Search Request</h3>

<p><img src="/dev-blog/images/svg/phase2-search-request.svg" alt="Phase 2: Search Request" /></p>

<p>During a search request, gnome-software allocates memory for:</p>
<ul>
  <li><strong>Release history</strong> (green) - Application metadata and version information</li>
  <li><strong>Icons</strong> (blue) - Application icon images</li>
  <li><strong>TLS trust stores</strong> (red) - gnutls certificate trust lists for HTTPS connections to Flathub</li>
</ul>

<p>Notice how TLS trust store allocations are <strong>interspersed</strong> with application data throughout the heap regions. The RSS grows to 280 MB.</p>

<h3 id="phase-3-after-cache-clear-30s-timeout">Phase 3: After Cache Clear (30s timeout)</h3>

<p><img src="/dev-blog/images/svg/phase3-after-clear.svg" alt="Phase 3: After Cache Clear" /></p>

<p>After the cache clear timeout, gnome-software frees the release history and icons, but the TLS trust store allocations remain.</p>

<p><strong>The Problem</strong>: Each heap region still contains TLS allocations, preventing glibc from returning the entire region to the OS via <code class="language-plaintext highlighter-rouge">madvise(MADV_DONTNEED)</code> or <code class="language-plaintext highlighter-rouge">sbrk()</code>.</p>

<p>The trust store allocations are only ~4-6 MB total, but they <strong>pin ~250 MB of heap regions</strong> in memory! The RSS only drops to 260 MB instead of returning to ~50 MB.</p>

<h2 id="call-stack">Call Stack</h2>

<p>The trust list allocations occur when downloading icons:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>gs_icon_download (gs-remote-icon.c:265)
  → soup_session_send (soup-session.c:3264)
    → soup_connection_connect (soup-connection.c:865)
      → new_tls_connection (soup-connection.c:626)
        → g_tls_connection_gnutls_initable_init (gtlsconnection-gnutls.c:207)
          → g_tls_connection_get_database (gtlsconnection.c:504)
            → g_tls_database_gnutls_populate_trust_list (gtlsdatabase-gnutls.c:590)
              → gnutls_x509_trust_list_add_system_trust (certs.c:384)
                → p11_index_replace_all (index.c:727)
                  → asn1_der_decoding (decoding.c:1627)
                    → _asn1_add_single_node (structure.c:55)
</code></pre></div></div>]]></content><author><name>Gordon Messmer</name></author><summary type="html"><![CDATA[The Problem in One Sentence]]></summary></entry><entry><title type="html">How to select a distribution</title><link href="https://gordonmessmer.codeberg.page/dev-blog/2026/02/07/choosing-a-distribution.html" rel="alternate" type="text/html" title="How to select a distribution" /><published>2026-02-07T00:00:00-06:00</published><updated>2026-02-07T00:00:00-06:00</updated><id>https://gordonmessmer.codeberg.page/dev-blog/2026/02/07/choosing-a-distribution</id><content type="html" xml:base="https://gordonmessmer.codeberg.page/dev-blog/2026/02/07/choosing-a-distribution.html"><![CDATA[<p>People who are interested in trying Free Software are frequently
bewildered at the number of available software distributions. Perhaps
doubly so because there seems to be so little difference between them.
And compounded further because Free Software development is a
cooperative endeavor, so the distributions which represent the best
that Free Software have to offer tend to avoid criticism of their
peers.</p>

<p>I do not intend to tell you what distribution to use, but I do want to
encourage you to think about who you trust, because that is the thing
that will vary the most from project to project. I’ll describe some of
the things that I look at when I consider distributions, and how that
affects my trust in a project.</p>

<p>I’ve been developing software and working in operations-focused roles
since the mid 1990s, so expect a lot of software development
philosophy to follow.</p>

<h2 id="first-of-all-what-is-a-distribution">First of all, what is a distribution?</h2>

<p>A distribution is a project that distributes software.</p>

<p>One of the reasons that it might be difficult to select a distribution
is that people use that term to describe something other than the
project. In particular, people tend to refer to the software itself as
the distribution. I think that’s confusing, because the software is
largely the same from distribution to distribution. If you install
GNOME on Fedora or GNOME on Ubuntu, they are largely the same
software, because the software is developed by the GNOME project.
It’s merely being distributed by the distributions. (To the extent
that there <em>are</em> differences, I think that is a flaw in one or the
other release. It is almost never in the best interest of users or the
primary developers of a project for a distribution to make significant
changes to their software.)</p>

<p>Mature distributions include tens of thousands of applications and
libraries. Offering all of those applications and libraries as a
single collection makes them easier to discover and easier to install,
but I think the most important function is managing updates.  Every
individual project is free to discontinue a release series, and to
start a new release series at any time. And any new release series may
or may not be backward compatible with previous releases. Each of the
tens of thousands of projects has to be monitored for updates, and
each update has to be reviewed to determine how it affects everything
else in the collection.</p>

<p>A distribution sits in between tens of thousands of Free Software
projects and millions of users in order to turn tens of thousands of
update streams into just one update stream.</p>

<p>Users who are trying to select a distribution tend to ask for a
distribution that does X or one that does Y, and the question doesn’t
make sense, because the functionality they are asking about is
typically developed in upstream projects, not in distributions. If the
software has that functionality, it’ll be available to users
regardless of who delivers the software to them.</p>

<p>In fact, significant development happening in the distribution rather
than upstream creates a lot of friction in the process. It makes it
less clear to users where they should report bugs. It frustrates
upstream developers who get bug reports for software they did not
write and do not maintain. And similarly, especially with LTS
distributions, it leads to a lot of bug reports upstream for bugs that
were fixed long ago, or reports on release series that are no longer
maintained.</p>

<p>The best thing that a distribution can do is to bring users and
developers closer together, and get out of the way. That means
patching as little as possible, and shipping upstream releases without
filtering them or delaying them.</p>

<p>Contrary to what you might expect, the less a distribution does, the
better it is.</p>

<h2 id="what-differentiates-distributions-from-each-other">What differentiates distributions from each other?</h2>

<p>The purpose of a distribution is to deliver software, and simplify the
process of updating it. But there are details that differ from project
to project. I’ll run down a list, ordered from least significant to
most significant.</p>

<ul>
  <li>
    <p>What is included? This item tends not to vary much from distribution
to distribution. We’re all building distributions from the same pool
of Free Software, and we’re including as much as we can subject to the
time our maintainers have available and our notions of what is
useful. There is some variation, though, because some parts of the
systems we build are difficult or impossible to change after the
system is built. That is, if you build a system with GNU libc, you
probably won’t also build and distribute uClibc, because your users
can’t exchange one for the other. These differences are relatively
uncommon, and typically only affect your decision if you are after a
very specific and uncommon project.</p>
  </li>
  <li>
    <p>How is integration managed? Source code often does not transform
deterministically into usable software. Most software starts out by
discovering features in the environment where it is built, and
adapting itself to those features. As a result, the features present
in the result of a build are influenced by what other packages were
present in the build environment, and often what behaviors were
specified on the command line during the build. That means that a
maintainer has to make choices about what build dependencies to
specify, and what configuration to specify in order to create a binary
package with a feature set that’s consistent with expectations, and
consistent from build to build. Maintainers need to understand the
default behavior of each package, and what their users need from the
package in order to make sure that everything within the distribution
is integrated well.</p>
  </li>
  <li>
    <p>How much are the defaults changed? Some distributions are trying to
create something unique, and others prefer to deliver software to
users in the configuration that its developers intended, as much as
possible.</p>
  </li>
  <li>
    <p>How much is the software changed? Some distributions apply a large
set of patches to the software they distribute, and others adopt a
policy of pushing changes to the upstream developers first in order to
reduce ongoing maintenance overhead and security risks.</p>
  </li>
  <li>
    <p>What is the distribution’s release cadence? Some distributions,
especially those that are oriented toward infrastructure workloads,
might release infrequently and support each release for a long
term. Those distributions will get new features much less often. Other
distributions might release relatively frequently with somewhat
shorter support periods. Still others adopt a “rolling” model where
there are no distinct releases, just one “current” release that
continually receives new features as they’re ready. Many users
conclude that they want long-term releases for systems the can set and
forget, and I want to caution readers on that point. Most of the
projects included in a distribution are not maintained for long-term
support. Shipping software to users after support is discontinued by
the developers is typically bad for both the developers and the users.</p>
  </li>
  <li>
    <p>Where is the build infrastructure? Some distributions provide a
build infrastructure that isn’t directly accessible to the
maintainers, while others allow maintainers to build software on their
own systems and upload the results. Providing an infrastructure for
builds that maintainers can’t directly access helps ensure that binary
packages are the result of the source code and the build scripts, with
less opportunity for humans to compromise the build process.</p>
  </li>
  <li>
    <p>Where is the source? Community oriented distributions offer
transparency by publishing their build scripts and patches for review.
Secure distributions provide shared infrastructure for source code,
because that allows them to enforce policies like “protected
branches,” which prevent developers from rewriting the history of
source code.</p>
  </li>
  <li>
    <p>How is software integrity ensured? Security-minded users want to see
things like signed kernels and boot loaders (for Secure Boot), and
signed packages. Some distributions sign their packages directly when
they are built, while others might sign the metadata when the
collection is published. In order to trust signatures, packages should
be signed as early as possible after built, and both the build and
signing systems should not be directly available to maintainers.</p>
  </li>
  <li>
    <p>How are decisions made? In order to ensure that a distribution
addresses the actual needs of its developers and users, decision
making processes should be well documented and public.</p>
  </li>
  <li>
    <p>Who uses it? One of the things you may want to consider when
selecting a distribution is its user community. When you have
questions, a larger community or a more technically experienced
community may be better able to answer those questions. From that
point of view, you might choose to select a distribution that’s used
by mature organizations, or has a large set of known experienced
users.</p>
  </li>
  <li>
    <p>Is there a code of conduct, and does it align with your values? Does
it encourage the kind of community that you want to be a part of?</p>
  </li>
  <li>
    <p>Is the project sustainable? For many years, we’ve seen notable
security events that weren’t the result of flaws in the software, but
the result of changes in project membership. If you can take over a
project with a large user base, you can ship software to a large user
base who wouldn’t voluntarily download and run your
software. Sustainability is a critical security concern. When you are
selecting a software provider, you want to know not only that you can
trust them, but that you can continue trusting them in the future.
Large projects with diverse participants tend to be more secure
against hostile takeover. Distributions that are derived from other
distributions are often the work of much smaller teams who rely on
larger projects to do the bulk of the work involved. Those projects
might look attractive, but they might also be at greater risk of
takeover due to normal turnover among participants.</p>
  </li>
</ul>

<h3 id="practical-examples">Practical examples</h3>

<p>If that list seems abstract, and you aren’t sure how to evaluate a
project, I’ve described some of those characteristics with respect
to <a href="/dev-blog/2026/02/07/choosing-a-distribution-fedora.html">Fedora</a>, because that is a
system that I understand well.</p>]]></content><author><name>Gordon Messmer</name></author><summary type="html"><![CDATA[People who are interested in trying Free Software are frequently bewildered at the number of available software distributions. Perhaps doubly so because there seems to be so little difference between them. And compounded further because Free Software development is a cooperative endeavor, so the distributions which represent the best that Free Software have to offer tend to avoid criticism of their peers.]]></summary></entry><entry><title type="html">How to choose a distribution: Fedora</title><link href="https://gordonmessmer.codeberg.page/dev-blog/2026/02/07/choosing-a-distribution-fedora.html" rel="alternate" type="text/html" title="How to choose a distribution: Fedora" /><published>2026-02-07T00:00:00-06:00</published><updated>2026-02-07T00:00:00-06:00</updated><id>https://gordonmessmer.codeberg.page/dev-blog/2026/02/07/choosing-a-distribution-fedora</id><content type="html" xml:base="https://gordonmessmer.codeberg.page/dev-blog/2026/02/07/choosing-a-distribution-fedora.html"><![CDATA[<p>In <a href="/dev-blog/2026/02/07/choosing-a-distribution.html">Choosing a distribution</a>, I said that
it’s not my intent to tell readers what distribution to use, and it
still isn’t. But many of the characteristics I described might seem
abstract, so they may not answer the question for everyone.</p>

<p>Many of the characteristics I described guided me toward Fedora, first
as a user and later as a maintainer. I don’t have comments on all of
them, but I’ll offer examples to illustrate how I evaluate those
concerns in the context of Fedora.</p>

<ul>
  <li>
    <p>Fedora includes promising new technology when it reaches adequate
maturity, resulting in a highly technically capable system. Fedora
often has new features and capabilities before any other distribution.</p>
  </li>
  <li>
    <p>Fedora has a policy of staying close to upstream, and if I remember
correctly, it was adopted shortly after another distro realized that
one of the patches they’d been applying to openssl for years had
drastically crippled key generation, resulting in a major security
incident.</p>
  </li>
  <li>
    <p>Fedora’s family spans the spectrum of stable release
cadences. Fedora publishes a new stable release, every 6 months, with
a 13 month support period. CentOS Stream publishes a stable release
(based on Fedora) every 3 years, with a 5 year support period. Red Hat
Enterprise Linux publishes a stable release (based on CentOS Stream)
every 3 years, with a 10 year support period for each major release,
and minor releases every 6 months, some of which have extended support
periods of up to 4 years. No matter what your needs are, there’s
probably a Fedora-derived release with an appropriate cadence.</p>
  </li>
  <li>
    <p>Fedora’s build infrastructure is well managed, with distribution
scripts and patches in Git, and builds managed by Koji. The build
infrastructure is secured and private. Packages are not uploaded by
maintainers.</p>
  </li>
  <li>
    <p>Packages are directly signed, which is common for rpm-based
distributions, but uncommon for other distributions which usually only
sign metadata. Secure Boot is supported.</p>
  </li>
  <li>
    <p>Fedora has extensive documentation for maintainers of individual
packages, and for managing changes in the distribution. Changes are
discussed in detail on the mailing list, and approved changes are
communicated effectively to everyone who needs to coordinate work in
order to make them successful and keep the distribution stable.</p>
  </li>
  <li>
    <p>RHEL is common in a wide range of industries. CentOS Stream is being
adopted by some of the world’s largest and most successful development
organizations, including Meta. Fedora is being adopted by AWS as the
basis of future releases of Amazon Linux. Fedora’s user and developer
communities are a wealth of experience.</p>
  </li>
  <li>
    <p>Fedora’s code of conduct encourages users to be respectful of one
another, to be inclusive, and to be kind.</p>
  </li>
  <li>
    <p>Fedora is maintained by thousands of contributors, with
infrastructure provided by Red Hat. It is one of the most sustainable
projects that I can think of.</p>
  </li>
</ul>]]></content><author><name>Gordon Messmer</name></author><summary type="html"><![CDATA[In Choosing a distribution, I said that it’s not my intent to tell readers what distribution to use, and it still isn’t. But many of the characteristics I described might seem abstract, so they may not answer the question for everyone.]]></summary></entry><entry><title type="html">Complex Packaging Workflow</title><link href="https://gordonmessmer.codeberg.page/dev-blog/2026/02/07/complex-packaging-workflow.html" rel="alternate" type="text/html" title="Complex Packaging Workflow" /><published>2026-02-07T00:00:00-06:00</published><updated>2026-02-07T00:00:00-06:00</updated><id>https://gordonmessmer.codeberg.page/dev-blog/2026/02/07/complex-packaging-workflow</id><content type="html" xml:base="https://gordonmessmer.codeberg.page/dev-blog/2026/02/07/complex-packaging-workflow.html"><![CDATA[<p>Often, the most difficult part of bringing an application into Fedora
isn’t getting the application itself to build, it’s the large tree of
dependencies the application has adopted, which haven’t been imported
into Fedora yet.</p>

<p>Package registries are a core part of the workflow for developers in
many modern languages. Rust developers have crates.io, Python
developers have pypi.org, Node.js developers have npmjs.com (or
pnpm.io… or jsr.io… it’s complicated). Package registries allow
developers to easily get reusable libraries, generally pre-built and
ready to use.</p>

<p>Fedora’s package repositories offer similar functionality. They
provide a collection of reusable libraries that are ready to
use. There are several differences from the language-specific package
registries, including the requirement that packages in Fedora’s
collection have to be built from source in Fedora’s build systems.</p>

<p>In order to support that requirement, Fedora packagers need to wrap
each project’s build system in a common build system that their build
infrastructure understands. Package maintainers are effectively
providing an alternate registry.</p>

<p>One hurdle in this endeavor is that in a typical package registry, a
developer can publish multiple releases in parallel. When an
application needs a component from a registry, it will typically
request information by the name of the package, it will receive
information describing the available versions, and it will select a
version according to constraints provided by the developer. However,
Fedora does not function like a typical registry in this respect. In
Fedora, there is only one package by any name, so in order to provide
multiple versions, there must be multiple packages that include parts
of the version in the name of the component.</p>

<p>Fortunately, we don’t have that constraint while preparing packages.
We can build local source repositories that function more like a
registry, and sort out the package name specifics once everything is
more or less ready to review.</p>

<p>That brings us to the first tools that can make packaging a little
easier.</p>

<h2 id="registry-packager">Registry packager</h2>

<p>Fedora includes tools designed to make it easier to bundle components
from crates.io and from PyPI. Since we aren’t always sure what version
or versions we will need when we begin building a complex application,
it might be helpful to assemble a description of how to build all of
them, similar to the data in the original registry.</p>

<p>I used Claude to construct <a href="https://codeberg.org/gordonmessmer/registry-packager">simple
wrappers</a> for
Fedora’s registry import tools. These wrappers create a local git
repository in which branches represent minor release series of the
component. Once the git repo is assembled, we can check out any branch
and build the latest release in that minor release series. (If it’s
necessary, we could also check out a previous patch and build that…)</p>

<p>For example:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ ./crate_packager.py fs4
$ cd crate-repos/rust-fs4
$ git branch
* main
  release-0.11
  release-0.12
  release-0.13
  release-0.5
  release-0.6
  release-0.7
  release-0.8
  release-0.9
</code></pre></div></div>

<h2 id="build-chains">Build chains</h2>

<p>Many of the packages imported in this manner will build, but some of
them will reveal more dependencies that need to be added. As the set
of dependencies grows, it can be difficult to track the set that’s
needed for a specific application and the order in which they need to
be built.</p>

<p>It would be helpful to have a tool to not only track this information,
but to manage the build of a list of rpm packages in sequence.</p>

<p>Once again, I used Claude to construct a simple program that wraps
mock-scm.</p>

<p><a href="https://github.com/rpm-software-management/mock">Mock</a> is a tool that
manages build environments in which package maintainers can build rpm
packages, and its “scm” extension supports building a package directly
from a source code repository, so that the package maintainer doesn’t
need to manually create a source RPM to start the process.</p>

<p>The wrapper is
“<a href="https://codeberg.org/gordonmessmer/rpm-build-assist">rpm-build-assist</a>”.
This program takes a YAML file that describes what release to use as
the base environment for builds, what type of source code repos are
used for the packages (which might be dist-git or source-git), where
the resulting RPM packages will be saved, and other details of the
build process.</p>

<p>As the packager works through the dependency set, they can simply add
new dependencies to the beginning of the list. The build-assist yaml
file will serve as a record of all of the packages that need to be
reviewed together, what version of each package is currently needed,
and the order in which they need to be built, while the script
automates the process of building them in sequence during development.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>base: fedora-44-x86_64
localrepo: /home/gordon/git/nodejs-electron-results

build:
  - type: dist-git
    url: /home/gordon/git
	packages:
	  - rust-walrus:release-0.24
	  - rust-wasmparser:release-0.240
	  - rust-wasmprinter:release-0.243
...
	  - nodejs-playwright:main
	  - nodejs-husky:main
	  - nodejs-electron:main
</code></pre></div></div>

<h2 id="automation">Automation</h2>

<p>Automation through CI can improve this workflow further by moving the
actual builds to dedicated compute infrastructure, and it also creates
opportunities for groups of maintainers to work together on a
collection of packages, coordinated in a shared source code
repository.</p>

<p>Claude helped here, too. Claude wrote a <a href="https://github.com/gordonmessmer/rpm-build-assist-action">basic container
action</a> for
use in GitHub runners. It has its own workflow to prepare a container
image that provides mock and rpm-build-assist, as well as the
action.yml that implements the reusable action.</p>

<p>Now, a repo that contains the build-assist.yaml file can also contain
a workflow that runs the build chain in GitHub CI.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>name: source-git build and test

on:
  push:
    branches: [ main ]
  pull_request:
    branches: [ main ]

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout repository
        uses: actions/checkout@v4

      - name: source-git build
        uses: gordonmessmer/rpm-build-assist-action@main
</code></pre></div></div>

<p>All of these tools are “proofs of concept”, so there are lots of
opportunities to improve them. But even at an early stage, they might
be useful to Fedora package maintainers who are preparing complex
applications.</p>]]></content><author><name>Gordon Messmer</name></author><summary type="html"><![CDATA[Often, the most difficult part of bringing an application into Fedora isn’t getting the application itself to build, it’s the large tree of dependencies the application has adopted, which haven’t been imported into Fedora yet.]]></summary></entry></feed>