On building build systems

  1. General stuff
  2. Why do I care about this problem?
  3. Things to avoid in your build system
    1. Requires non-portable state on a laptop to function
    2. A wiki page for your build system
    3. Require a pre-populated server or dedicated infrastructure to build
  4. Container Images
    1. Build an image
    2. Bind mount your source, don’t copy it if you can avoid it
    3. Use the UID/GID of the user launching the container
    4. Make a “shell” option for accessing the container image interactively
  5. Task management
    1. Don’t over-depend, don’t under-depend
    2. Install the tools for the user, but ask first
  6. Repository Management
    1. Avoid needing specific directories on the host
    2. Avoid symlinks — entirely
    3. Avoid repository co-dependency
    4. Code generation should be a part of your build
    5. Generating dependencies lists should also be an automatic process
  7. Key Takeaways

Some useful patterns I’ve learned using and writing build and repository management systems. As someone who has created a lot of repositories in the past, and built things up from nothing to mature products multiple times, I believe I have learned the following lessons.

These aren’t hard and fast rules; build systems are unique snowflakes because they are unique software. Trying to normalize the problem ignores everyone’s needs, but some guidance seems useful to jot down while I’m thinking about it.

General stuff

The build system should be the first thing you tackle if you plan to work with a team from conception of the project. It is the first thing you must agree on if you wish to check in code and function as a team. Even if your build system is just go install, that’s a build instruction you all share and use equally.

It’s also important to treat the build system as a piece of software that lives on its own. The best build systems that I have used are grown from other build systems; either kept as a repository skeleton, copied from an existing repository, or repeated (largely from memory) by people who do this stuff frequently. The recipes and tropes are taken and modified as necessary, or recalled from pre-fabricated components.

The current here is that the best build systems are composed of re-used components, one way or another, and refined over time to fit the needs of a variety of situations. Being independent doesn’t mean it lives in another repo or anything; it is going to naturally be coupled to your source, but that’s not an excuse to make it worse.

Why do I care about this problem?

Your developers are an expensive resource and the build for many of them takes lots of their actual working time; many developers I’ve considered efficient have quite simply just figured out how to get work done more effectively while building other software and juggling that all day between multiple sub-projects or repositories.

Builds are a time sink and the more time you spend taming your build while you’re trying to program, the less time you are spending fixing the build or your software. This is largely language agnostic and while worse for some and better for others, most of the things I’ve written down apply to all of them.

Additionally, a broken build is a stop-the-world event. The build must be examined and everyone usually ends up stopping what they’re doing to address that. If it’s not the developer’s source code causing the breakage, it must be dug into, which takes time and does not move the product forward. If investment is not made, this effort is frequently multiplied by the number of builds until a working solution is achieved.

The real solution is to fix the build system.

Things to avoid in your build system

Please note that for the scope of this article, deployments are not builds. That’s a different subject, the rules apply a little differently there.

Some things you really want to avoid in your build system:

Requires non-portable state on a laptop to function

When something like an encryption key, configuration file that can’t be checked in for whatever reason, or something else is required by all developers, then it’s effectively a build dependency. This state almost always ends up being something that is passed from developer to developer; the equivalent of an excel document that is on an email Cc: list. It’s something you should have stopped doing over a decade ago.

A wiki page for your build system

Check your build instructions into git or your favorite source control. If they’re too long, or they make the developer churn too hard, that’s a good reason to automate those processes so they are documented in code.

The problem with a wiki is that it doesn’t track the source code. If you ever check out an old version of the source, your wiki instructions are no longer relevant. It also forces you to consider what resources you depend on for the build, which usually in my experience accompanies a wiki page. More on this below.

Require a pre-populated server or dedicated infrastructure to build

Exceptionally large and intertwined corporate codebases are an exception here for a lot of good reasons, but most shouldn’t need to build your software on a specific server or set of servers. Most of us in this situation are just being lazy.

Your software should build on the target architecture after the build setup instructions are followed. Your goal for your developers is to keep those instructions as small as possible through automation.

Building on static servers leads to stale artifacts being depended on that are not accounted for in the build system. Packages containing shared libraries, or files (think web assets/images, for an example) that live on disk that need to be put on new servers to let the build function. These are build dependencies, they should be accounted for through automation (s3cmd maybe for our images?) or checked in.

Container Images

Container images are a great way to encapsulate complicated behavior in your build. A game changing technology for encapsulating build instructions; but they can be used more effectively in our field.

I’ve elaborated below, but what I find works best for me is to:

  • Build an image full of dependencies and depend on it for builds
  • Bind mount the source tree as a part of a make task with a “real” build command (can also be a make task!) run in the container, and generate the artifact there
  • Either keep the resulting container for an image, or do the packaging in the container and emit to source tree.

Build an image

However you do it, build an image. Expect your developer to have the tools they need to build and run your image, if you need to. What they shouldn’t need is to overly pollute their computers to build your software. This can make it hard to build other software they depend on as well — especially if there are mismatched dependency versions between two independent builds — so doing this eliminates development friction both internally and external to the owners of the source tree.

Bind mount your source, don’t copy it if you can avoid it

Avoid copying your source into your image. Not only does it bloat your image and make it harder to cleanup later, but it also will more frequently bust image build caches, which typically depend on the modification times or differences of file contents. This means your build will take longer.

Bind mounting on the other hand is lightning fast and makes the external content on the host available to the container, allowing the developer to use their own tools with the source tree without having to bring them into the container image.

You can bind mount by using the -v flag in docker and providing a path external and internal to the container image. Docker volumes are not bind mounts.

Use the UID/GID of the user launching the container

This can make your build ugly at times but is typically worth the effort. What we do here is mirror the UID/GID of the user running the docker command inside the container, so that writes to the bind mount come out to the host as modifiable by the user.

docker run -it -u $(id -u):$(id -g) \
  -v "${PWD}:/code" -w /code myimage make

This combined with the bind mount frees the user from having to work too hard in the container, allowing them to use their choice of tools outside of the container without being interfered with by the container’s permissions.

This has a few caveats in this form (and you can correct them, but it’s not always necessary):

  • You have no username, groupname, or homedir. If your user account is UID < 1000, you may be assigned a system/machine user account surprisingly. This is rare on modern unix systems.
  • Anything built by docker build or similar tools will likely not be written for the right UID (since we don’t know it at build time). Account for this when installing packages.

Make a “shell” option for accessing the container image interactively

I see this lacking more often than any of these other things; a simple shell task or option to access the container image and the stored toolchain with an interactive shell. This goes a long way when you need to resolve something special or one-off.

I think we’ve all seen code where the developer thought they needed something thousands of times and only ended up needing it once; build systems are bloated with this problem because they have a low degree of interactivity, requiring the support for arguably pointless stuff be baked into the build system. Making the system support interactivity saves everyone time and trouble.

Additionally, the pattern of docker run <image> <subtool> is annoying as piss, and leads to the previous paragraph more or less; a task for every subtool or process you need to run. A shell goes further by allowing you to be more expressive with those subtools, without compromising the integrity of the build system, or the developer’s host.

Task management

Managing the tasks that manage your repository is also important; tools like make are frequent collectors of dirt that must be refactored just like the rest of the software in your application.

Most of this will cover basic make usage but can be applied to other declarative task management systems.

Don’t over-depend, don’t under-depend

Don’t over-specify your dependencies. This is important mostly because if you do, you will create brittle tasks that cannot be re-used. This leads to bit-rot as new tasks roll in to add new behavior and old behavior falls out of scope.

Likewise, don’t blob up all your code (or worse, cut and paste) in a single task. At first this may make sense, but look to refactoring it out as soon as time permits.

Like with libraries of software, aim to make small tasks that do exactly what is needed; and compose them to make your build in larger tasks. This way the developer can compose themselves if they need to.

Install the tools for the user, but ask first

If you need a tool to bootstrap your build, it’s fine to install it, but make sure your users know what is happening. Don’t just throw them a sudo prompt.

If you install tools for the user, check that they exist before you do. Check $PATH, only escalate if necessary (this allows them to run sudo safely themselves):

if ! which tool &>/dev/null
  if [ "$(id -u)" = 0 ]
    echo 1>&2 'Needs sudo to install $TOOL; please enter your sudo password.'
    sudo install_tool

Ruby’s bundler does this nicely for you if you need sudo, should you need a practical example.

All of this is vastly preferred to “install this” on a wiki page.

Repository Management

As touched on at the beginning, build systems are conflated with repository management by necessity most of the time. While we have techniques like GitHub flow to manage our commits, the act of manipulating files in our repository is largely left to the build system because it is already doing that usually.

Avoid needing specific directories on the host

This goes back to not doing development on fixed infrastructure, more or less. Specific directories on the host should be verboten, and largely avoided. Use a container for solving this problem and stop polluting developer laptops.

This is fairly specific to git, but also has some additional traps. Symlinks are just plainly a tire fire in git. Avoid them at all costs. They rarely work in windows, they rarely work on unix properly even when all the stars are aligned. They are tracked in a way that doesn’t allow git to do much if they break, and can have long-lasting implications on your build as it walks through multiple symlinks that likely point at a single place, which is how I’ve seen it used most frequently.

Symlinks can also mandate surprising behavior just to build the software. I’ve been in a few repositories where symlinks are used to make a fake $GOPATH inside the tree for building, then this symlink is .gitignore‘d. The problem with this approach is that if the symlinks are not populated by automation (and with all of the cases I’m thinking of, this was a part of the problem), the symlinks could be off, causing surprising build behavior.

It also encourages people to create their own snowflake directory trees by managing the symlinks themselves. As soon as I’ve symlinked outside the repository, I’ve essentially created a build dependency on someone’s specific laptop configuration.

Avoid repository co-dependency

Tools like git submodule, git subtree and similar ilk exist to patch a broken build system. They always create more problems than orchestrating the act of making two repositories cooperate in your own unique way. You will run into merge conflicts, accidental rollbacks of your code/module definitions, and all sorts of other joys I don’t want to waste time talking about here.

A custom system not only gets your users to think about how the code is used within the build, but also avoids automatic and accidental alterations of that work, if appropriately managed with tools and automation.

Code generation should be a part of your build

Code generation should always happen at make or similar build time. If I can modify the manifest or configuration that changes the code generation output and build without modifying the actual output before commit, I’ve created a bisecting problem for you later. At best, it will require you to perform a regeneration to validate that code generation has indeed happened and is up-to-date. Lots of CI systems do this separately; it’s usually just easier to push it down to the build so it’s done by both the CI and the developers.

Generating dependencies lists should also be an automatic process

Since the act of upgrading a dependency can be a security or integration problem, it’s not always sound to make this completely automated. However, you can take the following actions:

  • Check and notify in CI for outdated works or security fixes (npm audit, etc)
  • Make a task that batch upgrades dependency sets which should track each other, or the whole repository (go get -u ./... or yarn upgrade, for example). Make this available for developers so they don’t try to min/max dependencies you don’t want managed that way, another frequent source of build problems.
  • Have a CI system that takes this into account (e.g., doesn’t work with a fixed dependency set), by allowing changes to the dependency list to be tested as a part of the test process.

Key Takeaways

  • Keep it in the repository.
  • Keep it simple for the developer.
  • Treat your build system like a product of its own, one that the product depends on.
  • Maintain it like a separate product. Think about how you will use it in the next project.
  • Don’t pollute.