Python Distributions for Bazel

I'm working with Bazel these days but I don't like how it handles Python. So here I reinvent that wheel.

The traditional way for Python in Bazel

Normal developers use rules_python. If I understand it correctly, this is essentially a construct to make all PyPI packages available as targets for Bazel. In order to do this, you must specify your Python toolchain for your workspace. With some more best practices like aspect_rules_py you have quite a few lines of boilerplate in each and every repository.

Bazel works in three phases: Loading, analysis, and execution. Downloading stuff generally only happens in the first phase. Naturally, that includes the download of all Python packages you will need.

Does it have to be that way? What if you have Python tools with conflicting dependencies? There should be a way to build Python environments with Bazel. Multiple environments.

Looking for Alternatives

No, not containers because I don't like them. Using containers for tools in Bazel feels heretical to me.

What about Python's venv? These are not suitable because they are inherently tied to their host Python. Bazel wants to enable you to execute your builds remotely and we cannot have ties to some host for that.

So we need a actual independent Python distribution. This is tricky because Python has all kinds of dependencies (libc, libssl, etc). Especially GNU's glibc with its version checks is annoying.

Then let's compile Python with libraries linked statically? Unfortunately, that breaks a lot Python packages because dlopen(3) is not available anymore. Practically, nobody uses statically-linked Python.

Ok, dynamically-linked musl libc then? We might package that together with our Python distribution. Unfortunately, this requires the musl loader on your system.

We don't want any system dependencies because Bazel shall handle all dependencies. No system libc. No loader. Yet, dynamic linking via dlopen must work.

So this finally leads us the conclusion: We want musl libc as a shared library. Therefore, we need the musl loader. However, we cannot assume the system will have it. We need some kind of wrapper.

Since I know there is some crazy stuff out there to make an αcτµαlly pδrταblε εxεcµταblε with cosmopolitan, it should be possible somehow.

The Trick for a Portable Python

Let's create a launcher which pretends to be a Python executable. One can build Python as a library and load that internally. We can build libpython with musl and that should be nicely portable.

How will that launcher work on systems with glibc though? The fun thing is that the binary will still mostly run even on a glibc system. It will only fail at the dlopen, where we try to load libpython. However, in that case the launcher can simply adapt the environment (LD_LIBRARY_PATH to our packaged libs), use the bundled musl loader, and execute itself with it. The spawned clone will use the bundled musl libc and loading libpython works.

Everything is dynamically linked so Python packages generally work. In contrast to venv, you can tarball this Python distribution and move it around without problems. It results in a 37MiB tar.gz. It includes nearly all the "batteries" (no GUI stuff). It includes header files, so one can compile C extensions. It includes libssl, libz, libsqlite3, and others.

With a little AI help, all this went quite smoothly. The really hard part was still to come.

Building Distributions

Bazel is straightforward if you work with files but it gets tricky if you work with directories. It took me more than two weeks to find a way to derive Python distributions in a way I consider "not too hacky".

One big trick is to put most of the programming into a Python script, which I sneak into the Python distribution. Now one just needs to call:

my/bin/python3 -m ppd_instantiate another/ reqs.txt

This copies the whole distribution into the another/ folder and pip-installs whatever is in reqs.txt. It also deletes __pycache__ stuff for size reasons.

Bazel is still picky because you cannot have a filegroup, like the another/ folder, and as its own target an executable from that folder. Apparently, one has to create a wrapper script and annotate the rest of the distribution as runfiles.

How does the wrapper script find the python3 executable in the runfiles? For now, I use find(1). That is the hacky part. There should be some way to not rely on find?! It still looks nice in a BUILD file though as the messy details are hidden in a library:

python_derived_dist(
    name = "my_python",
    python_exe = "@portable_python//:python",
    requirements_txt = "requirements.txt",
    tags=["requires-network"],
)

So now we have a rule to create Python distributions with a custom set of additional packages installed. From that one can create another rule to build some Sphinx documentation for example.

sphinx_build(
    name = "sphinx_html",
    srcs = glob(["docs/*"]),
    python_distro = ":my_python",
)

And just like that bazel build :sphinx_html will install your packages for your custom Python distribution and build the documentation.

Concept Proven

Such Python distributions are nicely cached by Bazel. You can have as many variants as you want and mix them in your builds without toolchain considerations.

You can also make them available to other modules as a simple executable target. No need for users to write toolchain boilerplate.

That said, this portable python is a proof of concept. It is not broadly tested, so take "portable" with a grain of salt.

Python as a target, not a toolchain, in a Bazel build system