The 5 Levels of Configuration Languages

Code is data and data is code. Years ago, I had a brief affair with Lisp and there I picked up this meme. Today, I believe there are also benefits in separating code and data.

Glimpses of this debate come up whenever people discuss the syntax for yet another configuration schema. There are 5 levels of power for configuration languages. If you design a new schema, be aware of all of them.

Level 1: String in a File

The file system is a key-value store. The Linux kernel does it with procfs and sysfs. Example from my shell:

$ cat /proc/sys/kernel/arch
x86_64
$ cat /proc/sys/kernel/sched_energy_aware
1

I could write 0 to the second one to change the kernel behavior.

This certainly is the simplest format, yet it works.

Level 2: A List

For a little bit more expressive power, you can treat the file contents as a list. Maybe one per line. Maybe a key-plus-value per line. Maybe with sections like an INI file. Example file contents:

[database]
server = 192.0.2.62     
port = 143
file = "payroll.dat"

This is already complex enough that not everything is intuitive. What happens if you duplicate a key? Can you do multiline strings?

The defining constraint is that you cannot have a list of lists. That would be the next level. However, think twice before going there because with a little pre- and postfixing names, you can do a lot here.

Level 3: Nested Data Structures

This is probably the most popular level, where we find JSON, YAML, XML, TOML, etc. Example file contents:

{
  "database": {
    "host": "localhost",
    "port": 1234,
    "auth": {
      "user": "elon",
      "password": "mars2023"
    }
  }
}

It is fascinating how much people can discuss about the pros and cons of the alternatives on this level even though they are more or less the same.

I actually like XML. It isn't "cool" like YAML anymore, but it has better tooling support (e.g. schema checking) and doesn't try to be too clever. Just try to stay away from namespaces and don't be afraid of using attributes.

In practice, many later encounter the limitation that you cannot compute anything. Maybe they need variables or want to generate a list of things. Then they retrofit it with abominations like "Python-expressions as values" or "Jinja-templates for generation". At this point, you better get up another level and this is where we transition from data to code, don't you think?

Level 4: Total Programming Languages

This is the least known level and should probably be more popular. The term total functional programming means you can compute stuff, but it will terminate. This is explicitly not Turing-complete.

This level includes XSLT, Jsonnet (a JSON extension), and even typed ones like Dhall. Here is a Starlark example from Bazel:

java_binary(
    name = "ProjectRunner",
    srcs = glob(["src/main/java/com/example/*.java"]),
)

A challenge here is that you are programming, but since the languages are not that popular, you don't have the usual language tooling available. So, the final level...

Level 5: Full Programming Language

Of course, you can use any scripting language to configure things. Python, Javascript, Lua, TCL, whatever. They are Turing-complete. For example, Conan is a package manager where you specify packages in Python:

from conan import ConanFile

class CompressorRecipe(ConanFile):
    settings = "os", "compiler", "build_type", "arch"
    generators = "CMakeToolchain", "CMakeDeps"

    def requirements(self):
        self.requires("zlib/1.2.11")

    def build_requirements(self):
        self.tool_requires("cmake/3.22.6")

Any Python programmer can easily add complex logic where they see fit.

Often people discover the problem, that the configuration determines what to import, but the imports also determine the configuration itself. This circular dependency leads to madness.

For example, in Conan you declare dependencies like in the example above. You might want to depend on some Python module which you use in this script. At that point you are already executing the script though. Thus, Conan invented python_requires_extend, its own weird way to inject a super-class into an existing object.

How to avoid this madness? Introduce another low-level configuration file. Back to level one...

Which Level to Use?

The guiding principles is to use the lowest possible level to keep it simple. Unfortunately, it usually is not an easy decision because you don't know the future.

The corollary of my level structure: Don't waste time on discussions within a level. For example, JSON and YAML both have their problems and pitfalls but both are probably good enough.

Discussion on lobste.rs with great comments. Discussion on hacker.news with many more languages.