How you can handle The Diamond with CMake

CMake is a conservative and popular build system for C++, thus the first choice if you look for boring technology. Yet, it does not scale well to large projects because of dependency management. This is about the classic "diamond" shape:

diamond shape

By "large scale project", I'm talking about multiple teams and even more components such that you cannot structure it as a tree. Instead, it makes more sense to have a flat directory structure where you place components side by side. Dependencies between components will quickly grow into all kind of shapes (although you should really avoid cycles) and among them there will sooner or later be a diamond.

To decouple the components, we would like to build and test each of them independently, so each gets a CMakeLists.txt. However, we still need a global one at the root so each of the subdirectories can find its dependencies.

CMakeLists.txt
components/A/CMakeLists.txt
components/base/CMakeLists.txt
components/B/CMakeLists.txt
components/root/CMakeLists.txt

This need for the root file is annoying. It needs to parse all CMakeLists.txt files for configuration.

Instead, I would prefer to enter a component directory and build there. Can CMake do this?

CMake has two mechanisms for dependencies. First, there is find_package. The intention here is to detect package available on your system and configure the build accordingly. It comes in a "Module" and a "Config" mode but the distinction is not relevant here. Neither are useful here because they assume a prebuilt library. CMake will not build a dependency through find_package.

The alternative is add_subdirectory. Just from its name, you see its intention is about a directory tree. The root CMakeLists.txt uses it to find the component CMakeLists.txt. If you try to target a "non-sub" directory, it will show an error message:

CMake Error at CMakeLists.txt:8 (add_subdirectory):
  add_subdirectory not given a binary directory but the given source
  directory "../A" is not a subdirectory
  of "root".  When specifying an out-of-tree
  source a binary directory must be explicitly specified.

Well, there is a second parameter for add_subdirectory to make it work. Since CMake supports out of tree builds, it uses the second parameter to locate where the out-of-tree build for the dependency shall be. Let's assume you create a build folder in A, the second parameter for dependency base is sub/base, and you run cmake .. in there. CMake creates CMakeCache.txt files and here it would create one for A and one for its dependency:

components/A/build/CMakeCache.txt
components/A/build/sub/base/CMakeCache.txt

Looks ok. At least until you run into the diamond situation.

Since every component creates its own sub build folder, this happens recursively such that base will exist twice. However, CMake has clever magic that inside sub/B it does not build its own base. Instead it builds in sub/A and reuses the targets there.

The problem is that CMake complains about duplicate variables as it parses base/CMakeLists.txt twice. To avoid that we need include guards as in C header files.

cmake_minimum_required(VERSION 3.16)

if(TARGET base)
  return()
endif()

project(base
...

A problem you might not notice initially is that CMake has no namespacing. This means it gets littered with pre- or postfixes like ${PROJECT_NAME}:

add_executable(unittests-${PROJECT_NAME}
  test/test_${PROJECT_NAME}.cpp)
target_link_libraries(unittests-${PROJECT_NAME}
  PRIVATE ${PROJECT_NAME})

Now the build succeeds. You can build from each component and it builds only dependencies as necessary. We need no root CMakeLists.txt. Not elegant but useable. If you want to try it yourself, checkout this git repo.

At least if your dependencies are not that deep or you don't try to build on Windows with its limited path length.

Remarkably the solutions seem to map to C solutions. So if you design a build system, it makes sense to consider how modern languages solved the C problems more cleanly.

Other Build Systems

Build systems which mimic CMake, like Meson or xmake, are similar. The primary purpose is to configure the build according to external dependencies but for large projects we care about the internal dependencies.

Bazel (and its clones Buck, Pants, and Please) is designed for this use case, so it looks more elegant there. Instead of specifying a directory name to build a dependency, Bazel reuses the folder relative to the workspace (often the repo). This explains why dependencies are specified with their whole path, like //component/A:A. Within the same file, the target name :A is sufficient so here you see the benefits of namespaces.

A more esoteric build system like redo achieves our use case here because it is not burdened by complex features like out-of-tree builds. Its simplicity means that users have to build the more complex features on top.


HN discussion

Related posts: Pondering Amazon's Manyrepo Build System shows how Amazon went all-in on packages instead. The Three Owners of an Interface describes how base packages could appear.

CMake requires old-school include-guards and prefix at scale