CMake is a conservative and popular build system for C++, thus the first choice if you look for boring technology. Yet, it does not scale well to large projects because of dependency management. This is about the classic "diamond" shape:
By "large scale project", I'm talking about multiple teams and even more components such that you cannot structure it as a tree. Instead, it makes more sense to have a flat directory structure where you place components side by side. Dependencies between components will quickly grow into all kind of shapes (although you should really avoid cycles) and among them there will sooner or later be a diamond.
To decouple the components,
we would like to build and test each of them independently,
so each gets a CMakeLists.txt
.
However, we still need a global one at the root
so each of the subdirectories can find its dependencies.
CMakeLists.txt
components/A/CMakeLists.txt
components/base/CMakeLists.txt
components/B/CMakeLists.txt
components/root/CMakeLists.txt
This need for the root file is annoying.
It needs to parse all CMakeLists.txt
files for configuration.
Instead, I would prefer to enter a component directory and build there. Can CMake do this?
CMake has two mechanisms for dependencies.
First, there is find_package.
The intention here is to detect package available on your system
and configure the build accordingly.
It comes in a "Module" and a "Config" mode but the distinction is not relevant here.
Neither are useful here because they assume a prebuilt library.
CMake will not build a dependency through find_package
.
The alternative is add_subdirectory.
Just from its name, you see its intention is about a directory tree.
The root CMakeLists.txt
uses it to find the component CMakeLists.txt
.
If you try to target a "non-sub" directory, it will show an error message:
CMake Error at CMakeLists.txt:8 (add_subdirectory):
add_subdirectory not given a binary directory but the given source
directory "../A" is not a subdirectory
of "root". When specifying an out-of-tree
source a binary directory must be explicitly specified.
Well, there is a second parameter for add_subdirectory
to make it work.
Since CMake supports out of tree builds,
it uses the second parameter to locate where the out-of-tree build
for the dependency shall be.
Let's assume you create a build
folder in A,
the second parameter for dependency base is sub/base
,
and you run cmake ..
in there.
CMake creates CMakeCache.txt
files
and here it would create one for A and one for its dependency:
components/A/build/CMakeCache.txt
components/A/build/sub/base/CMakeCache.txt
Looks ok. At least until you run into the diamond situation.
Since every component creates its own sub build folder,
this happens recursively such that base will exist twice.
However, CMake has clever magic that inside sub/B
it does not build its own base
.
Instead it builds in sub/A
and reuses the targets there.
The problem is that CMake complains about duplicate variables
as it parses base/CMakeLists.txt
twice.
To avoid that we need include guards
as in C header files.
cmake_minimum_required(VERSION 3.16)
if(TARGET base)
return()
endif()
project(base
...
A problem you might not notice initially is that CMake has no namespacing.
This means it gets littered with pre- or postfixes like ${PROJECT_NAME}
:
add_executable(unittests-${PROJECT_NAME}
test/test_${PROJECT_NAME}.cpp)
target_link_libraries(unittests-${PROJECT_NAME}
PRIVATE ${PROJECT_NAME})
Now the build succeeds.
You can build from each component
and it builds only dependencies as necessary.
We need no root CMakeLists.txt
.
Not elegant but useable.
If you want to try it yourself,
checkout this git repo.
At least if your dependencies are not that deep or you don't try to build on Windows with its limited path length.
Remarkably the solutions seem to map to C solutions. So if you design a build system, it makes sense to consider how modern languages solved the C problems more cleanly.
Other Build Systems
Build systems which mimic CMake, like Meson or xmake, are similar. The primary purpose is to configure the build according to external dependencies but for large projects we care about the internal dependencies.
Bazel (and its clones Buck, Pants, and Please)
is designed for this use case, so it looks more elegant there.
Instead of specifying a directory name to build a dependency,
Bazel reuses the folder relative to the workspace (often the repo).
This explains why dependencies are specified with their whole path,
like //component/A:A
.
Within the same file, the target name :A
is sufficient
so here you see the benefits of namespaces.
A more esoteric build system like redo achieves our use case here because it is not burdened by complex features like out-of-tree builds. Its simplicity means that users have to build the more complex features on top.
Related posts:
Pondering Amazon's Manyrepo Build System
shows how Amazon went all-in on packages instead.
The Three Owners of an Interface
describes how base
packages could appear.