If you think about large repositories you start to see similarities between version control system and build system:
A version control system manages files written by programmers. We want this data to be shared to collaborate. Git stores the files in content-addressable storage.
A build system manages how files are generated from other files. We want this data to be shared to avoid unnecessary rebuilds. Bazel stores the files in content-addressable storage.
Imagine we would use the same data store for both. For example, we could use IPFS. There is a Git on IPFS discussion thread so this is not a new idea. I have not found any activity for Bazel to use IPFS, but I don't see any fundamental problems.
Now that we store the source code and the build artifacts in the same storage, why do we use different tools?
A big assumption here is that your build is reproducible otherwise build artifacts are not identical and a content-adressable store does not deduplicate well. For most build systems this assumption does not hold because they rely on external dependencies, like a compiler that is available from your system. Effectively, store all the tooling together with your code and build artifacts. Since we can store plenty of assets like video files there, why not all the tools as well.
Going to an extreme, why do we even use files anymore? Essentially, we can fetch checksum ids from IPFS and use these data blobs to compute new blobs to put into IPFS again. The blobs can be native code, bytecode, source files, packages, functions, images, excel sheets, or SQL code. Ok, that would be too much for a first step, because practically all tools today work on a file system and not blob storage.
This whole system reminds me of Urbit. (Well, since the last time I looked Urbit apparently turned into a company and is running on Ethereum now) While the system is weird and crazy in a wonderful surreal way, the underlying idea is similar. It is just an even more fundamental change. Urbit starts from scratch with a brand new virtual machine and programming language. This is too much change for practical work.
If you think to the other extreme: What would be the minimum viable product to achieve this? This DVC tool is close. My quick glance through the documentation tells me it reimplements git-lfs and make. I wonder why not use Subversion and Scons or Bazel instead for better scalability? Anyways, there are issues with this approach:
- The workflow would be to commit all generated files, which is the opposite of what is usually recommended. There is no way to distingish the generated from original files except by looking into the build configuration.
- The build system does not know what has to be rebuilt unless you also commit the local database. The version control system will not be able to merge this database though.
- You cannot handle the case of multiple platforms. Assume some files must be generated on Linux and others on Windows. How would you pack this into a single commit?
So my conclusion is that there is a gap which none of the current Free Software tools can fill. I don't even know about any proprietary tools that would fit. It might be a project to invest a decade of work on.
Discussion on lobste.rs was fruitful. Special thanks for pointing to Vesta and ClearCase.
Discussion on r/programming was rather negative.
I also wrote about a rough design of a monorepo version control system.