Herb Sutter published the article Welcome to the Jungle, where he claimed "incoherent/weak memory models are a performance experiment that is in the process of failing in the marketplace". This statement throws down the gauntlet to the InvasIC project, which I am working on. Let me explain, why InvasIC believes an incoherent/weak memory model can work.
Sutter says that "on the software side, all of the mainstream general-purpose languages and environments (C, C++, Java, .NET) have largely rejected weak memory models, and require a coherent model that is technically called “sequential consistency for data race free programs” as either their only supported memory model (Java, .NET) or their default memory model (ISO C++11, ISO C11)." As a programmer, I completely agree. Concurrent programming is hard enough with sequential consistency. The InvasIC project uses the X10 programming language, which builds on Java and C++. While the current X10 language report has not specified a memory model, it compiles to Java and C++, so it shares a similar model.
Additionally, Sutter claims "on the hardware side, the theoretical performance benefits that come from letting caches work less synchronously have already been largely duplicated in other ways by mainstream processors having stronger memory models." In other words, the more complex architectures with coherent memory (Intel,AMD) are just as fast as incoherent architectures (ARM). However, as with all technical decisions there are tradeoffs. The tradeoff in this case is performance versus cost and energy efficiency. Intels needs to employ many more tricks (branch prediction, prefetching, ...), which means many more transistors.
The question remains, whether the programming effort for the incoherent memory is too expensive. My question is, why should the programmer pay the prize, if compiler and system can do that? The retort of course is, that compiler and system cannot do the job as good as the programmer. And the obvious response is that it might be enough for most cases. For example, memory management cannot be abstracted away via garbage collection in all cases, but is so convenient in most. Let us settle this discussion at this point, because we just talk about believes. The InvasIC project believes it might be worth it, but it will take us a few more years before we get some real numbers.
Why does InvasIC believe it will work out?
X10 provides the at
construct for communicating between "places".
Within a place shared coherent memory has the usual semantics.
Just the implementation of at
must care about incoherent memory
and the implementation is provided by runtime and compiler.
A similar effect would be provided by an MPI implementation,
which runs on an incoherent memory architecture.
If the send/recv operation can be made faster than on a cluster,
because we can skip some copies,
there should be an interesting market niche.