The Pitfalls and Benefits of GNU Make Parallelization

[article]
Ask Mr. Make
Summary:

Many build processes run for hours with build managers commonly typing 'make' and going home for the night. GNU Make's solution to this problem is parallel execution, which is a simple command-line option that causes GNU Make to run jobs in parallel using the dependency in the Makefile to run in the correct order.

Many build processes run for hours with build managers commonly typing 'make' and going home for the night. GNU Make's solution to this problem is parallel execution, which is a simple command-line option that causes GNU Make to run jobs in parallel using the dependency in the Makefile to run in the correct order.

In practice, GNU Make parallel execution is severely limited by a number of problems stemming from the fact that almost all Makefiles are written with the implicit notion that they will run in series. Hardly any Makefile authors 'think parallel' when writing their Makefiles which cause hidden traps that either cause the build to fail with a fatal error when GNU Make is run in parallel-mode, or, worse, the build succeeds but builds incorrect binaries.

This article looks at parallel GNU Make and points out the pitfalls and how to work around them to get maximum parallelism.

The Basics: -j or --jobs
To start GNU Make in parallel mode it's enough to specify either the -j or --jobs option on the command-line.  The argument to the option is the maximum number of processes that GNU Make will run in parallel.

For example, typing make --jobs=4 will allow GNU Make to run up to four subprocesses in parallel. This would give a theoretical maximum speed up of four times cutting build time by a quarter.  The theoretical time is, however, severely limited by restrictions in the Makefile.  Calculating the maximum actual speed up is done using Amdahl's Law (see below).

One simple, but very annoying problem, found in parallel GNU Make is that since the jobs are no longer run serially (and the order depends on the timing of jobs) the output from GNU Make will be somewhat randomly permuted depending on the actual order of job execution.

Consider the following simple example:

.PHONY: all
all: t5 t4 t1
   @echo Making $@

t1: t3 t2
   touch $@

t2:
   cp t3 $@

t3:
   touch $@

t4:
   touch $@

t5:
   touch $@

It builds five targets t1, t2, t3, t4 and t5.  All are simply touched except for t2 which is copied from t3.

Running this through standard GNU Make without a parallel option gives the output:

touch t5
touch t4
touch t3
cp t3 t2
touch t1
Making all

The order of execution will be the same each time because GNU Make will follow the prerequisites depth-first and from left-to-right.  Note that the left-to-right execution (for example, in the all rule t5 is built before t4 which is built before t1) is a convention and is not guaranteed even in serial GNU Make.

Now if GNU Make is run in parallel mode (say make --jobs=16) GNU Make can run multiple jobs in parallel.  For example, it's clear that t5, t4 and t1 can be run at the same time since there are no dependencies between them.  Similarly, t3 and t2 do not depend on each other and hence can be run in parallel. 

So the output of a parallel GNU Make on the same Makefile might be:

touch t4
touch t5
touch t3
cp t3 t2
touch t1
Making all

or even

touch t3
cp t3 t2
touch t4
touch t1
touch t5
Making all

This makes any process that examines log files to check for build problems (such as diffing log files) difficult to handle.  (There's no easy solution to this in GNU Make; you'll have to live with it or look at a commercial solution that works around this problem, like Electric Cloud-and that's the last time that I'll mention the company I founded to deal with parallel build problems in this article).

Pages

About the author

John Graham-Cumming's picture John Graham-Cumming

John Graham-Cumming is Co-Founder at Electric Cloud, Inc . Prior to joining Electric Cloud, John was a Venture Consultant with Accel Partners, VP of Internet Technology at Interwoven, Inc. (IWOV), VP of Engineering at Scriptics Corporation (acquired by Interwoven), and Chief Architect at Optimal Networks, Inc. John holds BA and MA degrees in Mathematics and Computation and a Doctorate in Computer Security from Oxford University. John is the creator of the highly acclaimed open source POPFile project. He also holds two patents in network analysis and has others pending.

AgileConnection is one of the growing communities of the TechWell network.

Featuring fresh, insightful stories, TechWell.com is the place to go for what is happening in software development and delivery.  Join the conversation now!

Upcoming Events

Oct 12
Oct 15
Nov 09
Nov 09