Descrambling Parallel Build Logs

[article]

#include
#include // For flock().
#include // For malloc().
#include // for tmpfile().
#include // For fork(), execvp(), dup2().
#include // For waitpid().

int main(int argc, char *argv[])
{
pid_t child;
int outfd = -1;
int status;
int lockfd = -1;
char buffer[64 * 1024];
ssize_t bytesRead;
FILE *outfile = tmpfile();

outfd = fileno(outfile);

child = fork();
if (child == (pid_t) 0) {
// In the child.  Run the real shell.

char * const shArgv[4] = {
"/bin/sh",
"-c",
argv[3],
0
};
close(1);
close(2);
dup2(outfd, 1);
dup2(outfd, 2);
execvp("/bin/sh", shArgv);

// The following should never execute.

return 17;
} else if (child == (pid_t) -1) {
// Error in fork; this should never happen.

return 23;
}

// In the parent.  Wait for the child to exit, then relay the output to
// the real stdout.

waitpid(child, &status, 0);
lockfd = open(argv[1], O_WRONLY|O_CREAT, 0600);
flock(lockfd, LOCK_EX);
fseek(outfile, 0, SEEK_SET);

while ((bytesRead = fread(buffer, 1, 64 * 1024, outfile)) > 0) {
write(1, buffer, bytesRead);
}
fclose(outfile);
flock(lockfd, LOCK_UN);
close(lockfd);

// Relay the exit code from the child.

return status;
}
There's not much to this utility. It just runs the specified command as a child process, redirecting the command's output and errors to a temporary file. When the command finishes, the wrapper acquires an exclusive lock on a specified lock file. This is the crux of the wrapper, as it ensures that the output from each command is written atomically, without interference from other commands that might be running at the same time. After copying the redirected output to the real standard output stream, the lock is released and the wrapper exits, propagating the exit status from the command. At this point we have a way to ensure that the output from each command is handled atomically. We just need to make sure each command is invoked using our wrapper. The obvious solution is to modify the makefile to prefix each command with locker lockfile -c, but on any non-trivial build that would be a nightmare. Fortunately there is a better way.

Overriding SHELL for fun and profit
Normally GNU make only uses the shell to invoke commands when the command-line requires the shell for correct operation, such as commands that use process pipelines, or that invoke shell built-in commands. Simple commands are invoked directly by GNU make, to avoid the overhead of creating the extraneous shell process. However, a little known feature of GNU make is that it will use the shell for every command invocation if you have explicitly set the SHELL variable in your makefile. We can exploit this fact to automatically prefix every command in the build with our wrapper, without requiring any other changes to the makefile. If SHELL is set, GNU make creates command-lines by appending -c commandline to the value of SHELL. That is, if the original command-line is gcc -o foo foo.c, then GNU make will actually invoke the command $(SHELL) -c "gcc -o foo foo.c". Therefore, we should set SHELL to simply locker lockfile. For example, we could modify our trivial test makefile like so:

About the author

Eric Melski's picture Eric Melski

Eric is Chief Architect for ElectricAccelerator, a high-performance implementation of make from Electric Cloud, Inc.  He obtained a BS in Computer Science from the University of Wisconsin in Madison in 1999.  In 2002 Eric co-founded Electric Cloud, where he has spent more than a decade developing distributed, parallel systems designed to accelerate build processes.  He is named on seven patents related to his work on build acceleration at Electric Cloud.

AgileConnection is one of the growing communities of the TechWell network.

Featuring fresh, insightful stories, TechWell.com is the place to go for what is happening in software development and delivery.  Join the conversation now!

Upcoming Events

Nov 09
Nov 09
Apr 13
May 03