Rebuilding When a File's Checksum Changes

[article]

if it doesn't exist, or if the checksum stored in the .md5 file has changed.  That works as follows.

The $(shell md5sum $*) checksums the file that matches the % part of %.md5.  For example, the this rule is being used to generate the foo.h.md5 file then % matches foo.h and is stored in $*.

The $(shell cat $@ 2>/dev/null) gets the contents of the current .md5 file (or a blank if it doesn't exist; note how the 2>/dev/null means that errors are ignored) and then the $(filter-out ...) compares the checksum retrieved from the .md5 file and the checksum generated by md5sum.  If they are the same then $(filter-out ...) is an empty string.

If the checksum has changed then the rule will actually run md5sum %* > $@ which will update the .md5 file's contents and timestamp.  The stored checksum will be available for later use when running Make and the changed timestamp on the .md5 file will cause the related object file to be built.

The Hack in Action

To see this in action we create files foo.c and foo.h and run GNU Make:

$ touch foo.c foo.h
$ ls
foo.c  foo.h  Makefile
$ make
cc    -c -o foo.o foo.c
$ ls
foo.c  foo.c.md5  foo.h  foo.h.md5  foo.o  Makefile

GNU Make has generated the object file foo.o and two .md5 files: foo.c.md5 and foo.h.md5.   Each .md5 file contains the checksum of the file:

$ cat foo.c.md5
d41d8cd98f00b204e9800998ecf8427e  foo.c

First, we verify that everything is up to date and then that changing the timestamp on either foo.c or foo.h causes foo.o to be rebuilt.

$ make
make: Nothing to be done for `all'.
$ touch foo.c
$ make
cc    -c -o foo.o foo.c
$ make
make: Nothing to be done for `all'.
$ touch foo.h
$ make
cc    -c -o foo.o foo.c

To demonstrate that changing the contents of a source file will cause foo.o to be rebuilt we can cheat by changing the contents of, say, foo.h and then touch foo.o.  In that way we know that foo.o is newer than foo.h but that foo.h's contents have changed since the last time foo.o was built.

$ make
make: Nothing to be done for `all'.
$ cat foo.h.md5
d41d8cd98f00b204e9800998ecf8427e  foo.h
$ cat > foo.h
// Add a comment
$ touch foo.o
$ make
cc    -c -o foo.o foo.c
$ cat foo.h.md5
65f8deea3518fcb38fd2371287729332  foo.h

There you can see that foo.o was rebuilt even though it was newer than all the related source files and that foo.h.md5 has been updated with the new checksum of foo.h.

Improvements

There are a couple of improvements that can be made to the code as it stands: the first is an optimization, the second makes the code ignore changes in whitespace in a source file.

When the checksum of a file has changed the rule to update the .md5 file actually ends up running md5sum twice on the same file with the same result.   That's a waste of time.   If you are using GNU Make 3.80 or above it's possible to store the output of md5sum $* in a temporary variable called CHECKSUM and just use the variable:

%.md5: FORCE
   @$(eval CHECKSUM := $(shell md5sum $*))$(if $(filter-out $(shell cat $@ 2>/dev/null),$(CHECKSUM)),echo $(CHECKSUM) > $@)

The other improvement is to make the checksum insensitive to changes in whitespace.  After all it would be a pity if two developers' differing opinions of the right amount of indentation caused object files to rebuild when nothing else had changed.

The md5sum utility itself does not have a way of ignoring whitespace, but it's easy enough

About the author

John Graham-Cumming's picture John Graham-Cumming

John Graham-Cumming is Co-Founder at Electric Cloud, Inc . Prior to joining Electric Cloud, John was a Venture Consultant with Accel Partners, VP of Internet Technology at Interwoven, Inc. (IWOV), VP of Engineering at Scriptics Corporation (acquired by Interwoven), and Chief Architect at Optimal Networks, Inc. John holds BA and MA degrees in Mathematics and Computation and a Doctorate in Computer Security from Oxford University. John is the creator of the highly acclaimed open source POPFile project. He also holds two patents in network analysis and has others pending.

AgileConnection is one of the growing communities of the TechWell network.

Featuring fresh, insightful stories, TechWell.com is the place to go for what is happening in software development and delivery.  Join the conversation now!

Upcoming Events

May 04
May 04
May 04
Jun 01