Makefile Optimization: $(eval) and macro caching

[article]
Summary:

The $(eval) function was introduced in GNU Make 3.80 but was a little buggy, 3.81 has fixed those bugs in $(eval) is ready for prime time.  $(eval)'s argument is expanded and then parsed as if it were part of a Makefile.   

About $(eval)

The $(eval) function was introduced in GNU Make 3.80 but was a little buggy, 3.81 has fixed those bugs in $(eval) is ready for prime time.  $(eval)'s argument is expanded and then parsed as if it were part of a Makefile.   

That means that within an $(eval) (which could be inside a variable definition) you can programmatically define variables, create rules (explicit or pattern), include other Makefiles, etc.  It's a really powerful function.

See page 84 of the GNU Make Manual (section 8.8, The eval function http://www.gnu.org/software/make/manual/make.html#Eval-Function) for more details on the use of $(eval).

A simple example is

set = $(eval $1 := $2)

$(call set,FOO,BAR)

$(call set,A,B)

This results in FOO having the value BAR and A having the value B.  Obviously, this example could have been achieved without $(eval).  For more complex examples, see the discussion of $(eval) in Managing Projects with GNU Make by Robert Mecklenburg and published by O'Reilly http://www.oreilly.com/catalog/make3/book/ch04.pdf.

An $(eval) side effect

One use of $(eval) is to create side effects.  For example, here's a variable that is actually an auto-incrementing counter (it uses the arithmetic function from the GNU Make Standard Library, see http://gmsl.sf.net/).

include gmsl

c-value := 0

counter = $(c-value)$(eval c-value := $(call plus,$(c-value),1))

Every time counter is used its value is incremented by one.  For example, the followed sequence of $(info) functions outputs numbers in sequence starting from 0. ($(info)) was introduced in GNU Make 3.81 and is GNU Make's equivalent of print; it just prints out its argument on stdout).

$(info Starts at $(counter))

$(info Then it's $(counter))

$(info And then it's $(counter))

The actual output is

Starts at 0

Then it's 1

And then it's 2

You could use a simple side effect like that to find out just how often a particular macro is reevaluated by GNU Make.  You might be surprised at the result.  For example, when building GNU Make itself the variable srcdir is accessed 48 times. OBJEXT is accessed 189 times, and that's in a very small simple project.

All those accesses to an unchanging variable add up to time wasted by GNU Make looking at the same string over and over again.  If the variable being accessed is long (such as a long path), or contains calls to $(shell) or complex GNU Make functions the performance of macro handling could effect the overall run time of a make. 

That's especially important if you are trying to minimize build time by parallelizing the make, or if a developer is running an incremental build requiring just a few files to be rebuilt.  In both cases a long start up time by GNU Make could be very wasteful.

Caching macro values

Of course, GNU Make does provide a solution to this problem: use := intead of =. A macro defined using := gets its value set once and for all, the right-hand side is evaluated
once and the resulting value is set in the variable.  Although this is faster since the right-hand side is evaluated once there are problems which cause it to be rarely used.  It fixes the order of definition of variables in the Makefile.  For example,

FOO := $(BAR)

BAR := bar

and

BAR := bar

FOO := $(BAR)

result in FOO having two totally different values (in the first snipper FOO is empty and in the second FOO is bar).  Contrast that with the simplicity of writing

FOO = $(BAR)

BAR = bar

where FOO is bar.  Most Makefiles are written in this style and only very conscientious (and speed conscious) Makefile authors use :=.  See also my article Makefile Optimization: := and $(shell) go together).

On the other hand, almost all of these recursively-defined macros actually only ever have one value when used.  The long evaluation time for a complex recursively defined macro is a convenience for the Makefile author.

What would be really nice is a way to cache the macro values so that the flexibility of the =
style is preserved, but the macros are only evaluated once for speed.  Clearly, this would cause a little loss of flexibility: a macro couldn't take two different values (which is sometimes handy in a Makefile), but for most uses it would provide a significant speed up.

How much speed up?

Consider the following example Makefile.  It defines a variable C which is a long string (it's actually 1234567890 repeated 2,048 times with the alphabet repeated 2,048 plus spaces for a totally of 77,824 characters).  Here I used := so that I could build C's size quickly.  C
is designed to emulate the sort of long strings that get generated inside Makefiles (e.g. long lists of source files with paths).

The a variable FOO is defined that manipulates C using the built-in $(subst) function.  FOO emulates the sort of manipulation that occurs inside Makefiles.

Finally $(FOO) is evaluated 200 times to emulate the use of FOO in a small, but realistically sized Makefile.  The Makefile itself does nothing, there's a dummy, empty all rule at the end.

C := 1234567890 ABCDEFGHIJKLMNOPQRSTUVWXYZ

C += $C

C += $C

C += $C

C += $C

C += $C

C += $C

C += $C

C += $C

C += $C

C += $C

C += $C

FOO
= $(subst 9,NINE,$C)$(subst 8,EIGHT,$C)$(subst 7,SEVEN,$C)$(subst
6,SIX,$C)$(subst 5,FIVE,$C)$(subst 4,THREE,$C)$(subst 2,TWO,$C)$(subst
1,ONE,$C)

_DUMMY := $(FOO)

... repeated 200 times ...

.PHONY: all

all:

On my laptop, using GNU Make 3.81, this Makefile takes an average of 3.1s to run (I averaged ten runs using the time function in zsh).  That's a pretty long time spent repeatedly manipulating C and FOO.

Using the counter trick above it's possible to figure out how many times FOO and C are evaluated in this Makefile. FOO was evaluated 200 times, and C 1600 times.  It's amazing how fast these evaluations can add up.

But the value of C and FOO only actually need to be calculated once (since they don't change).   Altering the definition of FOO to use :=:

FOO
:= $(subst 9,NINE,$C)$(subst 8,EIGHT,$C)$(subst 7,SEVEN,$C)$(subst
6,SIX,$C)$(subst 5,FIVE,$C)$(subst 4,THREE,$C)$(subst 2,TWO,$C)$(subst
1,ONE,$C)

drops the run time to 1.8s with C evaluated 9 times and FOO just once.

But, of course, that requires using := with the problems described above.  Another alternative is the following simple caching scheme.  First, I define a function cache
which automatically caches a macro's value the first time it is
evaluated and retrieves it from the cache for each subsequent attempt
to retrieve it.

cache = $(if $(cached-$1),,$(eval cached-$1 := 1)$(eval cache-$1 := $($1)))$(cache-$1)

cache uses two macros to store the cached value of a macro (when caching macro A the cached value is stored in cache-A) and whether the macro has been cached (when caching macro A the 'has been cached flag' is cached-A.

First, it checks to see if the macro has been cached, if so the $(if) does nothing, if not the cached flag is set for that macro in the first $(eval) and then the value of the macro is expanded (notice the $($1) which gets the name of the macro and then gets its value) and cached.  Lastly, cache returns the value from cache.

To update the Makefile simply turn any reference to a macro into a call to the cache function.  For example, the Makefile above can be modified by changing all occurrences of $(FOO) to $(call cache, FOO) with a simple find and replace:

C := 1234567890 ABCDEFGHIJKLMNOPQRSTUVWXYZ

C += $C

C += $C

C += $C

C += $C

C += $C

C += $C

C += $C

C += $C

C += $C

C += $C

C += $C

FOO
= $(subst 9,NINE,$C)$(subst 8,EIGHT,$C)$(subst 7,SEVEN,$C)$(subst
6,SIX,$C)$(subst 5,FIVE,$C)$(subst 4,THREE,$C)$(subst 2,TWO,$C)$(subst
1,ONE,$C)

_DUMMY := $(call cache,FOO)

... repeated 200 times ...

.PHONY: all

all:

Running this on my machine shows that there's now one access of FOO, the same 9 accesses of C and a run time of 2.4s.   It's not as fast as the := version (which took 1.8s), but it's still 24% faster.  On a big Makefile this simple technique could make a real difference.

Wrapping Up

The fastest way to handle macros is to use := whenever you can, but it requires a care and attention and is probably best only done in a new Makefile (just imagine trying to go back and reengineer an existing Makefile with :=).   

If you're stuck with = then the little cache function I've presented here can give you a speed boost that'll be especially appreciated by developers doing incremental short builds.

And I hope I've whetted your appetite for the $(eval) function.  It's a very powerful tool.

About the author

AgileConnection is a TechWell community.

Through conferences, training, consulting, and online resources, TechWell helps you develop and deliver great software every day.