## Making Mathematica faster with Compile

June 18th, 2011 | Tags:

Over at Sol Lederman’s fantastic new blog, Playing with Mathematica, he shared some code that produced the following figure.

Here’s Sol’s code with an AbsoluteTiming command thrown in.

f[x_, y_] := Module[{},
If[
Sin[Min[x*Sin[y], y*Sin[x]]] >
Cos[Max[x*Cos[y],
y*Cos[x]]] + (((2 (x - y)^2 + (x + y - 6)^2)/40)^3)/
6400000 + (12 - x - y)/30, 1, 0]
]

AbsoluteTiming[
\[Delta] = 0.02;
range = 11;
xyPoints = Table[{x, y}, {y, 0, range, \[Delta]}, {x, 0, range, \[Delta]}];
image = Map[f @@ # &, xyPoints, {2}];
]
Rotate[ArrayPlot[image, ColorRules -> {0 -> White, 1 -> Black}], 135 Degree]

This took 8.02 seconds on the laptop I am currently working on (Windows 7 AMD Phenom II N620 Dual core at 2.8Ghz). Note that I am only measuring how long the calculation itself took and am ignoring the time taken to render the image and define the function.

Compiled functions make Mathematica code go faster

Mathematica has a Compile function which does exactly what you’d expect…it produces a compiled version of the function you give it (if it can!). Sol’s function gave it no problems at all.

f = Compile[{{x, _Real}, {y, _Real}}, If[
Sin[Min[x*Sin[y], y*Sin[x]]] >
Cos[Max[x*Cos[y],
y*Cos[x]]] + (((2 (x - y)^2 + (x + y - 6)^2)/40)^3)/
6400000 + (12 - x - y)/30, 1, 0]
];

AbsoluteTiming[
\[Delta] = 0.02;
range = 11;
xyPoints =
Table[{x, y}, {y, 0, range, \[Delta]}, {x, 0, range, \[Delta]}];
image = Map[f @@ # &, xyPoints, {2}];
]
Rotate[ArrayPlot[image, ColorRules -> {0 -> White, 1 -> Black}],
135 Degree]

This simple change takes computation time down from 8.02 seconds to 1.23 seconds which is a 6.5 times speed up for hardly any extra coding work. Not too shabby!

Switch to C code to get it even faster

I’m not done yet though! By default the Compile command produces code for the so-called Mathematica Virtual Machine but recent versions of Mathematica allow us to go even further.

Install Visual Studio Express 2010 (and the Windows 7.1 SDK if you are running 64bit Windows) and you can ask Mathematica to convert the function to low level C code, compile it and produce a function object linked to the resulting compiled code. Sounds complicated but is a snap to actually do. Just add

CompilationTarget -> "C"

to the Compile command.

f = Compile[{{x, _Real}, {y, _Real}},
If[Sin[Min[x*Sin[y], y*Sin[x]]] >
Cos[Max[x*Cos[y],
y*Cos[x]]] + (((2 (x - y)^2 + (x + y - 6)^2)/40)^3)/
6400000 + (12 - x - y)/30, 1, 0]
, CompilationTarget -> "C"
];

AbsoluteTiming[\[Delta] = 0.02;
range = 11;
xyPoints =
Table[{x, y}, {y, 0, range, \[Delta]}, {x, 0, range, \[Delta]}];
image = Map[f @@ # &, xyPoints, {2}];]
Rotate[ArrayPlot[image, ColorRules -> {0 -> White, 1 -> Black}],
135 Degree]

On my machine this takes calculation time down to 0.89 seconds which is 9 times faster than the original.

Making the compiled function listable

The current compiled function takes just one x,y pair and returns a result.

In[8]:= f[1, 2]

Out[8]= 1

It can’t directly accept a list of x values and a list of y values. For example for the two points (1,2) and (10,20) I’d like to be able to do f[{1, 10}, {2, 20}] and get the results {1,1}. However what I end up with is an error

f[{1, 10}, {2, 20}]

CompiledFunction::cfsa: Argument {1,10} at position 1 should be a machine-size real number. >>

To fix this I need to make my compiled function listable which is as easy as adding

RuntimeAttributes -> {Listable}

to the function definition.

f = Compile[{{x, _Real}, {y, _Real}},
If[Sin[Min[x*Sin[y], y*Sin[x]]] >
Cos[Max[x*Cos[y],
y*Cos[x]]] + (((2 (x - y)^2 + (x + y - 6)^2)/40)^3)/
6400000 + (12 - x - y)/30, 1, 0]
, CompilationTarget -> "C", RuntimeAttributes -> {Listable}
];

So now I can pass the entire array to this compiled function at once. No need for Map.

AbsoluteTiming[
\[Delta] = 0.02;
range = 11;
xpoints = Table[x, {x, 0, range, \[Delta]}, {y, 0, range, \[Delta]}];
ypoints = Table[y, {x, 0, range, \[Delta]}, {y, 0, range, \[Delta]}];
image = f[xpoints, ypoints];
]
Rotate[ArrayPlot[image, ColorRules -> {0 -> White, 1 -> Black}],
135 Degree]

On my machine this gets calculation time down to 0.28 seconds, a whopping 28.5 times faster than the original. Rendering time is becoming much more of an issue than calculation time in fact!

Parallel anyone?

Parallelization -> True

to the Compile command I can parallelise the code using threads. Since I have a dual core machine, this might be a good thing to do. Let’s take a look

f = Compile[{{x, _Real}, {y, _Real}},
If[
Sin[Min[x*Sin[y], y*Sin[x]]] >
Cos[Max[x*Cos[y],
y*Cos[x]]] + (((2 (x - y)^2 + (x + y - 6)^2)/40)^3)/
6400000 + (12 - x - y)/30, 1, 0]
, RuntimeAttributes -> {Listable}, CompilationTarget -> "C",
Parallelization -> True];

AbsoluteTiming[
\[Delta] = 0.02;
range = 11;
xpoints = Table[x, {x, 0, range, \[Delta]}, {y, 0, range, \[Delta]}];
ypoints = Table[y, {x, 0, range, \[Delta]}, {y, 0, range, \[Delta]}];
image = f[xpoints, ypoints];
]
Rotate[ArrayPlot[image, ColorRules -> {0 -> White, 1 -> Black}],
135 Degree]

The first time I ran this it was SLOWER than the non-threaded version coming in at 0.33 seconds. Subsequent runs varied and occasionally got as low as 0.244 seconds which is only a few hundredths of a second faster than the original.

If I make the problem bigger, however, by decreasing the size of Delta then we start to see the benefit of parallelisation.

AbsoluteTiming[
\[Delta] = 0.01;
range = 11;
xpoints = Table[x, {x, 0, range, \[Delta]}, {y, 0, range, \[Delta]}];
ypoints = Table[y, {x, 0, range, \[Delta]}, {y, 0, range, \[Delta]}];
image = f[xpoints, ypoints];
]
Rotate[ArrayPlot[image, ColorRules -> {0 -> White, 1 -> Black}],
135 Degree]

The above calculation (sans rendering) took 0.988 seconds using a parallelised version of f and 1.24 seconds using a serial version. Rendering took significantly longer! As a comparison lets put a Delta of 0.01 in the original code:

f[x_, y_] := Module[{},
If[
Sin[Min[x*Sin[y], y*Sin[x]]] >
Cos[Max[x*Cos[y],
y*Cos[x]]] + (((2 (x - y)^2 + (x + y - 6)^2)/40)^3)/
6400000 + (12 - x - y)/30, 1, 0]
]

AbsoluteTiming[
\[Delta] = 0.01;
range = 11;
xyPoints = Table[{x, y}, {y, 0, range, \[Delta]}, {x, 0, range, \[Delta]}];
image = Map[f @@ # &, xyPoints, {2}];
]
Rotate[ArrayPlot[image, ColorRules -> {0 -> White, 1 -> Black}], 135 Degree]

The calculation time (again, ignoring rendering time) took 32.56 seconds and so our C-compiled, parallel version is almost 33 times faster!

Summary

• The Compile function can make your code run significantly faster by compiling it for the Mathematica Virtual Machine (MVM).  Note that not every function is suitable for compilation.
• If you have a C-compiler installed on your machine then you can switch from the MVM to compiled C-code using a single option statement.  The resulting code is even faster
• Making your functions listable can increase performance.
• Parallelising your compiled function is easy and can lead to even more speed but only if your problem is of a suitable size.
• Sol Lederman has a very cool Mathematica blog – check it out!  The code that inspired this blog post originated there.
1. Any suggestions compiling this on Mac…
Thanks.

2. Great post! However, something not so obvious here is that, for the compiled functions, the majority of the measured runtime is taken up by the generation of the input arrays using Table. Looking at the AbsoluteTiming of the function application by itself, I find on my computer (3.2GHz Core 2 Quad, 64-bit Windows, gcc 4.5.2) that the parallelized compiled version is a rather splendid ~150 times faster than the uncompiled one!

3. Hello,

great post. But what kind of functions are generally suitable to be compiled? Or is it a matter of “just try it and you will see”? I would guess many of functions are already “compiled” to run as fast as possible (and further compilation might by even contra-productive). But how to recognise them?

4. Compile[] is nice and all, but why should we need it? Type-inference JIT compilation has been around for decades (e.g. Parc-Place Smalltalk, Self). Why can’t Mathematica cache thunks and the corresponding translated VM automatically? That and some reasonable form of aggregate data structure (or might I dream: classes), are my major complaints. But I still use it because there is nothing else like it.