## Archive for September, 2010

One of my friends on facebook posted the following source code comment which I am sorely tempted to use in some future projects.

// // Dear maintainer: // // Once you are done trying to 'optimize' this routine, // and have realized what a terrible mistake that was, // please increment the following counter as a warning // to the next guy: // // total_hours_wasted_here = 32 //

It turns out that this is one of many examples in a superb thread on StackOverflow.com

Part of my job at the University of Manchester is to help researchers improve their code in systems such as Mathematica, MATLAB, NAG and so on. One of the most common questions I get asked is* ‘Can you make this go any faster please?’ *and it always makes my day if I manage to do something significant such as a speed up of a factor of 10 or more. The users, however, are often happy with a lot less and I once got bought a bottle of (rather good) wine for a mere 25% speed-up (Note: Gifts of wine aren’t mandatory)

I employ numerous tricks to get these speed-ups including applying vectorisation, using mex files, modifying the algorithm to something more efficient,picking the brains of colleagues and so on. Over the last year or so, I have managed to help several researchers get significant speed-ups in their MATLAB programs by doing one simple thing – swapping the fsolve function for an equivalent from the NAG Toolbox for MATLAB.

The fsolve function is part of MATLAB’s optimisation toolbox and, according to the documentation, it does the following:

*“fsolve Solves a system of nonlinear equation of the form F(X) = 0 where F and X may be vectors or matrices.”*

The NAG Toolbox for MATLAB contains a total of 6 different routines that can perform similar calculations to fsolve. These 6 routines are split into 2 sets of 3 as follows

**For when you have function values only:**

c05nb – Solution of system of nonlinear equations using function values only (easy-to-use)

c05nc – Solution of system of nonlinear equations using function values only (comprehensive)

c05nd – Solution of system of nonlinear equations using function values only (reverse communication)

** **

**For when you have both function values and first derivatives**

c05pb – Solution of system of nonlinear equations using first derivatives (easy-to-use)

c05pc – Solution of system of nonlinear equations using first derivatives (comprehensive)

c05pd – Solution of system of nonlinear equations using first derivatives (reverse communication)

So, the NAG routine you choose depends on whether or not you can supply first derivatives and exactly which options you’d like to exercise. For many problems it will come down to a choice between the two ‘easy to use’ routines – c05nb and c05pb (at least, they are the ones I’ve used most of the time)

Let’s look at a simple example. You’d like to find a root of the following system of non-linear equations.

F(1)=exp(-x(1)) + sinh(2*x(2)) + tanh(2*x(3)) – 5.01;

F(2)=exp(2*x(1)) + sinh(-x(2) ) + tanh(2*x(3) ) – 5.85;

F(3)=exp(2*x(1)) + sinh(2*x(2) ) + tanh(-x(3) ) – 8.88;

i.e. you want to find a vector X containing three values for which F(X)=0.

To solve this using MATLAB and the optimisation toolbox you could proceed as follows, first create a .m file called fsolve_obj_MATLAB.m (henceforth referred to as the objective function) that contains the following

function F=fsolve_obj_MATLAB(x) F=zeros(1,3); F(1)=exp(-x(1)) + sinh(2*x(2)) + tanh(2*x(3)) - 5.01; F(2)=exp(2*x(1)) + sinh(-x(2) ) + tanh(2*x(3) ) - 5.85; F(3)=exp(2*x(1)) + sinh(2*x(2) ) + tanh(-x(3) ) - 8.88; end

Now type the following at the MATLAB command prompt to actually solve the problem:

options=optimset('Display','off'); %Prevents fsolve from displaying too much information startX=[0 0 0]; %Our starting guess for the solution vector X=fsolve(@fsolve_obj_MATLAB,startX,options)

MATLAB finds the following solution (Note that it’s only found one of the solutions, not all of them, which is typical of the type of algorithm used in fsolve)

X = 0.9001 1.0002 1.0945

Since we are not supplying the derivatives of our objective function and we are not using any special options, it turns out to be very easy to switch from using the optimisation toolbox to the NAG toolbox for this particular problem by making use of the routine c05nb.

First, we need to change the header of our objective function’s .m file from

function F=fsolve_obj_MATLAB(x)

to

function [F,iflag]=fsolve_obj_NAG(n,x,iflag)

so our new .m file, fsolve_obj_NAG.m, will contain the following

function [F,iflag]=fsolve_obj_NAG(n,x,iflag) F=zeros(1,3); F(1)=exp(-x(1)) + sinh(2*x(2)) + tanh(2*x(3)) - 5.01; F(2)=exp(2*x(1)) + sinh(-x(2) ) + tanh(2*x(3) ) - 5.85; F(3)=exp(2*x(1)) + sinh(2*x(2) ) + tanh(-x(3) ) - 8.88; end

Note that the ONLY change we made to the objective function was the very first line. The NAG routine c05nb expects to see some extra arguments in the objective function (iflag and n in this example) and it expects to see them in a very particular order but we are not obliged to use them. Using this modified objective function we can proceed as follows.

startX=[0 0 0]; %Our starting guess X=c05nb('fsolve_obj_NAG',startX)

NAG gives the same result as the optimisation toolbox:

X = 0.9001 1.0002 1.0945

Let’s look at timings. I performed the calculations above on an 800Mhz laptop running Ubuntu Linux 10.04, MATLAB 2009 and Mark 22 of the NAG toolbox and got the following timings (averaged over 10 runs):

Optimisation toolbox: 0.0414 seconds

NAG toolbox: 0.0021 seconds

So, for this particular problem NAG was faster than MATLAB by a factor of 19.7 times on my machine.

That’s all well and good, but this is a simple problem. Does this scale to more realistic problems I hear you ask? Well, the answer is ‘yes’. I’ve used this technique several times now on production code and it almost always results in some kind of speed-up. Not always 19.7 times faster, I’ll grant you, but usually enough to make the modifications worthwhile.

I can’t actually post any of the more complicated examples here because the code in question belongs to other people but I was recently sent a piece of code from a researcher that made heavy use of fsolve and a typical run took 18.19 seconds (that’s for everything, not just the fsolve bit). Simply by swapping a couple of lines of code to make it use the NAG toolbox rather than the optimisation toolbox I reduced the runtime to 1.08 seconds – a speed-up of almost 17 times.

Sadly, this technique doesn’t always result in a speed-up and I’m working on figuring out exactly when the benefits disappear. I guess that the main reason for NAG’s good performance is that it uses highly optimised, compiled Fortran code compared to MATLAB’s interpreted .m code. One case where NAG didn’t help was for a massively complicated objective function and the majority of the run-time of the code was spent evaluating this function. In this situation, NAG and MATLAB came out roughly neck and neck.

In the meantime, if you have some code that you’d like me to try it on then drop me a line.

**If you enjoyed this post then you may also like:**

A very welcome change in MATLAB 2010b is the merging of the splines and curve-fitting toolboxes to form one super-toolbox that does both splines AND curve-fitting in a product that’s simply called The Curve Fitting Toolbox. The separate curve-fitting and splines toolboxes were always a bugbear for me and many of my equivalents in other UK universities so now we have one less thing to moan about. Thanks to the HUGE number of MATLAB toolboxes, however, we’ll still have plenty of things to moan about at our quarterly academic maths and stats meetings.

I took a trawl through some of Mathworks standard toolboxes to see what other toolboxes I thought should be merged together or done away with completely. Here are a couple of pleas to The Mathworks that would make MY life (and the lives of the community I support) a whole lot easier.

### The Parallel Toolbox for MATLAB

My laptop has a dual core processor so, unless I am lucky enough to be using functions that are implicitly parallel, MATLAB will only make use of half of my available processing power. It gets worse when I get into work since it will only use a quarter of the power of my quad-core desktop and as for those lucky fellows who have just bought themselves dual hex-core MONSTERS (12 cores total)….well, MATLAB won’t exactly be using them to their full capacity will it? Less than 10% in fact, unless they shell out extra for the parallel computing toolbox (PCT).

The vast majority of computers sold today have more than one processing core and yet the vast majority of MATLAB installations out there only use one of them (unless you use these functions). MATLAB’s competitors such as Mathematica, Maple and Labview all have explicit multi-core support built in as standard. Maybe it’s time that MATLAB did that too!

**What I’d like Mathworks to do:**Merge the Parallel Computing Toolbox with core MATLAB.

### Optimisation and Global Optimisation

You’ve got a function and you want to know when it gets as big or as small as it can be so you turn to optimisation routines. MATLAB has some basic optimisation functions built into its core (fminsearch for example) but many people find that they need the extra power and versatility offered by the functions in either the optimisation toolbox or the global optimisation toolbox.

People who have never used optimisation routines before sometimes ask me what the difference between these toolboxes is. When I was first faced with this question I discussed the different classes of algorithms involved and used phrases like ‘genetic algorithms’, ‘multistart’, ‘simulated annealing’, ‘global and local minima’ and so on. These days I start off somewhat more simply:

**Me: **“The optimisation toolbox finds **A** minimum near to your starting guess. It may or may not be **THE** minimum of your function. The global optimisation toolbox, on the other hand, attempts to find **THE** minimum.”

**User:** “Well, I want **THE** minimum obviously. So, I guess I’ll take the global optimisation toolbox please.”

**Me: THE** minimum costs more money than just **A** minimum. Twice as much in fact, since you need to buy the standard optimisation toolbox **AND** the global optimisation toolbox if you want **THE** minimum.

**User:** *<Deep in thought while they consider how far their grant is going to stretch>*

**Me:** The standard optimisation toolbox won’t cost you anything here since we have a set of licenses for it on our network license server.

**User:** OK OK. I’ll make do with that. I suppose I could just make LOTS of starting guesses and run the standard optimisation toolbox routines in parallel on my 12-core monster? Then I can take the best result and there will be a better chance that it will be THE minimum, right?

**Me:** That’ll need the Parallel computing toolbox…which costs extra!

**What I’d like Mathworks to do:** Merge the standard and global optimisation toolboxes.

So, that’s a couple of things that I would like. Do you agree with them? Are there other toolbox merges you’d like to see? Comments welcome.

One of the new features in MATLAB 2010b that’s getting me very excited is the CUDA based GP-GPU (General Purpose computation on Graphical Processing Units) integration that’s become available in the Parallel Computing Toolbox. As soon as I had MATLAB 2010b installed on my CUDA capable laptop (Dell XPS M1330 with a GeForce 8400M GS) running Ubuntu I wanted to try out as much of this new functionality as my low-end hardware would allow me. I’ve installed and played with CUDA on this machine in the past and so I fired up MATLAB 2010b and issued the following command to ask MATLAB how many CUDA enabled devices it thought I had on my system

gpuDeviceCount ??? Error using ==> feval The CUDA driver was found, but it is too old. The CUDA driver on your system is version: 3. The required CUDA version is: 3.1 or greater.

The practical upshot of the above error message is that I needed to upgrade my NVidia graphics driver which was at version 195.36.24. I went for the latest version which, at the time of writing, is version 256.53. I did this from the **NVIDIA-Linux-x86-256.53.run** file which I got direct from NVidia and all I’ll say about the process is that it ruined an otherwise perfectly good Sunday morning. I did get there in the end though!

So, I had the shiniest version of the graphics driver up and running. Time to fire up MATLAB again:

>> gpuDeviceCount Warning: The device selected (device 1, "GeForce 8400M GS") does not have sufficient compute capability to be used. Compute capability 1.3 (or greater) is required, the selected device has compute capability 1.1. > In deviceCount at 7 In GPUDevice.GPUDevice>GPUDevice.count at 27 In gpuDeviceCount at 10

Or, to put it another way, “**Please insert new laptop to continue.**” With that, my MATLAB/CUDA experiments are brought to an end.

Can anyone recommend a reasonably priced laptop that contains a CUDA capable graphics card at compute level 1.3 or above?

**Update (16th September 2010): **Several people have emailed me to defend The Mathwork’s design decision so I’d like to make something very clear: I completely agree with The Matworks in their insistence on CUDA compute level 1.3 or above. As one correspondent pointed out, this ensures that not only does the hardware support double precision but it is also IEEE-compliant and IEEE-compliance is a good thing! This blog post was never meant to criticize The Mathworks over this, I wrote it partly to ensure that more people are aware of the requirements and partly because I needed to vent over the sudden obsolescence of my relatively new laptop.

I recently had a set of files that were named as follows

frame1.png frame2.png frame3.png frame4.png frame5.png frame6.png frame7.png frame8.png frame9.png frame10.png

and so on, right up to frame750.png. The plan was to turn these .png files into an uncompressed movie using mencoder via the following command (original source)

mencoder mf://*.png -mf w=720:h=720:fps=25:type=png -ovc raw -oac copy -o output.avi

but I ended up with a movie that jumped all over the place since the frames were in an odd order. In the following order in fact

frame0.csv frame100.csv frame101.csv frame102.csv frame103.csv frame104.csv frame105.csv frame106.csv frame107.csv frame108.csv frame109.csv frame10.csv frame110.csv

This is because globbing expansion (the ***.png** bit) is alphabetical in bash rather than numerical.

One way to get the frames in the order that I want is to zero-pad them. In other words I replace **file1.png** with **file001.png** and **file20.png** with **file020.png** and so on. Here’s how to do that in bash

#!/bin/bash num=`expr match "$1" '[^0-9]*\([0-9]\+\).*'` paddednum=`printf "%03d" $num` echo ${1/$num/$paddednum}

Save the above to a file called zeropad.sh and then do the following command to make it executable

chmod +x ./zeropad.sh

You can then use the zeropad.sh script as follows

./zeropad.sh frame1.png

which will return the result

frame001.png

All that remains is to use this script to rename all of the .png files in the current directory such that they are zeropadded.

for i in *.png;do mv $i `./zeropad.sh $i`; done

You may want to change the number of digits used in each filename from 3 to 5 (say). To do this just change %03d in zeropad.sh to %05d

Let me know if you find this useful or have an alternative solution you’d like to share (in another language maybe?)