Walking Randomly

Archive for the ‘Condor’ Category

Checkpointing MATLAB Programs

February 19th, 2014

I occasionally get emails from researchers saying something like this

‘My MATLAB code takes a week to run and the cleaner/cat/my husband keeps switching off my machine before it’s completed — could you help me make the code go faster please so that I can get my results in between these events’

While I am more than happy to try to optimise the code in question, what these users really need is some sort of checkpointing scheme. Checkpointing is also important for users of high performance computing systems that limit the length of each individual job.

The solution – Checkpointing (or ‘Assume that your job will frequently be killed’)

The basic idea behind checkpointing is to periodically save your program’s state so that, if it is interrupted, it can start again where it left off rather than from the beginning. In order to demonstrate some of the principals involved, I’m going to need some code that’s sufficiently simple that it doesn’t cloud what I want to discuss. Let’s add up some numbers using a for-loop.

%addup.m
%This is not the recommended way to sum integers in MATLAB -- we only use it here to keep things simple
%This version does NOT use checkpointing

mysum=0;
for count=1:100
    mysum = mysum + count;
    pause(1);           %Let's pretend that this is a complicated calculation
    fprintf('Completed iteration %d \n',count);
end

fprintf('The sum is %f \n',mysum);

Using a for-loop to perform an addition like this is something that I’d never usually suggest in MATLAB but I’m using it here because it is so simple that it won’t get in the way of understanding the checkpointing code.

If you run this program in MATLAB, it will take about 100 seconds thanks to that pause statement which is acting as a proxy for some real work. Try interrupting it by pressing CTRL-C and then restart it. As you might expect, it will always start from the beginning:

>> addup
Completed iteration 1
Completed iteration 2
Completed iteration 3
Operation terminated by user during addup (line 6)

>> addup
Completed iteration 1
Completed iteration 2
Completed iteration 3
Operation terminated by user during addup (line 6)

This is no big deal when your calculation only takes 100 seconds but is going to be a major problem when the calculation represented by that pause statement becomes something like an hour rather than a second.

Let’s now look at a version of the above that makes use of checkpointing.

%addup_checkpoint.m
if exist( 'checkpoint.mat','file' ) % If a checkpoint file exists, load it
    fprintf('Checkpoint file found - Loading\n');
    load('checkpoint.mat')

else %otherwise, start from the beginning
    fprintf('No checkpoint file found - starting from beginning\n');
    mysum=0;
    countmin=1;
end

for count = countmin:100
    mysum = mysum + count;
    pause(1);           %Let's pretend that this is a complicated calculation

    %save checkpoint
    countmin = count+1;  %If we load this checkpoint, we want to start on the next iteration
    fprintf('Saving checkpoint\n');
    save('checkpoint.mat');

    fprintf('Completed iteration %d \n',count);
end
fprintf('The sum is %f \n',mysum);

Before you run the above code, the checkpoint file checkpoint.mat does not exist and so the calculation starts from the beginning. After every iteration, a checkpoint file is created which contains every variable in the MATLAB workspace. If the program is restarted, it will find the checkpoint file and continue where it left off. Our code now deals with interruptions a lot more gracefully.

>> addup_checkpoint
No checkpoint file found - starting from beginning
Saving checkpoint
Completed iteration 1 
Saving checkpoint
Completed iteration 2 
Saving checkpoint
Completed iteration 3 
Operation terminated by user during addup_checkpoint (line 16)

>> addup_checkpoint
Checkpoint file found - Loading
Saving checkpoint
Completed iteration 4 
Saving checkpoint
Completed iteration 5 
Saving checkpoint
Completed iteration 6 
Operation terminated by user during addup_checkpoint (line 16)

Note that we’ve had to change the program logic slightly. Our original loop counter was

for count = 1:100

In the check-pointed example, however, we’ve had to introduce the variable countmin

for count = countmin:100

This allows us to start the loop from whatever value of countmin was in our last checkpoint file. Such minor modifications are often necessary when converting code to use checkpointing and you should carefully check that the introduction of checkpointing does not introduce bugs in your code.

Don’t checkpoint too often

The creation of even a small checkpoint file is a time consuming process. Consider our original addup code but without the pause command.

%addup_nopause.m
%This version does NOT use checkpointing
mysum=0;
for count=1:100
    mysum = mysum + count;
    fprintf('Completed iteration %d \n',count);
end
fprintf('The sum is %f \n',mysum);

On my machine, this code takes 0.0046 seconds to execute. Compare this to the checkpointed version, again with the pause statement removed.

%addup_checkpoint_nopause.m

if exist( 'checkpoint.mat','file' ) % If a checkpoint file exists, load it
    fprintf('Checkpoint file found - Loading\n');
    load('checkpoint.mat')

else %otherwise, start from the beginning
    fprintf('No checkpoint file found - starting from beginning\n');
    mysum=0;
    countmin=1;
end

for count = countmin:100
    mysum = mysum + count;

    %save checkpoint
    countmin = count+1;  %If we load this checkpoint, we want to start on the next iteration
    fprintf('Saving checkpoint\n');
    save('checkpoint.mat');

    fprintf('Completed iteration %d \n',count);
end
fprintf('The sum is %f \n',mysum);

This checkpointed version takes 0.85 seconds to execute on the same machine — Over 180 times slower than the original! The problem is that the time it takes to checkpoint is long compared to the calculation time.

If we make a modification so that we only checkpoint every 25 iterations, code execution time comes down to 0.05 seconds:

%Checkpoint every 25 iterations

if exist( 'checkpoint.mat','file' ) % If a checkpoint file exists, load it
    fprintf('Checkpoint file found - Loading\n');
    load('checkpoint.mat')

else %otherwise, start from the beginning
    fprintf('No checkpoint file found - starting from beginning\n');
    mysum=0;
    countmin=1;
end

for count = countmin:100
    mysum = mysum + count;
    countmin = count+1;  %If we load this checkpoint, we want to start on the next iteration

    if mod(count,25)==0
        %save checkpoint   
        fprintf('Saving checkpoint\n');
        save('checkpoint.mat');
    end

    fprintf('Completed iteration %d \n',count);
end
fprintf('The sum is %f \n',mysum);

Of course, the issue now is that we might lose more work if our program is interrupted between checkpoints. Additionally, in this particular case, the mod command used to decide whether or not to checkpoint is more expensive than simply performing the calculation but hopefully that isn’t going to be the case when working with real world calculations.

In practice, we have to work out a balance such that we checkpoint often enough so that we don’t stand to lose too much work but not so often that our program runs too slowly.

Checkpointing code that involves random numbers

Extra care needs to be taken when running code that involves random numbers. Consider a modification of our checkpointed adding program that creates a sum of random numbers.

%addup_checkpoint_rand.m
%Adding random numbers the slow way, in order to demo checkpointing
%This version has a bug

if exist( 'checkpoint.mat','file' ) % If a checkpoint file exists, load it
    fprintf('Checkpoint file found - Loading\n');
    load('checkpoint.mat')

else %otherwise, start from the beginning
    fprintf('No checkpoint file found - starting from beginning\n');
    mysum=0;
    countmin=1;
    rng(0);     %Seed the random number generator for reproducible results
end

for count = countmin:100
    mysum = mysum + rand();
    countmin = count+1;  %If we load this checkpoint, we want to start on the next iteration
    pause(1); %pretend this is a complicated calculation

    %save checkpoint
    fprintf('Saving checkpoint\n');
    save('checkpoint.mat');

    fprintf('Completed iteration %d \n',count);
end
fprintf('The sum is %f \n',mysum);

In the above, we set the seed of the random number generator to 0 at the beginning of the calculation. This ensures that we always get the same set of random numbers and allows us to get reproducible results. As such, the sum should always come out to be 52.799447 to the number of decimal places used in the program.

The above code has a subtle bug that you won’t find if your testing is confined to interrupting using CTRL-C and then restarting in an interactive session of MATLAB. Proceed that way, and you’ll get exactly the sum you’ll expect : 52.799447. If, on the other hand, you test your code by doing the following

Run for a few iterations
Interrupt with CTRL-C
Restart MATLAB
Run the code again, ensuring that it starts from the checkpoint

You’ll get a different result. This is not what we want!

The root cause of this problem is that we are not saving the state of the random number generator in our checkpoint file. Thus, when we restart MATLAB, all information concerning this state is lost. If we don’t restart MATLAB between interruptions, the state of the random number generator is safely tucked away behind the scenes.

Assume, for example, that you stop the calculation running after the third iteration. The random numbers you’d have consumed would be (to 4 d.p.)

0.8147
0.9058
0.1270

Your checkpoint file will contain the variables mysum, count and countmin but will contain nothing about the state of the random number generator. In English, this state is something like ‘The next random number will be the 4th one in the sequence defined by a starting seed of 0.’

When we restart MATLAB, the default seed is 0 so we’ll be using the right sequence (since we explicitly set it to be 0 in our code) but we’ll be starting right from the beginning again. That is, the 4th,5th and 6th iterations of the summation will contain the first 3 numbers in the stream, thus double counting them, and so our checkpointing procedure will alter the results of the calculation.

In order to fix this, we need to additionally save the state of the random number generator when we save a checkpoint and also make correct use of this on restarting. Here’s the code

%addup_checkpoint_rand_correct.m
%Adding random numbers the slow way, in order to demo checkpointing

if exist( 'checkpoint.mat','file' ) % If a checkpoint file exists, load it
    fprintf('Checkpoint file found - Loading\n');
    load('checkpoint.mat')

    %use the saved RNG state
    stream = RandStream.getGlobalStream;
    stream.State = savedState;

else % otherwise, start from the beginning
    fprintf('No checkpoint file found - starting from beginning\n');
    mysum=0;
    countmin=1;
    rng(0);     %Seed the random number generator for reproducible results
end

for count = countmin:100
    mysum = mysum + rand();
    countmin = count+1;  %If we load this checkpoint, we want to start on the next iteration
    pause(1); %pretend this is a complicated calculation

    %save the state of the random number genertor
    stream = RandStream.getGlobalStream;
    savedState = stream.State;
    %save checkpoint
    fprintf('Saving checkpoint\n');
    save('checkpoint.mat');

    fprintf('Completed iteration %d \n',count);
end
fprintf('The sum is %f \n',mysum);

Ensuring that the checkpoint save completes

Events that terminate our code can occur extremely quickly — a powercut for example. There is a risk that the machine was switched off while our check-point file was being written. How can we ensure that the file is complete?

The solution, which I found on the MATLAB checkpointing page of the Liverpool University Condor Pool site is to first write a temporary file and then rename it. That is, instead of

save('checkpoint.mat')/pre>

we do

if strcmp(computer,'PCWIN64') || strcmp(computer,'PCWIN')
            %We are running on a windows machine
            system( 'move /y checkpoint_tmp.mat checkpoint.mat' );
else
            %We are running on Linux or Mac
            system( 'mv checkpoint_tmp.mat checkpoint.mat' );
end

As the author of that page explains ‘The operating system should guarantee that the move command is “atomic” (in the sense that it is indivisible i.e. it succeeds completely or not at all) so that there is no danger of receiving a corrupt “half-written” checkpoint file from the job.’

Only checkpoint what is necessary

So far, we’ve been saving the entire MATLAB workspace in our checkpoint files and this hasn’t been a problem since our workspace hasn’t contained much. In general, however, the workspace might contain all manner of intermediate variables that we simply don’t need in order to restart where we left off. Saving the stuff that we might not need can be expensive.

For the sake of illustration, let’s skip 100 million random numbers before adding one to our sum. For reasons only known to ourselves, we store these numbers in an intermediate variable which we never do anything with. This array isn’t particularly large at 763 Megabytes but its existence slows down our checkpointing somewhat. The correct result of this variation of the calculation is 41.251376 if we set the starting seed to 0; something we can use to test our new checkpoint strategy.

Here’s the code

% A demo of how slow checkpointing can be if you include large intermediate variables

if exist( 'checkpoint.mat','file' ) % If a checkpoint file exists, load it
    fprintf('Checkpoint file found - Loading\n');
    load('checkpoint.mat')
    %use the saved RNG state
    stream = RandStream.getGlobalStream;
    stream.State = savedState;
else %otherwise, start from the beginning
    fprintf('No checkpoint file found - starting from beginning\n');
    mysum=0;
    countmin=1;
    rng(0);     %Seed the random number generator for reproducible results
end

for count = countmin:100
    %Create and store 100 million random numbers for no particular reason
    randoms = rand(10000);
    mysum = mysum + rand();
    countmin = count+1;  %If we load this checkpoint, we want to start on the next iteration
    fprintf('Completed iteration %d \n',count);
    
    if mod(count,25)==0
        %save the state of the random number generator
        stream = RandStream.getGlobalStream;
        savedState = stream.State;
        %save and time checkpoint
        tic
        save('checkpoint_tmp.mat');
        if strcmp(computer,'PCWIN64') || strcmp(computer,'PCWIN')
            %We are running on a windows machine
            system( 'move /y checkpoint_tmp.mat checkpoint.mat' );
        else
            %We are running on Linux or Mac
            system( 'mv checkpoint_tmp.mat checkpoint.mat' );
        end
        timing = toc;
        fprintf('Checkpoint save took %f seconds\n',timing);
    end
    
end
fprintf('The sum is %f \n',mysum);

On my Windows 7 Desktop, each checkpoint save takes around 17 seconds:

Completed iteration 25 
        1 file(s) moved. 
Checkpoint save took 17.269897 seconds

It is not necessary to include that huge random matrix in a checkpoint file. If we are explicit in what we require, we can reduce the time taken to checkpoint significantly. Here, we change

save('checkpoint_tmp.mat');

save('checkpoint_tmp.mat','mysum','countmin','savedState');

This has a dramatic effect on check-pointing time:

Completed iteration 25 
        1 file(s) moved. 
Checkpoint save took 0.033576 seconds

Here’s the final piece of code that uses everything discussed in this article

%Final checkpointing demo

if exist( 'checkpoint.mat','file' ) % If a checkpoint file exists, load it
    fprintf('Checkpoint file found - Loading\n');
    load('checkpoint.mat')
    %use the saved RNG state
    stream = RandStream.getGlobalStream;
    stream.State = savedState;
else %otherwise, start from the beginning
    fprintf('No checkpoint file found - starting from beginning\n');
    mysum=0;
    countmin=1;
    rng(0);     %Seed the random number generator for reproducible results
end

for count = countmin:100
    %Create and store 100 million random numbers for no particular reason
    randoms = rand(10000);
    mysum = mysum + rand();
    countmin = count+1;  %If we load this checkpoint, we want to start on the next iteration
    fprintf('Completed iteration %d \n',count);
    
    if mod(count,25)==0 %checkpoint every 25th iteration
        %save the state of the random number generator
        stream = RandStream.getGlobalStream;
        savedState = stream.State;
        %save and time checkpoint
        tic
        %only save the variables that are strictly necessary
        save('checkpoint_tmp.mat','mysum','countmin','savedState');
        %Ensure that the save completed
        if strcmp(computer,'PCWIN64') || strcmp(computer,'PCWIN')
            %We are running on a windows machine
            system( 'move /y checkpoint_tmp.mat checkpoint.mat' );
        else
            %We are running on Linux or Mac
            system( 'mv checkpoint_tmp.mat checkpoint.mat' );
        end
        timing = toc;
        fprintf('Checkpoint save took %f seconds\n',timing);
    end
    
end
fprintf('The sum is %f \n',mysum);

Parallel checkpointing

If your code includes parallel regions using constructs such as parfor or spmd, you might have to do more work to checkpoint correctly. I haven’t considered any of the potential issues that may arise in such code in this article

Checkpointing checklist

Here’s a reminder of everything you need to consider

Test to ensure that the introduction of checkpointing doesn’t alter results
Don’t checkpoint too often
Take care when checkpointing code that involves random numbers – you need to explicitly save the state of the random number generator.
Take measures to ensure that the checkpoint save is completed
Only checkpoint what is necessary
Code that includes parallel regions might require extra care

4 comments

Linting Condor Submission Scripts

July 19th, 2012

This is a guest post by colleague and friend, Ian Cottam of The University of Manchester.

I recently implemented a lint program for Condor (unsurprisingly called: condor_lint), analogous to the famous one for C of days of yore. That is, it points out the fluff in a Condor script and suggests improvements. It is based on our local knowledge of how our Condor Pool is set up, here in Manchester, and also reflects recent changes to Condor.

Why I did it is interesting and may have wider applicability. Everything it reports is already written up on our extensive internal web site which users rarely read. I suspect the usual modus operandi of our users is to find, or be given, a Condor script, relevant to their domain, and make the minimum modifications to it that means it ‘works’. Subsequently, its basic structure is never updated (apart from referencing new data files, etc).

To be fair, that’s what we all do — is it not?

Ignoring our continually updated documentation means that a user’s job may make poor use of the Condor Pool, affecting others, and costing real money (in wasted energy) through such “bad throughput”.

Now, although we always run the user’s job if Condor’s basic condor_submit command accepts it, we first automatically run condor_lint. This directly tells them any “bad news” and also, in many cases, gives them the link to the specific web page that explains the issue in detail.

Clearly, even such “in your face” advice can still be ignored, but we are starting to see improvements.

Obviously such an approach is not limited to Condor, and we would be interested in hearing of “lint approaches” with other systems.

Links

Other WalkingRandomly posts by Ian Cottam

2 comments

Experience and Good Taste in Software/Systems Design

May 10th, 2012

A guest post by Ian Cottam (@iandavidcottam).

I have been a programmer for 40 years this month. I thought I would write a short essay on things I experienced over that time that went into the design of a relatively recent, small program: DropAndCompute. (The purpose and general evolution of that program were described in a blog entry here. Please just watch the video there if you are new to DropAndCompute.)

Once I had the idea for DropAndCompute –inspired equally by the complexity of Condor and the simplicity of Dropbox — I coded and tested it in about two days. (Typically, one bug lasted until it had a real user, but it was no big deal.) My colleague, Mark Whidby, later re-factored/re-coded it to better scale as it grew in popularity here at The University of Manchester. I expect Mark spent about two days on it too. The user interface and basic design did not change. (As the evolution blog discusses, we eventually made the use of Dropbox optional, but that does not detract from this tale.)

Physically dropping programs and their data:
In the early to mid 1970s as well as doing my own programming work I helped scientists in Liverpool to run their code. One approach we used to make them faster was to drop the deck of cards into the input well of a card reader which was remotely connected to the regional supercomputer centre at Manchester. (I never knew what the communication mechanism was – probably a wet string given the technology of the time.) A nearby line printer was similarly connected and our results could be picked up, usually the next day. DropAndCompute is a 21st century version of this activity, without the leg work and humping of large boxes of cards about.

That this approach was worth the effort was made obvious with one of the first card decks I ever submitted. We had been running the code on an ICL 1903A computer in Liverpool; Manchester had a CDC 6600 (hopefully my memory has not let me down – it did become a CDC 7600 at some stage). Running the code locally in Liverpool, with typical data, took around 55 CPU minutes. Dropping it into that card reader so that it automatically ran in Manchester resulted in the jaw dropping time of 4 CPU seconds. (I still had to wait overnight to pick up the results, something that resonates with today’s DropAndCompute users and Manchester’s Condor Pool, which is only large and powerful overnight.)

Capabilities:
Later, but still in the mid 1970s, I worked for Plessey on their System 250 high-reliability, multiprocessor system. It was the first commercial example of a capability architecture. With such there is no supervisor state or privileged code rings or similar. If you held the capability to do something (e.g. read, write or enter another code context) you could do it. If you didn’t hold the appropriate capability, you could not. The only tiny section of code in the System 250 that was special was where capabilities were generated. No one else could make them.

The server side of DropAndCompute generates capabilities to the user client side. They are implemented as zero length files whose information content is just in their names. For job 3159, you get 3159.kill, 3159.vacate and 3159.debug generated*. By dragging and dropping one or more of these zero length files (capabilities) onto the dropbox the remote lower level Condor command code is automatically executed. [* You could try to make your own capability, such as 9513.kill, but it won’t work.]

UNIX and Shell Glue Code:
My initial exposure to the UNIX tools philosophy in the late 1970s profoundly influenced me (and still does). In essence, it says that one should build new software by inventing ‘glue’ to stick existing components together. The UNIX Shell is often the language of choice for this, and was for me. DropAndCompute is a good example of where a little bit of glue produced a hopefully impressive synergy.

The Internet not The Web:
DropAndCompute uses the Internet (clearly). It is not a Web application. I only mention this as some younger programmers, who have grown up with the Web always being there, seem to think the best/only architecture to use for a software system design is one implemented through a web browser using web services. I am grateful to be able to remember pre-Web days, as much as I love what Tim Berners-Lee created for us.

Client-Server:
I’m not sure when I first became aware of client-server architecture. I know a famous computer scientist (the late David Wheeler*) once described it as simply the obvious way to implement software systems. For my part, I’m a believer in the less code the client side (user) needs to install the better (less to go wrong on strange environments one has no control over). In the case of DropAndCompute if the user had Dropbox, it was nothing to install, and just downloading Dropbox if they didn’t.
[* As well as being a co-inventer of the subroutine, David Wheeler led the team that designed the first operational capability-based computer: the Cambridge University CAP.]

Rosetta – Software as Magic:
Around a decade ago I worked for Transitive, a University of Manchester spin-out, and the company that produced Rosetta for Apple. With apologies to Arthur C Clarke: all great software appears to be magic to the user. The simpler the user interface, often the more complex the underlying system is to implement the magic. This is true, for example, for Apple iOS and OS X and for Dropbox (simpler, and yet I would bet that it is internally more complex, than its many competitors). One small part of OS X I helped with is Rosetta (or was, as Apple dropped it from the Lion 10.7 release of OS X). Rosetta dynamically (i.e. on-the-fly) translates PowerPC applications into Intel x86 code. There is no noticeable user interface: you double click your desired application, like any other, and, if needed, Rosetta is invoked to do its job in the background.

I have read many interesting web based discussions about Rosetta, several say, or imply, that it is a relatively simple piece of software: nothing could be further from the truth. It’s likely still a commercial secret how it works, but if it were simple, Apple’s transition to Intel would likely have been a disaster. It took a lot of smart people a long time to make the magic appear that simple.

I tried to keep DropAndCompute’s interface to the user as simple as possible, even where it added some complexity to its implementation. The National Grid Service in the UK did their own version of DropAndCompute, but, for my taste, added too many bells and whistles.

In Conclusion:
I hope this brief essay has been of interest and given some insight into how many years of software/system design experience come to be applied, even to small software systems, both consciously and subconsciously, and always, hopefully, with good taste. Hide complexity, keep it simple for the user, make them think it is simply magic!

0 comments

The Evolution of “DropAndCompute” – easing the pain of accessing High Throughput Computing (and similar) facilities

March 21st, 2011

In my previous blog post I mentioned that I am a member of a team that supports High Throughput Computing (HTC) at The University of Manchester via a 1600+ core ‘condor pool’. In order to make it as easy as possible for our researchers to make use of this resource one of my colleagues, Ian Cottam, created a system called DropAndCompute. In this guest blog post, Ian describes DropAndCompute and how it evolved into the system we use at Manchester today.

The Evolution of “DropAndCompute” by Ian Cottam

DropAndCompute, as used at The University of Manchester’s Faculty of Engineering and Physical Sciences, is an approach to using network (or grid or cloud based) computational resources without having to know the operating system of the resource’s gateway or any command line tools of either the resource itself —Condor in our case — or in general. Most such gateways run a flavour of Unix, often Linux. Many of our users are either unfamiliar with Linux or just prefer a drag-and-drop interface, as I do myself despite using various flavours of Unix since Version 6 in the late 70s.

Why did I invent it? On its original web site description page wiki.myexperiment.org/index.php/DropAndCompute the following reasons are given:

A simple and uniform drag-and-drop graphical user interface, potentially, to many resource pools.
No use of terminal windows or command lines.
No need to login to remote hosts or install complicated grid-enabling software locally.
No need for the user to have an account on the remote resources (instead they are accounted by having a shared folder allocated). Of course, nothing stops the users from having accounts should that be preferred.
No need for complicated Virtual Private Networks, IP Tunnelling, connection brokers, or similar, in order to access grid resources on private subnets (provided at least one node is on the public Internet, which is the norm).
Pop-ups notify users of important events (basically, log and output files being created when a job has been accepted, and when the generated result files arrive).
Somewhat increased security as the user only has (indirect) access to a small subset of the computational resource’s commands.

Version One
The first version was used on a Condor Pool within our interdisciplinary biocentre (MIB). A video of it in use is shown below

Please do take the time to look at this video as it shows clearly how, for example, Condor can be used via this type of interface.

This version was notable for using the commercial service: Dropbox and, in fact, my being a Dropbox user inspired the approach and its name. Dropbox is trivial to install on any of the main platforms, on any number of computers owned by a user, and has a free version giving 2GB of synchronised and shared storage. In theory, only the computational resource supplier need pay for a 100GB account with Dropbox, have a local Condor submitting account, and share folders out with users of the free Dropbox-based service.

David De Roure, then at the University of Southampton and now Oxford, reviewed this approach here at blog.openwetware.org/deroure/?p=97, and offers his view as to why it is important in helping scientists start on the ‘ramp’ to using what can be daunting, if powerful, computational facilities.

Version Two

Quickly the approach migrated to our full, faculty-wide Condor Pool and the first modification was made. Now we used separate accounts for each user of the service on our submitting nodes; Dropbox still made this sharing scheme trivial to set up and manage, whilst giving us much better usage accounting information. The first minor problem came when some users needed more –much more in fact– than 2GB of space. This was solved by them purchasing their own 50GB or 100GB accounts from Dropbox.

Problems and objections

However, two more serious problems impacted our Dropbox based approach. First, the large volume of network traffic across the world to Dropbox’s USA based servers and then back down to local machines here in Manchester resulted in severe bottlenecks once our Condor Pool had reached the dizzy heights of over a thousand processor cores. We could have ameliorated this by extra resources, such as multiple submit nodes, but the second problem proved to be more of a showstopper.

Since the introduction of DropAndCompute several people –at Manchester and beyond– have been concerned about research data passing through commercial, USA-based servers. In fact, the UK’s National Grid Service (NGS) who have implemented their own flavour of DropAndCompute did not use Dropbox for this very reason. The US Patriot Act means that US companies must surrender any data they hold if officially requested to do so by Federal Government agencies. Now one approach to this is to do user-level encryption of the data before it enters the user’s dropbox. I have demonstrated this approach, but it complicates the model and it is not so straightforward to use exactly the same method on all of the popular platforms (Windows, Mac, Linux).

Version Three

To tackle the above issues we implemented a ‘local version’ of DropAndCompute that is not Dropbox based. It is similar to the NGS approach, but, in my opinion, much simpler to setup. The user merely has to mount a folder on the submit node on their local computer(s), and then use the same drag-and-drop approach to get the job initiated, debugged and run (or even killed, when necessary). This solves the above issues, but could be regarded as inferior to the Dropbox based approach in five ways:

1. The convenience and transparency of ‘offline’ use. That is, Dropbox jobs can be prepared on, say, a laptop with or without net access, and when the laptop next connects the job submissions just happens. Ditto for the results coming back.

2. When online and submitting or waiting for results with the local version, the folder windows do not update to give the user an indication of progress.

3. Users must remember to use an email notification that a job has finished, or poll to check its status.

4. The initial setup is a little harder for the local version compared with using Dropbox.

5. The computation’s result files are not copied back automatically.

So far, only item 5 has been remarked on by some of our users, and it, and the others, could be improved with some programming effort.

A movie of this version is shown below; it doesn’t have any commentary, but essentially follows the same steps as the Dropbox based video. You will see the network folder’s window having to be refreshed manually –this is necessary on a Mac (but could be scripted); other platforms may be better– and results having to be dragged back from the mounted folder.

I welcome comments on any aspect of this –still evolving– approach to easing the entry ‘cost’ to using distributed computing resources.

Acknowledgments
Our Condor Pool is supported by three colleagues besides myself: Mark Whidby, Mike Croucher and Chris Paul. Mark, inter alia, maintains the current version of DropAndCompute that can operate locally or via Dropbox. Thanks also to Mike for letting me be a guest on Walking Randomly.

0 comments

Why I love my job #1: High Throughput Computing

March 17th, 2011

Part of my job at the University of Manchester is to help support the use of High Throughput Computing (HTC) services. I am part of a team that works within Manchester’s Faculty of Engineering and Physical Sciences (Physics,Chemistry,Maths,Engineering,Computer Science,Earth Sciences) but we don’t just work with ‘our’ schools; we also collaborate with teams from all over the University along with the central Research Support team.

For example, say you are a University of Manchester researcher and have written a monte-carlo simulation in MATLAB where a typical run takes 5 hours to complete. In order to get good results you might want to run this simulation 1000 times which is going to take your desktop machine quite a while; about 7 months in fact! You can either wait for 7 months or you can contact us for assistance. We’ll then do the following for you:

Attempt to optimise your MATLAB code. Recent speed-ups range from 10% to 20,000%
Give you access to our Condor pool (currently peaking at 1600 processor cores, more expected in the near future)
Assist you in modifying your code and writing script wrappers to make best use of those 1600 cores.
If appropriate, put you in touch with colleagues who support traditional HPC (High Performance Computing) supercomputers, GPUs (Graphics Processing Units) and similar technology.

Once we’ve done our bit, you throw your code at our Condor pool and get your results in less than an evening! Obviously we are not confined to MATLAB; we’ve also assisted users of Python, Mathematica, C, FORTRAN, Amber, Morphy and many more. Our job is to help researchers do research more quickly!

For a mathy science geek with a penchant for mathematical software it really doesn’t get much better than this! I get to play with high end hardware and the latest software, I get to learn new areas of science and mathematics and I get to work with world-leading experts in a multitude of fields. More importantly, I get to make a real difference.

Yep, this is one part of my job that I really love. Expect to see more articles on High Throughput Computing and technologies such as Condor on WalkingRandomly in the very near future.

0 comments

Parallel Random Numbers in MATLAB #1

July 13th, 2010

A bit of background to this post

I work in the IT department of the University of Manchester and we are currently developing a Condor Pool which is basically a method of linking together hundreds of desktop machines to produce a high-throughput computing resource. A MATLAB user recently submitted some jobs to our pool and complained that all of them gave identical results which is stupid because his code used MATLAB’s rand command to mix things up a bit.

I was asked if I knew why this should happen to which I replied ‘yes.’ I was then asked to advise the user how to fix the problem and I did so. The next request was for me to write some recommendations and tutorials on how users should use random numbers in MATLAB (and Mathematica and possibly Python while I was at it) along with our Condor Pool and I went uncharacteristically quiet for a while.

It turned out that I had a lot to learn about random numbers. This is the first of a series of (probably 2) posts that will start off by telling you what I knew and move on to what I have learned. It’s as much a vehicle for getting the concepts straight in my head as it is a tutorial.

Ask MATLAB for 10 Random Numbers

Before we go on, I’d like you to try something for me. You have to start on a system that doesn’t have MATLAB running at all so if MATLAB is running then close it before proceeding. Now, open up MATLAB and before you do anything else issue the following command

rand(10,1)

As many of you will know, the rand command produces random numbers from the uniform distribution between 0 and 1 and the command above is asking for 10 such numbers. You may reasonably expect that the 10 random numbers that you get will be different from the 10 random numbers that I get; after all, they are random right? Well, I got the following numbers when running the above command on MATLAB 2009b running on Linux.

Look familiar?

Now I’ve done this experiment with a number of people over the last few weeks and the responses can be roughly split into two different camps as follows:

1. Oh yeah, I know all about that – nothing to worry about. It’s pretty obvious why it happens isn’t it?
2. It’s a bug. How can the numbers be random if MATLAB always returns the same set?

What does random mean anyway?

If you are new to the computer generation of random numbers then there is something that you need to understand and that is that, strictly speaking, these numbers (like all software generated ‘random’ numbers) are not ‘truly’ random. Instead they are pseudorandom – my personal working definition of which is “A sequence of numbers generated by some deterministic algorithm in such a way that they have the same statistical properties of ‘true’ random numbers”. In other words, they are not random they just appear to be but the appearance is good enough most of the time.

Pseudorandom numbers are generated from deterministic algorithms with names like Mersenne Twister, L’Ecuyer’s mrg32k3a [1] and Blum Blum Schub whereas ‘true’ random numbers come from physical processes such as radioactive decay or atmospheric noise (the website www.random.org uses atmospheric noise for example).

For many applications, the distinction between ‘truly random’ and ‘pseudorandom’ doesn’t really matter since pseudorandom numbers are ‘random enough’ for most purposes. What does ‘random enough’ mean you might ask? Well as far as I am concerned it means that the random number generator in question has passed a set of well defined tests for randomness – something like Marsaglia’s Diehard tests or, better still, L’Ecuyer and Simard’s TestU01 suite will do nicely for example.

The generation of random numbers is a complicated topic and I don’t know enough about it to do it real justice but I know a man who does. So, if you want to know more about the theory behind random numbers then I suggest that you read Pierre L’Ecuyer’s paper simply called ‘Random Numbers’ (pdf file).

Back to MATLAB…

Always the same seed

So, which of my two groups are correct? Is there a bug in MATLAB’s random number generator or not?

There is nothing wrong with MATLAB’s random number generator at all. The reason why the command rand(10,1) will always return the same 10 numbers if executed on startup is because MATLAB always uses the same seed for its pseudorandom number generator (which at the time of writing is a Mersenne Twister) unless you tell it to do otherwise.

Without going into details, a seed is (usually) an integer that determines the internal state of a random number generator. So, if you initialize a random number generator with the same seed then you’ll always get the same sequence of numbers and that’s what we are seeing in the example above.

Sometimes, this behaviour isn’t what we want. For example, say I am doing a Monte Carlo simulation and I want to run it several times to verify my results. I’m going to want a different sequence of random numbers each time or the whole exercise is going to be pointless.

One way to do this is to initialize the random number generator with a different seed at startup and a common way of achieving this is via the system clock. The following comes straight out of the current MATLAB documentation for example

RandStream.setDefaultStream(RandStream('mt19937ar','seed',sum(100*clock)));

If you are using MATLAB 2011a or above then you can use the following, much simpler syntax to do the same thing

rng shuffle

Do this once per MATLAB session and you should be good to go (there is usually no point in doing it more than once per session by the way….your numbers won’t be any ‘more random’ if you so. In fact, there is a chance that they will become less so!).

Condor and ‘random’ random seeds

Sometimes the system clock approach isn’t good enough either. For example, at my workplace, Manchester University, we have a Condor Pool of hundreds of desktop machines which is perfect for people doing Monte Carlo simulations. Say a single simulation takes 5 hours and it needs to be run 100 times in order to get good results. On one machine that’s going to take about 3 weeks but on our Condor Pool it can take just 5 hours since all 100 simulations run at the same time but on different machines.

If you don’t think about random seeding at all then you end up with 100 identical sets of results using MATLAB on Condor for the reasons I’ve explained above. Of course you know all about this so you switch to using the clock seeding method, try again and….get 100 identical sets of results[2].

The reason for this is that the time on all 100 machines is synchronized using internet time servers. So, when you start up 100 simultaneous jobs they’ll all have the same timestamp and, therefore, have the same random seed.

It seems that what we need to do is to guarantee (as far as possible) that every single one of our condor jobs gets a unique seed in order to provide a unique random number stream and one way to do this would be to incorporate the condor process ID into the seed generation in some way and there are many ways one could do this. Here, however, I’m going to take a different route.

On Linux machines it is possible to obtain small numbers of random numbers using the special files /dev/random and /dev/urandom which are interfaces to the kernel’s random number generator. According to the documentation ‘The random number generator gathers environmental noise from device drivers and other sources into an entropy pool. The generator also keeps an estimate of the number of bit of the noise in the entropy pool. From this entropy pool random numbers are created.’

This kernel generator isn’t suitable for simulation purposes but it will do just fine for generating an initial seed for MATLAB’s pseudorandom number generator. Here’re the MATLAB commands

[status seed] = system('od /dev/urandom --read-bytes=4 -tu | awk ''{print $2}''');
seed=str2double(seed);
RandStream.setDefaultStream(RandStream('mt19937ar','seed',seed));

In MATLAB 2011a or above you can change this to

[status seed] = system('od /dev/urandom --read-bytes=4 -tu | awk ''{print $2}''');
seed=str2double(seed);
rng(seed);

Put this at the beginning of the MATLAB script that defines your condor job and you should be good to go. Don’t do it more than once per MATLAB session though – you won’t gain anything!

The end or the end of the beginning?

If you asked me the question ‘How do I generate a random seed for a pseudorandom number generator?’ then I think that the above solution answers it quite well. If, however, you asked me ‘What is the best way to generate multiple independent random number streams that would be good for thousands of monte-carlo simulations?‘ then we need to rethink somewhat for the following reasons.

Seed collisions: The Mersenne twister algorithm currently used as the default random number generator for MATLAB uses a 32bit integer seed which means that it can take on 2^32 or 4,294,967,296 different values – which seems a lot! However, by considering a generalisation of the birthday problem it can be seen that if you select such a seed at random then you have a 50% chance choosing two identical seeds after only 65,536 runs. In other words, if you perform 65,536 simulations then there is a 50% chance that two such simulations will produce identical results.

Bad seeds: I have read about (but never experienced) the possibility of ‘bad seeds’; that is some seeds that produce some very strange, non-random results – for the first few thousand iterates at least. This has led to some people advising that you should ‘warm-up’ your random number generator by asking for, and throwing away, a few thousand random numbers before you start using them. Does anyone know of any such bad seeds?

Overlapping or correlated sequences: All pseudorandom number generators are periodic (at least, all the ones that I know about are) – which means that after N iterations the sequence repeats itself. If your generator is good then N is usually large enough that you don’t need to worry about this. The Mersenne Twister used in MATLAB, for instance, has a huge period of (2^19937 – 1)/2 (half of the standard 32bit implementation).

The point I want to make is that you don’t get several different streams of random numbers, you get just one, albeit a very big one. Now, when you choose a seed you are essentially choosing a random point in this stream and there is no guarantee how far apart these two points are. They could be separated by a distance of trillions of points or they could be right next to each other – we simply do not know – and this leads to the possibility of overlapping sequences.

Now, one could argue that the possibility of overlap is very small in a generator such as the Mersenne Twister and I do not know of any situation where it has occurred in practice but that doesn’t mean that we shouldn’t worry about it. If your work is based on the assumption that all of your simulations have used independent, uncorrelated random number streams then there is a possibility that your assumptions could be wrong which means that your conclusions could be wrong. Unlikely maybe, but still no way to do science.

Next Time

Next time I’ll be looking at methods for generating guaranteed independent random number streams using MATLAB’s in-built functions as well as methods taken from the NAG Toolbox for MATLAB. I’ll also be including explicit examples that use this stuff in Condor.

Ask the audience

I assume that some of you will be in the business of performing Monte-Carlo simulations and so you’ll probably know much more about all of this than me. I have some questions

Has anyone come across any ‘bad seeds’ when dealing with MATLAB’s Mersenne Twister implementation?
Has anyone come across overlapping sequences when using MATLAB’s Mersenne Twister implementation?
How do YOU set up your random number generator(s).

I’m going to change my comment policy for this particular post in that I am not going to allow (most) anonymous comments. This means that you will have to give me your email address (which, of course, I will not publish) which I will use once to verify that it really was you that sent the comment.

Notes and References

[1] L’Ecuyer P (1999) Good parameter sets for combined multiple recursive random number generators Operations Research 47:1 159–164

[2] More usually you’ll get several different groups of results. For example you might get 3 sets of results, A B C, and get 30 sets of A, 50 sets of B and 20 sets of C. This is due to the fact that all 100 jobs won’t hit the pool at precisely the same instant.

[3] Much of this stuff has already been discussed by The Mathworks and there is an excellent set of articles over at Loren Shure’s blog – Loren onThe Art of MATLAB.

17 comments

TOP

Walking Randomly

Archive for the ‘Condor’ Category

Checkpointing MATLAB Programs

Linting Condor Submission Scripts

Experience and Good Taste in Software/Systems Design

The Evolution of “DropAndCompute” – easing the pain of accessing High Throughput Computing (and similar) facilities

Why I love my job #1: High Throughput Computing

Parallel Random Numbers in MATLAB #1

A bit of background to this post

Ask MATLAB for 10 Random Numbers

What does random mean anyway?

Always the same seed

Condor and ‘random’ random seeds

The end or the end of the beginning?

Next Time

Ask the audience

Notes and References

Categories

Archives

Blogroll

Meta

Walking Randomly

Archive for the ‘Condor’ Category

Checkpointing MATLAB Programs

Share this:

Linting Condor Submission Scripts

Share this:

Experience and Good Taste in Software/Systems Design

Share this:

The Evolution of “DropAndCompute” – easing the pain of accessing High Throughput Computing (and similar) facilities

Share this:

Why I love my job #1: High Throughput Computing

Share this:

Parallel Random Numbers in MATLAB #1

A bit of background to this post

Ask MATLAB for 10 Random Numbers

What does random mean anyway?

Always the same seed

Condor and ‘random’ random seeds

The end or the end of the beginning?

Next Time

Ask the audience

Notes and References

Share this:

Categories

Archives

Blogroll

Meta