Archive for the ‘HPC’ Category

April 13th, 2016

I was sat in a meeting recently where people were discussing High Performance Computing options. Someone had found some money down the back of the sofa and asked the rest of us how we’d spend it if it were ours (For the record, I wanted some big-memory nodes — 512Gb RAM minimum).

Halfway through the meeting someone asked about the parallel filesystem in use on Sheffield’s HPC systems and I answered Lustre — not expecting any further comment. I don’t know much about parallel I/O systems but I do know that all of the systems I have access to make use of Lustre. Sheffield’s Iceberg, Manchester’s Computational Shared Facility and the UK National supercomputer, Archer to name three. The Wikipedia page for Lustre suggests that it is very widely used in the HPC community worldwide.

I was surprised, then, when this comment was met with derision from some in the room. One person remarked ‘It’s a bit…um…classic..isn’t it?’, giving the impression that it was old and outdated. There was not much love for dear old Lustre it seemed!

There was no time in this meeting for me ask ‘What’s wrong with it then? Works fine for me!’so I did what I often do in times like this, I asked twitter:

I found the responses to be very instructive so thought I would share them here. Here’s a comment from Ben Waugh

Not a great start! I’ve been hit by Lustre downtime too but, since I’m not a sysadmin, I haven’t got a genuine sense of how big a problem it is. It happens often enough that I’m aware of it but not so often that I feel the need to worry about it. Looking after large HPC systems is difficult and when one attempts to do do something difficult, the occasional failure is inevitable.

It seems that configuration might play a part in stability and performance. Peter Van Heusden kicked off with

Adrian Jackson , Research Architect, HPC and Parallel Programmer, is thinking along similar lines.

Adrian also followed up with an interesting report on Performance of Parallel IO on ARCHER from June 2015. Thanks Adrian!

Glen K Lockwood is a Computational scientist specialising in I/O platforms at @NERSC, the flagship computing centre for the U.S. Dept. of Energy’s Office of Science. Glen has this insight,

I like this comment because it fits with my own, much less experienced, view of the parallel I/O world. A classic demonstration of confirmation bias perhaps but I see this thought pattern in a lot of areas. The pattern looks like this:

It’s hard to do ‘thing’. Most smart people do ‘thing’ with ‘foo’ and, since ‘thing’ is hard, many people have experienced problems with ‘foo’. Hence, people bash ‘foo’ a lot. ‘foo’ sucks!

Alternatives exist of course but there may be trade-offs. Here’s Mark Basham, software scientist  in the UK.

I’ll give the final word to Nick Holway who sums up how I feel about parallel storage from a user’s point of view.

June 26th, 2015

I hang out with a lot of people who work in the field of High Performance Computing (HPC) and learned long ago that a great conversation starter when in a room full of HPC experts is to ask the question ‘So…what IS High Performance Computing’ and then offer some opinion of my own. This apparently simple question can lead to some very heated debates!

I’m feeling in a curious mood and am wondering if this would work as a conversation starter for a blog post. So, here goes….

‘What IS High Performance Computing? I think it’s anything that requires more computational resources (memory, disk, CPU/GPU) than is available on a typical half-decent laptop or desktop’

What’s your view? Comments are open!

February 26th, 2015

Environment modules are widely used in the High Performance Computing (HPC) world where sysadmins need to install dozens, or maybe hundreds of potentially conflicting applications, libraries and compilers on multi-user machines. The University of Manchester’s Computational Shared Facility (CSF), for example, makes extensive use of environment modules and would be extremely difficult to run without them.

Once the sysadmin has correctly installed an application (MATLAB 2014a say) and set up the corresponding module file, making it available to your shell is as easy as doing

module load apps/binapps/matlab/R2014a

Unloading the module is just as easy

module unload apps/binapps/matlab/R2014a

On a heavily used, multi-user system environment modules are invaluable! Every user can have whatever compilers, libraries and applications they like — they just load and unload whatever they need from the huge selection supported by their ever-friendly sysadmins.

Environment modules on Ubuntu

I needed to install environment modules on a VM running Ubuntu 14.04 for my own use. I found a very nice setup guide at http://www.setuptips.com/unix/setup-environment-modules-on-ubuntu/ but it didn’t work. On attempting to compile, I got the error message

cmdModule.c:644:15: error: 'Tcl_Interp' has no member named 'errorLine'

This is a known bug in version 3.2.9c of environment modules and has a work-around.

I also found a set up guide at http://nickgeoghegan.net/linux/installing-environment-modules which had some useful advice on configuration..

Combining information from these sources, I managed to get a working install. Here are the steps I did in full for a clean Ubuntu 14.04 image

#Install the tcl development package
sudo apt-get install tcl-dev

#Make the directories where my modules and packages are going to live
sudo mkdir /opt/modules
sudo mkdir /opt/packages

#Get the source code. This was the most up to date version as of 25th Feb 2015
wget http://downloads.sourceforge.net/project/modules/Modules/modules-3.2.9/modules-3.2.9c.tar.gz

#unpack and enter source directory
tar xvzf modules-3.2.9c.tar.gz
cd modules-3.2.9

#Configure using the workaround and selecting my module folder 
CPPFLAGS="-DUSE_INTERP_ERRORLINE" ./configure --with-module-path=/opt/modules/

#make and install
make
sudo make install

#Edit the modulefiles path. Comment out all lines starting /usr so that only /opt/modules is used
sudo sed -i 's~^/usr~#/usr~' /usr/local/Modules/3.2.9/init/.modulespath

#Configure the shell to use modules
sudo tee /etc/profile.d/modules.sh > /dev/null << 'EOF'
#----------------------------------------------------------------------#
# system-wide profile.modules #
# Initialize modules for all sh-derivative shells #
#----------------------------------------------------------------------#
trap "" 1 2 3

MODULES=/usr/local/Modules/3.2.9

case "$0" in
-bash|bash|*/bash) . $MODULES/init/bash ;;
-ksh|ksh|*/ksh) . $MODULES/init/ksh ;;
-sh|sh|*/sh) . $MODULES/init/sh ;;
*) . $MODULES/init/sh ;; # default for scripts
esac

trap - 1 2 3
EOF

#Add modules to your .bashrc file
echo '#For modules' >> ~/.bashrc
echo '. /etc/profile.d/modules.sh' >> ~/.bashrc

That takes care of the basic setup but modules is pretty useless at this stage. To make it useful, you need to install some extra software and the corresponding module file.

Installing a module file for Anaconda Python 2.1
This is a really simple example of how to set up a basic module file

I downloaded and installed Anaconda Python 2.1 to /opt/packages and created a file called anaconda2.1 in /opt/modules containing the following

#%Module1.0
proc ModulesHelp { } {
global dotversion
 
puts stderr "\tAnaconda Python 2.1 providing Python 2.7.8"
}
 
module-whatis "Anaconda Python 2.1"
prepend-path PATH /opt/packages/anaconda/bin

Now, when I do the command

module avail

I get

-------------------------- /opt/modules/ ---------------------------
anaconda2.1

I can load my anaconda2.1 module with the command

module load anaconda2.1

Now, when I type python at the command prompt, I’ll be using Anaconda’s python rather than the system python. Once I’m done, I can unload with

module unload anaconda2.1

This example is so trivial it’s almost not worth it — modules really come into their own when you need to support loads of compilers and corresponding libraries. There’s an example using gcc at http://nickgeoghegan.net/linux/installing-environment-modules.

August 4th, 2014

I work at The University of Manchester as part of IT Service’s Research Support department. This month, I was given the task of editing the newsletter that we send out to the academics we serve.

For me, this month’s highlights include the fact that our Condor Pool (which I help build and run) has now delivered over 20 million CPU hours to academics at Manchester by making use of desktop PCs around campus. Another HPC system we have at Manchester is our Computational Shared Facility which is jointly funded by the University and all of the research groups who use it. This system has just seen the installation of its 6000-th CPU core.

Other news includes the Greater Manchester Data Dive, an AGM for Research Software Engineers, Image based modelling and more.

If you are interested in seeing the sort of thing we get up to, check out this months newsletter at http://documents.manchester.ac.uk/DocuInfo.aspx?DocID=20995

January 29th, 2013

I am having a friendly argument with a colleague over how you calculate the peak number of floating operations per second (flops) for devices that support Fused Multiply Add (FMA).  The FMA operation is d=a+b*c, an operation that can be done in one cycle on devices that support it.

I say that an FMA operation is two flops, he says it’s one.  So, when I calculate the theoretical peak of a device I get twice the value he does.  So, what do you think..is FMA one flop or two?

November 13th, 2012

Intel have finally released the Xeon Phi – an accelerator card based on 60 or so customised Intel cores to give around a Teraflop of double precision performance.  That’s comparable to the latest cards from NVIDIA (1.3 Teraflops according to http://www.theregister.co.uk/2012/11/12/nvidia_tesla_k20_k20x_gpu_coprocessors/) but with one key difference—you don’t need to learn any new languages or technologies to take advantage of it (although you can do so if you wish)!

The Xeon Phi uses good, old fashioned High Performance Computing technologies that we’ve been using for years such as OpenMP and MPI.  There’s no need to completely recode your algorithms in CUDA or OpenCL to get a performance boost…just a sprinkling of OpenMP pragmas might be enough in many cases.  Obviously it will take quite a bit of work to squeeze every last drop of performance out of the thing but this might just be the realisation of ‘personal supercomputer’ we’ve all been waiting for.

Here are some links I’ve found so far — would love to see what everyone else has come up with.  I’ll update as I find more

I also note that the Xeon Phi uses AVX extensions but with a wider vector width of 512 bytes so if you’ve been taking advantage of that technology in your code (using one of these techniques perhaps) you’ll reap the benefits there too.

I, for one, am very excited and can’t wait to get my hands on one!  Thoughts, comments and links gratefully received!