Archive for the ‘HPC’ Category

August 25th, 2020

At the time of writing, Microsoft Azure has 3 High Performance Computing instances available and I often find myself looking up their specifications and benchmark results when discussing their various methods with colleagues and clients.  All the information is out there but seems to be spread across several documents. To save me the trouble in future, here is everything I usually need in one place

HC nodes: Original source of data
* CPU: Intel Skylake
* Cores: 44 (non-hyperthreaded)
* 3.5 TeraFLOPS double precision peak
* 352 Gigabyte RAM
* 700 Gigabyte local SSD storage
* 190 GB/s memory bandwidth
* 100 GB/s infiniband

HB Nodes: Original source of data
* CPU: AMD EPYC 7000 series (Naples)
* Cores: 60 (non-hyperthreaded)
* ???? Peak double precision peak
* 240 Gigabyte RAM
* 700 Gigabyte local SSD storage
* 263 GB/s memory bandwidth
* 100 GB/s infiniband

HBv2 nodes: Original source of data
* CPU: AMD EPYC 7002 series (Rome)
* Cores: 120 (non-hyperthreaded)
* 4 TeraFlops Peak double precision
* 480 Gigabyte RAM
* 1.6 Terabyte local SSD storage
* 350 GB/s memory bandwidth
* 200 GB/s infiniband

Azure HPC Benchmarks

May 24th, 2019

HPC! To cloud or not to cloud….

Over the course of my career I have been involved with the provision of High Performance Computing (HPC) at almost every level. As a researcher and research software engineer I have been, and continue to be, a user of many large scale systems.  As a member of research computing support, I was involved in service development and delivery, user-support, system administration and HPC training. Finally, as a member of senior leadership teams, I have been involved in the financial planning and strategic development of institution-wide HPC  services.

In recent years, the question that pops up at every level of involvement with HPC is ‘Should we do this with our own equipment or should we do this in the cloud?’  

This is an extremely complex, multi-dimensional question where the parameters are constantly shifting.  The short, answer is always ‘well…it depends’ and it always does ‘depend’…on many things!  What do you want to do? What is the state of the software that you have available? How much money do you have? How much time do you have? What support do you have? What equipment do you currently have and who else are you working with and so on.

Today, I want to focus on just one of these factors. Cost.  I’m not saying that cost is necessarily the most important consideration in making HPC-related decisions but given that most of us have fixed budgets, it is impossible to ignore.

The value of independence

I have been involved in many discussions of HPC costs over the years and have seen several attempts at making the cloud vs on-premise cost-comparison.  Such attempts are often biased in one direction or the other.  It’s difficult not to be when you have a vested interest in the outcome.

When I chose to leave academia and join industry, I decided to work with the Numerical Algorithms Group (NAG), a research computing company that has been in business for almost 50 years. One reason for choosing them was their values which strongly resonate with my own.  Another was their independence.  NAG are a not-for-profit (yet commercial!) company who literally cannot be bought.  They solve the independence problem by collaborating with pretty much everybody in the industry.

So, NAG are more interested in helping clients solve their technical computing problems than they are in driving them to any given solution.  Cloud or on-premise?  They don’t mind…they just want to help you make the decision.

Explore HPC cost models yourself

NAG’s VP for HPC services and consulting, Andrew Jones (@hpcnotes), has been teaching seminars on Total Cost Of Ownership models for several years at events like the SC Supercomputing conference series.  To support these tutorials, Andrew created a simple, generic TCO model spreadsheet to illustrate the concepts and to show how even a simple TCO model can guide decisions. 

Many people asked if NAG could send them the TCO spreadsheet from the tutorial but they decided to go one better and converted it into a web-form where you can start exploring some of the concepts yourself. 

I also made a video about it.

If you want more, get in touch with either me (@walkingrandomly) or Andrew (@hpcnotes) on twitter or email hpc@nag.com.  We’ll also be teaching this type of thing at ISC 2019 so please do join us there too.

March 11th, 2019

One of the highlights of my attendance at the 2019 RICE Oil and Gas High Performance Computing event was the Women in HPC workshop. As a profession, HPC has very similar diversity issues to Research Software Engineering. In short, we need more diversity and the Women in HPC organisation is working towards that aim (side note – I am very happy that my employer, NAG, is a supporter of Women in HPC!)

The following call for participation was forwarded to me by Mozhgan Kabiri Chimeh who asked me to repost it.

The tenth international Women in HPC workshop at ISC19 will discuss methods to improve diversity and provide early career women with the opportunity to develop their professional skills and profile. The workshop will include:

  • Becoming an advocate and ally of under-represented women
  • Putting in place a framework to help women take leadership positions
  • Building mentoring programmes that work effectively for women.
  • Posters and lightning talks by women working in HPC
  • Short talks on: dealing with poor behaviour at work and how to help avoid it getting you down, how to deal with negative feedback, how to build writing into your daily routine and why it matters, etc.

Deadline for submissions: March 18th 2019

As part of the workshop, we invite submissions from women in industry and academia to present their work as a poster. Submissions are invited on all topics relating to HPC from users and developers. All abstracts should emphasize the computational aspects of the work, such as the facilities used, the challenges that HPC can help address and any remaining challenges etc.

Exclusive to WHPC at ISC19: Successful authors will have the opportunity to present their poster in the main ISC19 conference poster session.

For full details please see: https://www.womeninhpc.org/whpc-isc19/workshop/submit/

Workshop Organising Committee

  • Workshop Chair: Mozhgan Kabiri Chimeh (University of Sheffield, UK)
  • Co-chair: Toni Collis (Appentra S.L., Spain)
  • Co-chair: Misbah Mubarak (Argonne National Laboratory, USA)
  • Submissions Chair: Weronika Filinger (EPCC, University of Edinburgh, UK)
  • Submissions Vice Chair: Khomotso Maenetja (University of Limpopo, South Africa)
  • Mentoring Chair: Elsa Gonsiorowski (Lawrence Livermore National Laboratory, USA)
  • Invited Talks Chair: Gokcen Kestor (Pacific Northwest National Laboratory, USA)
  • Publicity Chair: Cristin Meritt (Alces Flight, UK)
  • Publicity Vice Chair: Aiman Shaikh (Science and Technology Facilities Council, UK)
  • Volunteer: Shabnam Sadegh (Technical University of Munich)
March 1st, 2018

I’m working on some MATLAB code at the moment that I’ve managed to reduce down to a bunch of implicitly parallel functions. This is nice because the data that we’ll eventually throw at it will be represented as a lot of huge matrices.  As such, I’m expecting that if we throw a lot of cores at it, we’ll get a lot of speed-up.  Preliminary testing on local HPC nodes shows that I’m probably right.

During testing and profiling on a smaller data set I thought that it would be fun to run the code on the most powerful single node I can lay my hands on.  In my case that’s an Azure F72s_v2 which I currently get for free thanks to a Microsoft Azure for Research grant I won.

These Azure F72s_v2 machines are NICE! Running Intel Xeon Platinum 8168 CPUs with 72 virtual cores and 144GB of RAM, they put my Macbook Pro to shame! Theoretically, they should be more powerful than any of the nodes I can access on my University HPC system.

So, you can imagine my surprise when the production code ran almost 3 times slower than on my Macbook Pro!

Here’s a Microbenchmark, extracted from the production code, running on MATLAB 2017b on a few machines to show the kind of slowdown I experienced on these super powerful virtual machines.


test_t = rand(8755,1);
test_c = rand(5799,1);
tic;test_res = bsxfun(@times,test_t,test_c');toc
tic;test_res = bsxfun(@times,test_t,test_c');toc

I ran the bsxfun twice and report the fastest since the first call to any function in MATLAB is often slower than subsequent ones for various reasons. This quick and dirty benchmark isn’t exactly rigorous but its good enough to show the issue.

  • Azure F72s_v2 (72 vcpus, 144 GB memory) running Windows Server 2016: 0.3 seconds
  • Azure F32s_v2 (32 vcpus, 64 GB memory) running Windows Server 2016: 0.29 seconds
  • 2014 Macbook Pro running OS X: 0.11 seconds
  • Dell XPS 15 9560 laptop running Windows 10: 0.11 seconds
  • 8 cores on a node of Sheffield University’s Linux HPC cluster: 0.03 seconds
  • 16 cores on a node of Sheffield University’s Linux HPC cluster: 0.015 seconds

After a conversation on twitter, I ran it on Azure twice — once on a 72 vCPU instance and once on a 32 vCPU instance. This was to test if the issue was related to having 2 physical CPUs. The results were pretty much identical.

The results from the University HPC cluster are more in line with what I expected to see — faster than a laptop and good scaling with respect to number of cores.  I tried running it on 32 cores but the benchmark is still in the queue ;)

What’s going on?

I have no idea! I’m stumped to be honest.  Here are some thoughts that occur to me in no particular order

  • Maybe it’s an issue with Windows Server 2016. Is there some environment variable I should have set or security option I could have changed? Maybe the Windows version of MATLAB doesn’t get on well with large core counts? I can only test up to 4 on my own hardware and that’s using Windows 10 rather than Windows server.  I need to repeat the experiment using a Linux guest OS.
  • Is it an issue related to the fact that there isn’t a 1:1 mapping between physical hardware and virtual cores? Intel Xeon Platinum 8168 CPUs have 24 cores giving 48 hyperthreads so two of them would give me 48 cores and 96 hyperthreads.  They appear to the virtualised OS as 2 x 18 core CPUs with 72 hyperthreads in total.   Does this matter in any way?

 

February 21st, 2018

In a previous blog post, I told the story of how I used Amazon AWS and AlcesFlight to create a temporary multi-user HPC cluster for use in a training course.  Here are the details of how I actually did it.

Note that I have only ever used this configuration as a training cluster.  I am not suggesting that the customisations are suitable for real work.

Before you start

Before attempting to use AlcesFlight on AWS, I suggest that you ensure that you have the following things working

Customizing the HPC cluster on AWS

AlcesFlight provides a CloudFormation template for launching cluster instances on Amazon AWS.  The practical upshot of this is that you answer a bunch of questions on a web form to customise your cluster and then you launch it.

We are going to use this CloudFormation template along with some bash scripts that provide additional customisation.

Get the customisation scripts

The first step is to get some customization scripts in an S3 bucket. You could use your own or you could use the ones I created.

If you use mine, make sure you take a good look at them first to make sure you are happy with what I’ve done!  It’s probably worth using your own fork of my repo so you can customise your cluster further.

It’s the bash scripts that allow the creation of a bunch of user accounts for trainees with randomized passwords.  My scripts do some other things too and I’ve listed everything in the github README.md.

git clone https://github.com/mikecroucher/alces_flight_customisation
cd alces_flight_customisation

Now you need to upload these to an s3 bucket. I called mine walkingrandomly-aws-cluster

aws s3api create-bucket --bucket walkingrandomly-aws-cluster --region eu-west-2 --create-bucket-configuration LocationConstraint=eu-west-2
aws s3 sync . s3://walkingrandomly-aws-cluster --delete

Set up the CloudFormation template

  • Head over to Alces Flight Solo (Community Edition) and click on continue to subscribe
  • Choose the region you want to create the cluster in, select Personal HPC compute cluster and click on Launch with CloudFormationConsole

personal_hpc_compute_cluster

  • Go through the CloudFormation template screens, creating the cluster as you want it until you get to the S3 bubcket for customization profiles box where you fill in the name of the S3 bucket you created earlier.
  • Enable the default profile

flight_customisation

  • Continue answering the questions asked by the web form.  For this simple training cluster, I just accepted all of the defaults and it worked fine

When the CloudFormation stack has been fully created, you can log into your new cluster as an administrator.  To get the connection details of the headnode, go to the EC2 management console in your web-browser, select the headnode and click on Connect.

When you log in to the cluster as administrator, the usernames and passwords for your training cohort will be in directory specified by the password_file variable in the configure.d/run_me.sh script. I set my administrator account to be called walkingrandomly and so put the password file in /home/walkingrandomly/users.txt.  I could then print this out and distribute the usernames and passwords to each training delegate.

This is probably not great sysadmin practice but worked on the day.  If anyone can come up with a better way, Pull Requests are welcomed!

alces_flight_login

Try a training account

At this point, I suggest that you try logging in as one of the training user accounts and make sure you can successfully submit a job.  When I first tried all of this, the default scheduler on the created cluster was SunGridEngine and my first attempt at customisation left me with user accounts that couldn’t submit jobs.

The current scripts have been battle tested with Sun Grid Engine, including MPI job submission and I’ve also done a very basic test with Slurm. However, you really should check that a user account can submit all of the types of job you expect to use in class.

Troubleshooting

When I first tried to do this, things didn’t go completely smoothly.  Here are some things I learned to help diagnose the problems

Full documentation is available at http://docs.alces-flight.com/en/stable/customisation/customisation.html

On the cluster, we can see where its looking for its customisation scripts with the alces about command


alces about customizer
Customizer bucket prefix: s3://walkingrandomly-aws-cluster/customizer

The log file at /var/log/clusterware/instance.log on both the head node and worker nodes is very useful.

Once, I did all of this using a Windows CMD bash prompt and the customisation scripts failed to run.  The logs showed this error

/bin/bash^M: bad interpreter: No such file or directory

This is a classic dos2unix error and could be avoided, for example, by using the Windows Subsystem for linux instead of CMD.exe.

February 21st, 2018

I needed a supercomputer…..quickly!

One of the things that we do in Sheffield’s Research Software Engineering Group is host training courses delivered by external providers.  One such course is on parallel programming using MPI for which we turn to the experts at NAG (Numerical Algorithms Group).  A few days before turning up to deliver the course, the trainer got in touch with me to ask for details about our HPC cluster.

Because Croucher’s law, I had forgotten to let our HPC sysadmin know that I’d need a bunch of training accounts and around 128 cores set-aside for us to play around with for a couple of days.

In other words, I was hosting a supercomputing course and had forgotten the supercomputer.

Building a HPC cluster in the cloud

AlcesFlight is a relatively new product that allows you to spin up a traditional-looking High Performance Computing cluster on cloud computing substrates such as Microsoft Azure or Amazon AWS.  You get a head node, a bunch of worker nodes and a job scheduler such as Slurm or Sun Grid Engine. It looks just the systems that The University of Sheffield provides for its researchers!

You also get lots of nice features such as the ability to scale the number of worker nodes according to demand, a metric ton of available applications and the ability to customise the cluster at start up.

The supercomputing budget was less than the coffee budget

…and I only bought coffee for myself and the two trainers over the two days!  The attendees had to buy their own (In my defence…the course was free for attendees!).

I used the following

  • A head node of:  t2.large (2 vCPUs, 8Gb RAM)
  • Initial worker nodes: 4 of c4.4xlarge (16 vCPUs and 30GB RAM each)
  • Maximum worker nodes: 8 of c4.4xlarge (16 vCPUs and 30GB RAM each)

This gave me a cluster with between 64 and 128 virtual cores depending on the amount that the class were using it.  Much of the time, only 4 nodes were up and running – the others spun up automatically when the class needed them and vanished when they hadn’t been used for a while.

I was using the EU (Ireland) region and the prices at the time were

  • Head node: On demand pricing of $0.101 per hour
  • Worker nodes: $0.24 (ish) using spot pricing. Each one about twice as powerful as a 2014 Macbook Pro according to this benchmark.

HPC cost: As such, the maximum cost of this cluster was $2.73 per hour when all nodes were up and running. The class ran from 10am to 5pm for two days so we needed it for 14 hours.  Maximum cost would have been $38.22.

Coffee cost: 2 instructors and me needed coffee twice a day. So that’s 12 coffees in total.  Around £2.50 or $3.37 per coffee so $40.44

The HPC cost was probably less than that since we didn’t use 128 cores all the time and the coffee probably cost a little more.

Setting up the cluster

Technical details of how I configured the cluster can be found in the follow up post at https://www.walkingrandomly.com/?p=6431

 

February 10th, 2018

The Meltdown bug which affects most modern CPUs has been called by some ‘The worst ever CPU bug’. Accessible explanations about what the Meltdown bug actually is are available here and here.

Software patches have been made available but some people have estimated a performance hit of up to 30% in some cases. Some of us in the High Performance Computing (HPC) community (See here for the initial twitter conversation) started to wonder what this might mean for the type of workloads that run on our systems. After all, if the worst case scenario of 30% is the norm, it will drastically affect the power of our systems and hence reduce the amount of science we are able to support.

In the video below, Professor Mark Handley from University College London gives a detailed explanation of both Meltdown and Spectre at an event held at Alan Turing Institute in London.

Another video that gives a great introduction to this topic was given by Jon Masters at  https://fosdem.org/2018/schedule/event/closing_keynote/ 

To patch or not to patch

To a first approximation, a patch causing a 30% performance hit on a system costing £1 million pounds is going to cost an equivalent of £300,000 — not exactly small change! This has led to some people wondering if we should patch HPC systems at all:

All of the UK Tier-3 HPC centres I’m aware of have applied the patches (Sheffield, Leeds and Manchester) but I’d be interested to learn of centres that decided not to.  Feel free to comment here or message me on twitter if you have something to add to this discussion and I’ll update this post where appropriate.

Research paper discussing the performance penalties of these patches on HPC workloads

A group of people have written a paper on Arxiv that looks at HPC performance penalties in more detail.  From the paper’s abstract:

The results show that although some specific functions can have execution times decreased by as much as 74%, the majority of individual metrics indicates little to no decrease in performance. The real-world applications show a 2-3% decrease in performance for single node jobs and a 5-11% decrease for parallel multi node jobs.
The full pdf is available at https://arxiv.org/abs/1801.04329

 

Other relevant results and benchmarks

Here are a few other links that discuss the performance penalty of applying the Meltdown patch.

Acknowledgements

Thanks to Adrian Jackson, Phil Tooley, Filippo Spiga and members of the UK HPC-SIG for useful discussions.

November 28th, 2017

The RCUK Cloud Working Group are hosting their 3rd free annual workshop in January 2018 and I’ll be attending.  At the time of writing, there are still places left and you can sign up at https://www.eventbrite.co.uk/e/research-councils-uk-cloud-workshop-tickets-39439492584  

From the event advertisement:

This workshop will focus on key areas to address in order for the potential of cloud computing for research to be fully realised:

  • Tackling technical challenges around the use of cloud: for example, porting legacy workloads, scenarios for hybrid cloud, moving large data volumes, use of object storage vs. POSIX file systems.
  • Cloud as enabler for new and novel applications: e.g. use of public cloud toolkits and services around Machine Learning, AI, use of FPGAs and GPU based systems, applications related to Internet of Things and Edge Computing
  • Perspectives from European and international collaborations and research programmes
  • Policy, legal, regulatory and ethical issues, models for funding – case studies for managing sensitive or personal data in the cloud
  • Addressing the skills gap: how to educate researchers in how to best take advantage of cloud; DevOps and ResOps

To give a flavour, you can read about last year’s workshop here or look at the programme from last time.

May 15th, 2017

For a while now, Microsoft have provided a free Jupyter Notebook service on Microsoft Azure. At the moment they provide compute kernels for Python, R and F# providing up to 4Gb of memory per session. Anyone with a Microsoft account can upload their own notebooks, share notebooks with others and start computing or doing data science for free.

They University of Cambridge uses them for teaching, and they’ve also been used by the LIGO people  (gravitational waves) for dissemination purposes.

This got me wondering. How much power does Microsoft provide for free within these notebooks?  Computing is pretty cheap these days what with the Raspberry Pi and so on but what do you get for nothing? The memory limit is 4GB but how about the computational power?

To find out, I created a simple benchmark notebook that finds out how quickly a computer multiplies matrices together of various sizes.

Matrix-Matrix multiplication is often used as a benchmark because it’s a common operation in many scientific domains and it has been optimised to within an inch of it’s life.  I have lost count of the number of times where my contribution to a researcher’s computational workflow has amounted to little more than ‘don’t multiply matrices together like that, do it like this…it’s much faster’

So how do Azure notebooks perform when doing this important operation? It turns out that they max out at 263 Gigaflops! azure_free_notebook

For context, here are some other results:

As you can see, we are getting quite a lot of compute power for nothing from Azure Notebooks. Of course, one of the limiting factors of the free notebook service is that we are limited to 4GB of RAM but that was more than I had on my own laptops until 2011 and I got along just fine.

Another fun fact is that according to https://www.top500.org/statistics/perfdevel/, 263 Gigaflops would have made it the fastest computer in the world until 1994. It would have stayed in the top 500 supercomputers of the world until June 2003 [1].

Not bad for free!

[1] The top 500 list is compiled using a different benchmark called LINPACK  so a direct comparison isn’t strictly valid…I’m using a little poetic license here.

March 29th, 2017

UK to launch 6 major HPC centres

Tomorrow, I’ll be attending the launch event for the UK’s new HPC centres and have been asked to deliver a short talk as part of the program. As someone who paddles in the shallow-end of the HPC pool I find this both flattering and more than a little terrifying. What can little-ole-me say to the national HPC glitterati that might be useful?

This blog post is an attempt at gathering my thoughts together for that talk.

The technology gap in research computing

Broadly speaking, my role in academia is to hang out with researchers, compute providers (cloud and HPC) and software vendors in an attempt to be vaguely useful in the area of research software. I’m a Research Software Engineer with a focus on Long Tail Science: The large number of very small research groups who do a huge amount of modern research.

Time and again, what I see can be summarized in this quote by Greg Wilsongwilson

This is very true in the world of High Performance Computing.

Geek Top Gear

I love technology and I love HPC in particular. I love to geek out on Flops, Ghz, SIMD instructions, GPUs, FPGAs…..all that stuff. I help support The University of Sheffield’s local HPC service and worked in Research IT at The University of Manchester for around a decade before moving to Sheffield.

I’ve given and seen many a HPC-related talk in my time and have certainly been guilty of delivering what I now refer to as the ‘Geek Top Gear’ speech.  For maximum effect, you need to do it in a Jeremy Clarkson voice and, if you’re feeling really macho, kiss your bicep at the point where you tell the audience how many Petaflops your system can do in Linpack.

*Begin Jeremy Clarkson Impression*

Our new HPC system has got 100,000 of the latest Intel Kaby Lake cores...which is a lot!

Usually running at 2.6Ghz, these cores can turbo-boost to 3.2Ghz for those moments when we need that extra boost of power. Obviously, being Kaby Lake, these cores have all the instruction extensions you’d expect with AVX2, FMA, SSE, ABM and many many other TLAs for all your SIMD needs. Of course every HPC system needs accelerators…..and we have the best of them: Xeon Phis with 68 cores each and NVIDIA GPUs with thousands of tiny little cores will handle every massively parallel job you can throw at them….Easily. We connect these many many cores together with high-speed interconnect fashioned from threads of pure unicorn hair and cool the whole thing with the tears of virigin nerds.

YEEEEEES! Our new HPC system is the best one since the last one and, achieving over a Gajillion Petaflops in the Linpack test (kiss bicep), it will change your life forever.

more_power

Any questions?

Audience member 1: What’s a core?
Audience member 2: Why does it run my R script slower than my laptop?
Audience member 3: Do you have Excel installed on it?

There is a huge gap between what many HPC providers like to focus on and what the typical long-tail researcher wants or needs. I propose that the best bridge for this gap is the Research Software Engineer (RSE).

Research Software Engineer as Alpine guide

In my fellowship proposal, I compared the role of a Research Software Engineer to that of an alpine guide:

Technological development in software is more like a cliff-face than a ladder – there are many routes to the top, to a solution. Further, the cliff face is dynamic – constantly and quickly changing as new technologies emerge and decline. Determining which technologies to deploy and how best to deploy them is in itself a specialist domain, with many features of traditional research.

Researchers need empowerment and training to give them confidence with the available equipment and the challenges they face. This role, akin to that of an Alpine guide, involves support, guidance, and load carrying. When optimally performed it results in a researcher who knows what challenges they can attack alone, and where they need appropriate support. Guides can help decide whether to exploit well-trodden paths or explore new possibilities as they navigate through this dynamic environment.

At Sheffield, we have built a pool of these Research Software Engineers to provide exactly this kind of support and it’s working extremely well so far. Not only are we helping individual research groups but we are also using our experiences in the field to help shape the University HPC environment in collaboration with the IT department.

Supercomputing: Irrelevant to many?

“Never bring an anecdote to a data-fight” so the saying goes and all I have from my own experiences are a bucket load of anecdotes, case studies and cursory log-mining experiments that indicate that even those who DO use HPC are not doing so effectively. Fortunately, others have stepped up to the plate and we have survey and interview data on how researchers are using compute resources.

How Do Scientists Develop and Use Scientific Software? is a report on a 2009 survey of 1972 researchers from around the world. They found that “79.9% of the scientists never use scientific software on a supercomputer

When I first learned of this number, I found it faintly depressing. This technology that I love so much and for which University IT departments dedicate special days to seems to be pretty much irrelevant to the majority of researchers. Could it be that even in an era of big data, machine learning and research software engineering that most people only need a laptop?

Only ever needing a laptop certainly doesn’t fit with what I’ve seen while working in the trenches. Almost every researcher I’ve met who does computational research wishes it was faster or that they had more memory to allow them to do larger problems. Speed is the easiest thing to sell to researchers in the world of RSE. They come for faster execution and leave with a side-order of version control, testing and documentation. A combination of software development and migration to even a small HPC system can easily result in 100x or even 1000x speed-ups for many researchers.

In my experience, it’s not that researchers don’t need HPC, it’s that the jump from their laptop-based workflow to one that makes good use of a HPC system is too large for them to bridge without a little help. Providing that help can result in some great partnerships such as the recent one between the Sheffield RSE group and the Sheffield Faculty of Arts and Humanities.

language

Want to know how that partnership started? I compiled an experimental R/Rcpp package that they were struggling with and then took them for coffee and said ‘That code took a while to run. Here’s how we can make it go faster….Now…what exactly are you doing because it looks cool?’ Fast forward a year or so and we are on the cusp of starting a great new project that will include traditional HPC and cloud computing as part of their R-based workflow.

My experiences seem to be reflected in the data. In  their 2011 article, A Survey of the Practice of Computational Science, Prabhu et al interviewed 114 randomly selected researchers from Princeton University. Princeton have a very strong, well supported HPC centre which provides both resources and the expertise to use them. Even at such a well equipped institution, the authors write that  ‘Despite enormous wait times, many scientists run their programs only on desktops’ although they did report much higher HPC usage compared to the Hanny et al survey.

Other salient quotes from the Prabhu interviews include

“only 18% of researchers who optimize code leveraged profiling tools to inform their optimization plans”

“only 7% of researchers leveraged any form of thread based shared memory CPU parallelism”

“Only 11% of researchers utilized loop based parallelism”

“Currently, many researchers fit their scientific models to only a subset of available parameters for faster program runs.”

“Across disciplines, an order of magnitude performance improvement was cited as a requirement for significant changes in research quality”

HPC: There’s plenty of room at the bottom

Potential users of HPC look different to those of 20 years ago. The popularity explosion of languages such as MATLAB, Python and R have democratized programming and the world is awash with inefficient research software. Most of the time, this lack of efficiency is not a problem (see ‘In defense of inefficient scientific code‘) but if a researcher needs to scale up what they are doing, it can become limiting. Researchers might wait for days for the results to come in and limit the scope of their investigations to fit the hardware they have access to — their laptop usually.

The paper of Prabu et al said that an order of magnitude (10x) speed up was cited by researchers as a requirement for significant changes in research quality. For an experienced Research Software Engineer with access to cloud or HPC facilities, a 10x speed-up is usually pretty easy to achieve for this new audience. 100x or even 1000x can be achieved fairly frequently if you employ multiple hardware and software techniques. Compared to squeezing out a few percent more performance from HPC-centric code such as LAPACK or CASTEP, it’s not even all that difficult. I recently sped up one researcher’s MATLAB code by a factor of 800x in a couple of days and I’m a fairly middling developer if I’m brutally honest.

The whole point of High Performance Computing is to accelerate science and right now there is more computational science around than there has ever been before. Furthermore, it’s easier than ever to accelerate! There’s plenty of room at the bottom.

Closing the computational gap with people, training and compute power

The UK’s 6 new HPC centers represent the cutting edge of hardware technology. They provide a crucial component of our national hardware infrastructure, will contribute to research in HPC itself and will doubtless be of huge benefit to computational science. Furthermore, all of the funded proposals include significant engagement with the national Research Software Engineering community – the vital bridge between many researchers and HPC.

Co-development of research software with effort from both RSEs and researchers can be an extremely powerful model. Combine this with further collaboration between RSEs and compute providers and we have an environment that I think is both very exciting and capable of helping to close the rich/poor compute divide.

As an RSE who works with both researchers and University-level HPC providers, I ask for 3 things to be considered by these new regional centres.

  • Enjoy your new compute-ferraris. I look forward to seeing how hard you can push them.
  • You will be learning new good practice in how to provide HPC services. Disseminate this to those of us running smaller services.
  • There’s plenty of room at the bottom! Help us to support the new wave of computational researchers.

Thanks to languages such as MATLAB, Python and R, general purpose programming has been fully democratized. I look forward to working with these new centres to help democratize high performance computing.