March 20th, 2017 | Categories: walking randomly | Tags:

I’ve worked with computers for a long time. Decades in fact, and yet I still routinely make the same rookie mistake when discussing how long a computer-related job is going to take. Programming, sysadmin, installing a game….whatever….things almost always take much longer than I expect them to. This is true even when I take the previous statement into account.

This morning, for example, I am supposed to be on annual leave but I needed to set up a license server for a new product that a colleague of mine has just  bought. The plan was ‘Get up really early, take the dog out for morning walk and get this work done before my wife is even out of bed.’ I’d then make breakfast in bed and be the model geek-husband.

I figured it would take about 5 minutes….I’ve administered dozens of network license servers for thousands of users. Nothing phases me in this area…..I am supremely confident.  Since I know that things take longer than expected, I gave myself an hour to do this 5 minute job and set the alarm clock. This morning, I woke up before the alarm and and ended up with a whole 2 hours to do this 5 minute job.

How prepared am I?!

4 hours later, I still can’t get the chuffing thing to work and I’m wondering when on Earth I’ll ever learn…..

March 15th, 2017 | Categories: RSE, Science | Tags:

One aspect of my EPSRC Research Software Engineering fellowship is to spread basic good practice in research software to different academic fields. Last year, I was invited to participate in a Reproducible research in Ecology workshop which was part of the 2016 International Society of Behavioural Ecology Conference. My contributions included a talk (your research software correct?) and a workshop on using projects and version control using R and RStudio.

The latest output from this stream of work is a paper in Behavioral Ecology called Striving for transparent and credible research: practical guidelines for behavioral ecologists which discusses various topics including preregistration, open science and, of course, research software practices with shout-outs to initiatives such as Software Carpentry, Research Software Engineering and the Software Sustainability Institute. The lead author is Malika Ihle with contributions from Isabel S Winney, Anna Krystalli and me.

February 27th, 2017 | Categories: RSE | Tags:

The job title ‘Research Software Engineer’ (RSE) wasn’t really a thing until 2012 when the term was invented in a Software Sustainability Institute collaborations workshop. Of course, there were lots of people doing Research Software Engineering before then but we had around 200 different job titles, varying degrees of support and career options tended to look pretty bleak.  A lot has happened since then including the 2016 EPSRC RSE Fellows, the first international RSE conference and a host of University-RSE groups popping up all over the country.

In my talk, Is your Research Software Correct?, I tell the audience ‘If you need help, refer to your local RSE team. All good Universities have a central RSE team and if yours does not…..I refer you back to the word ‘good” Always leads to healthy debate when talking at an institution that’s yet to get involved :)

Centrally funded, University-wide RSE teams are useful because they offer a way to maintain a pool of expertise that can be costed into grants. It’s the model we are starting to employ at University of Sheffield following its success at trailblazing sites such as UCL and Manchester.

For this model to work, it is vital that we collaborate with researchers on getting RSE time costed into grants. In turn, researchers worry that they are asking funders for ‘something a bit strange’ which might lead to their project being turned down.

Asking for RSE Support in your grant is a Good Idea

There are two main arguments that I use when attempting to alleviate these concerns. The first is that we are quite successful in obtaining RSE funding, even in areas that you might not expect. The second is to point to funding calls where the funding council explicitly recommends RSE costing to be considered where appropriate.

The EPSRC have led the way in the UK with its RSE fellowship call, funding the Software Sustainability Institute (these days its funded by 3 research councils including BBSRC and ESRC) and various other initiatives.

Earlier this month, I was very happy to see that the BBSRC have explicitly mentioned Research Software Engineers in one of their latest calls: Machine Learning to Generate New Biological Understanding. In the call, the BBSRC say:

We note the significant contribution of staff such as Research Software Engineers (see external links) to interdisciplinary computational projects such as machine learning, and supports recognition of their contributions and encourages applicants to cost them appropriately on applications to this highlight.

I feel that this is a great move by the BBSRC and hope to see other funding councils follow their lead in future.

February 21st, 2017 | Categories: GPU, HPC, matlab, parallel programming | Tags:

My new toy is a 2017 Dell XPS 15 9560 laptop on which I am running Windows 10. Once I got over (and fixed) the annoyance of all the advertising in Windows Home, I quickly starting loving this new device.

To get a handle on its performance, I used GPUBench in MATLAB 2016b and got the following results (This was the best of 4 runs…I note that MTimes performance for the CPU (Host PC), for example, varied between 130 and 150 Glops).

  • CPU: Intel Core I7-7700HQ (6M Cache, up to 3.8Ghz)
  • GPU: NVIDIA GTX 1050 with 4GB GDDR5


I last did this for my Retina MacBook Pro and am happy to see that the numbers are better across the board. The standout figure for me is the 1206 Gflops (That’s 1.2 Teraflops!) of single precision performance for Matrix-Matrix Multiply.

That figure of 1.2 Teraflops rang a bell for me and it took me a while to realise why…..

My laptop vs Manchester University’s old HPC system – Horace

Old timers like me (I’m almost 40) like to compare modern hardware with bygone supercomputers (1980s Crays vs mobile phones for example) and we know we are truly old when the numbers coming out of laptop benchmarks match the peak theoretical performance of institutional HPC systems we actually used as part of our career.

This has now finally happened to me! I was at the University of Manchester when it commissioned a HPC service called Horace and I was there when it was switched off in 2010 (only 6 and a bit years ago!). It was the University’s primary HPC service with a support team, helpdesk, sysadmins…the lot.  The specs are still available on Manchester’s website:

  • 24 nodes, each with 8 cores giving 192 cores in total.
  • Each core had a theoretical peak compute performance of 6.4 double precision Gflop/s
  • So a node had a theoretical peak performance of 51.2 Gflop/s
  • The whole thing could theoretically manage 1.2 Teraflop/s
  • It had four special ‘high memory’ nodes with 32Gb RAM each

Good luck getting that 1.2 Teraflops out of it in practice!

I get a big geek-kick out of the fact that my new laptop has the same amount of RAM as one of these ‘big memory’ nodes and that my laptop’s double precision CPU performance is on par with the combined power of 3 of Horace’s nodes. Furthermore, my laptop’s GPU can just about manage 1.2 Teraflop/s of single precision  performance in MATLAB — on par with the total combined power of the HPC system*.

* (I know, I know….Horace’s numbers are for double precision and my GPU numbers are single precision — apples to oranges — but it still astonishes me that the headline numbers are the same — 1.2 Teraflops).


February 9th, 2017 | Categories: Microsoft, Windows | Tags:

I’ve been a OS X user for just over 3 years when I migrated from a laptop that dual booted Windows 7 and Linux. I like my MacBook Pro a lot but time moves on and I needed a new laptop. For reasons that I’ll write about in more depth another time, I’ve decided to move back into the Microsoft ecosystem for a while and try using Windows 10 on a Dell XPS 15 as my daily driver.

Windows is a lot better for Research Software Engineers than it used to be (See Bash on Windows: The scripting game just changed for an example of why) and I find myself enjoying using it rather than suffering it just because my clients use it. Mostly!

Windows is cheap and tacky

So why am I disappointed? In short, its because Windows still hasn’t grown up. It’s cheap, tacky and is constantly trying to sell me stuff.

It started off in the lock screen

Windows 10 nagging advert

Other people were quick to agree. Adverts in Windows 10 are a problem


The Start Menu is also full of third party applications that I’d rather not have…Games like Candy Crush Soda Saga and Royal Revolt 2 for example. These used to be the sort of bloatware you’d get with OEM’s when you bought a new, cheap laptop and the solution used to be ‘Wipe the laptop and install a clean copy of Windows’ but now the bloatware is coming from Windows itself.  Sure, I can uninstall it but I shouldn’t have to.

We’re not in Mac OS X anymore toto!

Cleaning up Windows’ act

How to disable Windows 10 built in advertising   from HowToGeek can help turn off all of this tat and others have pointed to scripted options that I’ve not tried myself (I suggest caution before running PowerShell scripts you do not understand).

All of this shouldn’t be necessary. I paid over £2,000 for this laptop and I expect a professional experience from the operating system that it comes with.

I expected better. I’m disappointed.

January 19th, 2017 | Categories: Free software, programming, python, sage interactions | Tags:

There are lots of Widgets in ipywidgets. Here’s how to list them

from ipywidgets import *

At the time of writing, this gave me

{'Jupyter.Accordion': ipywidgets.widgets.widget_selectioncontainer.Accordion,
 'Jupyter.BoundedFloatText': ipywidgets.widgets.widget_float.BoundedFloatText,
 'Jupyter.BoundedIntText': ipywidgets.widgets.widget_int.BoundedIntText,
 'Jupyter.Box': ipywidgets.widgets.widget_box.Box,
 'Jupyter.Button': ipywidgets.widgets.widget_button.Button,
 'Jupyter.Checkbox': ipywidgets.widgets.widget_bool.Checkbox,
 'Jupyter.ColorPicker': ipywidgets.widgets.widget_color.ColorPicker,
 'Jupyter.Controller': ipywidgets.widgets.widget_controller.Controller,
 'Jupyter.ControllerAxis': ipywidgets.widgets.widget_controller.Axis,
 'Jupyter.ControllerButton': ipywidgets.widgets.widget_controller.Button,
 'Jupyter.Dropdown': ipywidgets.widgets.widget_selection.Dropdown,
 'Jupyter.FlexBox': ipywidgets.widgets.widget_box.FlexBox,
 'Jupyter.FloatProgress': ipywidgets.widgets.widget_float.FloatProgress,
 'Jupyter.FloatRangeSlider': ipywidgets.widgets.widget_float.FloatRangeSlider,
 'Jupyter.FloatSlider': ipywidgets.widgets.widget_float.FloatSlider,
 'Jupyter.FloatText': ipywidgets.widgets.widget_float.FloatText,
 'Jupyter.HTML': ipywidgets.widgets.widget_string.HTML,
 'Jupyter.Image': ipywidgets.widgets.widget_image.Image,
 'Jupyter.IntProgress': ipywidgets.widgets.widget_int.IntProgress,
 'Jupyter.IntRangeSlider': ipywidgets.widgets.widget_int.IntRangeSlider,
 'Jupyter.IntSlider': ipywidgets.widgets.widget_int.IntSlider,
 'Jupyter.IntText': ipywidgets.widgets.widget_int.IntText,
 'Jupyter.Label': ipywidgets.widgets.widget_string.Label,
 'Jupyter.PlaceProxy': ipywidgets.widgets.widget_box.PlaceProxy,
 'Jupyter.Play': ipywidgets.widgets.widget_int.Play,
 'Jupyter.Proxy': ipywidgets.widgets.widget_box.Proxy,
 'Jupyter.RadioButtons': ipywidgets.widgets.widget_selection.RadioButtons,
 'Jupyter.Select': ipywidgets.widgets.widget_selection.Select,
 'Jupyter.SelectMultiple': ipywidgets.widgets.widget_selection.SelectMultiple,
 'Jupyter.SelectionSlider': ipywidgets.widgets.widget_selection.SelectionSlider,
 'Jupyter.Tab': ipywidgets.widgets.widget_selectioncontainer.Tab,
 'Jupyter.Text': ipywidgets.widgets.widget_string.Text,
 'Jupyter.Textarea': ipywidgets.widgets.widget_string.Textarea,
 'Jupyter.ToggleButton': ipywidgets.widgets.widget_bool.ToggleButton,
 'Jupyter.ToggleButtons': ipywidgets.widgets.widget_selection.ToggleButtons,
 'Jupyter.Valid': ipywidgets.widgets.widget_bool.Valid,
 'jupyter.DirectionalLink': ipywidgets.widgets.widget_link.DirectionalLink,
 'jupyter.Link': ipywidgets.widgets.widget_link.Link}
January 12th, 2017 | Categories: programming, RSE, Scientific Software | Tags:

If you are a researcher and are currently writing scripts or developing code then I have a suggestion for you. If you haven’t done it already, get yourself a willing volunteer and send them your code/analysis/simulation/voodoo and ask them to run it on their machine to see what happens. Bonus points are awarded for choosing someone who uses a different operating system from you!

This simple act is one of the things I recommend in my talk Is Your Research Software Correct and it can often help improve both code and workflow.

It quickly exposes patterns that are not good practice. For example, scattered references to ‘/home/walkingrandomly/mydata.dat’ suddenly don’t seem like a great idea when your code buddy is running windows. The ‘minimal tweaking’ required to move your analysis from your machine to theirs starts to feel a lot less minimal as you get to the bottom of the second page of instructions.

Crashy McCrashFace

When I start working with someone new, the first thing I ask them to do is to provide access to their code and simple script called runme or similar that will build and run their code and spit out an answer that we agree is OK. Many projects stumble at this hurdle! Perhaps my compiler is different to theirs and objects to their abuse (or otherwise) of the standards or maybe they’ve forgotten to include vital dependencies or input data.

Email ping-pong ensues as we attempt to get the latest version…zip files with names like get thrown about while everyone wonders where Bob is because he totally got it working on Windows back in 2009.

git clone

‘Hey Mike, just clone the git repo and run the test suite. It should be fine because the latest continuous integration run didn’t throw up any issues. The benchmark code and data we’d like you to optimise is in the benchmarks folder along with the timings and results from our most recent tests. Ignore the papers folder, that just reproduces all of the results from our recent papers and links to Zenodo DOIs’


‘Are you OK Mike?’

‘I’m…..fine. Just have something in my eye’


January 10th, 2017 | Categories: HPC, parallel programming, RSE, Scientific Software, University of Sheffield | Tags:

I work at The University of Sheffield where I am one of the leaders of the new Research Software Engineering function. One of the things that my group does is help people make use of Sheffield’s High Performance Computing cluster, Iceberg.

Iceberg is a heterogenous system with around 3440 CPU cores and a sprinkling of GPUs. It’s been in use for several years and has been upgraded a few times over that period. It’s a very traditional HPC system that makes use of Linux and a variant of  Sun Grid Engine as the scheduler and had served us well.


A while ago, the sysadmin pointed me to a goldmine of a resource — Iceberg’s accounting log. This 15 Gigabyte file contains information on every job submitted since July 2009. That’s more than 7 years of the HPC usage of 3249 users — over 46 million individual jobs.

The file format is very straightforward. There’s one line per job and each line consists of a set of colon separated fields.  The first few fields look like something like this:

The username is field 4 and the number of slots used by the job is field 35. On our system, slots correspond to CPU cores. If you want to run a 16 core job, you ask for 16 slots.

With one line of awk, we can determine the maximum number of slots ever requested by each user.

gawk -F: '$35>=slots[$4] {slots[$4]=$35};END{for(n in slots){print n, slots[n]}}' accounting > ./users_max_slots.csv

As a quick check, I grepped the output file for my username and saw that the maximum number of cores I’d ever requested was 20. I ran a 32 core MPI ‘Hello World’ job, reran the line of awk and confirmed that my new maximum was 32 cores.

There are several ways I could have filtered the number of users but I was having awk lessons from David Jones so let’s create a new file containing the users who have only ever requested 1 slot.

gawk -F: '$35>=slots[$4] {slots[$4]=$35};END{for(n in slots){if(slots[n]==1){print n, slots[n]}}}' accounting > users_where_max_is_one_slot.csv

Running wc on these files allows us to determine how many users are in each group

wc users_max_slots.csv 

3250  6498 32706 users_max_slots.csv

One of those users turned out to be a blank line so 3249 usernames have been used on Iceberg over the last 7 years.

wc users_where_max_is_one_slot.csv 
2393  4786 23837 users_where_max_is_one_slot.csv

That is, 2393 of our 3249 users (just over 73%) over the last 7 years have only ever run 1 slot, and therefore 1 core, jobs.

High Performance?

So 73% of all users have only ever submitted single core jobs. This does not necessarily mean that they have not been making use of parallelism. For example, they might have been running job arrays – hundreds or thousands of single core jobs performing parameter sweeps or monte carlo simulations.

Maybe they were running parallel codes but only asked the scheduler for one core. In the early days this would have led to oversubscribed nodes, possibly up to 16 jobs, each trying to run 16 cores.These days, our sysadmin does some voodoo to ensure that jobs can only use the number of cores that have been requested, no matter how many threads their code is spawning. Either way, making this mistake is not great for performance.

Whatever is going on, this figure of 73% is surprising to me!

Thanks to David Jones for the awk lessons although if I’ve made a mistake, it’s all my fault!

Update (11th Jan 2017)

UCL’s Ian Kirker took a look at the usage of their general purpose cluster and found that 71.8% of their users have only ever run 1 core jobs.



December 31st, 2016 | Categories: Android | Tags:

A personal Android history

My first Android phone was the HTC Hero which I wrote about all the way back in 2009. It was very different to anything I’d had before and I liked it a lot. I even compared it to 1980s supercomputers in an article that subsequently got slashdotted. Android has changed a lot since then and I’ve kept up with most of the changes although I quickly switched to Samsung after the Hero. I started off with the Galaxy S1 but upgraded to the Galaxy S2 relatively quickly when the S1 died.  The S2 was a nice phone. I remember I liked that one a lot.

I then switched to the Galaxy Note series of phones and was regularly mocked by my friends for owning such a HUGE phone; If I had a pound for every time someone referenced a particular Trigger Happy TV sketch I’d be a rich man!  The large screen was perfect for keeping me entertained on the regular train commute between Sheffield and Manchester that I endured at the time. The Note 1 gave way to the Note 2 followed by the  Note 3 — I upgraded fairly regularly back then.

Things are different now

For the first time since starting out with Android, I didn’t feel compelled to upgrade when the next version of my phone came out. The Note 4 passed me by and the next time I noticed a phone in the series was when my boss got the ill-fated Note 7.

Perhaps I’m just getting old but the truth is that my phone usage has stabilised around a few core applications — none of which require anything too fancy. Although I use my phone heavily, I don’t do anything that pushes its capabilities. Reading (Kindle, Guardian, Browser), Video (iPlayer, Netflix, YouTube), Audio (Music, DoggCatcher, Audible) and social media (Gmail and Twitter) are probably my most used apps. Other than that, it’s predominantly utility-type stuff such as Calendar, Camera, Maps, Coursera, Calculator and so on.  A slew of things I fire up occasionally such as Fitbit and Shazam and that’s pretty much it.

In the early days of Android, I used to play a lot of games but no longer do so. This is primarily due to a lack of time but also because most mobile games simply aren’t fun anymore. The industry switch to the Fremium model has changed game dynamics in a way that I don’t find palatable.

The Note 3 wasn’t just good enough for my usage pattern, it was better than I needed it to be! I’m perfectly happy with the HD screen resolution of my 32inch TV so having the same resolution on a 5(ish) inch phone feels like decadent luxury. There’s an awesome stylus I never use, more CPU horse power than I need and a ton of sensors that I don’t have time to play with.

I don’t need to upgrade my phone anymore

As a Research Software Engineer I find that whatever computer I have is not quite good enough. I could always do with more cores, a faster clock speed, better GPU or more memory (No burning desire for dongles or a touch bar though!).  Phones are different. They got good enough for me years ago.

Breaking out of the phone upgrade treadmill is great: I can reduce my contract down to almost nothing and put the money saved from handset upgrades to something more important like financial independence.

So, when I lost my Note 3 and found myself back in the mobile phone market earlier this year, I was gutted!

My Big Android Mistake – The Samsung Galaxy S5 Neo

The logic went like this:

  • The Note 3 was good enough but I never used the stylus and modern galaxy note phones cost a fortune. They also explode!
  • All I need to do is find a phone that matches the Note 3 performance.
  • I can probably do that by getting a mid range phone these days — saving me money.
  • I’ll stick to Samsung since they’ve served me well so far.

I reminded myself of the Note 3 benchmarks and discovered that the S5 Neo had slightly better performance. This review told me that the S5 Neo had an AnTuTu Benchmark result of 37,854. When I ran this on my trusty Note 3, the score was 35,637.

The reviews for the S5 neo were reasonably good, it was several hundred pounds cheaper than flagships such as the Note 7 or the Galaxy edge and performance was on-par with my Note 3. So I got it.

Big Mistake! Huge!

Without a shadow of a doubt, the S5 Neo was the worst phone I’ve ever owned and I’ve been around! I’ve had Windows Mobile phones you understand…not the modern Windows Phone that no one uses but Ye-Olde Windows Mobile that was around when the iPhone was a twinkle in Steve Job’s eye.

It did this thing where I’d turn it on and before I could finish typing my 4 digit pin, it would switch itself off again. Bear in mind that I am not slow at doing this! It would do this randomly so that at the point where I hit peak rage, someone would come over to see why I’m so upset only for it to work perfectly when I showed them.

Everything lagged like nothing I’ve ever seen before. Messages about checking the back cover popped up randomly, apps crashed all the time; it was a frustrating experience! When I mentioned these problems at work, one of the PhD students said ‘S5 Neo? Oh yeah, my mom has that….Worst. Phone. Ever.’

A geek friend suggested that I flash the phone with cyanogenmod but there wasn’t an s5 Neo version. Woes!

Oddly, it seems to be very much a Marmite phone. Some people love it while others have had the same experience as me. This forum shows the love/hate divide quite nicely.

An attempt at destruction

A few days ago, the S5 Neo managed to push all my buttons and, having lost my temper with it, I threw it hard onto the floor….something I’ve never done with a mobile phone before. Unfortunately, I was in the living room and the phone bounced off the carpet and back into my hand. My attempt at its destruction was futile!

The ‘check battery cover’ message popped up.

Damn thing was taunting me!


The OnePlus 3T – A New Hope

Having a mobile phone that drives you to acts of rage against the machine is ridiculous so I vowed to get rid of it that day. First step — find a new phone. A better phone. Ideally, one that didn’t break the bank.

I saw a review of the OnePlus 3T that looked great! A search through various forums and twitter suggested that this was a good, alternative choice. I couldn’t see a downside so I took the plunge. It cost around £450 pounds upfront and unlocked from 02 but they also gave me £55 for scrap value of the S5 Neo.

Just over a week later, I can report that I am very happy so far. This appears to be the Android phone I’ve been looking for!

Review articles and benchmarks coming in the new year.

December 12th, 2016 | Categories: Open Data Science, RSE, Scientific Software | Tags:

I was in Stockholm last week to give an invited talk at the Workshop on Nordic Big Biomedical Data for Action. I was representing the Software Sustainability Institute and delivered the latest version of my talk Is Your Research Software Correct? screen-shot-2016-12-11-at-12-47-14

It was a great event which introduced me to some nice initiatives going on waaaay up north. Initiatives such as Code Refinery who’s aims align well with those of the UK’s software sustainability Institute. Code refinery was introduced by Radovan Bast — Slide deck at


Other talks included the introduction of a scalable, parallel version of BLAST, Big Data Processing for Genomics and Delivering Bioinformatics Software as Virtual Machine images. I also got chance to geek out with some High Performance Computing and Bioinformatics people over interesting Swedish food.

Slides from most of the talks are available at