February 21st, 2017 | Categories: GPU, HPC, matlab, parallel programming | Tags:

My new toy is a 2017 Dell XPS 15 9560 laptop on which I am running Windows 10. Once I got over (and fixed) the annoyance of all the advertising in Windows Home, I quickly starting loving this new device.

To get a handle on its performance, I used GPUBench in MATLAB 2016b and got the following results (This was the best of 4 runs…I note that MTimes performance for the CPU (Host PC), for example, varied between 130 and 150 Glops).

  • CPU: Intel Core I7-7700HQ (6M Cache, up to 3.8Ghz)
  • GPU: NVIDIA GTX 1050 with 4GB GDDR5


I last did this for my Retina MacBook Pro and am happy to see that the numbers are better across the board. The standout figure for me is the 1206 Gflops (That’s 1.2 Teraflops!) of single precision performance for Matrix-Matrix Multiply.

That figure of 1.2 Teraflops rang a bell for me and it took me a while to realise why…..

My laptop vs Manchester University’s old HPC system – Horace

Old timers like me (I’m almost 40) like to compare modern hardware with bygone supercomputers (1980s Crays vs mobile phones for example) and we know we are truly old when the numbers coming out of laptop benchmarks match the peak theoretical performance of institutional HPC systems we actually used as part of our career.

This has now finally happened to me! I was at the University of Manchester when it commissioned a HPC service called Horace and I was there when it was switched off in 2010 (only 6 and a bit years ago!). It was the University’s primary HPC service with a support team, helpdesk, sysadmins…the lot.  The specs are still available on Manchester’s website:

  • 24 nodes, each with 8 cores giving 192 cores in total.
  • Each core had a theoretical peak compute performance of 6.4 double precision Gflop/s
  • So a node had a theoretical peak performance of 51.2 Gflop/s
  • The whole thing could theoretically manage 1.2 Teraflop/s
  • It had four special ‘high memory’ nodes with 32Gb RAM each

Good luck getting that 1.2 Teraflops out of it in practice!

I get a big geek-kick out of the fact that my new laptop has the same amount of RAM as one of these ‘big memory’ nodes and that my laptop’s double precision CPU performance is on par with the combined power of 3 of Horace’s nodes. Furthermore, my laptop’s GPU can just about manage 1.2 Teraflop/s of single precision  performance in MATLAB — on par with the total combined power of the HPC system*.

* (I know, I know….Horace’s numbers are for double precision and my GPU numbers are single precision — apples to oranges — but it still astonishes me that the headline numbers are the same — 1.2 Teraflops).


February 9th, 2017 | Categories: Microsoft, Windows | Tags:

I’ve been a OS X user for just over 3 years when I migrated from a laptop that dual booted Windows 7 and Linux. I like my MacBook Pro a lot but time moves on and I needed a new laptop. For reasons that I’ll write about in more depth another time, I’ve decided to move back into the Microsoft ecosystem for a while and try using Windows 10 on a Dell XPS 15 as my daily driver.

Windows is a lot better for Research Software Engineers than it used to be (See Bash on Windows: The scripting game just changed for an example of why) and I find myself enjoying using it rather than suffering it just because my clients use it. Mostly!

Windows is cheap and tacky

So why am I disappointed? In short, its because Windows still hasn’t grown up. It’s cheap, tacky and is constantly trying to sell me stuff.

It started off in the lock screen

Windows 10 nagging advert

Other people were quick to agree. Adverts in Windows 10 are a problem


The Start Menu is also full of third party applications that I’d rather not have…Games like Candy Crush Soda Saga and Royal Revolt 2 for example. These used to be the sort of bloatware you’d get with OEM’s when you bought a new, cheap laptop and the solution used to be ‘Wipe the laptop and install a clean copy of Windows’ but now the bloatware is coming from Windows itself.  Sure, I can uninstall it but I shouldn’t have to.

We’re not in Mac OS X anymore toto!

Cleaning up Windows’ act

How to disable Windows 10 built in advertising   from HowToGeek can help turn off all of this tat and others have pointed to scripted options that I’ve not tried myself (I suggest caution before running PowerShell scripts you do not understand).

All of this shouldn’t be necessary. I paid over £2,000 for this laptop and I expect a professional experience from the operating system that it comes with.

I expected better. I’m disappointed.

January 19th, 2017 | Categories: Free software, programming, python, sage interactions | Tags:

There are lots of Widgets in ipywidgets. Here’s how to list them

from ipywidgets import *

At the time of writing, this gave me

{'Jupyter.Accordion': ipywidgets.widgets.widget_selectioncontainer.Accordion,
 'Jupyter.BoundedFloatText': ipywidgets.widgets.widget_float.BoundedFloatText,
 'Jupyter.BoundedIntText': ipywidgets.widgets.widget_int.BoundedIntText,
 'Jupyter.Box': ipywidgets.widgets.widget_box.Box,
 'Jupyter.Button': ipywidgets.widgets.widget_button.Button,
 'Jupyter.Checkbox': ipywidgets.widgets.widget_bool.Checkbox,
 'Jupyter.ColorPicker': ipywidgets.widgets.widget_color.ColorPicker,
 'Jupyter.Controller': ipywidgets.widgets.widget_controller.Controller,
 'Jupyter.ControllerAxis': ipywidgets.widgets.widget_controller.Axis,
 'Jupyter.ControllerButton': ipywidgets.widgets.widget_controller.Button,
 'Jupyter.Dropdown': ipywidgets.widgets.widget_selection.Dropdown,
 'Jupyter.FlexBox': ipywidgets.widgets.widget_box.FlexBox,
 'Jupyter.FloatProgress': ipywidgets.widgets.widget_float.FloatProgress,
 'Jupyter.FloatRangeSlider': ipywidgets.widgets.widget_float.FloatRangeSlider,
 'Jupyter.FloatSlider': ipywidgets.widgets.widget_float.FloatSlider,
 'Jupyter.FloatText': ipywidgets.widgets.widget_float.FloatText,
 'Jupyter.HTML': ipywidgets.widgets.widget_string.HTML,
 'Jupyter.Image': ipywidgets.widgets.widget_image.Image,
 'Jupyter.IntProgress': ipywidgets.widgets.widget_int.IntProgress,
 'Jupyter.IntRangeSlider': ipywidgets.widgets.widget_int.IntRangeSlider,
 'Jupyter.IntSlider': ipywidgets.widgets.widget_int.IntSlider,
 'Jupyter.IntText': ipywidgets.widgets.widget_int.IntText,
 'Jupyter.Label': ipywidgets.widgets.widget_string.Label,
 'Jupyter.PlaceProxy': ipywidgets.widgets.widget_box.PlaceProxy,
 'Jupyter.Play': ipywidgets.widgets.widget_int.Play,
 'Jupyter.Proxy': ipywidgets.widgets.widget_box.Proxy,
 'Jupyter.RadioButtons': ipywidgets.widgets.widget_selection.RadioButtons,
 'Jupyter.Select': ipywidgets.widgets.widget_selection.Select,
 'Jupyter.SelectMultiple': ipywidgets.widgets.widget_selection.SelectMultiple,
 'Jupyter.SelectionSlider': ipywidgets.widgets.widget_selection.SelectionSlider,
 'Jupyter.Tab': ipywidgets.widgets.widget_selectioncontainer.Tab,
 'Jupyter.Text': ipywidgets.widgets.widget_string.Text,
 'Jupyter.Textarea': ipywidgets.widgets.widget_string.Textarea,
 'Jupyter.ToggleButton': ipywidgets.widgets.widget_bool.ToggleButton,
 'Jupyter.ToggleButtons': ipywidgets.widgets.widget_selection.ToggleButtons,
 'Jupyter.Valid': ipywidgets.widgets.widget_bool.Valid,
 'jupyter.DirectionalLink': ipywidgets.widgets.widget_link.DirectionalLink,
 'jupyter.Link': ipywidgets.widgets.widget_link.Link}
January 12th, 2017 | Categories: programming, RSE, Scientific Software | Tags:

If you are a researcher and are currently writing scripts or developing code then I have a suggestion for you. If you haven’t done it already, get yourself a willing volunteer and send them your code/analysis/simulation/voodoo and ask them to run it on their machine to see what happens. Bonus points are awarded for choosing someone who uses a different operating system from you!

This simple act is one of the things I recommend in my talk Is Your Research Software Correct and it can often help improve both code and workflow.

It quickly exposes patterns that are not good practice. For example, scattered references to ‘/home/walkingrandomly/mydata.dat’ suddenly don’t seem like a great idea when your code buddy is running windows. The ‘minimal tweaking’ required to move your analysis from your machine to theirs starts to feel a lot less minimal as you get to the bottom of the second page of instructions.

Crashy McCrashFace

When I start working with someone new, the first thing I ask them to do is to provide access to their code and simple script called runme or similar that will build and run their code and spit out an answer that we agree is OK. Many projects stumble at this hurdle! Perhaps my compiler is different to theirs and objects to their abuse (or otherwise) of the standards or maybe they’ve forgotten to include vital dependencies or input data.

Email ping-pong ensues as we attempt to get the latest version…zip files with names like PhD_code_ver1b_ForMike_withdata_fixed.zip get thrown about while everyone wonders where Bob is because he totally got it working on Windows back in 2009.

git clone

‘Hey Mike, just clone the git repo and run the test suite. It should be fine because the latest continuous integration run didn’t throw up any issues. The benchmark code and data we’d like you to optimise is in the benchmarks folder along with the timings and results from our most recent tests. Ignore the papers folder, that just reproduces all of the results from our recent papers and links to Zenodo DOIs’


‘Are you OK Mike?’

‘I’m…..fine. Just have something in my eye’


January 10th, 2017 | Categories: HPC, parallel programming, RSE, Scientific Software, University of Sheffield | Tags:

I work at The University of Sheffield where I am one of the leaders of the new Research Software Engineering function. One of the things that my group does is help people make use of Sheffield’s High Performance Computing cluster, Iceberg.

Iceberg is a heterogenous system with around 3440 CPU cores and a sprinkling of GPUs. It’s been in use for several years and has been upgraded a few times over that period. It’s a very traditional HPC system that makes use of Linux and a variant of  Sun Grid Engine as the scheduler and had served us well.


A while ago, the sysadmin pointed me to a goldmine of a resource — Iceberg’s accounting log. This 15 Gigabyte file contains information on every job submitted since July 2009. That’s more than 7 years of the HPC usage of 3249 users — over 46 million individual jobs.

The file format is very straightforward. There’s one line per job and each line consists of a set of colon separated fields.  The first few fields look like something like this:


The username is field 4 and the number of slots used by the job is field 35. On our system, slots correspond to CPU cores. If you want to run a 16 core job, you ask for 16 slots.

With one line of awk, we can determine the maximum number of slots ever requested by each user.

gawk -F: '$35>=slots[$4] {slots[$4]=$35};END{for(n in slots){print n, slots[n]}}' accounting > ./users_max_slots.csv

As a quick check, I grepped the output file for my username and saw that the maximum number of cores I’d ever requested was 20. I ran a 32 core MPI ‘Hello World’ job, reran the line of awk and confirmed that my new maximum was 32 cores.

There are several ways I could have filtered the number of users but I was having awk lessons from David Jones so let’s create a new file containing the users who have only ever requested 1 slot.

gawk -F: '$35>=slots[$4] {slots[$4]=$35};END{for(n in slots){if(slots[n]==1){print n, slots[n]}}}' accounting > users_where_max_is_one_slot.csv

Running wc on these files allows us to determine how many users are in each group

wc users_max_slots.csv 

3250  6498 32706 users_max_slots.csv

One of those users turned out to be a blank line so 3249 usernames have been used on Iceberg over the last 7 years.

wc users_where_max_is_one_slot.csv 
2393  4786 23837 users_where_max_is_one_slot.csv

That is, 2393 of our 3249 users (just over 73%) over the last 7 years have only ever run 1 slot, and therefore 1 core, jobs.

High Performance?

So 73% of all users have only ever submitted single core jobs. This does not necessarily mean that they have not been making use of parallelism. For example, they might have been running job arrays – hundreds or thousands of single core jobs performing parameter sweeps or monte carlo simulations.

Maybe they were running parallel codes but only asked the scheduler for one core. In the early days this would have led to oversubscribed nodes, possibly up to 16 jobs, each trying to run 16 cores.These days, our sysadmin does some voodoo to ensure that jobs can only use the number of cores that have been requested, no matter how many threads their code is spawning. Either way, making this mistake is not great for performance.

Whatever is going on, this figure of 73% is surprising to me!

Thanks to David Jones for the awk lessons although if I’ve made a mistake, it’s all my fault!

Update (11th Jan 2017)

UCL’s Ian Kirker took a look at the usage of their general purpose cluster and found that 71.8% of their users have only ever run 1 core jobs. https://twitter.com/ikirker/status/819133966292807680



December 31st, 2016 | Categories: Android | Tags:

A personal Android history

My first Android phone was the HTC Hero which I wrote about all the way back in 2009. It was very different to anything I’d had before and I liked it a lot. I even compared it to 1980s supercomputers in an article that subsequently got slashdotted. Android has changed a lot since then and I’ve kept up with most of the changes although I quickly switched to Samsung after the Hero. I started off with the Galaxy S1 but upgraded to the Galaxy S2 relatively quickly when the S1 died.  The S2 was a nice phone. I remember I liked that one a lot.

I then switched to the Galaxy Note series of phones and was regularly mocked by my friends for owning such a HUGE phone; If I had a pound for every time someone referenced a particular Trigger Happy TV sketch I’d be a rich man!  The large screen was perfect for keeping me entertained on the regular train commute between Sheffield and Manchester that I endured at the time. The Note 1 gave way to the Note 2 followed by the  Note 3 — I upgraded fairly regularly back then.

Things are different now

For the first time since starting out with Android, I didn’t feel compelled to upgrade when the next version of my phone came out. The Note 4 passed me by and the next time I noticed a phone in the series was when my boss got the ill-fated Note 7.

Perhaps I’m just getting old but the truth is that my phone usage has stabilised around a few core applications — none of which require anything too fancy. Although I use my phone heavily, I don’t do anything that pushes its capabilities. Reading (Kindle, Guardian, Browser), Video (iPlayer, Netflix, YouTube), Audio (Music, DoggCatcher, Audible) and social media (Gmail and Twitter) are probably my most used apps. Other than that, it’s predominantly utility-type stuff such as Calendar, Camera, Maps, Coursera, Calculator and so on.  A slew of things I fire up occasionally such as Fitbit and Shazam and that’s pretty much it.

In the early days of Android, I used to play a lot of games but no longer do so. This is primarily due to a lack of time but also because most mobile games simply aren’t fun anymore. The industry switch to the Fremium model has changed game dynamics in a way that I don’t find palatable.

The Note 3 wasn’t just good enough for my usage pattern, it was better than I needed it to be! I’m perfectly happy with the HD screen resolution of my 32inch TV so having the same resolution on a 5(ish) inch phone feels like decadent luxury. There’s an awesome stylus I never use, more CPU horse power than I need and a ton of sensors that I don’t have time to play with.

I don’t need to upgrade my phone anymore

As a Research Software Engineer I find that whatever computer I have is not quite good enough. I could always do with more cores, a faster clock speed, better GPU or more memory (No burning desire for dongles or a touch bar though!).  Phones are different. They got good enough for me years ago.

Breaking out of the phone upgrade treadmill is great: I can reduce my contract down to almost nothing and put the money saved from handset upgrades to something more important like financial independence.

So, when I lost my Note 3 and found myself back in the mobile phone market earlier this year, I was gutted!

My Big Android Mistake – The Samsung Galaxy S5 Neo

The logic went like this:

  • The Note 3 was good enough but I never used the stylus and modern galaxy note phones cost a fortune. They also explode!
  • All I need to do is find a phone that matches the Note 3 performance.
  • I can probably do that by getting a mid range phone these days — saving me money.
  • I’ll stick to Samsung since they’ve served me well so far.

I reminded myself of the Note 3 benchmarks and discovered that the S5 Neo had slightly better performance. This review told me that the S5 Neo had an AnTuTu Benchmark result of 37,854. When I ran this on my trusty Note 3, the score was 35,637.

The reviews for the S5 neo were reasonably good, it was several hundred pounds cheaper than flagships such as the Note 7 or the Galaxy edge and performance was on-par with my Note 3. So I got it.

Big Mistake! Huge!

Without a shadow of a doubt, the S5 Neo was the worst phone I’ve ever owned and I’ve been around! I’ve had Windows Mobile phones you understand…not the modern Windows Phone that no one uses but Ye-Olde Windows Mobile that was around when the iPhone was a twinkle in Steve Job’s eye.

It did this thing where I’d turn it on and before I could finish typing my 4 digit pin, it would switch itself off again. Bear in mind that I am not slow at doing this! It would do this randomly so that at the point where I hit peak rage, someone would come over to see why I’m so upset only for it to work perfectly when I showed them.

Everything lagged like nothing I’ve ever seen before. Messages about checking the back cover popped up randomly, apps crashed all the time; it was a frustrating experience! When I mentioned these problems at work, one of the PhD students said ‘S5 Neo? Oh yeah, my mom has that….Worst. Phone. Ever.’

A geek friend suggested that I flash the phone with cyanogenmod but there wasn’t an s5 Neo version. Woes!

Oddly, it seems to be very much a Marmite phone. Some people love it while others have had the same experience as me. This forum shows the love/hate divide quite nicely.

An attempt at destruction

A few days ago, the S5 Neo managed to push all my buttons and, having lost my temper with it, I threw it hard onto the floor….something I’ve never done with a mobile phone before. Unfortunately, I was in the living room and the phone bounced off the carpet and back into my hand. My attempt at its destruction was futile!

The ‘check battery cover’ message popped up.

Damn thing was taunting me!


The OnePlus 3T – A New Hope

Having a mobile phone that drives you to acts of rage against the machine is ridiculous so I vowed to get rid of it that day. First step — find a new phone. A better phone. Ideally, one that didn’t break the bank.

I saw a review of the OnePlus 3T that looked great! A search through various forums and twitter suggested that this was a good, alternative choice. I couldn’t see a downside so I took the plunge. It cost around £450 pounds upfront and unlocked from 02 but they also gave me £55 for scrap value of the S5 Neo.

Just over a week later, I can report that I am very happy so far. This appears to be the Android phone I’ve been looking for!

Review articles and benchmarks coming in the new year.

December 12th, 2016 | Categories: Open Data Science, RSE, Scientific Software | Tags:

I was in Stockholm last week to give an invited talk at the Workshop on Nordic Big Biomedical Data for Action. I was representing the Software Sustainability Institute and delivered the latest version of my talk Is Your Research Software Correct? screen-shot-2016-12-11-at-12-47-14

It was a great event which introduced me to some nice initiatives going on waaaay up north. Initiatives such as Code Refinery who’s aims align well with those of the UK’s software sustainability Institute. Code refinery was introduced by Radovan Bast — Slide deck at http://cicero.xyz/v2/remark/github/coderefinery/talk-intro/niasc-2016/talk.md/#1


Other talks included the introduction of a scalable, parallel version of BLAST, Big Data Processing for Genomics and Delivering Bioinformatics Software as Virtual Machine images. I also got chance to geek out with some High Performance Computing and Bioinformatics people over interesting Swedish food.

Slides from most of the talks are available at http://www.nordicehealth.se/2016/12/04/workshop-on-nordic-big-biomedical-data-for-action/

October 13th, 2016 | Categories: math software, R, RSE, Scientific Software, University of Sheffield | Tags:

I was recently invited to give a talk at the Sheffield R Users Group and decided to give a brief overview of how R relates to other technologies. Subjects included Mathematica’s integration of R, Intel’s compilers, Math Kernel Library and how they can make R faster and a range of Microsoft technologies including R Tools for Visual Studio, Microsoft R Open and the MRAN for reproducibility. I also touched upon the NAG Library, Maple’s code generation for R, GPUs and Spark.

Did I miss anything? If you were to give a similar talk, what might you have included?

September 22nd, 2016 | Categories: RSE | Tags:

“It gives me great pleasure to welcome you all to the first ever Research Software Engineering conference” 

Rob Haines‘ opening line was met by thunderous applause from 202 people representing 14 different countries with a delegation that included funders, industry, academic researchers and, of course, research software engineers. The hairs on the back of my neck stood up as it dawned on me that this was a historic moment and that I was there when it happened. I wasn’t the only one


I felt like I’d come home and that these people were my tribe…and what a tribe!


Microsoft loves Linux

The conference was a mixture of talks, workshops and networking opportunities with the opening plenary given by Matthew Johnson of Microsoft Research. Microsoft was the gold sponsor for the event and the swag bag included one of these

we're not in Kansas anymore

I reflect on the fact that I’m currently using my Macbook Pro as a Windows 10 machine to access the linux subsystem — we’re not in Kansas anymore!

Microsoft is a keen supporter of the RSE movement although the job title they use is ‘Research Software Development Engineer’, a title they’ve used for several years now. An RSE (or RSDE) does much more software development than a typical researcher and more research than a typical software engineer.

The choice of job role is important since it defines how you are assessed for things such as promotion. This is an issue that some of us are working to address within academia because many RSEs are currently assessed using the same criteria as researchers.


Docker…we need more Docker

The conference included several practical workshops on all sorts of interesting topics but the most popular, by far, was the Docker workshop. It was so oversubscribed that access to the room had to be strictly controlled! Even I wasn’t allowed in and I was on the organising committee!

Fortunately, the materials are freely available on github – https://github.com/mfernandes61/RSE_Docker_course/wiki 


What a diff’rence a fellowship makes

I attended a discussion workshop called ‘The Role of the Research Software Engineer’ and gave a caffeine fueled lighting talk about the impact my EPSRC RSE fellowship has had within the University of Sheffield over its first six months. Slides are at https://mikecroucher.github.io/fellowship_difference/ but you might not get much from them since I like to talk over a set of images for things like this.

The EPSRC RSE Fellowship is the first of its kind and I believe that its had a huge impact on how the role of RSE is perceived by academic institutions. There were only 7 awards, however, so there is still so much more to be done.

Since members of the audience included representatives from various funding bodies, I wanted to help convince them that RSE fellowships are great value for money and they should consider launching their own.

Workshop materials

Here is a list of links to some of the workshop materials. If you know of one I’ve missed, please let me know.


For more information about what happened on the day see the following links

September 5th, 2016 | Categories: Open Data Science, RSE, Science, Scientific Software, tutorials, University of Sheffield | Tags:

One of the great things about being a Research Software Engineer is the diversity of work you can get involved with. I specialise in smaller interventions which means that I can be working with physicists on Monday, engineers on Tuesday, geneticists on Wednesday….you get the idea.

Last month, I got to work with some Ecologists along with Anna Krystalli. We undertook the arduous journey from Sheffield down to Exeter to deliver talks and workshops at a post-conference symposium on reproducibility in science, organised by Malika Ihle and Isabel Winney, at the International Symposium on Behavioural Ecology.

I gave my talk, Is your research software correct?, and also delivered a workshop on using projects and version control using R and RStudio in the Code Cafe style. For the full write up of the day, see the excellent blog post by Anna over at the Mozilla Science Lab blog.

Updates : More resources