Search Results

December 6th, 2013

A question I get asked a lot is ‘How can I do nonlinear least squares curve fitting in X?’ where X might be MATLAB, Mathematica or a whole host of alternatives.  Since this is such a common query, I thought I’d write up how to do it for a very simple problem in several systems that I’m interested in

This is the Julia version. For other versions,see the list below

The problem

xdata = -2,-1.64,-1.33,-0.7,0,0.45,1.2,1.64,2.32,2.9
ydata = 0.699369,0.700462,0.695354,1.03905,1.97389,2.41143,1.91091,0.919576,-0.730975,-1.42001

and you’d like to fit the function

 F(p_1,p_2,x) = p_1 \cos(p_2 x)+p_2 \sin(p_1 x)

using nonlinear least squares.  You’re starting guesses for the parameters are p1=1 and P2=0.2

For now, we are primarily interested in the following results:

  • The fit parameters
  • Sum of squared residuals

Future updates of these posts will show how to get other results such as confidence intervals. Let me know what you are most interested in.

Julia solution using Optim.jl

Optim.jl is a free Julia package that contains a suite of optimisation routines written in pure Julia.  If you haven’t done so already, you’ll need to install the Optim package

Pkg.add("Optim")

The solution is almost identical to the example given in the curve fitting demo of the Optim.jl readme file:

using Optim

model(xdata,p) = p[1]*cos(p[2]*xdata)+p[2]*sin(p[1]*xdata)

xdata = [-2,-1.64,-1.33,-0.7,0,0.45,1.2,1.64,2.32,2.9]
ydata = [0.699369,0.700462,0.695354,1.03905,1.97389,2.41143,1.91091,0.919576,-0.730975,-1.42001]

beta, r, J = curve_fit(model, xdata, ydata, [1.0, 0.2])
# beta = best fit parameters
# r = vector of residuals
# J = estimated Jacobian at solution

@printf("Best fit parameters are: %f and %f",beta[1],beta[2])
@printf("The sum of squares of residuals is %f",sum(r.^2.0))

The result is

Best fit parameters are: 1.881851 and 0.700230
@printf("The sum of squares of residuals is %f",sum(r.^2.0))

Notes

I used Julia version 0.2 on 64bit Windows 7 to run the code in this post

October 7th, 2012

I felt like playing with Julia and MATLAB this Sunday morning.  I found some code that prices European Options in MATLAB using Monte Carlo simulations over at computeraidedfinance.com and thought that I’d port this over to Julia.  Here’s the original MATLAB code

function V = bench_CPU_European(numPaths)
%Simple European
steps = 250;
r = (0.05);
sigma = (0.4);
T = (1);
dt = T/(steps);
K = (100);

S = 100 * ones(numPaths,1);

for i=1:steps
 rnd = randn(numPaths,1);
 S = S .* exp((r-0.5*sigma.^2)*dt + sigma*sqrt(dt)*rnd);
end
V = mean( exp(-r*T)*max(K-S,0) )

I ran this a couple of times to see what results I should be getting and how long it would take for 1 million paths:

tic;bench_CPU_European(1000000);toc
V =
   13.1596
Elapsed time is 6.035635 seconds.
>> tic;bench_CPU_European(1000000);toc
V =
   13.1258
Elapsed time is 5.924104 seconds.
>> tic;bench_CPU_European(1000000);toc
V =
   13.1479
Elapsed time is 5.936475 seconds.

The result varies because this is a stochastic process but we can see that it should be around 13.1 or so and takes around 6 seconds on my laptop. Since it’s Sunday morning, I am feeling lazy and have no intention of considering if this code is optimal or not right now. I’m just going to copy and paste it into a julia file and hack at the syntax until it becomes valid Julia code. The following seems to work

function bench_CPU_European(numPaths)

steps = 250
r = 0.05
sigma = .4;
T = 1;
dt = T/(steps)
K = 100;

S = 100 * ones(numPaths,1);

for i=1:steps
 rnd = randn(numPaths,1)
 S = S .* exp((r-0.5*sigma.^2)*dt + sigma*sqrt(dt)*rnd)
end
V = mean( exp(-r*T)*max(K-S,0) )
end

I ran this on Julia and got the following

julia> tic();bench_CPU_European(1000000);toc()
elapsed time: 36.259000062942505 seconds
36.259000062942505

julia> bench_CPU_European(1000000)
13.114855104505445

The Julia code appears to be valid, it gives the correct result of 13.1 ish but at 36.25 seconds is around 6 times slower than the MATLAB version.  The dog needs walking so I’m going to think about this another time but comments are welcome.

Update (9pm 7th October 2012):   I’ve just tried this Julia code on the Linux partition of the same laptop and 1 million paths took 14 seconds or so:

tic();bench_CPU_European(1000000);toc()
elapsed time: 14.146281957626343 seconds

I built this version of Julia from source and so it’s at the current bleeding edge (version 0.0.0+98589672.r65a1 Commit 65a1f3dedc (2012-10-07 06:40:18). The code is still slower than the MATLAB version but better than the older Windows build

Update: 13th October 2012

Over on the Julia mailing list, someone posted a faster version of this simulation in Julia

function bench_eu(numPaths)
    steps = 250
    r = 0.05
    sigma = .4;
    T = 1;
    dt = T/(steps)
    K = 100;

    S = 100 * ones(numPaths,1);

    t1 = (r-0.5*sigma.^2)*dt
    t2 = sigma*sqrt(dt)
    for i=1:steps
        for j=1:numPaths
            S[j] .*= exp(t1 + t2*randn())
        end
    end

    V = mean( exp(-r*T)*max(K-S,0) )
end

On the Linux partition of my test machine, this got through 1000000 paths in 8.53 seconds, very close to the MATLAB speed:

julia> tic();bench_eu(1000000);toc()
elapsed time: 8.534484148025513 seconds

It seems that, when using Julia, one needs to unlearn everything you’ve ever learned about vectorisation in MATLAB.

Update: 28th October 2012

Members of the Julia team have been improving the performance of the randn() function used in the above code (see here and here for details).  Using the de-vectorised code above, execution time for 1 million paths in Julia is now down to 7.2 seconds on my machine on Linux.  Still slower than the MATLAB 2012a implementation but it’s getting there.  This was using Julia version  0.0.0+100403134.r0999 Commit 099936aec6 (2012-10-28 05:24:40)

tic();bench_eu(1000000);toc()
elapsed time: 7.223690032958984 seconds
7.223690032958984
  • Laptop model: Dell XPS L702X
  • CPU:Intel Core i7-2630QM @2Ghz software overclockable to 2.9Ghz. 4 physical cores but total 8 virtual cores due to Hyperthreading.
  • RAM: 8 Gb
  • OS: Windows 7 Home Premium 64 bit and Ubuntu 12.04
  • MATLAB: 2012a
  • Julia: Original windows version was Version 0.0.0+94063912.r17f5, Commit 17f50ea4e0 (2012-08-15 22:30:58).  Several versions used on Linux since, see text for details.
August 14th, 2012

I first mentioned Julia, a new language for high performance scientific computing, back in the February edition of a Month of Math software and it certainly hasn’t stood still since then.  A WalkingRandomly reader, Ohad, recently contacted me to tell me about a forum post announcing some Julia speed improvements.

Julia has a set of micro-benchmarks and the slowest of them is now only two times slower than the equivalent in C.  That’s compiled language performance with an easy to use scripting language.  Astonishingly, Julia is faster than gfortran in a couple of instances.  Nice work!

Comparison times between Julia and other scientific scripting languages (MATLAB, Python and R for instance) for these micro-benchmarks are posted on Julia’s website.  The Julia team have included the full benchmark source code used for all languages so if you are an expert in one of them, why not take a look at how they have represented your language and see what you think?

Let me know if you’ve used Julia at all, I’m interested in what people think of it.

January 6th, 2020

My stepchildren are pretty good at mathematics for their age and have recently learned about Pythagora’s theorem

$c=\sqrt{a^2+b^2}$

The fact that they have learned about this so early in their mathematical lives is testament to its importance. Pythagoras is everywhere in computational science and it may well be the case that you’ll need to compute the hypotenuse to a triangle some day.

Fortunately for you, this important computation is implemented in every computational environment I can think of!
It’s almost always called hypot so it will be easy to find.
Here it is in action using Python’s numpy module

import numpy as np
a = 3
b = 4
np.hypot(3,4)

5

When I’m out and about giving talks and tutorials about Research Software Engineering, High Performance Computing and so on, I often get the chance to mention the hypot function and it turns out that fewer people know about this routine than you might expect.

Trivial Calculation? Do it Yourself!

Such a trivial calculation, so easy to code up yourself! Here’s a one-line implementation

def mike_hypot(a,b):
    return(np.sqrt(a*a+b*b))

In use it looks fine

mike_hypot(3,4)

5.0

Overflow and Underflow

I could probably work for quite some time before I found that my implementation was flawed in several places. Here’s one

mike_hypot(1e154,1e154)

inf

You would, of course, expect the result to be large but not infinity. Numpy doesn’t have this problem

np.hypot(1e154,1e154)

1.414213562373095e+154

My function also doesn’t do well when things are small.

a = mike_hypot(1e-200,1e-200)

0.0

but again, the more carefully implemented hypot function in numpy does fine.

np.hypot(1e-200,1e-200)

1.414213562373095e-200

Standards Compliance

Next up — standards compliance. It turns out that there is a an official standard for how hypot implementations should behave in certain edge cases. The IEEE-754 standard for floating point arithmetic has something to say about how any implementation of hypot handles NaNs (Not a Number) and inf (Infinity).

It states that any implementation of hypot should behave as follows (Here’s a human readable summary https://www.agner.org/optimize/nan_propagation.pdf)

hypot(nan,inf) = hypot(inf,nan) = inf

numpy behaves well!

np.hypot(np.nan,np.inf)

inf

np.hypot(np.inf,np.nan)

inf

My implementation does not

mike_hypot(np.inf,np.nan)

nan

So in summary, my implementation is

  • Wrong for very large numbers
  • Wrong for very small numbers
  • Not standards compliant

That’s a lot of mistakes for one line of code! Of course, we can do better with a small number of extra lines of code as John D Cook demonstrates in the blog post What’s so hard about finding a hypotenuse?

Hypot implementations in production

Production versions of the hypot function, however, are much more complex than you might imagine. The source code for the implementation used in openlibm (used by Julia for example) was 132 lines long last time I checked. Here’s a screenshot of part of the implementation I saw for prosterity. At the time of writing the code is at https://github.com/JuliaMath/openlibm/blob/master/src/e_hypot.c

 

openlibm_hypot

 

That’s what bullet-proof, bug checked, has been compiled on every platform you can imagine and survived code looks like.

There’s more!

Active Research

When I learned how complex production versions of hypot could be, I shouted out about it on twitter and learned that the story of hypot was far from over!

The implementation of the hypot function is still a matter of active research! See the paper here https://arxiv.org/abs/1904.09481

hypot_twitter

Is Your Research Software Correct?

Given that such a ‘simple’ computation is so complicated to implement well, consider your own code and ask Is Your Research Software Correct?.

November 20th, 2018

In a recent blog post, Daniel Lemire explicitly demonstrated that vectorising random number generators using SIMD instructions could give useful speed-ups.  This reminded me of the one of the first times I played with the Julia language where I learned that Julia’s random number generator used a SIMD-accelerated implementation of Mersenne Twister called dSFMT to generate random numbers much faster than MATLAB’s Mersenne Twister implementation.

Just recently, I learned that MATLAB now has its own SIMD accelerated version of Mersenne Twister which can be activated like so:

seed=1;
rng(seed,'simdTwister')

This new Mersenne Twister implementation gives different random variates to the original implementation (which I demonstrated is the same as Numpy’s implementation in an older post) as you might expect

>> rng(1,'Twister')
>> rand(1,5)
ans =
0.4170 0.7203 0.0001 0.3023 0.1468
>> rng(1,'simdTwister')
>> rand(1,5)
ans =
0.1194 0.9124 0.5032 0.8713 0.5324

So it’s clearly a different algorithm and, on CPUs that support the relevant instructions, it’s about twice as fast!  Using my very unscientific test code:

format compact
number_of_randoms = 10000000
disp('Timing standard twister')
rng(1,'Twister')
tic;rand(1,number_of_randoms);toc
disp('Timing SIMD twister')
rng(1,'simdTwister')
tic;rand(1,number_of_randoms);toc

gives the following results for a typical run on my Dell XPS 15 9560 which supports AVX instructions

number_of_randoms =
    10000000
Timing standard twister
Elapsed time is 0.101307 seconds.
Timing SIMD twister
Elapsed time is 0.057441 seconds

The MATLAB documentation does not tell us which algorithm their implementation is based on but it seems to be different from Julia’s. In Julia, if we set the seed to 1 as we did for MATLAB and ask for 5 random numbers, we get something different from MATLAB:

julia> using Random
julia> Random.seed!(1);
julia> rand(1,5)
1×5 Array{Float64,2}:
 0.236033  0.346517  0.312707  0.00790928  0.48861

The performance of MATLAB’s new generator is on-par with Julia’s although I’ll repeat that these timing tests are far far from rigorous.

julia> Random.seed!(1);

julia> @time rand(1,10000000);
  0.052981 seconds (6 allocations: 76.294 MiB, 29.40% gc time)
August 29th, 2018

Over the years I’ve been blogging, I have run a few recurring series of blogposts.  In the early days, there was the shortlived Problem of The Week.  Sometime later I inherited The Carnival of Maths which I looked after for a couple of years before passing it over to Aperiodical.com who have looked after it ever since.  I also ran a series called A Month of Math Software for 2 and a half years before my enthusiasm for the topic ran out.

I am currently the Head of Research Computing at The University of Leeds — a senior management role that puts me in the fortunate position of being reasonably well-informed about the world of research computing.  Software, hardware, cloud, data science, dealing with sensitive data…everyone, it seems, has something to tell me.  I’m also continuing with my EPSRC fellowship part-time which means that I’m rather more hands on than a typical member of an executive leadership team.

While at JuliaCon 2018, I had the extremely flattering experience of a few people telling me that they had been long time readers of WalkingRandomly and that they were disappointed that I didn’t post as often

All of this has led to the desire to start a new regular series.  One where I look at all aspects of research computing and compile it into a series of monthly posts.  If you have anything you’d like including in next months’ post — contact me via the usual channels.

Botched code causes seven-year scientific argument

For the last couple of years, I have given a talk around the UK and Europe called ‘Is your Research Software Correct‘ (unlike this other talk of mine, it has not yet been recorded but I’ll soon remedy that! Let me know if you can offer a venue with good recording facilities).

I start off by asking the audience to Imagine….imagine that the results of your latest simulation or data analysis are in and they are amazing.  Your heart beats faster, this result changes everything and you know it. This is why you entered science, this is what you always hoped for. Papers in journals like Nature or Science — no problem. Huge grant to follow up this work…professorship….maybe, you dare to dream, this could lead to a nobel prize.  Only one minor problem — the code is completely wrong and you just haven’t figured it out yet.

In the talk (based originally on an old blog post here) I go on to suggest and discuss some simple practices that might help the situation.  Scripting and coding instead of pointy-clicky solutions, version control, testing, open source your software, software citation etc.  None of it is mind blowing but I firmly believe that if all of the advice was taken, we’d have fewer situations like this one…..

Long story short, Two groups were investigating what happens when you super-freeze water.  They disagreed and much shouting happened for 7 years.  There was a bug in the code of one group.  A great article discussing the saga is available over at Physics Today:

https://physicstoday.scitation.org/do/10.1063/PT.6.1.20180822a/full/

Standout quotes that may well end up in a future version of Is Your Research Software Correct include

“One of the real travesties,” he says, is that “there’s no way you could have reproduced [the Berkeley team’s] algorithm—the way they had implemented their code—from reading their paper.” Presumably, he adds, “if this had been disclosed, this saga might not have gone on for seven years”

and

Limmer maintains that he and his mentor weren’t trying to hide anything. “I had and was very willing to share the code,” he says. What he didn’t have, he says, was the time or personnel to prepare the code in a form that could be useful to an outsider. “When Debenedetti’s group was making their code available,” Limmer explains, “he brought people in to clean up the code and document and run tests on it. He had resources available to him to be able to do that.” At Berkeley, “it was just me trying to get that done myself.”

Which is a case study for asking for Research Software Engineer support in a grant if ever I saw one.

Julia gets all grown up — version 1.0 released at JuliaCon 2018

One of the highlights of the JuliaCon 2018 conference was the release of Julia version 1.0 — a milestone that signifies that the new-language-on-the block has reached a certain level of maturity.   We celebrated the release at University of Leeds by installing it on our most recent HPC system – ARC3.

In case you don’t know, Julia is a relatively new free and open source language for technical computing.  It works on everything from Raspberry Pi up to HPC systems with thousands of Cores.  It’s the reason for the letters Ju in Project Jupyter and aims to be an easy to use language (along the lines of Python, R or MATLAB) with the performance of languages like Fortran or C.

UK Research Software Engineering Association Webinar series

The UK Research Software Engineering Association is starting a new webinar series this month with a series of planned topics including an Introduction to Object-Oriented Design for Scientists, Interfacing to/from Python with C, FORTRAN or C++ and Meltdown for Dummies.

These webinars are free to join, and you do not need to register in advance. Full details including the link to join the webinar are available below.

For more information on the RSE webinar series, including information on how to propose a webinar and information on upcoming webinars, please see:

https://rse.ac.uk/events/rse-webinar-series

This page will also have links to recordings of past webinars when they become available.

Verification and Modernisation of Fortran Codes using the NAG Fortran Compiler

There is still a huge amount of research software written in Fortran. Indeed, software written in Fortran are, by far, the most popular codes run on the UK’s national supercomputer service, Archer (See http://www.archer.ac.uk/status/codes/ for up to date stats).

Fortran compilers are not created equally and many professional Fortran developers will suggest that you develop against more than one.  Gfortran is probably essential if you want your code to be usable by all but the Intel Fortran Compiler can often produce the fastest executables on x86 hardware and the NAG Compiler is useful for ensuring correctness.

This webinar by NAG’s Wadud Miah promises to show what the NAG Fortran Compiler can do for your Fortran code.

New Macbook Pro has 6 CPU cores but….

Apple’s new Macbook Pro laptops have a fantastic looking CPUs in them with the top of the line boasting 6 cores and turbo boost up to 4.8Ghz.  Sounds amazing for simulation work but it seems that there are some thermal issues that prevent it from running at top speed for long.

Contact me to get your news included next month

That’s all I have for this first article in the series.  If you have any research computing news that you’d like included in the next edition, contact me.

August 24th, 2018

Audiences can be brutal

I still have nightmares about the first talk I ever gave as a PhD student. I was not a strong presenter, my grasp of the subject matter was still very tenuous and I was as nervous as hell. Six months or so into my studentship, I was to give a survey of the field I was studying to a bunch of very experienced researchers.  I spent weeks preparing…practicing…honing my slides…hoping that it would all be good enough.

The audience was not kind to me! Even though it was only a small group of around 12 people, they were brutal! I felt like they leaped upon every mistake I made, relished in pointing out every misunderstanding I had and all-round gave me a very hard time.  I had nothing like the robustness I have now and very nearly quit my PhD the very next day. I can only thank my office mates and enough beer to kill a pony for collectively talking me out of quitting.

I remember stopping three quarters of the way through saying ‘That’s all I want to say on the subject’ only for one of the senior members of the audience to point out that ‘You have not talked about all the topics you promised’.  He made me go back to the slide that said something like ‘Things I will talk about’ or ‘Agenda’ or whatever else I called the stupid thing and say ‘Look….you’ve not mentioned points X,Y and Z’ [1].

Everyone agreed and so my torture continued for another 15 minutes or so.

Practice makes you tougher

Since that horrible day, I have given hundreds of talks to audiences that range in size from 5 up to 300+ and this amount of practice has very much changed how I view these events.  I always enjoy them…always!  Even when they go badly!

In the worst case scenario, the most that can happen is that I get given a mildly bad time for an hour or so of my life but I know I’ll get over it. I’ve gotten over it before. No big deal! Academic presentations on topics such as research computing rarely lead to life threatening outcomes.

But what if it was recorded?!

Anyone who has worked with me for an appreciable amount of time will know of my pathological fear of having one of my talks recorded. Point a camera at me and the confident, experienced speaker vanishes and is replaced by someone much closer to the terrified PhD student of my youth.

I struggle to put into words what I’m so afraid of but I wonder if it ultimately comes down to the fact that if that PhD talk had been recorded and put online, I would never have been able to get away from it. My humiliation would be there for all to see…forever.

JuliaCon 2018 and Rise of the Research Software Engineer

When the organizers of JuliaCon 2018 invited me to be a keynote speaker on the topic of Research Software Engineering, my answer was an enthusiastic ‘Yes’. As soon as I learned that they would be live streaming and recording all talks, however, my enthusiasm was greatly dampened.

‘Would you mind if my talk wasn’t live streamed and recorded’ I asked them.  ‘Sure, no problem’ was the answer….

Problem averted. No need to face my fears this week!

A fellow delegate of the conference pointed out to me that my talk would be the only one that wouldn’t be on the live stream. That would look weird and not in a good way.

‘Can I just be live streamed but not recorded’ I asked the organisers.  ‘Sure, no problem’ [2] was the reply….

Later on the technician told me that I could have it recorded but it would be instantly hidden from the world until I had watched it and agreed it wasn’t too terrible.  Maybe this would be a nice first step in my record-a-talk-a-phobia therapy he suggested.

So…on I went and it turned out not to be as terrible as I had imagined it might be.  So we published it. I learned that I say ‘err’ and ‘um’ a lot [3] which I find a little embarrassing but perhaps now that I know I have that problem, it’s something I can work on.

Rise of the Research Software Engineer

Anyway, here’s the video of the talk. It’s about some of the history of The Research Software Engineering movement and how I worked with some awesome people at The University of Sheffield to create a RSE group. If you are the computer-person in your research group who likes software more than papers, you may be one of us. Come join the tribe!

Slide deck at mikecroucher.github.io/juliacon2018/

Feel free to talk to me on twitter about it: @walkingrandomly

Thanks to the infinitely patient and wonderful organisers of JuliaCon 2018 for the opportunity to beat one of my long standing fears.

Footnotes

[1] Pro-Tip: Never do one of these ‘Agenda’ slides…give yourself leeway to alter the course of your presentation midway through depending on how well it is going.

[2] So patient! Such a lovely team!

[3] Like A LOT! My mum watched the video and said ‘No idea what you were talking about but OMG can you cut out the ummms and ahhs’

June 13th, 2016

I was in a funk!

Not long after joining the University of Sheffield, I had helped convince a raft of lecturers to switch to using the Jupyter notebook for their lecturing. It was an easy piece of salesmanship and a whole lot of fun to do. Lots of people were excited by the possibilities.

The problem was that the University managed desktop was incapable of supporting an instance of the notebook with all of the bells and whistles included. As a cohort, we needed support for Python 2 and 3 kernels as well as R and even Julia. The R install needed dozens of packages and support for bioconductor. We needed LateX support to allow export to pdf and so on. We also needed to keep up to date because Jupyter development moves pretty fast! When all of this was fed into the managed desktop packaging machinery, it died. They could give us a limited, basic install but not one with batteries included.

I wanted those batteries!

In the early days, I resorted to strange stuff to get through the classes but it wasn’t sustainable. I needed a miracle to help me deliver some of the promises I had made.

Miracle delivered – SageMathCloud

During the kick-off meeting of the OpenDreamKit project, someone introduced SageMathCloud to the group. This thing had everything I needed and then some! During that presentation, I could see that SageMathCloud would solve all of our deployment woes as well as providing some very cool stuff that simply wasn’t available elsewhere. One killer-application, for example, was Google-docs-like collaborative editing of Jupyter notebooks.

I fired off a couple of emails to the lecturers I was supporting (“Everything’s going to be fine! Trust me!”) and started to learn how to use the system to support a course. I fired off dozens of emails to SageMathCloud’s excellent support team and started working with Dr Marta Milo on getting her Bioinformatics course material ready to go.

TL; DR: The course was a great success and a huge part of that success was the SageMathCloud platform

Giving back – A tutorial for lecturers on using SageMathCloud

I’m currently working on a tutorial for lecturers and teachers on how to use SageMathCloud to support a course. The material is licensed CC-BY and is available at https://github.com/mikecroucher/SMC_tutorial 

If you find it useful, please let me know. Comments and Pull Requests are welcome.

April 18th, 2016

In this article, I find myself in the rather odd position of interviewing myself as part of my series of interviews with the new cohort of EPSRC Research Software Engineering Fellows.

Could you tell us a little about yourself and how you became a Research Software Engineer?

I completed a PhD in theoretical physics in 2005 at The University of Sheffield where my area of research was photonic crystals. The most important thing I learned during my PhD is that I was a lot better at solving computational problems than I was at Physics. In particular, I seemed to be a lot better at solving other people’s problems rather than inventing and solving my own.

This led me to consider a job at The University of Manchester in the centralised Applications Support Team with IT Services. This team looked fantastic! Its role was to support Manchester’s extensive site licensed application portfolio – MATLAB, Mathematica, Intel Compilers, SPSSAbaqus…that sort of thing. It included aspects of licensing, sysadmin, installation support, high performance computing, consultancy, teaching – you name it! Sadly, 6 months after I started at Manchester, there was the first of many IT department restructures and the team was disbanded.

The centralised service was devolved into the faculties (It’s centralised again now!) and I ended up in the faculty of Engineering and Physical Sciences.  I took the responsibility of supporting a portfolio of applications with me. Broadly speaking, I became ‘the MATLAB guy’ at Manchester but also started extending support into the open source world — Python, R, Octave and so on.  This was done organically: I went were the work took me. If lots of academics came with problems in foo, I learned about foo, solved problems in foo and started teaching how to do foo better. If ‘foo’ happened to be distasteful to me — such as VBA — well, tough! To be successful in support, you must do the job that’s put in front of you. It’s a little like an Accident and Emergency department.

Very early on, I learned that the path to a researcher’s heart was speed. They wanted results and they wanted them fast! There was a team at Manchester doing hardcore Fortran/MPI stuff and they had that market sewn up — there was no space for the likes of me there. So, I took advantage of being ‘The MATLAB guy’ and started offering programming services to whoever wanted them for free. I could often achieve 100x speed-ups with less than a day’s work and this made me pretty popular!

I got to see a lot of code and that’s where the problems started. I’d get code with names like phdresults_dec2006_ver12_broken_fixed_FORMIKE.m that wouldn’t run on my machine. I’d learn that my machine was the second machine it had ever been run on and I was the second person to ever see the code. There would be 1000s of lines of code with no version control and no tests which made refactoring scary!

I learned that a huge amount of computational research was being done inefficiently. I feel that I need to be very clear: when I say ‘inefficient’, I’m not referring to Fortran code, say, that’s not making optimal use of the cache, SIMD instructions or has poor scaling over 128 cores. When I was young, this is where I thought the work would be. Sure, there’s some, but that’s not where you can help the most people.

A much more common situation, I find, is a researcher who’s workflow includes a huge amount of manual work. PhD students (and in one notable case, a very senior professor!) who manually edit 100s of spreadsheets for example. A morning’s work spent on some simple automation can completely change their lives!

These experiences got me interested in how to improve the general level of software engineering practice in research. I became a Software Sustainability Institute fellow in 2013, discovered this huge, amazing community and the rest is history.

You recently left Manchester University to move to Sheffield? What was behind that?

Prof. Neil Lawrence met me in The Sheffield Tap one evening and said ‘How would you like to ditch your commute and change the world?’  He was interested in bringing some of the research software initiatives I’d worked on in Manchester over to Sheffield as part of his Open Data Science initiative.

Changing the world sounded interesting but he had me at ‘ditch the commute’ if I’m being honest. I’ve lived in Sheffield for 20 years and commuted to Manchester by train for 10 of those. On some days, my twitter feed @walkingrandomly degenerated into little more than rants against various train companies! I needed to stop.

Working with Neil and the University of Sheffield has been an amazing experience. There’s a vibrancy here that’s infectious and a strong desire to do better in Research Software. That Sheffield won 2 of the 7 Research Software Engineering fellowships on offer was like a dream come true. The other Sheffield RSE fellow is Paul Richmond and we’ve joined forces to provide the strongest research software service we can to The University of Sheffield.

What do you think is the role of a Research Software Engineer?

I’m going to lift the answer to this straight out of my fellowship application.

Technological development in software is more like a cliff-face than a ladder – there are many routes to the top, to a solution. Further, the cliff face is dynamic – constantly and quickly changing as new technologies emerge and decline. Determining which technologies to deploy and how best to deploy them is in itself a specialist domain, with many features of traditional research.

Researchers need empowerment and training to give them confidence with the available equipment and the challenges they face. This role, akin to that of an Alpine guide, involves support, guidance, and load carrying. When optimally performed it results in a researcher who knows what challenges they can attack alone, and where they need appropriate support. Guides can help decide whether to exploit well-trodden paths or explore new possibilities as they navigate through this dynamic environment.

These guides are highly trained, technology-centric, research-aware individuals who have a curiosity driven nature dedicated to supporting researchers by forging a research software support career. Such Research Software Engineers (RSEs) guide researchers through the technological landscape and form a human interface between scientist and computer. A well-functioning RSE group will not just add to an organisation’s effectiveness, it will have a multiplicative effect since it will make every individual researcher more effective. It has the potential to improve the quality of research done across all University departments and faculties.

Are there any downsides to being a Research Software Engineer?

Something I’ve learned from conducting these interviews is that there are several different types of ‘Research Software Engineer’. We are not a ‘one size fits all’ community! I think that one thing we all have in common is that we don’t fit the normal ‘money-in, papers-out’ model of many academics. This was brought up in Louise Brown’s interview and it strongly resonates with me. This situation makes it difficult for us to follow an academic-like career path.

It is extremely difficult, for example, to get promoted as a research programmer without attempting to become something you are not. Worse, it’s difficult to simply get a permanent job! Many RSEs are on short term contracts with low salaries. In short, you get much of the grief of working in academia without any of the benefits. Little wonder, then, that many of the best in the community choose to work in industry.

An alternative path for RSEs is to work in the University IT department. It’s the path that I took for example. This solves the short term contract issue but brings with it a whole new set of problems. Many IT managers simply don’t understand the value that an RSE can bring to a University. You can sum the issue up with the observation ‘Academics rarely complain to the head of IT that there’s no one around who can optimise their MATLAB code but they complain very quickly when MATLAB doesn’t work on the University managed desktop’. So, guess what I’d get assigned to?

We’ve established that RSEs aren’t ‘normal academics’ and they aren’t ‘normal IT support’ either so where do we fit? I’m trying to help figure that out and help provide an environment where RSEs can not only exist but thrive.

You’ve recently won an EPSRC RSE Fellowship! Can you give a brief overview of your project?

I aim to improve the research software of the ‘long tail scientist’. This term, attributed to Jim Downing of the Unilever Centre for Molecular Informatics – refers to the large number of small research units who perform a huge amount of research. Often, these small research units only have one or two people in them. They aren’t “big science” but there are LOTS of them!

Much of this research involves the generation of code by relatively untrained and inexperienced programmers. This code can benefit greatly from input by RSEs. An experienced RSE can, with relatively little effort, significantly enhance the quality and efficiency of such code whilst simultaneously providing training for the researcher who wrote it. For examples of what I mean, see my Testiminonials page.

I will improve scientific software efficiency, sustainability and reproducibility at the University of Sheffield, by working alongside researchers on their research code in a consultative manner. Rather than working prescriptively, my approach is based on offering and implementing a series of nudges. Nudges are interventions that alter people’s behaviour in a predictable way without forbidding any options. In the context of research software, example nudges might include ‘Learn to write idiomatically in your language of choice – it can lead to faster execution’, ‘See how unit tests allow us to make changes with confidence’ or ‘Using version control, we always know which code produced what result’.

The gulf between the computing scientific “elite” and those emailing spreadsheets is growing and I aim to close that gap.  One researcher I worked with recently said ‘You provide the next step after we’ve been on a Software Carpentry course.’ and I think that describes what I’m trying to do quite well.

How long did it take you to write your Fellowship application? Any other thoughts/advice on the application process?

Writing my fellowship application was one of the most difficult writing exercises I’ve ever undertaken! It took just over a month to write and during that time I did very little else. It took up my days, my evenings, my weekends, my every waking thought. It even consumed my dreams. It was exhausting!

Something that surprised me was the number of people who I needed to help make it happen. Fellowships are often made out to be very individual things but my application involved work by over 40 people! This includes university administrators at all levels, project partners, advisors and mentors. I had to navigate areas of University life and systems that were completely alien to me. There is no way I could have done it alone.

It is essential to get institutional support for your application. At the most basic level, you need a manager who is happy for you to go AWOL for a few weeks. At a higher level, you need to be able to demonstrate to the funding body that your University is fully behind you and your project.

You also need to be emotionally resilient. I poured my heart and soul into my first draft and the feedback from one of my advisors was ‘Well, you solved the blank-page problem.’ That was the only positive thing she had to say! Everything else was a tearing apart. It was brutal! I think I might have cried a little.

Every time I did a rewrite, my mentors found more weaknesses and beat up on me a little. This feedback was essential and made the application so much stronger. As such, I think one piece of advice I’d give is ‘Find mentors you trust who are going to be crushingly hard on you’.  It’ll hurt but nowhere near as much as the comments of Reviewer 2 ;)

Who are your project partners?

My style of working is extremely collaborative. As such I have a lot of formal project partners: The Software Sustainability Institute, The University of Manchester, UCL, Microsoft Research, Dassault Systèmes, Wolfram Research, Mathworks, The N8 Research Partnership, Maplesoft and NAG.

Tell me about your RSE group.

Sheffield has two EPSRC RSE fellows and we’ve teamed up to form the Sheffield Research Software Engineering group. We’ve only existed for a month! At the moment its just us but we have funds to recruit a few more people so watch this space.

Which programming languages and technologies do you regularly use?

I don’t get to choose what languages I use — the researchers I support do that for me. As such, I’m doing a lot of MATLAB, Python and R these days. For compiled languages, I tend to use either C or C++. There’s also some Mathematica and Maple sprinkled here and there.

I help support Sheffield’s High Performance Computing Service so also do a reasonable amount of Bash scripting and parallel computing.

Are there any languages/technologies that you used to use a lot but have now moved away from? Why?

I used to use Fortran back in the day but don’t seem to need it much now — it’s been a long time since I did a project with it. A couple of groups are offering to teach ‘Modern Fortran’ for us at Sheffield so perhaps I’ll take another look?

I used to like Perl, and even taught a one-day course on it 10 years ago but I strongly prefer Python and so, it seems, do the researchers I support.

Is there anything on your ‘to-learn’ list?

  • Cloud computing: I’ve started doing some small projects using Amazon EC2 and feel very much a newbie at the moment. I can figure out how to do things but am not sure if what I am doing is good practice or not.
  • Docker: I understand the basics but am yet to figure out how to use the technology properly for research.
  • Julia: I played with it a little a few years ago and really like it. There’s a lot of buzz around the language. No one has come to me with a Julia problem yet but I think its just a matter of time.
  • Modern OpenMP: I learned OpenMP a long time ago. It’s time for an update.
November 19th, 2015

Welcome to the 128th Carnival of Mathematics, the latest in a mathematical blogging tradition that’s been ongoing for over 8 years now!

Facts about 128

It’s said that every number is interesting and 128 is no exception. 128 is the largest number which is not the sum of distinct squares whereas it is the smallest number n such that dropping the first and the last digit of n leaves its largest prime factor (thanks, Number Gossip).

Wikipedia tells us that it is divisible by the total number of its divisors, making it a refactorable number. Additionally, 128 can be expressed by a combination of its digits with mathematical operators thus 128 = 28 – 1, making it a Friedman number in base 10.

128 was also the number of kilobytes of memory available in the magnificent computer shown below.

© Bill Bertram 2006, CC-BY-2.5 — Attribution

 

The Princeton Companion to Applied Mathematics
I recently received a copy of the The Princeton Companion to Applied Mathematics and it’s just beautiful, definitely recommended as a christmas gift for the maths geek in your life. The companion’s editor, Nick Higham, has written a few blog posts about it – Companion authors speaking about their work, Famous Mathematicians and The Princeton Companion and How to Use The Princeton Companion to Applied Mathematics.

We have a lot of problems, and that’s a good thing
‘Diane G’ submitted this advanced knowledge problem — great practice for advanced mathematics. This blog is amazing and posts practice problems every Monday and advanced problems every Wednesday.

Linear Programming
Laura Albert McLay of Punk Rock Operations Research (great blog title!) submitted two great posts: Should a football team run or pass? A game theory and linear programming approach and dividing up a large class into discussion sections using integer programming

Francisco Yuraszeck submitted 10 Things You need to know about Simplex Method saying ‘This article is about the basics concepts of Linear Programming and Simplex Method for beginers in Operations Research.

Computation
Stuart Mumford demonstrates various ways of computing the first 10,000 numbers in the Fibonacci Sequence using Python — and some are much faster than others. Laurent Gatto followed up with a version in R.

Cleve Moler, the original developer of MATLAB, looks at three algorithms for finding a zero of a function of a real variable:

Michael Trott of Wolfram Research looks at Aspect Ratios in Art: What Is Better Than Being Golden? Being Plastic, Rooted, or Just Rational? Investigating Aspect Ratios of Old vs. Modern Paintings

Andrew Collier explores Fourier Techniques in the Julia programming language.

Optimisation

The Numerical Algorithm Groups’s John Muddle looks at solving The Travelling Rugby Fan Problem.

Robert Fourer gives us two articles on Quadratic Optimization Mysteries: Part 1 and Part2. These are posts concerned with computational aspects of mathematical optimization, and specifically with the unexpected behavior of large-scale optimization algorithms when presented with several related quadratic problems.

Why Was 5 x 3 = 5 + 5 + 5 Marked Wrong
This image went viral recently

It generated a LOT of discussion. Brett Berry takes a closer look in Why Was 5 x 3 = 5 + 5 + 5 Marked Wrong.

Misc

Katie Steckles submitted an article that analyses the different visual themes explored by M.C. Escher in his artwork

Shecky R writes about our curious fascination with eccentric and top-notch mathematicians in Pursuing Alexander.

Brian Hayes has been Pumping the Primes and asks “Should we be surprised that a simple arithmetic procedure–two additions, a gcd, and an equality test–can pump out an endless stream of pure primality?”

Next time

Carnival of Maths #129 will be delivered by the team at Ganit Charcha. Head over to the main carnival website for more details.