April 23rd, 2019 | Categories: Linux | Tags:

My preferred workflow for writing technical documents these days is to write in Markdown (Or Jupyter Notebooks) and then use Pandoc to convert to PDF, Microsoft Word or whatever format is required by the end client.

While working with a markdown file recently, the pandoc conversion to PDF failed with the following error message

! Undefined control sequence.
l.117 \[ \coloneqq

This happens because Pandoc first converts the Markdown file to LaTeX which then gets compiled to PDF and the command \coloneqq isn’t included in any of the LaTeX packages that Pandoc uses by default.

The coloneqq command is in the mathtools package which, on Ubuntu, can be installed using

apt-get install texlive-latex-recommended

Once we have the package installed, we need to tell Pandoc to make use of it. The way I did this was to create a Pandoc metadata file that contained the following

---
header-includes: |
            \usepackage{mathtools}
---

I called this file pandoc_latex.yml and passed it to Pandoc as follows

pandoc --metadata-file=./pandoc_latex.yml ./input.md -o output.pdf

On one system I tried this on, I received the following error message

pandoc: unrecognized option `--metadata-file=./pandoc_latex.yml'

which suggests that the –metadata-file option is a relatively recent addition to Pandoc. I have no idea when this option was added but if this happens to you, you could try installing the latest version from https://github.com/jgm/pandoc/

I used 2.7.1.1 and it was fine so I guess anything later than this should also be OK.

April 17th, 2019 | Categories: Fortran, programming, retro computers | Tags:

It started with a tweet

While basking in some geek nostalgia on twitter, I discovered that my first ever microcomputer, the Sinclair Spectrum, once had a Fortran compiler

mira

However, that compiler was seemingly lost to history and was declared Missing in Action on World of Spectrum.

mira2

A few of us on Twitter enjoyed reading the 1987 review of this Fortran Compiler but since no one had ever uploaded an image of it to the internet, it seemed that we’d never get the chance to play with it ourselves.

I never thought it would come to this

One of the benefits of 5000+ followers on Twitter is that there’s usually someone who knows something interesting about whatever you happen to tweet about and in this instance, that somebody was my fellow Fellow of the Software Sustainability InstituteBarry Rowlingson.  Barry was fairly sure that he’d recently packed a copy of the Mira Fortran Compiler away in his loft and was blissfully unaware of the fact that he was sitting on a missing piece of microcomputing history!mira3

He was right! He did have it in the attic…and members of the community considered it valuable.

mira_box

As Barry mentioned in his tweet, converting a 40 year old cassette to an archivable .tzx format is a process that could result in permanent failure.  The attempt on side 1 of the cassette didn’t work but fortunately, side 2 is where the action was!

makeTzx

It turns out that everything worked perfectly.  On loading it into a Spectrum emulator, Barry could enter and compile Fortran on this platform for the first time in decades! Here is the source code for a program that computes prime numbers

prime_source

Here it is running

running_primes

and here we have Barry giving the sales pitch on the advanced functionality of this compiler :)

high_res

How to get the compiler

Barry has made the compiler, and scans of the documentation, available at https://gitlab.com/b-rowlingson/mirafortran

April 10th, 2019 | Categories: Linear Algebra, NAG Library, python | Tags:

I recently wrote a blog post for my new employer, The Numerical Algorithms Group, called Exploiting Matrix Structure in the solution of linear systems. It’s a demonstration that shows how choosing the right specialist solver for your problem rather than using a general purpose one can lead to a speed up of well over 100 times!  The example is written in Python but the NAG routines used can be called from a range of languages including C,C++, Fortran, MATLAB etc etc

April 3rd, 2019 | Categories: GPU, RSE, University of Sheffield | Tags:

My friends over at the University of Sheffield Research Software Engineering group are running a GPU Hackathon sponsored by Nvidia. The event will be on August 19-23 2019  in Sheffield, United Kingdom.  The call for proposals is at http://gpuhack.shef.ac.uk/

The Sheffield team have this to say about the event:

We are looking for teams of 3-5 developers with a scalable** application to port to or optimize on a GPU accelerator. Collectively the team must have complete knowledge of the application. If the application is a suite of apps, no more than two per team will be allowed and a minimum of 2 people per app must attend. Space will be limited to 8 teams.

** By scalable we mean node-to-node communication implemented, but don’t be discouraged from applying if your application is less than scalable. We are also looking for breadth of application areas.

The goal of the GPU hackathon is for current or prospective user groups of large hybrid CPU-GPU systems to send teams of at least 3 developers along with either:

  • (potentially) scalable application that could benefit from GPU accelerators, or
  • An application running on accelerators that needs optimization.

There will be intensive mentoring during this 5-day hands-on workshop, with the goal that the teams leave with applications running on GPUs, or at least with a clear roadmap of how to get there.

March 11th, 2019 | Categories: HPC | Tags:

One of the highlights of my attendance at the 2019 RICE Oil and Gas High Performance Computing event was the Women in HPC workshop. As a profession, HPC has very similar diversity issues to Research Software Engineering. In short, we need more diversity and the Women in HPC organisation is working towards that aim (side note – I am very happy that my employer, NAG, is a supporter of Women in HPC!)

The following call for participation was forwarded to me by Mozhgan Kabiri Chimeh who asked me to repost it.

The tenth international Women in HPC workshop at ISC19 will discuss methods to improve diversity and provide early career women with the opportunity to develop their professional skills and profile. The workshop will include:

  • Becoming an advocate and ally of under-represented women
  • Putting in place a framework to help women take leadership positions
  • Building mentoring programmes that work effectively for women.
  • Posters and lightning talks by women working in HPC
  • Short talks on: dealing with poor behaviour at work and how to help avoid it getting you down, how to deal with negative feedback, how to build writing into your daily routine and why it matters, etc.

Deadline for submissions: March 18th 2019

As part of the workshop, we invite submissions from women in industry and academia to present their work as a poster. Submissions are invited on all topics relating to HPC from users and developers. All abstracts should emphasize the computational aspects of the work, such as the facilities used, the challenges that HPC can help address and any remaining challenges etc.

Exclusive to WHPC at ISC19: Successful authors will have the opportunity to present their poster in the main ISC19 conference poster session.

For full details please see: https://www.womeninhpc.org/whpc-isc19/workshop/submit/

Workshop Organising Committee

  • Workshop Chair: Mozhgan Kabiri Chimeh (University of Sheffield, UK)
  • Co-chair: Toni Collis (Appentra S.L., Spain)
  • Co-chair: Misbah Mubarak (Argonne National Laboratory, USA)
  • Submissions Chair: Weronika Filinger (EPCC, University of Edinburgh, UK)
  • Submissions Vice Chair: Khomotso Maenetja (University of Limpopo, South Africa)
  • Mentoring Chair: Elsa Gonsiorowski (Lawrence Livermore National Laboratory, USA)
  • Invited Talks Chair: Gokcen Kestor (Pacific Northwest National Laboratory, USA)
  • Publicity Chair: Cristin Meritt (Alces Flight, UK)
  • Publicity Vice Chair: Aiman Shaikh (Science and Technology Facilities Council, UK)
  • Volunteer: Shabnam Sadegh (Technical University of Munich)
January 23rd, 2019 | Categories: C/C++, Linux, programming | Tags:

I have been an advocate of the Windows Subsytem for Linux ever since it was released (See Bash on Windows: The scripting game just changed) since it allows me to use the best of Linux from my windows laptop.  I no longer dual boot on my personal machines and rarely need to use Linux VMs on them either thanks to this technology.  I still use full-blown Linux a lot of course but these days it tends to be only on servers and HPC systems.

I recently needed to compile and play with some code that was based on the GNU Scientific Library. Using the Ubuntu 18.04 version of the WSL this is very easy. Install the GSL with

sudo apt-get install libgsl-dev

A simple code that evaluates Dawson’s integral over a range of x values is shown below. Call this dawson.cpp

#include<iostream>
#include<vector>
#include<gsl/gsl_sf.h>

int main(){

double range = 6; // max/min values
int N = 100000; // Number of evaluations
double step = 2 * range / N;
std::vector<double> x(N);
std::vector<double> result(N);

for (int i=0;i<=N;i++){
     x[i] = -range + i*step;
     result[i] = gsl_sf_dawson(x[i]);
}


for (int i=0;i<=N;i++){
	std::cout << x[i] << "," << result[i] << std::endl;
}

return 0;
}

Compile with

g++ -std=c++11 dawson.cpp -o ./dawson -lgsl -lgslcblas -lm

Run it and generate some results

./dawson > results.txt

If we plot results.txt we get

dawson

This code is also available on GitHub: https://github.com/mikecroucher/GSL_example

November 20th, 2018 | Categories: Making MATLAB faster, matlab, random numbers, Scientific Software | Tags:

In a recent blog post, Daniel Lemire explicitly demonstrated that vectorising random number generators using SIMD instructions could give useful speed-ups.  This reminded me of the one of the first times I played with the Julia language where I learned that Julia’s random number generator used a SIMD-accelerated implementation of Mersenne Twister called dSFMT to generate random numbers much faster than MATLAB’s Mersenne Twister implementation.

Just recently, I learned that MATLAB now has its own SIMD accelerated version of Mersenne Twister which can be activated like so:

seed=1;
rng(seed,'simdTwister')

This new Mersenne Twister implementation gives different random variates to the original implementation (which I demonstrated is the same as Numpy’s implementation in an older post) as you might expect

>> rng(1,'Twister')
>> rand(1,5)
ans =
0.4170 0.7203 0.0001 0.3023 0.1468
>> rng(1,'simdTwister')
>> rand(1,5)
ans =
0.1194 0.9124 0.5032 0.8713 0.5324

So it’s clearly a different algorithm and, on CPUs that support the relevant instructions, it’s about twice as fast!  Using my very unscientific test code:

format compact
number_of_randoms = 10000000
disp('Timing standard twister')
rng(1,'Twister')
tic;rand(1,number_of_randoms);toc
disp('Timing SIMD twister')
rng(1,'simdTwister')
tic;rand(1,number_of_randoms);toc

gives the following results for a typical run on my Dell XPS 15 9560 which supports AVX instructions

number_of_randoms =
    10000000
Timing standard twister
Elapsed time is 0.101307 seconds.
Timing SIMD twister
Elapsed time is 0.057441 seconds

The MATLAB documentation does not tell us which algorithm their implementation is based on but it seems to be different from Julia’s. In Julia, if we set the seed to 1 as we did for MATLAB and ask for 5 random numbers, we get something different from MATLAB:

julia> using Random
julia> Random.seed!(1);
julia> rand(1,5)
1×5 Array{Float64,2}:
 0.236033  0.346517  0.312707  0.00790928  0.48861

The performance of MATLAB’s new generator is on-par with Julia’s although I’ll repeat that these timing tests are far far from rigorous.

julia> Random.seed!(1);

julia> @time rand(1,10000000);
  0.052981 seconds (6 allocations: 76.294 MiB, 29.40% gc time)
October 1st, 2018 | Categories: programming, RSE | Tags:

A guest blog-post by Catherine Smith of University of Birmingham

In early 2017 I was in the audience at one of Mike Croucher’s ‘Is your research software correct?’ presentations. One of the first questions posed in the talk is ‘how reproducible is a mouse click?’. The answer, of course, is that it isn’t and therefore research processes should be automated and not involve anyone pressing any buttons. This posed something of a challenge to my own work which is primarily about making buttons for researchers to press (and select, drag and drop etc.) in order to present their data in the appropriate scholarly way. This software, for making electronic editions of texts preserved in multiple sources, assists with the alignment and editing of material. Even so, the editor is always in control and that is the way it should be. The lack of automation means reproducibility is a problem for my software but as Peter Shillingsburg, one of the pioneers of digital editing, says ‘editing is an art not a science’: maybe art can therefore be excused, to an extent, from the constraints of automation and, despite their introduction of human decisions, the buttons may be permitted to stay. Nevertheless I still want to know that my software doing what I think it is doing even if I can’t automate what editors choose to do with it. In the discussion that followed the paper I was talking about the complication of testing my interface-heavy software. Mike agreed that it was a complex situation but concluded by saying “if you go away from here and write one test you will have made the world a better place”.

I did just that. In fact I did very little else for the next three months. What started with one Python unit test has so far led to 65 Python unit tests, 82 Javascript unit tests and 54 functional tests using Selenium. The timing of all of this was perfect in that I had just begun a project to migrate all of our web applications to Django. I had one application partially migrated and so I tested that one and even did some test-driven development on the sections that were not yet complete.

The tests themselves are great to have. This was my first project using Django and I made lots of mistakes in the first application. The tests have been invaluable in ensuring that, as I learned more and made improvements, the older code kept pace with those changes. Now that I have tests for some things I want tests for everything and I have developed a healthy fear of editing code that is not yet tested. There are other advantages as well. When I sat down to write my first test it very quickly became clear that the code I had written was not easily testable. I had to break down the large Django views into smaller chunks of code that could each be unit tested. I now write better structured code because of that time I invested in testing just some of it. I also learned a lot about how to approach migrating all of the remaining applications while writing the detailed tests for every aspect of the first one.T

Django has an integrated test framework based on the python unittest module but with the additional benefit of automatically creating a test database using the models from the project to which test data can be added. It was very straightforward to set up and run (see the Django docs https://docs.djangoproject.com/en/2.1/topics/testing/). I found Javascript unit testing less straight forward. There was not much Javascript in this first application so I used the qunit test framework and sinon.js for mocking. I have never automated the running of these tests and instead just open them in the browser. It’s not ideal but it works for now. I have other applications which are far more Javascript heavy and for those I will need to have automated tests, there are plenty of frameworks around to choose from so I will investigate those when I start writing the tests.

Probably the most important tests I have are the functional tests which are written in Selenium. I had already heard of Selenium, having attended a Test Driven Development workshop several years ago by Harry Percival. I used his book, Test-Driven Development with Python, as a tutorial for all of the Selenium tests and some of the Django and Javascript tests too. Selenium tests are automated browser tests which really do allow you to test what happens when a user presses a button, types text into a text box, selects an item from a list, moves an element by dragging it etc.. The result of every interaction in an interface can be tested with Selenium. The content of each page can also be checked. It is generally not necessary to test static html but I did test the contents of several dynamic pages which loaded different content depending on the permissions granted to a user. Selenium is also integrated within Django using the LiveServerTestCase which means it has access to a copy of the database just like the Django unit tests. Selenium tests can be complex and there are several things to watch out for. Selenium doesn’t automatically wait for a page to load before executing the test statements against it, at every point data is loaded Selenium must be told to wait until a given condition is fulfilled up to a maximum time limit before continuing. I still have tests which occasionally fail because, on that particular run, a page or an ajax call is taking longer to load than I have allowed for. Run it another five times and it may well pass on every one. It is also important to make sure the browser is told to scroll to a point where an element can be seen before the instruction to interact with that element is given. It’s not difficult to do and is more predictable that waiting for a page to load but it still has to be remembered every time.

The functional tests are by far the most complex of all the tests I wrote in my three month testing marathon but they are the most important. I can’t automate the entire creation of a digital edition but with tests I can make sure my interface is presenting the correct data in the right way to the editors and that when they interact with that data everything behaves as it should. I really can say that the buttons and other interactive elements I have tested do exactly what I think they do. Now I just need to test all the rest of the buttons – one test at a time!

August 29th, 2018 | Categories: Research Computing Times | Tags:

Over the years I’ve been blogging, I have run a few recurring series of blogposts.  In the early days, there was the shortlived Problem of The Week.  Sometime later I inherited The Carnival of Maths which I looked after for a couple of years before passing it over to Aperiodical.com who have looked after it ever since.  I also ran a series called A Month of Math Software for 2 and a half years before my enthusiasm for the topic ran out.

I am currently the Head of Research Computing at The University of Leeds — a senior management role that puts me in the fortunate position of being reasonably well-informed about the world of research computing.  Software, hardware, cloud, data science, dealing with sensitive data…everyone, it seems, has something to tell me.  I’m also continuing with my EPSRC fellowship part-time which means that I’m rather more hands on than a typical member of an executive leadership team.

While at JuliaCon 2018, I had the extremely flattering experience of a few people telling me that they had been long time readers of WalkingRandomly and that they were disappointed that I didn’t post as often

All of this has led to the desire to start a new regular series.  One where I look at all aspects of research computing and compile it into a series of monthly posts.  If you have anything you’d like including in next months’ post — contact me via the usual channels.

Botched code causes seven-year scientific argument

For the last couple of years, I have given a talk around the UK and Europe called ‘Is your Research Software Correct‘ (unlike this other talk of mine, it has not yet been recorded but I’ll soon remedy that! Let me know if you can offer a venue with good recording facilities).

I start off by asking the audience to Imagine….imagine that the results of your latest simulation or data analysis are in and they are amazing.  Your heart beats faster, this result changes everything and you know it. This is why you entered science, this is what you always hoped for. Papers in journals like Nature or Science — no problem. Huge grant to follow up this work…professorship….maybe, you dare to dream, this could lead to a nobel prize.  Only one minor problem — the code is completely wrong and you just haven’t figured it out yet.

In the talk (based originally on an old blog post here) I go on to suggest and discuss some simple practices that might help the situation.  Scripting and coding instead of pointy-clicky solutions, version control, testing, open source your software, software citation etc.  None of it is mind blowing but I firmly believe that if all of the advice was taken, we’d have fewer situations like this one…..

Long story short, Two groups were investigating what happens when you super-freeze water.  They disagreed and much shouting happened for 7 years.  There was a bug in the code of one group.  A great article discussing the saga is available over at Physics Today:

https://physicstoday.scitation.org/do/10.1063/PT.6.1.20180822a/full/

Standout quotes that may well end up in a future version of Is Your Research Software Correct include

“One of the real travesties,” he says, is that “there’s no way you could have reproduced [the Berkeley team’s] algorithm—the way they had implemented their code—from reading their paper.” Presumably, he adds, “if this had been disclosed, this saga might not have gone on for seven years”

and

Limmer maintains that he and his mentor weren’t trying to hide anything. “I had and was very willing to share the code,” he says. What he didn’t have, he says, was the time or personnel to prepare the code in a form that could be useful to an outsider. “When Debenedetti’s group was making their code available,” Limmer explains, “he brought people in to clean up the code and document and run tests on it. He had resources available to him to be able to do that.” At Berkeley, “it was just me trying to get that done myself.”

Which is a case study for asking for Research Software Engineer support in a grant if ever I saw one.

Julia gets all grown up — version 1.0 released at JuliaCon 2018

One of the highlights of the JuliaCon 2018 conference was the release of Julia version 1.0 — a milestone that signifies that the new-language-on-the block has reached a certain level of maturity.   We celebrated the release at University of Leeds by installing it on our most recent HPC system – ARC3.

In case you don’t know, Julia is a relatively new free and open source language for technical computing.  It works on everything from Raspberry Pi up to HPC systems with thousands of Cores.  It’s the reason for the letters Ju in Project Jupyter and aims to be an easy to use language (along the lines of Python, R or MATLAB) with the performance of languages like Fortran or C.

UK Research Software Engineering Association Webinar series

The UK Research Software Engineering Association is starting a new webinar series this month with a series of planned topics including an Introduction to Object-Oriented Design for Scientists, Interfacing to/from Python with C, FORTRAN or C++ and Meltdown for Dummies.

These webinars are free to join, and you do not need to register in advance. Full details including the link to join the webinar are available below.

For more information on the RSE webinar series, including information on how to propose a webinar and information on upcoming webinars, please see:

https://rse.ac.uk/events/rse-webinar-series

This page will also have links to recordings of past webinars when they become available.

Verification and Modernisation of Fortran Codes using the NAG Fortran Compiler

There is still a huge amount of research software written in Fortran. Indeed, software written in Fortran are, by far, the most popular codes run on the UK’s national supercomputer service, Archer (See http://www.archer.ac.uk/status/codes/ for up to date stats).

Fortran compilers are not created equally and many professional Fortran developers will suggest that you develop against more than one.  Gfortran is probably essential if you want your code to be usable by all but the Intel Fortran Compiler can often produce the fastest executables on x86 hardware and the NAG Compiler is useful for ensuring correctness.

This webinar by NAG’s Wadud Miah promises to show what the NAG Fortran Compiler can do for your Fortran code.

New Macbook Pro has 6 CPU cores but….

Apple’s new Macbook Pro laptops have a fantastic looking CPUs in them with the top of the line boasting 6 cores and turbo boost up to 4.8Ghz.  Sounds amazing for simulation work but it seems that there are some thermal issues that prevent it from running at top speed for long.

Contact me to get your news included next month

That’s all I have for this first article in the series.  If you have any research computing news that you’d like included in the next edition, contact me.

August 24th, 2018 | Categories: Julia, RSE, Scientific Software, walking randomly | Tags:

Audiences can be brutal

I still have nightmares about the first talk I ever gave as a PhD student. I was not a strong presenter, my grasp of the subject matter was still very tenuous and I was as nervous as hell. Six months or so into my studentship, I was to give a survey of the field I was studying to a bunch of very experienced researchers.  I spent weeks preparing…practicing…honing my slides…hoping that it would all be good enough.

The audience was not kind to me! Even though it was only a small group of around 12 people, they were brutal! I felt like they leaped upon every mistake I made, relished in pointing out every misunderstanding I had and all-round gave me a very hard time.  I had nothing like the robustness I have now and very nearly quit my PhD the very next day. I can only thank my office mates and enough beer to kill a pony for collectively talking me out of quitting.

I remember stopping three quarters of the way through saying ‘That’s all I want to say on the subject’ only for one of the senior members of the audience to point out that ‘You have not talked about all the topics you promised’.  He made me go back to the slide that said something like ‘Things I will talk about’ or ‘Agenda’ or whatever else I called the stupid thing and say ‘Look….you’ve not mentioned points X,Y and Z’ [1].

Everyone agreed and so my torture continued for another 15 minutes or so.

Practice makes you tougher

Since that horrible day, I have given hundreds of talks to audiences that range in size from 5 up to 300+ and this amount of practice has very much changed how I view these events.  I always enjoy them…always!  Even when they go badly!

In the worst case scenario, the most that can happen is that I get given a mildly bad time for an hour or so of my life but I know I’ll get over it. I’ve gotten over it before. No big deal! Academic presentations on topics such as research computing rarely lead to life threatening outcomes.

But what if it was recorded?!

Anyone who has worked with me for an appreciable amount of time will know of my pathological fear of having one of my talks recorded. Point a camera at me and the confident, experienced speaker vanishes and is replaced by someone much closer to the terrified PhD student of my youth.

I struggle to put into words what I’m so afraid of but I wonder if it ultimately comes down to the fact that if that PhD talk had been recorded and put online, I would never have been able to get away from it. My humiliation would be there for all to see…forever.

JuliaCon 2018 and Rise of the Research Software Engineer

When the organizers of JuliaCon 2018 invited me to be a keynote speaker on the topic of Research Software Engineering, my answer was an enthusiastic ‘Yes’. As soon as I learned that they would be live streaming and recording all talks, however, my enthusiasm was greatly dampened.

‘Would you mind if my talk wasn’t live streamed and recorded’ I asked them.  ‘Sure, no problem’ was the answer….

Problem averted. No need to face my fears this week!

A fellow delegate of the conference pointed out to me that my talk would be the only one that wouldn’t be on the live stream. That would look weird and not in a good way.

‘Can I just be live streamed but not recorded’ I asked the organisers.  ‘Sure, no problem’ [2] was the reply….

Later on the technician told me that I could have it recorded but it would be instantly hidden from the world until I had watched it and agreed it wasn’t too terrible.  Maybe this would be a nice first step in my record-a-talk-a-phobia therapy he suggested.

So…on I went and it turned out not to be as terrible as I had imagined it might be.  So we published it. I learned that I say ‘err’ and ‘um’ a lot [3] which I find a little embarrassing but perhaps now that I know I have that problem, it’s something I can work on.

Rise of the Research Software Engineer

Anyway, here’s the video of the talk. It’s about some of the history of The Research Software Engineering movement and how I worked with some awesome people at The University of Sheffield to create a RSE group. If you are the computer-person in your research group who likes software more than papers, you may be one of us. Come join the tribe!

Slide deck at mikecroucher.github.io/juliacon2018/

Feel free to talk to me on twitter about it: @walkingrandomly

Thanks to the infinitely patient and wonderful organisers of JuliaCon 2018 for the opportunity to beat one of my long standing fears.

Footnotes

[1] Pro-Tip: Never do one of these ‘Agenda’ slides…give yourself leeway to alter the course of your presentation midway through depending on how well it is going.

[2] So patient! Such a lovely team!

[3] Like A LOT! My mum watched the video and said ‘No idea what you were talking about but OMG can you cut out the ummms and ahhs’