July 1st, 2019 | Categories: python | Tags:

I am a huge user of Anaconda Python and the way I usually get access to the Anaconda Prompt is to start typing ‘Anaconda’ in the Windows search box and click on the link as soon as it pops up. Easy and convenient. Earlier today, however, the Windows 10 menu shortcuts for the Anaconda command line vanished from my machine!

I’m not sure exactly what triggered this but I was heavily messing around with various environments, including the base one, and also installed Visual Studio 2019 Community Edition with all the Python extensions before I noticed that the menu shortcuts had gone missing. No idea what the root cause was.

Fortunately, getting my Anaconda Prompt back was very straightforward:

  • launch Anaconda Navigator
  • Click on Environments
  • Selected base (root)
  • Choose Not installed  from the drop down list
  • Type for console_ in the search box
  • Check the console_shortcut package
  • Click Apply and follow the instructions to install the package

console_shortcut

May 24th, 2019 | Categories: Cloud Computing, HPC, NAG Library | Tags:

HPC! To cloud or not to cloud….

Over the course of my career I have been involved with the provision of High Performance Computing (HPC) at almost every level. As a researcher and research software engineer I have been, and continue to be, a user of many large scale systems.  As a member of research computing support, I was involved in service development and delivery, user-support, system administration and HPC training. Finally, as a member of senior leadership teams, I have been involved in the financial planning and strategic development of institution-wide HPC  services.

In recent years, the question that pops up at every level of involvement with HPC is ‘Should we do this with our own equipment or should we do this in the cloud?’  

This is an extremely complex, multi-dimensional question where the parameters are constantly shifting.  The short, answer is always ‘well…it depends’ and it always does ‘depend’…on many things!  What do you want to do? What is the state of the software that you have available? How much money do you have? How much time do you have? What support do you have? What equipment do you currently have and who else are you working with and so on.

Today, I want to focus on just one of these factors. Cost.  I’m not saying that cost is necessarily the most important consideration in making HPC-related decisions but given that most of us have fixed budgets, it is impossible to ignore.

The value of independence

I have been involved in many discussions of HPC costs over the years and have seen several attempts at making the cloud vs on-premise cost-comparison.  Such attempts are often biased in one direction or the other.  It’s difficult not to be when you have a vested interest in the outcome.

When I chose to leave academia and join industry, I decided to work with the Numerical Algorithms Group (NAG), a research computing company that has been in business for almost 50 years. One reason for choosing them was their values which strongly resonate with my own.  Another was their independence.  NAG are a not-for-profit (yet commercial!) company who literally cannot be bought.  They solve the independence problem by collaborating with pretty much everybody in the industry.

So, NAG are more interested in helping clients solve their technical computing problems than they are in driving them to any given solution.  Cloud or on-premise?  They don’t mind…they just want to help you make the decision.

Explore HPC cost models yourself

NAG’s VP for HPC services and consulting, Andrew Jones (@hpcnotes), has been teaching seminars on Total Cost Of Ownership models for several years at events like the SC Supercomputing conference series.  To support these tutorials, Andrew created a simple, generic TCO model spreadsheet to illustrate the concepts and to show how even a simple TCO model can guide decisions. 

Many people asked if NAG could send them the TCO spreadsheet from the tutorial but they decided to go one better and converted it into a web-form where you can start exploring some of the concepts yourself. 

I also made a video about it.

If you want more, get in touch with either me (@walkingrandomly) or Andrew (@hpcnotes) on twitter or email hpc@nag.com.  We’ll also be teaching this type of thing at ISC 2019 so please do join us there too.

April 23rd, 2019 | Categories: Linux | Tags:

My preferred workflow for writing technical documents these days is to write in Markdown (Or Jupyter Notebooks) and then use Pandoc to convert to PDF, Microsoft Word or whatever format is required by the end client.

While working with a markdown file recently, the pandoc conversion to PDF failed with the following error message

! Undefined control sequence.
l.117 \[ \coloneqq

This happens because Pandoc first converts the Markdown file to LaTeX which then gets compiled to PDF and the command \coloneqq isn’t included in any of the LaTeX packages that Pandoc uses by default.

The coloneqq command is in the mathtools package which, on Ubuntu, can be installed using

apt-get install texlive-latex-recommended

Once we have the package installed, we need to tell Pandoc to make use of it. The way I did this was to create a Pandoc metadata file that contained the following

---
header-includes: |
            \usepackage{mathtools}
---

I called this file pandoc_latex.yml and passed it to Pandoc as follows

pandoc --metadata-file=./pandoc_latex.yml ./input.md -o output.pdf

On one system I tried this on, I received the following error message

pandoc: unrecognized option `--metadata-file=./pandoc_latex.yml'

which suggests that the –metadata-file option is a relatively recent addition to Pandoc. I have no idea when this option was added but if this happens to you, you could try installing the latest version from https://github.com/jgm/pandoc/

I used 2.7.1.1 and it was fine so I guess anything later than this should also be OK.

April 17th, 2019 | Categories: Fortran, programming, retro computers | Tags:

It started with a tweet

While basking in some geek nostalgia on twitter, I discovered that my first ever microcomputer, the Sinclair Spectrum, once had a Fortran compiler

mira

However, that compiler was seemingly lost to history and was declared Missing in Action on World of Spectrum.

mira2

A few of us on Twitter enjoyed reading the 1987 review of this Fortran Compiler but since no one had ever uploaded an image of it to the internet, it seemed that we’d never get the chance to play with it ourselves.

I never thought it would come to this

One of the benefits of 5000+ followers on Twitter is that there’s usually someone who knows something interesting about whatever you happen to tweet about and in this instance, that somebody was my fellow Fellow of the Software Sustainability InstituteBarry Rowlingson.  Barry was fairly sure that he’d recently packed a copy of the Mira Fortran Compiler away in his loft and was blissfully unaware of the fact that he was sitting on a missing piece of microcomputing history!mira3

He was right! He did have it in the attic…and members of the community considered it valuable.

mira_box

As Barry mentioned in his tweet, converting a 40 year old cassette to an archivable .tzx format is a process that could result in permanent failure.  The attempt on side 1 of the cassette didn’t work but fortunately, side 2 is where the action was!

makeTzx

It turns out that everything worked perfectly.  On loading it into a Spectrum emulator, Barry could enter and compile Fortran on this platform for the first time in decades! Here is the source code for a program that computes prime numbers

prime_source

Here it is running

running_primes

and here we have Barry giving the sales pitch on the advanced functionality of this compiler :)

high_res

How to get the compiler

Barry has made the compiler, and scans of the documentation, available at https://gitlab.com/b-rowlingson/mirafortran

April 10th, 2019 | Categories: Linear Algebra, NAG Library, python | Tags:

I recently wrote a blog post for my new employer, The Numerical Algorithms Group, called Exploiting Matrix Structure in the solution of linear systems. It’s a demonstration that shows how choosing the right specialist solver for your problem rather than using a general purpose one can lead to a speed up of well over 100 times!  The example is written in Python but the NAG routines used can be called from a range of languages including C,C++, Fortran, MATLAB etc etc

April 3rd, 2019 | Categories: GPU, RSE, University of Sheffield | Tags:

My friends over at the University of Sheffield Research Software Engineering group are running a GPU Hackathon sponsored by Nvidia. The event will be on August 19-23 2019  in Sheffield, United Kingdom.  The call for proposals is at http://gpuhack.shef.ac.uk/

The Sheffield team have this to say about the event:

We are looking for teams of 3-5 developers with a scalable** application to port to or optimize on a GPU accelerator. Collectively the team must have complete knowledge of the application. If the application is a suite of apps, no more than two per team will be allowed and a minimum of 2 people per app must attend. Space will be limited to 8 teams.

** By scalable we mean node-to-node communication implemented, but don’t be discouraged from applying if your application is less than scalable. We are also looking for breadth of application areas.

The goal of the GPU hackathon is for current or prospective user groups of large hybrid CPU-GPU systems to send teams of at least 3 developers along with either:

  • (potentially) scalable application that could benefit from GPU accelerators, or
  • An application running on accelerators that needs optimization.

There will be intensive mentoring during this 5-day hands-on workshop, with the goal that the teams leave with applications running on GPUs, or at least with a clear roadmap of how to get there.

March 11th, 2019 | Categories: HPC | Tags:

One of the highlights of my attendance at the 2019 RICE Oil and Gas High Performance Computing event was the Women in HPC workshop. As a profession, HPC has very similar diversity issues to Research Software Engineering. In short, we need more diversity and the Women in HPC organisation is working towards that aim (side note – I am very happy that my employer, NAG, is a supporter of Women in HPC!)

The following call for participation was forwarded to me by Mozhgan Kabiri Chimeh who asked me to repost it.

The tenth international Women in HPC workshop at ISC19 will discuss methods to improve diversity and provide early career women with the opportunity to develop their professional skills and profile. The workshop will include:

  • Becoming an advocate and ally of under-represented women
  • Putting in place a framework to help women take leadership positions
  • Building mentoring programmes that work effectively for women.
  • Posters and lightning talks by women working in HPC
  • Short talks on: dealing with poor behaviour at work and how to help avoid it getting you down, how to deal with negative feedback, how to build writing into your daily routine and why it matters, etc.

Deadline for submissions: March 18th 2019

As part of the workshop, we invite submissions from women in industry and academia to present their work as a poster. Submissions are invited on all topics relating to HPC from users and developers. All abstracts should emphasize the computational aspects of the work, such as the facilities used, the challenges that HPC can help address and any remaining challenges etc.

Exclusive to WHPC at ISC19: Successful authors will have the opportunity to present their poster in the main ISC19 conference poster session.

For full details please see: https://www.womeninhpc.org/whpc-isc19/workshop/submit/

Workshop Organising Committee

  • Workshop Chair: Mozhgan Kabiri Chimeh (University of Sheffield, UK)
  • Co-chair: Toni Collis (Appentra S.L., Spain)
  • Co-chair: Misbah Mubarak (Argonne National Laboratory, USA)
  • Submissions Chair: Weronika Filinger (EPCC, University of Edinburgh, UK)
  • Submissions Vice Chair: Khomotso Maenetja (University of Limpopo, South Africa)
  • Mentoring Chair: Elsa Gonsiorowski (Lawrence Livermore National Laboratory, USA)
  • Invited Talks Chair: Gokcen Kestor (Pacific Northwest National Laboratory, USA)
  • Publicity Chair: Cristin Meritt (Alces Flight, UK)
  • Publicity Vice Chair: Aiman Shaikh (Science and Technology Facilities Council, UK)
  • Volunteer: Shabnam Sadegh (Technical University of Munich)
January 23rd, 2019 | Categories: C/C++, Linux, programming | Tags:

I have been an advocate of the Windows Subsytem for Linux ever since it was released (See Bash on Windows: The scripting game just changed) since it allows me to use the best of Linux from my windows laptop.  I no longer dual boot on my personal machines and rarely need to use Linux VMs on them either thanks to this technology.  I still use full-blown Linux a lot of course but these days it tends to be only on servers and HPC systems.

I recently needed to compile and play with some code that was based on the GNU Scientific Library. Using the Ubuntu 18.04 version of the WSL this is very easy. Install the GSL with

sudo apt-get install libgsl-dev

A simple code that evaluates Dawson’s integral over a range of x values is shown below. Call this dawson.cpp

#include<iostream>
#include<vector>
#include<gsl/gsl_sf.h>

int main(){

double range = 6; // max/min values
int N = 100000; // Number of evaluations
double step = 2 * range / N;
std::vector<double> x(N);
std::vector<double> result(N);

for (int i=0;i<=N;i++){
     x[i] = -range + i*step;
     result[i] = gsl_sf_dawson(x[i]);
}


for (int i=0;i<=N;i++){
	std::cout << x[i] << "," << result[i] << std::endl;
}

return 0;
}

Compile with

g++ -std=c++11 dawson.cpp -o ./dawson -lgsl -lgslcblas -lm

Run it and generate some results

./dawson > results.txt

If we plot results.txt we get

dawson

This code is also available on GitHub: https://github.com/mikecroucher/GSL_example

November 20th, 2018 | Categories: Making MATLAB faster, matlab, random numbers, Scientific Software | Tags:

In a recent blog post, Daniel Lemire explicitly demonstrated that vectorising random number generators using SIMD instructions could give useful speed-ups.  This reminded me of the one of the first times I played with the Julia language where I learned that Julia’s random number generator used a SIMD-accelerated implementation of Mersenne Twister called dSFMT to generate random numbers much faster than MATLAB’s Mersenne Twister implementation.

Just recently, I learned that MATLAB now has its own SIMD accelerated version of Mersenne Twister which can be activated like so:

seed=1;
rng(seed,'simdTwister')

This new Mersenne Twister implementation gives different random variates to the original implementation (which I demonstrated is the same as Numpy’s implementation in an older post) as you might expect

>> rng(1,'Twister')
>> rand(1,5)
ans =
0.4170 0.7203 0.0001 0.3023 0.1468
>> rng(1,'simdTwister')
>> rand(1,5)
ans =
0.1194 0.9124 0.5032 0.8713 0.5324

So it’s clearly a different algorithm and, on CPUs that support the relevant instructions, it’s about twice as fast!  Using my very unscientific test code:

format compact
number_of_randoms = 10000000
disp('Timing standard twister')
rng(1,'Twister')
tic;rand(1,number_of_randoms);toc
disp('Timing SIMD twister')
rng(1,'simdTwister')
tic;rand(1,number_of_randoms);toc

gives the following results for a typical run on my Dell XPS 15 9560 which supports AVX instructions

number_of_randoms =
    10000000
Timing standard twister
Elapsed time is 0.101307 seconds.
Timing SIMD twister
Elapsed time is 0.057441 seconds

The MATLAB documentation does not tell us which algorithm their implementation is based on but it seems to be different from Julia’s. In Julia, if we set the seed to 1 as we did for MATLAB and ask for 5 random numbers, we get something different from MATLAB:

julia> using Random
julia> Random.seed!(1);
julia> rand(1,5)
1×5 Array{Float64,2}:
 0.236033  0.346517  0.312707  0.00790928  0.48861

The performance of MATLAB’s new generator is on-par with Julia’s although I’ll repeat that these timing tests are far far from rigorous.

julia> Random.seed!(1);

julia> @time rand(1,10000000);
  0.052981 seconds (6 allocations: 76.294 MiB, 29.40% gc time)
October 1st, 2018 | Categories: programming, RSE | Tags:

A guest blog-post by Catherine Smith of University of Birmingham

In early 2017 I was in the audience at one of Mike Croucher’s ‘Is your research software correct?’ presentations. One of the first questions posed in the talk is ‘how reproducible is a mouse click?’. The answer, of course, is that it isn’t and therefore research processes should be automated and not involve anyone pressing any buttons. This posed something of a challenge to my own work which is primarily about making buttons for researchers to press (and select, drag and drop etc.) in order to present their data in the appropriate scholarly way. This software, for making electronic editions of texts preserved in multiple sources, assists with the alignment and editing of material. Even so, the editor is always in control and that is the way it should be. The lack of automation means reproducibility is a problem for my software but as Peter Shillingsburg, one of the pioneers of digital editing, says ‘editing is an art not a science’: maybe art can therefore be excused, to an extent, from the constraints of automation and, despite their introduction of human decisions, the buttons may be permitted to stay. Nevertheless I still want to know that my software doing what I think it is doing even if I can’t automate what editors choose to do with it. In the discussion that followed the paper I was talking about the complication of testing my interface-heavy software. Mike agreed that it was a complex situation but concluded by saying “if you go away from here and write one test you will have made the world a better place”.

I did just that. In fact I did very little else for the next three months. What started with one Python unit test has so far led to 65 Python unit tests, 82 Javascript unit tests and 54 functional tests using Selenium. The timing of all of this was perfect in that I had just begun a project to migrate all of our web applications to Django. I had one application partially migrated and so I tested that one and even did some test-driven development on the sections that were not yet complete.

The tests themselves are great to have. This was my first project using Django and I made lots of mistakes in the first application. The tests have been invaluable in ensuring that, as I learned more and made improvements, the older code kept pace with those changes. Now that I have tests for some things I want tests for everything and I have developed a healthy fear of editing code that is not yet tested. There are other advantages as well. When I sat down to write my first test it very quickly became clear that the code I had written was not easily testable. I had to break down the large Django views into smaller chunks of code that could each be unit tested. I now write better structured code because of that time I invested in testing just some of it. I also learned a lot about how to approach migrating all of the remaining applications while writing the detailed tests for every aspect of the first one.T

Django has an integrated test framework based on the python unittest module but with the additional benefit of automatically creating a test database using the models from the project to which test data can be added. It was very straightforward to set up and run (see the Django docs https://docs.djangoproject.com/en/2.1/topics/testing/). I found Javascript unit testing less straight forward. There was not much Javascript in this first application so I used the qunit test framework and sinon.js for mocking. I have never automated the running of these tests and instead just open them in the browser. It’s not ideal but it works for now. I have other applications which are far more Javascript heavy and for those I will need to have automated tests, there are plenty of frameworks around to choose from so I will investigate those when I start writing the tests.

Probably the most important tests I have are the functional tests which are written in Selenium. I had already heard of Selenium, having attended a Test Driven Development workshop several years ago by Harry Percival. I used his book, Test-Driven Development with Python, as a tutorial for all of the Selenium tests and some of the Django and Javascript tests too. Selenium tests are automated browser tests which really do allow you to test what happens when a user presses a button, types text into a text box, selects an item from a list, moves an element by dragging it etc.. The result of every interaction in an interface can be tested with Selenium. The content of each page can also be checked. It is generally not necessary to test static html but I did test the contents of several dynamic pages which loaded different content depending on the permissions granted to a user. Selenium is also integrated within Django using the LiveServerTestCase which means it has access to a copy of the database just like the Django unit tests. Selenium tests can be complex and there are several things to watch out for. Selenium doesn’t automatically wait for a page to load before executing the test statements against it, at every point data is loaded Selenium must be told to wait until a given condition is fulfilled up to a maximum time limit before continuing. I still have tests which occasionally fail because, on that particular run, a page or an ajax call is taking longer to load than I have allowed for. Run it another five times and it may well pass on every one. It is also important to make sure the browser is told to scroll to a point where an element can be seen before the instruction to interact with that element is given. It’s not difficult to do and is more predictable that waiting for a page to load but it still has to be remembered every time.

The functional tests are by far the most complex of all the tests I wrote in my three month testing marathon but they are the most important. I can’t automate the entire creation of a digital edition but with tests I can make sure my interface is presenting the correct data in the right way to the editors and that when they interact with that data everything behaves as it should. I really can say that the buttons and other interactive elements I have tested do exactly what I think they do. Now I just need to test all the rest of the buttons – one test at a time!