Archive for the ‘Scientific Software’ Category

April 18th, 2016

This interview with the University of Cambridges’s Chris Richardson is part of my series of interviews on the new cohort of EPSRC Research Software Engineering Fellows.

Could you tell us a little about yourself and how you became a Research Software Engineer?

I did a PhD in GeoPhysics, followed by a PostDoc in Japan, over 15 years ago. Although it was a numerical project (on transport of magma), it was rather frustrating to code in Fortran.

I might joke now, that I could solve some of my PhD problems in 5 minutes using some of the more modern tools that we are developing! Partly, that is because of the higher level of abstraction, and the offloading of more routine tasks to libraries.

When I finished my PostDoc, I decided to apply for a Computer Officer role, which I knew would involve some dull, standard IT support but it also came with interesting stuff… code, numerics etc., without the pressure of teaching, publishing and raising grants.

I have been involved with many different computational projects over the years, including calculating neutron diffraction of adsorbed monolayers (Chemistry), and dynamic geomorphology – how a landscape changes over time as a response to tectonics (Geophysics).

More recently, I started working with the FEniCS team, first via an EPSRC dCSE grant, administered by NAG. The idea of dCSE was to improve code for running on HPC systems – we needed to get better parallel I/O into FEniCS, so it was a perfect match. It was an opportunity for me to learn C++ and really get involved in some of the internal issues in parallel coding.

After the end of the project, I continued to work on FEniCS from time to time, and I gradually got accepted as a member of the core developer team. We applied for a (now rebranded) ECSE project and also some projects with commercial partners.

What do you think is the role of a Research Software Engineer? Is it different from a ‘normal’ researcher?

I suppose it depends what you mean by “normal”. Mostly, I would expect an RSE to work collaboratively with other scientists and engineers, and not to lead scientific projects themselves. So, whilst not being a PI on a project, they need to communicate effectively with the scientific team they are working with, provide the technical expertise to realise the computational aspects of the project, and have some scientific understanding too.

Ultimately, it is important to have something invested in a collaboration, so it is not just a service, but a personal involvement, which might result in a joint publication or another form of recognition. And that is not so different from many “normal” researchers below the PI level.

You’ve recently won an EPSRC RSE Fellowship – congratulations! Can you give a brief overview of your project?

I am part of a team, writing a finite element analysis library. Finite element analysis (FEA) goes way back to the early days of computing, and has been used by engineers for decades, because it is very good at modelling physics in arbitrary shaped objects. One of the problems of FEA is that it is tricky to program, so our library FEniCS takes care of some of the more difficult aspects, whilst allowing the user a great deal of flexibility in describing the equations or meshes to solve on.

I am extending FEniCS to use: (a) complex numbers – useful for wave-like phenomena, (b) powerful non-linear solvers, which can solve difficult problems more quickly, (c) curved boundaries – needed for interface problems, e.g. surface tension, and (d) dynamic meshes, which can change in geometry and topology over time.

As well as “just doing programming”, I want to engage with the scientific community to apply these techniques to actual physical problems, and form collaborations with domain scientists in their specialist areas. Ultimately, I want to build up a small RSE team who can help scientists across diverse fields to solve their computational problems, using FEniCS, or other relevant packages.

I also hope to be involved in teaching and training more sustainable practices in the scientific community, helping people to use revision control, and write more reusable code.

gear

How long did it take you to write your Fellowship application?

The EPSRC call was very widely advertised, and I received multiple emails about it. Since the first hurdle was small – only an A4 sheet – I decided to give it a go.

I guess it was a standard EPSRC call template, and things like – ‘Early career stage researcher’ seemed like a possible blocker, but the A4 sheet was accepted, so I made a full application, and the feedback from reviewers was very positive. Writing the full application was very stressful, and really a full-time occupation for a few weeks. I felt like someone who had an exam but hadn’t revised for it. Some friends said to me: “make it easy for the reviewers” – which is good advice. By closely reading the call documents, I tried to tick off as many points as possible, and make my application match the criteria as well as I could.

I was quite nervous for the interview, and I’m sure I said a few stupid things, but I don’t think that’s particularly unusual. Everyone at EPSRC was very friendly.

Who are your project partners?

I’ve got academic project partners at Southampton and Oxford Universities in the UK, as well as at Simula Research in Norway, and Rice University in the US. Industrial partners are BP and Melior Innovations.

Who are the users of FEniCS?

FEniCS has got a lot of users across the world, but in many ways, it is difficult to know who they are. We have many different distribution channels, so we can’t really monitor downloads. One estimate is the number of questions we get on the user forum. These have come from 50+ countries around the world, and average about two questions per day.

How long have you been involved?

I have been involved for about the last 5 years, gradually building up from being a user, to a part-time contributor, to core developer.

Do you find it it difficult to get recognition/ full time employment for your work?

I’m not sure if I get recognition, I suppose becoming an EPSRC fellow is recognition, and I usually do get included as an author on scientific papers. Recently we started issuing release FEniCS notes in Arch. Num. Soft., which is a way to get recognition for pure library development.

How many Fenics developers and which version control system do you use?

There are about half a dozen core contributors, mostly Europe-based, and we use “git” on bitbucket.org.

Tell me about your group.

I am in a small institute, with researchers from Earth Sciences, Chemistry, Chemical Engineering, Engineering and Applied Maths. I am the only RSE in the building, but we are on the West Cambridge campus, which includes the University High Performance Computing Service (HPCS), just across the road. I am working with Filippo Spiga from HPCS and his team. I also spend quite a lot of time at the Engineering Department, where one of the other FEniCS developers (Dr Garth Wells) is based.

What’s your plan for the future?

The head of the BP Institute is a professor in Earth Sciences, and he has always been very supportive. He sees my RSE fellowship as an opportunity to expand our RSE activities in the future. I want to help researchers write RSE time into their grant proposals, so we can collaborate on a wide range of projects, continuing after the period of the fellowship.

Which programming languages and technologies do you regularly use?

For serious coding, almost everything is in C++11 now. Some people complain that it is not good for scientific computing, but there are some excellent libraries, such as Eigen3 and boost::multi_array which provide highly optimised matrix algebra, and access to multi-dimensional arrays, in much the same way as Fortran or C.

  • Python is very useful for doing things quickly, and has really taken over almost entirely from shell scripting.
  • SWIG – a bit esoteric, but essential for wrapping C++ to Python.
  • Google tests. Atlassian Bamboo CI. Run tests inside docker inside Bamboo.
  • ParaView – really useful for visualisation.

What would we like? We have a user forum, based on q2a, but it would be much nicer if it had a “Stack Exchange” like interface. Maybe we should investigate using Area51.

Are there any languages/technologies that you used to use a lot but have now moved away from? Why?

  • Fortran. I know it’s still widely used, and there are modern versions, but I’m not going back.
  • Shell scripting. I mostly use Python instead now – it’s easier to understand.

Is there anything on your ‘to-learn’ list?

  • How to use Python in ParaView
  • How to write threaded code that is efficient for Finite Element
  • Using hybrid OpenMP/MPI
  • Intel Xeon Phi

Do you have any advice for anyone who wants to become a Research Software Engineer?

I think you need to be multi-skilled. You need to understand people – psychology and culture; programming – obviously; and have some basic understanding of the science itself, even if you don’t know all the details.

April 18th, 2016

This interview with Oliver Henrich is part of my series of interviews on the new cohort of EPSRC Research Software Engineering Fellows.

Which University are you from?

I work at the School of Physics and Astronomy and the Edinburgh Parallel Computing Centre at the University of Edinburgh.

Could you tell us a little about yourself and how you became a Research Software Engineer?

I have a background in soft condensed matter physics, a relatively new and interdisciplinary field of science at the interface of physics, chemistry and biology. Soft matter is squidgy stuff that you know from your everyday lives: viscous liquids, polymers, foams, gels, granular materials, liquid crystals, but also biological materials. The behaviour of soft matter is difficult to predict, which is why computer simulations are a major tool of the trade. Over several postdoctoral appointments and a previous fellowship I evolved from an application scientist to a research software engineer (RSE). This is also why I still have a small personal research agenda, contrary to many other RSEs.

What do you think is the role of a Research Software Engineer? Is it different from a ‘normal’ researcher?

I think the roles of RSEs and researchers are very different. Researchers apply software as application scientists and publish their results in scientific publications. Developing new software is almost always just a means to an end of getting the next publication out. With the focus on science traditional researchers often lack the programming skills and rigorousness for developing sustainable, extendible and failure-proof software solutions. RSEs combine in-depth knowledge of IT technology with a scientific background. This skill set is also quite distinct from that of a Postdoctoral Research Associate. The role of RSEs is more akin to those of managers of experimental labs. Research software engineers are the caretakers of ‘virtual laboratories’, and in that sense do complementary and important infrastructural work for traditional researchers.

You’ve recently won an EPSRC RSE Fellowship – congratulations! Can you give a brief overview of your project?

My programme of software development consists roughly of two different tracks. The first strand of projects is related to Ludwig, a code for simulation of complex fluids which uses the lattice-Boltzmann method. Ludwig has unique capabilities and can model the flow of liquid crystals, bacterial and algae suspensions or liquid electrolytes in complex, nano- and microscopic geometries. Surprisingly little is known about the dynamics of these systems. My goal is to enable new research by extending Ludwig’s capabilities. We also want to make Ludwig part of an open source, scalable library for simulation of complex fluids. Something like this exists for conventional computational fluid dynamics in the form of the celebrated OpenFOAM framework.

The other strand of projects is about developing a community code for multiscale modelling of DNA and RNA. While we know a lot about DNA through genetic sequencing we know little as to how DNA and RNA behave dynamically in space. With sequencing we get the analogue of a 2D still photography of DNA, but what we need to understand its behaviour and functionality in more detail is in fact something like a 3D movie.

Besides these activities I am also involved in a number of other High-Performance Computing projects and outreach events. I am also planning to approach local coding clubs and give software enthusiasts an idea of the job role of RSEs.

col_disc_x_run1341

How long did it take you to write your Fellowship application? Any other thoughts or advice on the application process?

It took me about 2-3 weeks to write and compile all necessary documents, i.e. the track record, case for support, pathways to impact statement and the support statements and letters of support from my project partners. I have to admit this is a lot of work for a single person, comparable to writing an EPSRC Standard Grant application all alone. This, however, would be normally done in a team together with other researchers. Other fellowship schemes take this into account and make their applications more lightweight.

I think what is really important is to ask for guidance from the research council. It helped me a lot to understand what the aim of this specific call was and how I had to write my application.

Who are your project partners?

As I am based at the University of Edinburgh I work with a number of local people, primarily members of the local Soft Matter Group, EPCC and researchers at the School of Engineering. My other project partners further afield are at the University of Barcelona in Spain, Sandia National Laboratories in the USA and domestic universities like the University of Oxford, University College London and the University of Cambridge. One of my project partners is actually the incumbent Lucasian Professor of Mathematics at Cambridge, showing how far research software engineering is linked to cutting-edge science.

Tell me about your RSE group.

There is strictly speaking no such thing as my group. I am a member of the Edinburgh Parallel Computing Centre (EPCC), a growing group of about 90 software development experts and work with them on a per-project basis. At EPCC, now in its 26th year, the career path of RSE is relatively well established. Many people have been around for a long time and built unique skill sets. It is a big advantage for me to be able to draw on their long-term experience and expertise. I also work with various academic researchers, Postdocs, PhD students and MSc students at the School of Physics and Astronomy and the School of Engineering.

Which programming languages and technologies do you regularly use?

Most of the time I use C/C++ and MPI for my applications and Python for scripts for pre- and postprocessing. I am also working with OpenMP and CUDA C, but I am not one of the lead developers and tend to extend and enhance existing code. For our own in-house software we use advanced UNIX language features for automated testing suites and nightly build tests.

Are there any languages/technologies that you used to use a lot but have now moved away from? Why?

I used Fortran a lot in the past and moved now away from it. This, however, reflects more the specific applications I work with at the moment. Some of my colleagues at EPCC are full-time Fortran programmers and there are fantastic Fortran codes out there which are virtually irreplaceable and would be difficult to rewrite in the current funding situation. Object-oriented programming has become so ubiquitous and offers a wide range of convenient features which simplify maintenance and reuse of code. This is more naturally embedded in C++.

Is there anything on your ‘to-learn’ list?

The latest standard MPI 3.1 (pdf download) offers a lot of sophisticated single-sided communication features which I would like to become acquainted with. I would also like to get a better understanding of CUDA C and OpenCL, but fear this will never go beyond basic knowledge as I don’t seem to have enough time to engage intensively with applications that are written in these languages. As for the computational science I need to become more familiar with specific algorithms and computational concepts. This includes advanced sampling techniques for rare events and discrete, non-standard methods for fluid dynamics.

Do you have any advice for anyone who wants to become a Research Software Engineer?

I think most of the people in this new and emerging profession have evolved towards rather than chosen this line of work. So aiming directly for this career is somewhat unprecedented. I would emphasise the importance of working in a research environment with scientists. Ultimately the job requires a very specific skill set between a traditional academic researcher and a software expert. Hence, a PhD or postdoctoral appointment in a specific field of interest with a strong focus on software development could be a good starting point for such a career.

April 18th, 2016

This interview with Ian Bush is part of my series of interviews on the new cohort of EPSRC Research Software Engineering Fellows.

Which University are you from?

Oxford. The RSE fellowship will be in the Oxford e-Research Centre and will fund 80% of my time. The remaining 20% will be for my current post, as support staff for the HPC Cluster that ARC provides to the university.

Could you tell us a little about yourself and how you became a Research Software Engineer?

Drifted into it! Ultimately I’m a chemist/condensed matter physicist, my degree is in the former and my doctorate is more toward the later, but really the division is artificial. During my doctorate I had to code up various models to run on the two Cray 1 machines (nicknamed Ronnie and Reggie) then placed in ULCC, and as time went by I got more interested in the method of solving the problem rather than the solution of the problem itself; in the end I was investigating and implementing totally different algorithms from that originally proposed for my project. From there I went to what is now STFC Daresbury Laboratory supporting and developing parallel software on the Intel iPSC/860 machine there with a whole 64 cores. Hence my interest in parallel computing, and since then I have been working on the development, optimisation and support of a number of packages in the materials science/chemistry area. I’m probably most closely associated with CRYSTAL and DL_POLY (in total over 3000 groups worldwide hold licences for these applications) but I have code in a much larger number of widely used applications. If asked for my niche it would be that I couple good parallel programming skills with a good understanding of the scientific areas in which I specialise.

What do you think is the role of a Research Software Engineer? Is it different from a ‘normal’ researcher?

Yes it is different. An RSE’s prime role is in the enabling and facilitation of research, not in the carrying out of research itself. Given an idea, often (but not always, the RSE him/herself can be the driver) provided by a researcher, the RSE is the person who implements it, who makes it reality, and then usually hands it on/back to a researcher to exploit. Thus while the researcher gets the immediate benefit and publishes the papers, it would not be possible without the RSE, and I hope that EPSRC’s funding of these posts will drive the better understanding and recognition of these vital individuals and groups within the academic research infrastructure.

It requires a demanding skill set! While it is very much engineering in that the idea has to be turned into a design spec and then through the programming skill of the RSE into a correct, usable, maintainable, extendable and efficient solution to the researcher’s problem, it is more than that. People skills are required to interact constructively with the user to provide a solution tailored to their needs, often scientific insight into the user’s area is of great benefit and knowledge of the world without the RSE’s institution is vital to avoid reinventing the wheel – A RSE is not a jack but a master of all trades! But then again if it weren’t challenging it wouldn’t be fun …

You’ve recently won an EPSRC RSE Fellowship – congratulations! Can you give a brief overview of your project?

Well the funding provided by EPSRC is for me and a post-doc, 8 FTEs in total over a 5 year period. The theme running through the proposal is to push the scalability of software on the very highest end hardware, and so is very much based in high performance computing (HPC) and the exploitation of 10’s and 100’s of thousands of cores for the solution of scientific problems. Within this there are two threads, one programming, one based in teaching.

The programming thread is to develop a small set of codes so as to better utilise HPC, both in terms of scalability and in terms of solving scientifically exciting problems . These are mostly based in my traditional area of material science (CRYSTAL, CRYSCOR and DL_POLY), but one is a new collaboration with the Culham Centre for Fusion Energy, CCFE, to work on their gyrokinetic plasma simulation code, GS/2. All these codes have been demonstrated to scale in at least part of their functionality to many hundreds or thousands of cores, but there are scientific drivers for this to be improved.

As one example take CRYSTAL. This is an ab initio electronic structure code, so it (approximately) solves Schrödinger’s equation for a given system of interest, and from that solution many interesting properties of the material in question can be calculated. The issue here is that while the base solver is parallelised well and has been shown to amongst the best in class (see e.g. Orlando et al.  “A new massively parallel version of CRYSTAL for large systems on high performance computing architectures”, Journal of computational Chemistry, vol. 33, no. 28, pp. 2276–2284) the calculation of certain quantities is not parallelised, and thus for large systems while the base equation can be solved the scientifically interesting stuff can’t! During the fellowship, in collaboration with STFC RAL, the University of Turin and Imperial College London, the post-doc and I will work to address this. One of the most interesting areas is how materials interact with light. This can be modelled by Time Dependent Density Functional Theory (TD-DFT). A serial version of this exists within CRYSTAL but due to lack of expertise within the developers this has not been parallelised. As RSEs, I and the post-doc will provide the skill set to develop a fully parallel distributed memory implementation capable of scaling to many thousands of cores. This in turn provides many scientific opportunities. One possibility is the first fully ab initio modelling of radiation damage in proteins; the current code is attempting this but its current model of a protein is the small 29 atom hydrocarbon n-C9H20 due to the limitations of the serial implementation, rather than the 100s or 1000s of atoms that make up a real protein. The new code will allow understanding what is really happening when X-rays are incident on materials, for example in a synchrotron such as Diamond, as the X-rays will damage the material being studied as the measurements are being taken. It will also provided insights when other soft, organic systems, such as humans, are exposed to high energy electromagnetic radiation.

High energies are also obviously relevant to the work on the plasma modelling code GS/2. The background to this is ultimately the 10 billion euro experimental tokamak ITER being built in France to examine the possibility of using hydrogen fusion power as a cheap, plentiful supply of energy – in other words cheaper electricity bills! Again the current code scales well to a few thousand cores, but even at this level of computational power they are still far away from a fully realistic model of the plasma within a tokamak such as ITER. For instance it would be desirable to use a spatial resolution which is roughly two orders of magnitude finer than currently used, this resulting in a 100 fold increase in simulation time on the current thousands of cores that can be used currently. To solve such a problem in reasonable times instead needs 100s of thousands of cores, and scaling the application up to this size of machine is what will be studied by myself and the post-doc.

The teaching strand covers areas I think are poorly covered within the UK. Firstly, unlike in my youth, the majority of HPC users are now not programmers but application users, people who view the computer program as black box, a tool used to generate results to present in papers and at conferences. But like any complex tool its use is not easy, and I believe application specific training for the best use of parallel computers is something that is sorely lacking. Thus for the codes I will work on I shall develop such material under a creative commons licence, and  will present it to interested groups focusing in the first place on CDTs. For programmers I also feel that while teaching the syntax of OpenMP or MPI, the basic parallel programming tools within in computational science, is well covered, when and how to use them is not. After all a large supercomputer is one of the most powerful tools available to a programmer, we teach them what the tool is but not how to use it – it’s akin to giving someone a chainsaw and then telling them to cut down trees before you show them how to use it. So again under creative commons, and again initially focussed on CDTs, I shall develop material covering aspects such as how to think about distributing objects on multinode machines, and how to assess the potential scalability of different algorithms and data distributions.

Over and above that I hope to use the fellowship to raise the profile and recognition of the RSE initially within the byzantine institution that is Oxford, in the longer term without. I feel passionately that the lack of understanding, respect and job progression possibilities for people who provide the irreplaceable technical enabling of fundamental research is wasting any number of opportunities for UK PLC, not just in terms of scientific research that is just not possible without the RSE, but more importantly in the people who are lost from the field due to the lack of opportunities within it. I dearly hope that the existence and actions of RSE fellows will not only prove a first step in improving the current situation (and kudos to EPSRC for recognising the issue), but also as an example to aspiring, talented graduates that an RSE is a real career opportunity and path, with resulting benefit to all.

How long did it take you to write your Fellowship application? Do you have any advice on the application process?

Difficult to say, but the hard work was mainly over the last 2 weeks with 2 very long days just before the deadline … However long before then I had held a number of meetings with my collaborators, and I asked for letters of support as early as I possibly could.

Advice? Contact your collaborators as early as possible! The dependency list for writing the proposal when you have a large number of collaborators will be complicated, so get that sorted well before the deadline. Otherwise don’t underestimate how long it will take, and don’t overestimate how much you can put in 8 pages, it is not much space.

Who are your project partners?

Universities: Oxford, Turin, Bristol, Southampton, Imperial College London, University College London

Other: STFC RAL, STFC Daresbury, Culham Centre for Fusion Energy, HPC Materials Chemistry Consortium, NAG

Tell me about your RSE group.

The fellowship provides a seed for me to develop a research group within OeRC and Oxford, with 7 years of funding (3 provided by OeRC) for myself and 4 for a post-doc. The above paragraphs cover some of the more immediate goals for the group but of all of those possibly the most important is that raised last – “I hope to use the fellowship to raise the profile and recognition of the RSE initially within the byzantine institution that is Oxford, in the longer term without.” For me at least I hope to grow the group, and through collaborations with the group show the benefit of RSEs, both through the software engineering aspects and by providing RSEs with longer term employment possibilities coupled with real career progression opportunities. The cynic might say that I need to do that for my own career, the altruist that it is only in  the interest of UK PLC for me to achieve this – I hope I am more the later than the former!

Which programming languages and technologies do you regularly use?

Fortran 95 or later, C, C++, MPI, OpenMP, Doxygen, svn, git, netCDF, a variety of debuggers and profilers, almost entirely on Linux. A variety of batch scheduling systems. And god’s own editor, Emacs. Powerpoint when I must, but I prefer the Libreoffice tools and latex where appropriate.

Are there any languages/technologies that you used to use a lot but have now moved away from? Why?

Fortran 77. Because it is crap not conducive to enabling modern best practice in software engineering. I only mention it because it is clear that students are still being taught this due to lack of funded, experienced RSEs in science departments. This leads to the teaching of programming being left to the department’s “computer guy” who learnt his/her programming 25 years ago and hasn’t done it seriously since. THIS MUST STOP. It is to the detriment of both the students and computational research in general. Fortran 77 was of its time, but its time was half a century ago, nowadays use Fortran 2003 or C++ where performance is required, or one of the better scripting languages, e.g. python, when it is not.

Apart from that numerous message passing technologies (e.g. PVM, NX/2, PARMACS) which are just thankfully obsolete. SCCS, RCS, CVS … why does every project I am involved in have a different revision control system!?

Is there anything on your ‘to-learn’ list?

Python. Improve my C++. Money. I fundamentally don’t understand money and I fear it will become rather important in the new post. The French Defence, Winawer variation from white’s perspective.

Do you have any advice for anyone who wants to become a Research Software Engineer?

Be passionate about solving problems. Be involved with the researchers you are helping. Be patient as it ain’t going to happen quickly!

April 18th, 2016

This interview with the University of Nottingham’s Louise Brown is part of my series of interviews on the new cohort of EPSRC Research Software Engineering Fellows.

Could you tell us a little about yourself and how you became a Research Software Engineer?

I’ve had a peculiar career path. I’ve worked as a software engineer for 25 years. I did my PhD in Mechanical Engineering and then moved to a CS department where I worked for a company whilst based in the Drawing Recognition Research Group. I was working on CAD systems for industrial embroidery machines implementing algorithms for automatically converting image data into embroidery machine input data. I worked there for four years until my daughter was born after which I returned part time.

My daughter  was ill when she was small and when my son was born 2 years later, I stopped work completely for 2 years. I then worked part time from home for 13 years, firstly on ‘Easy Cross’ software for design of home needlecrafts.  I designed and developed add-on modules for different crafts including bobbin lace making, patchwork and knitwear design.  My second job during that time was working on a system for automatic reading of travel documents such as passports and air tickets.

I moved to my current post six years ago which advertised for an open-source software co-ordinator.  The job entails running the TexGen project, working as a software engineer but with a Research Fellow job title. After a while I started to become aware of the issues surrounding being employed as a researcher but spending most of the time programming.  For me the main issue was the lack of career progression, being employed at the top of the Research  Fellow scale but with little chance of promotion because there was limited chance of fulfilling the criteria for Senior Research Fellow. (The fellowship fixed this!).

To try and address this I started doing some teaching (MATLAB course for postgrads and first year undergrad drawing and design tutorials). I also joined committees to attempt to broaden my work profile.  I am a co-author on a few papers but not enough for academic promotion, although there is more effort being made now to include me when a specific piece of development has been carried out which enables the work described in the paper. I applied for a promotion last year but was turned down, mainly due to lack of publications but also due to my not having done any PhD supervision..

One problem was that my performance reviewer did not know what to suggest in order to move my career forward.  One possibility which I was encouraged to look at was fellowships but, having returned to academia relatively recently, I came into the category of ‘Early Career’ fellowships.  These normally specify a time limit since completion of a PhD and the 25 years since I completed made these unsuitable.  This fellowship was suitable because despite being labeled ‘Early Career’, it had no such time limits and it didn’t matter that I don’t fit the normal early career pigeon holes.

What do you think is the role of a Research Software Engineer? Is it different from a ‘normal’ researcher?

The fundamental difference is the fact that I do work which underpins research and without which the research wouldn’t happen but isn’t necessarily publishable in its own right. Without the work carried out by the Research Software Engineer the papers wouldn’t get published but they don’t necessarily get the credit. The research gets the publication and the credit!

I often end up spending time doing lots of odds and ends that are significant in that they support researchers and facilitate research but aren’t really quantifiable in the sense of a research output.

The outputs of a RSE are different to a ‘normal’ researcher.  For example, in performance review, the work done on outputting software isn’t one of the KPIs. I don’t fit the normal ‘money-in, papers-out’ model of many academics.

You’ve recently won an EPSRC RSE Fellowship – congratulations! Can you give a brief overview of your project?

The project is centred around TexGen software.  This is open source software for generating 3D geometric models of textiles and textile composites  The initial part of the project will be to develop a new major version of the software aimed at addressing the modelling requirements of the increasingly complex textiles and preforms used in composite materials. The project will identify new and emerging areas in textile technology and develop tools to meet the analysis needs of these new technologies and materials.

The Faculty have committed to fund a PhD student and a project has been proposed to research optimisation of 3D woven preforms with various cross-sectional shapes.

Also included in the proposal are elements surrounding programming education and outreach. I currently teach a MATLAB course for postgraduate students which is intended to be a conversion from other languages. The reality is that many of the students haven’t programmed before so the course could be split to better meet the needs of the varying requirements of the students.

9 layer laminate

How long did it take you to write your Fellowship application?

It was horrendous! It was really hard work. The Business development team in the Faculty were really helpful, giving very useful advice on how to write a proposal. It consumed all of my time for the weeks between acceptance of the intent to submit and the submission date.

One thing that really helped was that two weeks before the submission date, the Graduate School held a ‘Writing retreat’ – 3 days for research staff to ‘just write’. There was space set aside, refreshments, speakers/courses if you wanted them and one-to-one appointments available with a careers advisor and EndNote specialist. The research development team people came down for a couple of meetings  but there was a no phones rule and we were encouraged to turn our email off.

Having academics who were willing to spend time reading (and rereading!) the proposal was really important, especially as it was the first proposal that I’d written.  The first draft was shredded by one of them but in a way that was very positive.

Who are your project partners?

I don’t have any named on the proposal. For the last couple of years I have been a platform fellow for EPSRC Centre for Innovative Manufacturing in Composites (CIMComp) which has a large number of project partners, funding research projects underpinned by TexGen.  It is anticipated that this collaboration will continue.  Part of the aim of the project is also to seek out new project partners, particularly for areas of textile research other than composites.

Tell me about your RSE group.

We don’t have one at Nottingham! It’s just me. There are plans to identify other RSEs in the faculty to start building a community. Watch this space.

Which programming languages and technologies do you regularly use?

TexGen is written in C++ and has Python wrappers. A bit of MATLAB for teaching. OS – Windows. I work in Visual Studio and am comfortable in it.

TexGen is cross platform so I have to make sure it builds in Linux. It also makes use of open source libraries such as wxWidgets, SWIG, OpenCASCADE and VTK.

I tend to work on a ‘need to know’ basis, looking for new technologies when there’s a specific need.  When you are the only person in a project, you don’t have the luxury of learning lots of technologies.

Are there any languages/technologies that you used to use a lot but have now moved away from? Why?

I used to use C but rarely use it anymore. I also did assembler many years ago. When I was an undergrad, I did a software engineering course that used PDP-11 assembler.  It was the first time that the lecturer had taught the course and was incomprehensible!  I got the textbooks out of the library and worked my way through them, discovering how much I enjoyed assembler programming in the process.  On the back of that the lecturer invited me to do a PhD where I designed and built a filament winding machine.  The control system was run from a PC using 286 assembler.

Is there anything on your ‘to-learn’ list?

Getting better at the Linux side of things. I can get by but I am aware that there is much that I don’t know and could learn much better from someone who’s expert at using it rather than hunting around myself to work out how to do things.  There are people who run TexGen on the HPC system so I would like to be able to support them better.

I’m looking forward to becoming part of the RSE community and learning from the other members.

October 13th, 2015

The Sheffield Open Data Science Initiative

The University of Sheffield Open Data Science Initiative (ODSI) is really starting to take off. So what is it?

From the website, the aims of the ODSI are:

  1. Make new analysis methodologies available as widely and rapidly as possible with as few conditions on their use as possible (see the ML@SITraN group software pages and the local software page).
  2. Educate our commercial, scientific and medical partners in the use of these latest methodologies (see http://gpss.cc)
  3. Act to achieve a balance between data sharing for societal benefit and the right of an individual to own their data. (see our summary of our efforts in public understanding and debate)

My role within this initiative is to work on various aspects of research software throughout the University of Sheffield (and beyond!). I am a fellow of the Software Sustainability Institute and you could sum up everything I try to do with their motto Better Software, Better Research.

Join us on October 20th 2015

We have just started a programme of events which aims to bring together a wide variety of people interested in data, machine learning and research software (my favourite part!). The first such event is at The Data Hide on October 2015 at University of Sheffield.

There will be talk on Research Data Management for Computational Science by @ctjacobs_uk as well as lightning talks: What Kind of AI are we Creating? by @lawrenndMachine Learning for Chemical Simulations by Chris Handley and a demonstration of how great Reveal.js is by me.

This will be followed by food, beer and an opportunity to chat and geek out.

We would be honoured if you would join us.

Would you like to present at a future event?

Contact me to see what we can do together.

July 27th, 2015

If you change it, It will break

“It only works on the Windows version of MATLAB 2010a. The code doesn’t work on Linux or other versions of MATLAB.” explained the researcher. He needed to run his program hundreds of times and his solution was to lock himself into a computer room over the weekend, log into the two dozen managed desktop machines there and manually start his code running on each one. Along with colleagues from the University of Manchester, I was trying to understand why he couldn’t use the Linux-based 3000+ core Condor pool we’d built since it was perfectly suited to his workflow.

Now, when he said ‘doesn’t work‘ what he meant was ‘gives different results’ and the only ones he liked were the ones that came from MATLAB 2010a on Windows. Naturally, I offered to take a look at his code with a view to figuring out what was going on but he simply wasn’t interested. Once he determined that we couldn’t (or more accurately, wouldn’t) stop his current workflow he thanked us for our interest and left.

Different strokes results for different folks

This wasn’t the first time I had discovered research code that gave different results when run on different operating systems or runtimes and it probably won’t be the last. A relatively high profile case that caught my eye recently was a publication that demonstrated that the results of a program called FreeSurfer varied according to operating system, workstation type and software version.

I suspect that the phenomenon might be more prevalent than we think because I suspect (but confess to having no evidence) that a large number of computational research results come from research code that’s only ever been run on one operating system, with one set of dependencies on one particular piece of hardware.

Experience has shown me that some researchers deal with this lack of robustness by keeping their working environment as constant as possible. Don’t. Touch. Anything!

The shifting sands of Windows 10

A huge percentage of researchers conduct their research using Windows and the Windows environment is about to change in a rather fundamental way: Updates will be automatic and mandatory.

Since the operating system will be constantly shifting under their feet, researchers are no longer going to be able to keep that aspect of their environment stable.

I wonder if or how this will change things in the world of research software. Perhaps it will go by unnoticed, perhaps more test-suites will be written or perhaps something else?

Any thoughts?

July 23rd, 2015

You’ve written a computer program in your favourite language as part of your research and are getting some great-looking results. The results could change everything! Perhaps they’ll influence world-economics, increase understanding of multidrug resistanceimprove health and well-being for the population of entire countries or help with the analysis of brain MRI scans.

Thanks to you and your research, the world will be a better place. Life is wonderful; this is why you went into research.

It’s just a shame that you’re completely wrong but don’t yet know it.

What went wrong?

If you click on any of the studies linked to above, you’ll find a common theme – problems with software. These days it’s close to impossible to do science without either using or developing specialist software. Using research software can be difficult, complex and extremely time consuming. Developing it is orders of magnitude more difficult.

What can be done?

When I’m writing code, my first and main assumption is always ‘I can be an idiot and will make mistakes.’ Some people I’ve worked with assume that I’m either being self-deprecating or have a self-confidence problem when I talk like this. The reality is that it’s simply true. I’m fallible: my knowledge of everything is incomplete and if I haven’t had at least two cups of coffee in the morning, I’m essentially good for nothing.

Rather than lament my weaknesses, I try to develop methods of working that mitigate my inevitable stupidity. These methods are actually very simple.

  • Write tests. Every programming language worth its salt provides testing frameworks (e.g. Python, MATLAB, R). Learn how to use them and use them whenever you can. Whenever you make a change to your code or install it somewhere new, run your tests to see if anything has broken.
  • Get a code buddy. Find yourself another programmer and hand them your code with the remit ‘Tell me where you think I could do better’. This will be a painful experience. Suck it up because your code will almost certainly be better as a result. There is only one true measure of code quality!
  • Use version control. It doesn’t matter if its git, SVN, Mercurial or whatever the particular flavour of the month is. Choose a system, learn it and use it (for the record, I use git and have a twitter account called @git_tricks that posts tips on how to use it). When you use your code to get results, refer back to the actual commit that you used to get those results. This greatly assists the reproducibility of your research. If you cannot reproduce your own results with your own code and data, neither can anyone else.
  • Share code as openly as possible. Ideally, ‘openly’ should mean on the public internet. GitHub, blog posts, personal websites etc. Whenever I’ve posted code here on WalkingRandomly, mistakes usually get caught very quickly. Geeks love telling other geeks that they’ve made a mistake. Sure, your pride takes a hit but you quickly become immune to such things. The code is better, you learn something useful and the geeks that point out your errors feel good about themselves. Everyone’s a winner.

Sadly, the great majority of scientists I work with really don’t want to share their code openly for numerous reasons and so much of the stuff I’ve worked on is in the dark. Sometimes, collaborators don’t even want to share code with me.I’m about to start work on one optimisation case where the researcher tells me that they are not allowed to email me their code. So, he’s bringing his laptop to me and will sit next to me for a few hours while I try to figure out if I can help or not. Such is the lot of a working research software engineer.

Along with organisations such as the Sheffield Open Data Science Initiative and The Software Sustainability Institute, I am trying to improve this state of affairs but have to admit that progress is slower than I’d like.

These steps won’t guarantee that your code is correct but they are great steps in the right direction. For more in-depth advice, I refer you to Greg Wilson’s paper Best Practices for Scientific Computing.

June 4th, 2015

The ever-superb John D. Cook recently found this lovely looking curve in a book he’s currently readingmystery_curve

 

John posted some Python code that reproduced this curve. I stole borrowed his code, put it in a Jupyter notebook and wrapped it in an interactive widget to allow me to play with the parameters and see what other curves I could come up with. The result looks like this.

Screen Shot 2015-06-04 at 15.09.26

If you’d like something where those sliders work, you need to run the notebook I’ve created in Project Jupyter. Here are 2 ways to do that.

Once you have the notebook open, click on Cell->Run All and play with the sliders that pop up.

Other posts about these curves:

April 28th, 2015

I recently found myself in need of a portable install of the Jupyter notebook which made use of a portable install of R as the compute kernel. When you work in institutions that have locked-down managed Windows desktops, such portable installs can be a life-saver! This is particularly true when you are working with rapidly developing projects such as Jupyter and IRKernel.

It’s not perfect but it works for the fairly modest requirements I had for it. Here are the steps I took to get it working.

Download and install Portable Python

I downloaded Portable Python 2.7.6.1 from http://portablepython.com/ and installed into a directory called Portable Python 2.7.6.1

Update IPython and install the extra modules we need

This version of Portable Python comes with a portable IPython instance but it is too old to support alternative kernels. As such, we need to install a newer version.

Open a cmd.exe command prompt and navigate to Portable Python 2.7.6.1\App\Scripts.

Enter the command

easy_install ipython.exe

You’ll now find that you can launch the ipython.exe terminal from within this directory:

C:\Users\walkingrandomly\Desktop\Portable Python 2.7.6.1\App\Scripts>ipython
Python 2.7.6 (default, Nov 10 2013, 19:24:18) [MSC v.1500 32 bit (Intel)]
Type "copyright", "credits" or "license" for more information.

IPython 3.1.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: exit()

If you try to launch the notebook, however, you’ll get error messages. This is because we haven’t taken care of all the dependencies. Let’s do that now. Ensuring you are still in the Portable Python 2.7.6.1\App\Scripts folder, execute the following commands.

easy_install pyzmq
easy_install jinja2
easy_install tornado
easy_install jsonschema

You should now be able to launch the notebook using

ipython notebook

Install portable R and IRKernel

  • I downloaded Portable R 3.2 from http://sourceforge.net/projects/rportable/files/ and installed into a directory called R-Portable
  • Move this directory into the Portable Python directory. It needs to go inside Portable Python 2.7.6.1\App (see this discussion to learn how I discovered that this location was the correct one)
  • Launch the Portable R executable which should be at Portable Python 2.7.6.1\App\R-Portable\R-portable.exe and install the IRKernel packages by doing
install.packages(c("rzmq","repr","IRkernel","IRdisplay"), repos="http://irkernel.github.io/")

Install additional R packages

The version of Portable R I used didn’t include various necessary packages. Here’s how I fixed that.

  • Launch the Portable R executable which should be at Portable Python 2.7.6.1\App\R-Portable\R-portable.exe and install the following packages 
    install.packages('digest')
    install.packages('uuid')
    install.packages('base64enc')
    install.packages('evaluate')
    install.packages('jsonlite')

Install the R kernel file
Create the directory structure Portable Python 2.7.6.1\App\share\jupyter\kernels\R_kernel

Create a file called kernel.json that contains the following

{"argv": ["R-Portable/App/R-Portable/bin/i386/R.exe","-e","IRkernel::main()",
"--args","{connection_file}"],
 "display_name":"Portable R"
}

This file needs to go in the R_kernel directory created earlier. Note that the kernel location specified in kernel.json uses Linux style forward slashes in the path rather than the backslashes that Windows users are used to. I found that this was necessary for the kernel to work –it was ignored by the notebook otherwise.

Finishing off

Everything created so far, including R, is in the folder Portable Python 2.7.6

I created a folder called PortableJupyter and put the Portable Python 2.7.6 folder inside it. I also created the folder PortableJupyter\notebooks to allow me to carry my notebooks around with the software that runs them.

There is a bug in Portable Python 2.7.6.1 relating to scripts like IPython.exe that have been installed using easy_install. In short, they stop working if you move the directory they’re installed in – breaking portability somewhat! (Details here)

The workaround is to launch Ipython by running the script Portable Python 2.7.6.1\App\Scripts\ipython-script.py

I didn’t want to bother with that so created a shortcut in my PortableJupyter folder called Launch notebook. The target of this shortcut was the following line

%windir%\system32\cmd.exe /c "cd notebooks && "%CD%/Portable Python 2.7.6.1/App\python.exe" "%CD%/Portable Python 2.7.6.1\App\Scripts\ipython-script.py" notebook"

This starts the notebook using the default web browser and puts you in the notebooks directory.

The pay off

My folder looks like this:

PortableJupyter_folder

If I click on the Launch Notebook shortcut, I get a Jupyter session with 2 kernel options

PortableJupyter_kernels

I can choose the Portable R kernel and start using R in the notebook!

PortableJupyter_screenshot

February 19th, 2015

I’ve been working at The University of Manchester for almost a decade and will be leaving at the end of this week! A huge part of my job was to support a major subset of Manchester’s site licensed application software portfolio so naturally I’ve made use of a lot of it over the years. As of February 20th, I will no longer be entitled to use any of it!

This article is the second in a series where I’ll look at some of the software that’s become important to me and what my options are on leaving Manchester.  Here, I consider MATLAB – a technical computing environment that has come to dominate my career at Manchester. For the last 10 years, I’ve used MATLAB at least every week, if not most days.

I had a standalone license for MATLAB and several toolboxes – Simulink, Image Processing, Parallel Computing, Statistics and Optimization. Now, I’ve got nothing! Unfortunately for me, I’ve also got hundreds of scripts, mex files and a few Simulink models that I can no longer run! These are my options:

Go somewhere else that has a MATLAB site license

  • I’ll soon be joining the University of Sheffield who have a MATLAB site license. A great option if you can do it.

Use something else

  • Octave – Octave is a pretty good free and open source clone of MATLAB and quite a few of my programs would work without modification. Others would require some rewriting and, in some cases, that rewriting could be extensive! There is no Simulink support.
  • Scilab – It’s free and it’s MATLAB-like-ish but I’d have to rewrite my code most of the time. I could also port some of my Simulink models to Scilab as was done in this link.
  • Rewrite all my code to use something completely different. What I’d choose would depend on what I’m trying to achieve but options include Python, Julia and R among others.

Compile!

  • If all I needed was the ability to run a few MATLAB applications I’d written, I could compile them using the MATLAB Compiler and keep the result. The whole point of the MATLAB Compiler is to distribute MATLAB applications to those who don’t have a MATLAB license. Of course once I’ve lost access to MATLAB itself, debugging and adding features will be  um……tricky!

Get a hobbyist license for MATLAB

  • MATLAB Home – This is the full version of MATLAB for hobbyists. Writing a non-profit blog such as WalkingRandomly counts as a suitable ‘hobby’ activity so I could buy this license. MATLAB itself for 85 pounds with most of the toolboxes coming in at an extra 25 pounds each. Not bad at all! The extra cost of the toolboxes would still lead me to obsess over how to do things without toolboxes but, to be honest, I think that’s an obsession I’d miss if it weren’t there! Buying all of the same toolboxes as I had before would end up costing me a total of £210+VAT.
  • Find a MOOC that comes with free MATLAB – Mathworks make MATLAB available for free for students of some online courses such as the one linked to here. Bear in mind, however, that the license only lasts for the duration of the course.

Academic Use

If I were to stay in academia but go to an institution with no MATLAB license, I could buy myself an academic standalone license for MATLAB and the various toolboxes I’m interested in. The price lists are available at http://uk.mathworks.com/pricing-licensing/

For reference, current UK academic prices are

  • MATLAB £375 + VAT
  • Simulink £375 + VAT
  • Standard Toolboxes (statistics, optimisation, image processing etc) £150 +VAT each
  • Premium Toolboxes (MATLAB Compiler, MATLAB Coder etc) – Pricing currently not available

My personal mix of MATLAB, Simulink and 4 toolboxes would set me back £1350 + VAT.

Commercial Use

If I were to use MATLAB professionally and outside of academia, I’d need to get a commercial license. Prices are available from the link above which, at the time of writing, are

  • MATLAB £1600 +VAT
  • Simulink £2400 + VAT
  • Standard Toolboxes £800 +VAT each
  • Premium Toolboxes – Pricing currently not available

My personal mix of MATLAB, Simulink and 4 toolboxes would set me back £7200 + VAT.

Contact MathWorks

If anyone does find themselves in a situation where they have MATLAB code and no means to run it, then they can always try contacting MathWorks and ask for help in finding a solution.