August 5th, 2016 | Categories: Linux, tutorials, Windows | Tags:

Like many people, I was excited to learn about the new Linux subsystem in Windows announced by Microsoft earlier this year (See Bash on Windows: The scripting game just changed).

Along with others, I’ve been playing with it on the Windows Insider builds but now that the Windows Anniversary Update has been released, everyone can get in on the action.

Activating the Linux Subsystem in Windows

Once you’ve updated to the Anniversary Update of Windows, here’s what you need to do.

Open settings

windows_settings

In settings, click on Update and Security

windows_update

In Update and Security, click on For developers in the left hand pane. Then click on Developer mode.

windows_for_developers

Take note of the Use developer features warning and click Yes if you are happy. Developer mode gives you greater power, and with great power comes great responsibility.

windows_develper_warning

Reboot the machine (may not be necessary here but it’s what I did).

Search for Features and click on Turn Windows features on or off

windows_features

Tick Windows Subsystem for Linux (Beta) and click OK

Screen Shot 2016-08-05 at 15.30.08

When it’s finished churning, reboot the machine.

Launch cmd.exe

Screen Shot 2016-08-05 at 15.36.14

Type bash, press enter and follow the instructions

Screen Shot 2016-08-05 at 15.37.58

The linux subsystem will be downloaded from the windows store and you’ll be asked to create a Unix username and password.

Try something linux-y

The short version of what’s available is ‘Every userland tool that’s available for Ubuntu’ with the caveat that anything requiring a GUI won’t work.

This isn’t emulation, it isn’t cygwin, it’s something else entirely. It’s very cool!

The gcc compiler isn’t installed by default so let’s fix that:

sudo apt-get install gcc

Using your favourite terminal based editor (I used vi), enter the following ‘Hello World’ code in C and call it hello.c.

/* Hello World program */

#include

int main()
{
    printf("Hello World from C\n");
    
    return(0);
}

Compile using gcc

gcc hello.c -o hello

Run the executable

./hello
Hello World from C

Now, transfer the executable to a modern Ubuntu machine (I just emailed it to myself) and run it there.

That’s right – you just wrote and compiled a C-program on a Windows machine and ran it on a Linux machine.

Now install cowsay — because you can:

sudo apt-get install cowsay
cowsay 'Hello from Windows'
 ____________________
< Hello from Windows >
 --------------------
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||

Update 1:

I was challenged by @linuxlizard to do a follow up tutorial that showed how to install the scientific Python stack — Numpy, SciPy etc.

It’s all there :)

sudo apt-get install python-scipy

Screen Shot 2016-08-05 at 16.42.30

Update 2

TensorFlow on LinuxOnWindows is also easy: http://www.hanselman.com/blog/PlayingWithTensorFlowOnWindows.aspx

July 27th, 2016 | Categories: RSE | Tags:

This post is also published over at the Software Sustainability Institute.

William Stein, lead developer of the computer algebra system, Sage, and its cloud-based spin-off, SageMathCloud, recently announced that he was quitting academia to go and form a company. In his talk, William says ‘I can’t figure out how to create Sage in academia. The money isn’t there. The mathematical community doesn’t care enough. The only option left is for me to build a company.’

His talk is below and slides are at http://wstein.org/talks/2016-06-sage-bp/bp.pdf

 

“Every great open source math library is built on the ashes of someone’s academic career.”

William’s departure is not unique. Here’s a tweet from Wes Mckinney, creator of pandas, one of the essential data science tools for Python.

Contact us

We are looking for similar stories; good research software people who felt that they had to leave academia because there wasn’t enough support, recognition or funding. Equally, we want to hear from you if you think academia is a rewarding environment for software development. Either way, please contact us at rse-study@software.ac.uk

July 12th, 2016 | Categories: HPC | Tags:

The High Performance Computing system at University of Sheffield has several different file systems available to it. We have:-

  • /fastdata – A lustre-based, shared filesystem with hundreds of terabytes of space. No backup. No quota.
  • /data – An NFS file system where each user has access to 100Gb of storage. Back-ups go back 7 days.
  • /home –  An NFS file system where each user has 10Gb. Backed up over 28 days. Mirrored.
  • /scratch – Local disk on each worker node. No back up. Uses ext4.

Lots of options with differing amounts of space, back-up policy and, as I’m about to demonstrate, performance characteristics. I suspect that many other HPC systems have a similar set up.

On our system, it’s very tempting to do everything in /fastdata. There’s lots of space, no quota, readable from all worker nodes simultaneously — good times! I try to encourage people to think about what they are doing, however. Bad things can happen if the lustre filesystem is hammered too much. Also, there can be a huge difference in performance for some operations across different filesystems.

Let’s take an example. I want to download and untar gcc 4.9.2. How long does that take on the three different filesystems?

On the scratch directory of a worker node

cd \scratch
mkdir testing123
cd testing123
wget ftp://ftp.mirrorservice.org/sites/sourceware.org/pub/gcc/releases/gcc-4.9.2/gcc-4.9.2.tar.gz
time tar xfz ./gcc-4.9.2.tar.gz 

real    0m6.237s
user    0m5.302s
sys 0m3.033s

On the lustre filesystem

cd /fastdata/
mkdir testing123
cd testing123
wget ftp://ftp.mirrorservice.org/sites/sourceware.org/pub/gcc/releases/gcc-4.9.2/gcc-4.9.2.tar.gz
time tar xfz ./gcc-4.9.2.tar.gz 

real    7m18.170s
user    0m6.751s
sys 0m56.802s

On the NFS filesystem

cd /data/myusername
mkdir testing123
cd testing123
wget ftp://ftp.mirrorservice.org/sites/sourceware.org/pub/gcc/releases/gcc-4.9.2/gcc-4.9.2.tar.gz
time tar xfz ./gcc-4.9.2.tar.gz 

real	16m37.343s
user	0m6.052s
sys	0m23.438s

For this particular operation, there is a two orders of magnitude difference between the worst and the best option.

I’m not an expert in filesystems and I have no idea what’s causing these differences or if I’d see a similar speed difference given a different file operation. I currently have no interest in doing a robust set of benchmarks. The point I’m making is that if you are using a system that has multiple filesystems it may be worth checking if there’s an advantage to using one over the other for your particular use case.

July 6th, 2016 | Categories: RSE | Tags:

I was recently invited to a Schloss Dagstuhl Workshop on ‘Engineering Academic Software’ by organisers Carole GobleJames HowisonClaude Kirchner and Oscar M. Nierstrasz. One week of geeking-out with research software people from all over the world in lovely surroundings with as much beer and cheese as you can eat — sounds good to me!

I gave a presentation about life on the frontline of Research Software Engineering support or the RSE Accident and Emergency department as I sometimes think of it. I spent some time discussing Sheffield’s new Research Software Engineering group formed by Paul Richmond and me off the back of our EPSRC Research Software Engineering Fellowships. I also discussed a worrying trend I’ve noticed in research software — top people are leaving academia for industry, not because they want to but because of a lack of support! Slides for my talk are at https://mikecroucher.github.io/dagstuhl_RSE_Sheffield/#/.

Highlights

I love attending seminars like this because I get to learn about all of the wonderful things that the community is up to.  Personal highlights included:

Effective computation in physics

Meeting Katy Huff, co-author of my favourite Python book, Effective computation in physics. The only problem with this book is the word ‘physics’ in the title since it suggests that it’s only useful if you are a physicist. Totally not the case! If you are doing science in Python, get this book! Fellow blogger John D Cook, interviewed both authors of the book back in 2015 – see the write-up at http://www.johndcook.com/blog/2015/08/08/effective-computation-in-physics/.

Software Heritage

Learning about the Software Heritage project that launched very recently. The project harvests and archives projects from various locations — github, Debian and the GNU Project for now. They say that ‘we preserve software, because it contains our technical and scientific knowledge.’ It’s shaping up to be a ‘Library of Alexandria of Software’. The full mission statement is over at https://www.softwareheritage.org/mission/

software_heritage

Software citation and credit

There was a lot of discussion about considering software as a first class scientific output and several projects were mentioned that help the situation. The force11 software citation principles address how software should be cited and depsy.org is ‘an open-source webapp that tracks research software impact‘. Dan Katz’s blog post ‘How should we add citations inside software‘ is also worth a read.

16252.1.B

What did we talk about?

Many of the participants are active on Twitter so there was a lot of live tweeting. The twitter hashtag for the workshop was #dagstuhleas. It’s been hijacked by spammers recently but there is a lot of great content there – https://twitter.com/search?q=dagstuhleas&src=typd

Elsewhere…

I’m not the only attendee to write about this workshop:

Alice Allen of the Astrophysics Source Code Library has written up a day by day account of the workshop in a way that captures what it’s like to attend a Dagstuhl seminar perfectly.

BetterSoftwareBetterResearchImage

The Software Sustainability Institute was also present in force. See what they had to say over at http://www.software.ac.uk/blog/2016-07-06-dagstuhl-perspectives-workshop-engineering-academic-software

Slides for all presentations can be found at http://materials.dagstuhl.de/index.php?semnr=16252 

I was in a funk!

Not long after joining the University of Sheffield, I had helped convince a raft of lecturers to switch to using the Jupyter notebook for their lecturing. It was an easy piece of salesmanship and a whole lot of fun to do. Lots of people were excited by the possibilities.

The problem was that the University managed desktop was incapable of supporting an instance of the notebook with all of the bells and whistles included. As a cohort, we needed support for Python 2 and 3 kernels as well as R and even Julia. The R install needed dozens of packages and support for bioconductor. We needed LateX support to allow export to pdf and so on. We also needed to keep up to date because Jupyter development moves pretty fast! When all of this was fed into the managed desktop packaging machinery, it died. They could give us a limited, basic install but not one with batteries included.

I wanted those batteries!

In the early days, I resorted to strange stuff to get through the classes but it wasn’t sustainable. I needed a miracle to help me deliver some of the promises I had made.

Miracle delivered – SageMathCloud

During the kick-off meeting of the OpenDreamKit project, someone introduced SageMathCloud to the group. This thing had everything I needed and then some! During that presentation, I could see that SageMathCloud would solve all of our deployment woes as well as providing some very cool stuff that simply wasn’t available elsewhere. One killer-application, for example, was Google-docs-like collaborative editing of Jupyter notebooks.

I fired off a couple of emails to the lecturers I was supporting (“Everything’s going to be fine! Trust me!”) and started to learn how to use the system to support a course. I fired off dozens of emails to SageMathCloud’s excellent support team and started working with Dr Marta Milo on getting her Bioinformatics course material ready to go.

TL; DR: The course was a great success and a huge part of that success was the SageMathCloud platform

Giving back – A tutorial for lecturers on using SageMathCloud

I’m currently working on a tutorial for lecturers and teachers on how to use SageMathCloud to support a course. The material is licensed CC-BY and is available at https://github.com/mikecroucher/SMC_tutorial 

If you find it useful, please let me know. Comments and Pull Requests are welcome.

May 31st, 2016 | Categories: Maple | Tags:

I occasionally write articles over at The University of Sheffield’s Research Software Engineering blog. This is a site I set up with Paul Richmond as part of our EPSRC Research Software Engineering Fellowships.

I recently helped a user of Maple get started with Sheffield’s HPC system and started writing up my notes as a series of blog posts. The first one is at http://rse.shef.ac.uk/blog/HPC-Maple-1/.

 

May 20th, 2016 | Categories: R, RSE | Tags:

I’ve just delivered a session called ‘R awareness’ to a group of IT staff at University of Manchester. The audience was a combination of desktop support, applications support and research software engineers and initial feedback indicates that it was well received.

The focus of the session was not the language itself but the software infrastructure that surrounds it. Multiple versions of R, packages, R Studio, Jupyter notebook, Microsoft R Open, SageMathCloud and the way that various applications such as Mathematica, Maple and Visual Studio interact with R.

I chose to deliver the material in the same way that The Code Cafe is delivered – self directed material where I act as facilitator. This seemed to work really well and there was a lot of conversation and interaction with the audience that I find is missing when doing a more traditional presentation.

Course material is at https://github.com/mikecroucher/R_awareness

 

May 10th, 2016 | Categories: mathematica, Open Data Science, University of Sheffield | Tags:

I learned about entropy as part of my undergraduate Physics education but it turns out that the concept of entropy turns up in many fields including linguistics, themodynamics, information theory, chemistry and artificial intelligence.

As part of Sheffield’s Open Data Science Initiative, computer scientist, Neil Lawrence, has teamed up with linguist, Dagmar Divjak, to organise a cross-faculty discussion meeting on the subject of entropy.

For more details on the day’s events, and to register, see http://opendsi.cc/ed2016/program

entropy

I wasted a little time producing the above logo for the event using Mathematica.

Here’s the source code:-

(*consider column one pixel at a time. Invert the pixel if a random number is below some threshold*)
flipbit[col_, prob_] := Module[{result, temp, x},
  result = col;
  Do[
   If[RandomReal[] <=  prob,
    If[result[[x]] == 1, result[[x]] = 0, result[[x]] = 1];
    ]
   , {x, 1, Length[col]}
   ];
  
  Return[result]
  ]

text = "Entropy";
image = Rasterize[Text[Style[text, White, Italic, 190]], 
   Background -> Black];
imageData = ImageData[Binarize[image]];
const = 1/Dimensions[imageData][[2]]*0.42;
(*Apply flipbit to all columns. Increase probability of flipping as you move along the x-axis*)
logo = 
  Transpose[
   MapIndexed[flipbit[#1, const*#2[[1]]] &, Transpose[imageData]]];
Image[logo]

Finally, I found this quote about entropy that I quite like:

You should call it entropy, for two reasons. In the first place your uncertainty function has been used in statistical mechanics under that name, so it already has a name. In the second place, and more important, no one really knows what entropy really is, so in a debate you will always have the advantage.

John von Neumann to Claude Shannon a name for his new uncertainty function. Source: Wikiquotes

April 18th, 2016 | Categories: programming, RSE, Scientific Software | Tags:

The Engineering and Physical Sciences Research Council (EPSRC) is the UK’s main agency for funding research in engineering and the physical sciences. In 2015, they made a very unusual type of fellowship call – one that was targeted specifically at Research Software Engineers. This was the first fellowship of its kind in the world  and I believe it represents a strong commitment by EPSRC to the improvement of research software.

Research Software Engineers are the people behind research software. They make a huge contribution to science but often lack reward and recognition for the work that they do. This fellowship is a huge step in the right direction to providing some of that recognition. Quoting from the call document:

This call will support Research Software Engineer (RSE) Fellowships for a period of up to five years. The RSE Fellowship describes exceptional individuals with combined expertise in programming and a solid knowledge of the research environment. The Research Software Engineer works with researchers to gain an understanding of the problems they face, and then develops, maintains and extends software to provide the answers.

201 people responded to the call with an ‘Intent to submit’ outline application. Of these, 7 were successful. As part of my work with the EPSRC funded Research Software Engineering Network (RSE-N), I got in touch with the new cohort of RSE fellows and interviewed them about their projects and careers.

Follow the links below to see what they had to say.

April 18th, 2016 | Categories: programming, RSE, Scientific Software | Tags:

This interview with The University of Bristol’s Chrys Woods is part of my series of interviews on the new cohort of EPSRC Research Software Engineering Fellows.

Could you tell us a little about yourself and how you became a Research Software Engineer?

I have been coding since preschool when my Dad bought me a Texas Instruments TI-99/4A. This had a simple BASIC, but no tape or disk storage, meaning that all of the code was lost when the computer was switched off. After that, I had an Amiga as a teenager, and had fun coding little games in my spare time. I grew up in a seaside town on the East Coast, and the industry there was just fishing and making frozen food, so it didn’t occur to me that I could do programming as a job. It was just for fun. It was only when I went to University (Southampton) that I saw that programming could be useful for science. I undertook a 3rd-year computational chemistry research project with Jon Essex at Southampton, and from there I was hooked and wanted to become a computational chemist. In Jon’s group in the late 1990’s I helped to build Beowulf compute clusters from scratch (assembling shelves, doing all the cabling, building the cluster installer disks, job schedulers etc.), as well as developing lots of software in first Fortran 77 and then C++ and Python. From there, I moved to Bristol, and wrote lots of grant applications and managed to work for about 10 years on a series of EPSRC and BBSRC funded software development projects (sincere thanks to both funders for the grants). These all culminated in a framework for molecular simulation, called Sire (http://siremol.org), around which a reasonable community has formed (about 20+ people have developed the code over the years).

The experience of working with this community made me realise that software engineering was about helping other people develop and play with code. It showed me the importance of leading by example, e.g. adding in tests, using clean designs and APIs, and writing clear documentation (although I readily admit that I am a really bad documenter).

About 2 years ago I was offered a job in BrisSynBio (a BBSRC/EPSRC Synthetic Biology Research Centre), as a technical lead, systems administrator and RSE. I have really enjoyed this position, as it made me step back from my research and really work in “services” to support other researchers. This gave me a completely new perspective on research, as I saw the world from the view of e.g. administrators, finance, procurement and technical services. This showed me that successful research depends on a whole team of energised, committed and dedicated people, and that research software engineers can play an important role as part of the research development team.

What do you think is the role of a Research Software Engineer? Is it different from a ‘normal’ researcher?

For me, research software engineering is about helping junior researchers develop their code right the first time. Helping them to structure their code so that it is easier for them to write flexible, trustably correct and performance portable software without them having to be overburdened with learning a lot of computer science. Following this, good RSE work is helping researchers build flexible frameworks that allow researchers to play with their scientific ideas. The code should allow them to prototype and play with new ideas, to get them running quickly and efficiently first time without having to have people come and re-engineer everything later.

You’ve recently won an EPSRC RSE Fellowship – congratulations! Can you give a brief overview of your project?

My project is about providing a new member of a research team – the research software engineer. This will enable a new way of doing research. Research is team based, and I want to help change the culture so that RSEs are seen as a member of the research team and not a service. The EPSRC RSE project provides funding for me as an RSE to be embedded within research groups to work with them to develop new research. Twenty projects will initially be supported; 10 in the first wave that have been allocated, and then 10 that will be allocated in response to a call. The projects cover everything from modelling chemical reactions, designing new optical machines, creating great visualisations in an interactive 3D planetarium, modelling bacterial factories and engineering new scaffolds for future vaccines.

cwoods

How long did it take you to write your Fellowship application (Any other thoughts/advice on the application process?)

I found out about the call when on holiday in Switzerland from a friend. That got me started thinking about a model for how RSEs could be added to a research team. Once I’d worked out the model, I found the proposal to be very easy to write. Indeed, it was the opportunity to write the proposal that I had always wanted to write – to put software development and good software engineering front and centre in the proposal itself. When I got back from from holiday I alerted HPC users at Bristol about what I was planning, and then met up with researchers from across the University in 10 minute quick flash talks set out my proposal for RSE projects. People contributed projects quickly, and I was soon oversubscribed. Then, writing the proposal was just about putting it all down on paper.

The strangest part was that I had consciously left an academic role when I moved into BrisSynBio, and had accepted that I was never going to become “an academic”. The hardest part was talking with my wife and persuading her that I should go back into that world.

Where do you want to be in 5 years?

I want to be running a large and successful RSE group and contributing to the development of computational science/engineering as a complete discipline, i.e. being on the path to having departments with faculty, teaching of undergraduates in good software engineering best practice, researchers in software engineering, collaborating with scientists as parts of teams to develop the next generation of well-engineered code to support 21st century science. I also want to help inspire the next generation of potential RSEs and help (1) raise awareness that programming a computer can help you leave a seaside town and travel the world, and (2) maths, physics and programming are useful skills, that there is stable career pathway for scientific software developers and RSEs, that this is an exciting and dynamic career choice, it does let you work with intelligent and energetic people, and most importantly, it puts you in a position to shape how the technology of the future will be designed and developed.

Who are your project partners?

Cresset, a company that writes software for the pharmaceutical industry, and the Software Sustainability Institute. Also, all of the researchers who will be supported by the RSE projects.

Tell me about your RSE group.

We are now building the RSE group at Bristol. Currently it is me, a new junior RSE to be appointed, and some graduates on our new Graduate Accelerator Programme (GAP) who will be appointed later this year. We anticipate growing further over the next few years.

Which programming languages and technologies do you regularly use?

C++ is my favourite, especially the functional coding support in C++11/14, closely followed by Python. I find the combination of C++ and Python is extremely powerful. It allows easy writing of fast, performant parallel code in the C++ layer, yet retains flexibility in the scripting layer which treats C++ as a library of building blocks. All unit testing can be via Python scripts that stress these building blocks.

I teach a lot of python and strongly recommend it to newcomers, e.g. see http://chryswoods.com/main/courses.html

Are there any languages/technologies that you used to use a lot but have now moved away from? Why?

Fortran. I have a soft spot for F77, but it is very 20th century. It is missing modern containers, generics, templates, virtual functions, task based parallelism, easy wrapping with scripting languages, integration with unit testing suites, etc. etc. It is also missing easy handling of low-level memory, while also providing a high level memory interface.

Perl. I loved Perl. I teach Perl, but no-one comes any more. Python is better, and it is difficult to argue against. And then the Perl community turned in on itself in going from Perl 5 to Perl 6.

Is there anything on your ‘to-learn’ list?

Management. How to Teach. Any new programming paradigm. LLVM and stuff that bridges the gap between scripted and compiled languages. Anything else in the programming world that is cool.

And, MATLAB, R, etc., as I need to learn how to interface with the communities that use those tools.

Humility. We don’t always know what is best (even if we do think we are right).

Do you have any advice for anyone who wants to become a Research Software Engineer?

My advice applies to anyone who wants to work in a university as an academic. Never forget that each grant and each award is a gift from the public. You are not given this gift so that you are employed there forever. It is given so that, in some way, you can make a difference to society. Be in research because you want to make a difference. The counter to this, is that there is a life outside academia, and it is not a failure to move on to other roles. Sometimes, like my BrisSynBio position, they can make you stronger

For research software engineering, I would say learn to communicate with people. Being able to talk with people is just as important as being able to talk with the computer. Also, learn the paradigms of programming (structural, object, functional etc.), as once you get these, different computer languages are just different syntax. Finally, learn some maths and science. They may be harder to learn, but they are fundamental, and without understanding these, it is very hard to really appreciate the complexities of research code, or to see the potential optimisations or approximations that may be available.