October 7th, 2015 | Categories: Apple, Guest posts | Tags:

This is a guest article written by friend, ex-colleague and keen user of Apple and Dropbox products, Ian Cottam of Manchester University’s IT Services.

Twice recently I have been bitten by being an early adopter of new software releases. Of course, I partly do this so colleagues at The University of Manchester don’t have to. Further, I have three Apple Macs and only update my least used one initially.

The two updates that bit me are: Mac OS X 10.11.0 El Capitan and the new Teams feature of Dropbox. My advice is not to use either of these updates (yet), unless you really need to. Interestingly, I persevered and have stayed with the Teams feature
of Dropbox; but have uninstalled OS X El Capitan by reverting to 10.10 Yosemite from a Time Machine backup.

Let’s do OS X El Capitan first.

This operating system update experience is the worst I can remember, and I have a long memory. What the legions of beta testers were doing I cannot imagine. To be fair, I expect many of them reported issues to Apple and just assumed Apple would not release to the public at large before they were fixed. I know I thought Apple would not release an OS that would kernel panic for many users on boot up. How wrong we all were.

A major, low-level change that Apple made for El Capitan was in the area of security; that is: kernel extensions, as used by some third party applications, have to be digitally signed to be acceptable. So far, so sensible. Some examples that I use include: VirtualBox, VMware, ncryptedCloud, github.osxfuse and Avatron. To see if you use any third party ones too, you can type the following into Terminal:

kextstat | grep -v com.apple

I expect over time all of the above will be updated to be digitally signed. However, at the time of writing some of them are not. Now quite why El Capitan does not log such unsigned extensions and ignore them on boot up I do not know. Instead you get a kernel panic and your Mac has become a brick. Well, not quite a brick as you can boot into safe mode (where extensions are not loaded) and try and fix things. But which ones are causing the problem? Not to mention that I would like to continue working with most, if not all, of them.

Googling shows that VirtualBox before version 5 and possibly nCrypted Cloud can ’cause’ the kernel panic; and I have several colleagues who persevered with updating to El Capitan by removing VirtualBox 4. I went back to Yosemite. (The fact that my Time Machine backup wasn’t as up to date as I would have liked is purely my fault. My other two Macs back up to Time Machine drives automatically, but my new Macbook waits for an external USB drive to be plugged in. I did try a restore from one of the other Time Machine drives that was 100% up to-date, but sadly the results were poor – e.g. the display driver – and I re-started the process from the specific but slightly out of date back-up.)

If you think that you can get around or live with this issue, I would further caution you to Google for Microsoft Office Problems El Capitan, to be further shocked. Ditto: Microsoft Outlook 2011 Problems El Capitan, if you are, like me, an Outlook/Exchange user.

I should repeat that some of my colleagues have updated to El Capitan and have not hit my problems or have worked around them.

Now on to Dropbox and its new Teams feature.

Teams is Dropbox’s way of bringing some of the advantages of Dropbox for Business to Dropbox Basic (the free version with quite limited storage) and Dropbox Pro (paid for, giving either 1TB or 2TB storage limits). Dropbox’s reason for doing this is so more people will update to Dropbox for Business (unlimited storage and greater admin control, etc.).
My first reason for being wary of Teams is that any member of a Team – not just the lead – can press the Update to Dropbox for Business button at anytime, committing and converting all in the Team. Now no doubt you can back out after a short trial period, but I expect it is a messy business (no pun intended).

What does the Teams feature offer? The first is to create a team or group list, such that when sharing appropriate folders you don’t need to list everyone in the team every time. Similarly, when a new team member starts, and once their email address is added to the group list, they will get copies of all all the existing shared folders. Good feature. I created a Team with just me in it for initial testing.

The other feature of Teams – the one I was most interested in – is the ability to have separate Work and Personal dropboxes under a single account on your Mac, PC or Linux box. If you are coming to Dropbox fresh, I would say go ahead and set up this feature, as it’s extremely handy to keep work and personal stuff completely separate. However, I had a mixture of work and private folders totalling some 120GB: that’s quite a bit to separate out. Now you face the decision as to whether the dropbox you currently have becomes your business one or your personal one. As I pay for Dropbox Pro myself I thought this fairly arbitrary and chose to keep my existing account as the business one. Your mileage may vary but I discovered I had more personal stuff in my dropbox than business, so the other way around might have saved me quite some time.

Your actual folder is renamed from “Dropbox” to “Dropbox (Your Choice of Business Name)”. As many folk have moaned about that, they also create a link to it with the old name of “Dropbox”. The next step is to set up, in my case, a new personal account (using a different email address) and link it to my main one. This is fairly straightforward. It’s also well done how easily you switch between your business and personal boxes. If like me, you have a 1TB account, that amount is shared between the two, although it is not obvious at first and you may get the odd message about the small size of your new dropbox, which you can safely ignore.

The thing that took a long time was copying all the personal stuff from what is now my business account to the personal one. You have to use the Dropbox desktop client for this. If a given sub-folder is not a share, you can try just dragging it over. I realised though that many of mine were shared. The only safe, if tedious, route I could find was to add my new personal identity to these shares; then transfer ownership – assuming I owned the share – and finally after the sync finished, remove my original identity from the share, unticking the box that says Keep a Copy. There might be an easier way: I hope so. I had a lot of shares to work through, and of course you are doing this on just one of the machines you own.

When done, I set up my second Mac, which was fairly straightforward and eventually everything synchronised to the right place. The big issue I had was with a third Mac that was further out of step. You guessed it: it was the one I had restored to Yosemite as mentioned in the first half of this blog. In such circumstances, Dropbox thinks you want to put lots of the folders back in to the original and now business identity. You can imagine how long that took and took me to unwind it back to how I had it with the two other Macs.

Perhaps I was both unlucky and made a bad choice or two, but be warned: for existing users with a complex dropbox setup this is rather painful to go through. I’m glad I did, but don’t find it easy to recommend to colleagues (yet).

Dropbox describe Teams here https://www.dropbox.com/help/9124.

Be careful out there, and don’t be an early adopter unless you are prepared for the pain.

October 5th, 2015 | Categories: RSE | Tags:

I was recently invited to give a talk at a Machine Learning in Personalised Medicine summer school in Manchester and decided to expand my blog post Is Your Research Software Correct? into a full 1 hour presentation.

I’m happy to say that it was extremely well received – both by many people at the event and by a few people on twitter.

Here are the slides

September 5th, 2015 | Categories: math software, Numerics, python | Tags:

The test suite of a project I’m working on is poking around at the extreme edges of the range of double precision numbers. I noticed a difference between Windows and other platforms that I can’t yet fully explain. On Windows, the test suite was pumping out RuntimeWarnings that we don’t see in Linux or Mac. I’ve distilled the issue down to a single numpy command:


On Windows 7 Anaconda Python 2.3, this gives a RuntimeWarning and returns inf whereas on Linux and Mac OS X it evaluates to 709.78-ish

Numpy version is 1.9.2 in all cases.

64 bit Windows 7

Python 2.7.10 |Continuum Analytics, Inc.| (default, May 28 2015, 16:44:52) [MSC
v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://binstar.org
>>> import numpy as np
>>> np.log1p(1.7976931348622732e+308)
__main__:1: RuntimeWarning: overflow encountered in log1p

64 bit Linux

Python 2.7.9 (default, Apr  2 2015, 15:33:21) 
[GCC 4.9.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> np.log1p(1.7976931348622732e+308)

Mac OS X

Python 2.7.10 |Anaconda 2.3.0 (x86_64)| (default, May 28 2015, 17:04:42) 
[GCC 4.2.1 (Apple Inc. build 5577)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://binstar.org
>>> import numpy as np
>>> np.log1p(1.7976931348622732e+308)

The argument to log1p is getting close to the largest double precision number:

>>> sys.float_info.max
August 21st, 2015 | Categories: walking randomly | Tags:

I’m a geek and love hanging out with other geeks. There are many different flavours of geek: gym geeks, food geeks, beer geeks, math geeks, science geeks, greyhound geeks…you get the idea! There’s a quote attributed to Simon Pegg that sums up geekdom perfectly for me:

“Being a geek is all about being honest about what you enjoy and not being afraid to demonstrate that affection. It means never having to play it cool about how much you like something. It’s basically a license to proudly emote on a somewhat childish level rather than behave like a supposed adult. Being a geek is extremely liberating.” Simon Pegg

Pure, unbridled enthusiasm is extremely infectious. Since I work at a University, I am extremely fortunate in that I often find myself on the receiving end of a combination of outrageous optimism and advanced-geek enthusiasm – a powerful combination that results in people changing the world.

Long ago, I discovered that you could learn a lot about the world simply by sharing a coffee or beer with a geek or two and finding out what they think is cool. Keep a note of any google-able phrases they come up with and follow up later. Much more fun, and probably more efficient, than sitting through dozens of conference talks.

Here are a few computer-based things I learned from such conversations this week:


July 30th, 2015 | Categories: Windows | Tags:

Due to the demands of my job and the fact that I like shiny new technology, I’m pretty much operating system agnostic these days. I find myself flitting between Windows, Linux, Mac OS X, Android and iOS on a regular basis and find them all delightful and head-smackingly frustrating in equal measure.

One of my geeky guilty pleasures is taking some time out to kick the tyres of a new operating system and so I’m having a lot of fun with Windows 10 right now. I tend not to play with preview builds so all of this is new to me.

The Windows command prompt hasn’t seen much love in decades and yet it’s so important to the work I do. In Windows 10, it’s received a much needed update. Right at the top of the list are improvements to copy and paste. In older versions of windows, this is my workflow I go to a new machine:

  • Using CTRL-V, try to copy and paste into the command line. It doesn’t work. You get ^V appear instead
  • Sigh and mutter to yourself. Right click on the top of the window. Choose properties. Enable Quick Edit mode.
  • Press CTRL-V again. ^V appears. Press it a few more times ^V^V^V^V
  • Remember that in cmd.exe, unlike all other applications in Windows, the way to paste is to right click. Mutter again. Get on with life.

In Windows 10, quick edit mode is enabled by default and CTRL-V just works. Happy days!

There’s a whole host of other improvements including word-wrap, transparencies, the ability to resize the window and more. I feel like the Windows command prompt has taken its first step into a larger world.

Microsoft have also set up a discussion forum for the future of the Command Prompt.



July 27th, 2015 | Categories: RSE, Scientific Software, Windows | Tags:

If you change it, It will break

“It only works on the Windows version of MATLAB 2010a. The code doesn’t work on Linux or other versions of MATLAB.” explained the researcher. He needed to run his program hundreds of times and his solution was to lock himself into a computer room over the weekend, log into the two dozen managed desktop machines there and manually start his code running on each one. Along with colleagues from the University of Manchester, I was trying to understand why he couldn’t use the Linux-based 3000+ core Condor pool we’d built since it was perfectly suited to his workflow.

Now, when he said ‘doesn’t work‘ what he meant was ‘gives different results’ and the only ones he liked were the ones that came from MATLAB 2010a on Windows. Naturally, I offered to take a look at his code with a view to figuring out what was going on but he simply wasn’t interested. Once he determined that we couldn’t (or more accurately, wouldn’t) stop his current workflow he thanked us for our interest and left.

Different strokes results for different folks

This wasn’t the first time I had discovered research code that gave different results when run on different operating systems or runtimes and it probably won’t be the last. A relatively high profile case that caught my eye recently was a publication that demonstrated that the results of a program called FreeSurfer varied according to operating system, workstation type and software version.

I suspect that the phenomenon might be more prevalent than we think because I suspect (but confess to having no evidence) that a large number of computational research results come from research code that’s only ever been run on one operating system, with one set of dependencies on one particular piece of hardware.

Experience has shown me that some researchers deal with this lack of robustness by keeping their working environment as constant as possible. Don’t. Touch. Anything!

The shifting sands of Windows 10

A huge percentage of researchers conduct their research using Windows and the Windows environment is about to change in a rather fundamental way: Updates will be automatic and mandatory.

Since the operating system will be constantly shifting under their feet, researchers are no longer going to be able to keep that aspect of their environment stable.

I wonder if or how this will change things in the world of research software. Perhaps it will go by unnoticed, perhaps more test-suites will be written or perhaps something else?

Any thoughts?

July 23rd, 2015 | Categories: RSE, Scientific Software | Tags:

You’ve written a computer program in your favourite language as part of your research and are getting some great-looking results. The results could change everything! Perhaps they’ll influence world-economics, increase understanding of multidrug resistanceimprove health and well-being for the population of entire countries or help with the analysis of brain MRI scans.

Thanks to you and your research, the world will be a better place. Life is wonderful; this is why you went into research.

It’s just a shame that you’re completely wrong but don’t yet know it.

What went wrong?

If you click on any of the studies linked to above, you’ll find a common theme – problems with software. These days it’s close to impossible to do science without either using or developing specialist software. Using research software can be difficult, complex and extremely time consuming. Developing it is orders of magnitude more difficult.

What can be done?

When I’m writing code, my first and main assumption is always ‘I can be an idiot and will make mistakes.’ Some people I’ve worked with assume that I’m either being self-deprecating or have a self-confidence problem when I talk like this. The reality is that it’s simply true. I’m fallible: my knowledge of everything is incomplete and if I haven’t had at least two cups of coffee in the morning, I’m essentially good for nothing.

Rather than lament my weaknesses, I try to develop methods of working that mitigate my inevitable stupidity. These methods are actually very simple.

  • Write tests. Every programming language worth its salt provides testing frameworks (e.g. Python, MATLAB, R). Learn how to use them and use them whenever you can. Whenever you make a change to your code or install it somewhere new, run your tests to see if anything has broken.
  • Get a code buddy. Find yourself another programmer and hand them your code with the remit ‘Tell me where you think I could do better’. This will be a painful experience. Suck it up because your code will almost certainly be better as a result. There is only one true measure of code quality!
  • Use version control. It doesn’t matter if its git, SVN, Mercurial or whatever the particular flavour of the month is. Choose a system, learn it and use it (for the record, I use git and have a twitter account called @git_tricks that posts tips on how to use it). When you use your code to get results, refer back to the actual commit that you used to get those results. This greatly assists the reproducibility of your research. If you cannot reproduce your own results with your own code and data, neither can anyone else.
  • Share code as openly as possible. Ideally, ‘openly’ should mean on the public internet. GitHub, blog posts, personal websites etc. Whenever I’ve posted code here on WalkingRandomly, mistakes usually get caught very quickly. Geeks love telling other geeks that they’ve made a mistake. Sure, your pride takes a hit but you quickly become immune to such things. The code is better, you learn something useful and the geeks that point out your errors feel good about themselves. Everyone’s a winner.

Sadly, the great majority of scientists I work with really don’t want to share their code openly for numerous reasons and so much of the stuff I’ve worked on is in the dark. Sometimes, collaborators don’t even want to share code with me.I’m about to start work on one optimisation case where the researcher tells me that they are not allowed to email me their code. So, he’s bringing his laptop to me and will sit next to me for a few hours while I try to figure out if I can help or not. Such is the lot of a working research software engineer.

Along with organisations such as the Sheffield Open Data Science Initiative and The Software Sustainability Institute, I am trying to improve this state of affairs but have to admit that progress is slower than I’d like.

These steps won’t guarantee that your code is correct but they are great steps in the right direction. For more in-depth advice, I refer you to Greg Wilson’s paper Best Practices for Scientific Computing.

July 13th, 2015 | Categories: Maple, math software, programming, tutorials | Tags:

It is possible to write quick, interactive demonstrations in a variety of languages these days. Functions such as Mathematica’s Manipulate, Sage Math’s interact and IPython’s interact allow programmers to write functional graphical user interfaces with just a few lines of code.

Earlier this week, I hosted a session in the Faculty of Engineering at The University of Sheffield where Maplesoft showed us, among other things, their version of this technology. This blog post is an extension of my notes from this part of the session.

The series command expands a function as a power series around a point. For example, let’s expand sin(x) as a power series around the point x=0.

series(sin(x), x = 0, 10)

Screen Shot 2015-07-01 at 14.05.52
If we try to plot this, we get an error message

plot(series(sin(x), x = 0, 10), x = -2*Pi .. 2*Pi, y = -3 .. 3)

Warning, unable to evaluate the function to numeric values in the region; see the plotting command's help page to ensure the calling sequence is correct

This is because the output of the series command is a series data structure — something that the plot function cannot handle. We can, however, convert this to a polynomial which is something that the plot function can handle

convert(series(sin(x), x = 0, 10), polynom)

Wrapping the above with plot gives:

plot(convert(series(sin(x), x = 0, 10), polynom), x = -2*Pi .. 2*Pi, y = -3 .. 3);

Screen Shot 2015-07-08 at 08.33.03
Let’s see how close this is to the sin(x) curve by plotting them both together

plot([sin(x), convert(series(sin(x), x = 0, 10), polynom)], x = -2*Pi .. 2*Pi, y = -3 .. 3);

Screen Shot 2015-07-08 at 08.37.56
It would be nice if we could see how the approximation varies as we vary the number of terms in the expansion. Change the value 10 to a parameter a, pass the whole thing to the Explore function and we get an interactive widget.

Explore(plot([sin(x), convert(series(sin(x), x = 0, a), polynom)], x = -2*Pi .. 2*Pi, y = -3 .. 3), parameters = [a = 2 .. 20]);

Here’s a screenshot of it:

Screen Shot 2015-07-08 at 08.42.49
Adding extra parameters
It would also be nice to vary the point we expand around. Change the value 0 to b and add an extra parameter to Explore to get two sliders instead of one:

Explore(plot([sin(x), convert(series(sin(x), x = b, a), polynom)], x = -2*Pi .. 2*Pi, y = -3 .. 3), parameters = [a = 2 .. 20, b = -2*Pi .. 2*Pi]);

To see what this looks like, open the companion worksheet in Maple.

Adding labels to the sliders
We can change the labels on the sliders as follows

Explore(plot([sin(x), convert(series(sin(x), x = b, a), polynom)], x = -2*Pi .. 2*Pi, y = -3 .. 3), parameters = [[a = 2 .. 20, label = `Number Of Terms`], [b = -2*Pi .. 2*Pi, label = `Expansion location`]]);

To see what this looks like, open the companion worksheet in Maple.

Adding initial values
Finally, let’s set some starting values for each slider

Explore(plot([sin(x), convert(series(sin(x), x = b, a), polynom)], x = -2*Pi .. 2*Pi, y = -3 .. 3), parameters = [[a = 2 .. 20, label = `Number Of Terms`], [b = -2*Pi .. 2*Pi, label = `Expansion location`]], initialvalues = [a = 2, b = 1]);

The resulting interactive widget looks like this:

Screen Shot 2015-07-08 at 08.53.35


Not bad for one line of code!

Upload to the Maple Cloud

At The University of Sheffield, we are lucky because all of our staff and students have access to Maple on both university-owned and personally-owned equipment. If your audience isn’t as fortunate, they can access the resulting worksheet on the Maple Cloud.

July 2nd, 2015 | Categories: RSE, walking randomly | Tags:

With apologies to Ridley Scott and Rutger Hauer:

I’ve…seen things you people wouldn’t believe.

Compute kernels on fire while looking over the shoulder of Brian. I watched C-code glitter in the dark as it flowed from the automatic generator with no test-rig in sight. There was no version control, no back-up so all those…results…will be lost in time, like tears in rain.

Time to debug.

June 26th, 2015 | Categories: HPC | Tags:

I hang out with a lot of people who work in the field of High Performance Computing (HPC) and learned long ago that a great conversation starter when in a room full of HPC experts is to ask the question ‘So…what IS High Performance Computing’ and then offer some opinion of my own. This apparently simple question can lead to some very heated debates!

I’m feeling in a curious mood and am wondering if this would work as a conversation starter for a blog post. So, here goes….

‘What IS High Performance Computing? I think it’s anything that requires more computational resources (memory, disk, CPU/GPU) than is available on a typical half-decent laptop or desktop’

What’s your view? Comments are open!