April 28th, 2015 | Categories: Free software, python, R, Scientific Software, software deployment, Windows | Tags:

I recently found myself in need of a portable install of the Jupyter notebook which made use of a portable install of R as the compute kernel. When you work in institutions that have locked-down managed Windows desktops, such portable installs can be a life-saver! This is particularly true when you are working with rapidly developing projects such as Jupyter and IRKernel.

It’s not perfect but it works for the fairly modest requirements I had for it. Here are the steps I took to get it working.

Download and install Portable Python

I downloaded Portable Python 2.7.6.1 from http://portablepython.com/ and installed into a directory called Portable Python 2.7.6.1

Update IPython and install the extra modules we need

This version of Portable Python comes with a portable IPython instance but it is too old to support alternative kernels. As such, we need to install a newer version.

Open a cmd.exe command prompt and navigate to Portable Python 2.7.6.1\App\Scripts.

Enter the command

easy_install ipython.exe

You’ll now find that you can launch the ipython.exe terminal from within this directory:

C:\Users\walkingrandomly\Desktop\Portable Python 2.7.6.1\App\Scripts>ipython
Python 2.7.6 (default, Nov 10 2013, 19:24:18) [MSC v.1500 32 bit (Intel)]
Type "copyright", "credits" or "license" for more information.

IPython 3.1.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: exit()

If you try to launch the notebook, however, you’ll get error messages. This is because we haven’t taken care of all the dependencies. Let’s do that now. Ensuring you are still in the Portable Python 2.7.6.1\App\Scripts folder, execute the following commands.

easy_install pyzmq
easy_install jinja2
easy_install tornado
easy_install jsonschema

You should now be able to launch the notebook using

ipython notebook

Install portable R and IRKernel

  • I downloaded Portable R 3.2 from http://sourceforge.net/projects/rportable/files/ and installed into a directory called R-Portable
  • Move this directory into the Portable Python directory. It needs to go inside Portable Python 2.7.6.1\App (see this discussion to learn how I discovered that this location was the correct one)
  • Launch the Portable R executable which should be at Portable Python 2.7.6.1\App\R-Portable\R-portable.exe and install the IRKernel packages by doing
install.packages(c("rzmq","repr","IRkernel","IRdisplay"), repos="http://irkernel.github.io/")

Install additional R packages

The version of Portable R I used didn’t include various necessary packages. Here’s how I fixed that.

  • Launch the Portable R executable which should be at Portable Python 2.7.6.1\App\R-Portable\R-portable.exe and install the following packages 
    install.packages('digest')
    install.packages('uuid')
    install.packages('base64enc')
    install.packages('evaluate')
    install.packages('jsonlite')

Install the R kernel file
Create the directory structure Portable Python 2.7.6.1\App\share\jupyter\kernels\R_kernel

Create a file called kernel.json that contains the following

{"argv": ["R-Portable/App/R-Portable/bin/i386/R.exe","-e","IRkernel::main()",
"--args","{connection_file}"],
 "display_name":"Portable R"
}

This file needs to go in the R_kernel directory created earlier. Note that the kernel location specified in kernel.json uses Linux style forward slashes in the path rather than the backslashes that Windows users are used to. I found that this was necessary for the kernel to work –it was ignored by the notebook otherwise.

Finishing off

Everything created so far, including R, is in the folder Portable Python 2.7.6

I created a folder called PortableJupyter and put the Portable Python 2.7.6 folder inside it. I also created the folder PortableJupyter\notebooks to allow me to carry my notebooks around with the software that runs them.

There is a bug in Portable Python 2.7.6.1 relating to scripts like IPython.exe that have been installed using easy_install. In short, they stop working if you move the directory they’re installed in – breaking portability somewhat! (Details here)

The workaround is to launch Ipython by running the script Portable Python 2.7.6.1\App\Scripts\ipython-script.py

I didn’t want to bother with that so created a shortcut in my PortableJupyter folder called Launch notebook. The target of this shortcut was the following line

%windir%\system32\cmd.exe /c "cd notebooks && "%CD%/Portable Python 2.7.6.1/App\python.exe" "%CD%/Portable Python 2.7.6.1\App\Scripts\ipython-script.py" notebook"

This starts the notebook using the default web browser and puts you in the notebooks directory.

The pay off

My folder looks like this:

PortableJupyter_folder

If I click on the Launch Notebook shortcut, I get a Jupyter session with 2 kernel options

PortableJupyter_kernels

I can choose the Portable R kernel and start using R in the notebook!

PortableJupyter_screenshot

Back in December 2014, I learned that I’d be moving from The University of Manchester to The University of Sheffield to do the type of thing I’ve always done which is a combination of research software engineering and research software support.

I’ve been in Sheffield for two months now and am having a blast! There’s so much cool stuff going on here that it makes my head spin a little and the community at Sheffield have welcomed me with open arms. It truly is a wonderful place in which to work.

One of the departments I’ve started working with is The Sheffield Institute for Translation Neuroscience (SITraN). My contributions have been relatively minor so far – A bit of Python coding for a machine learning project called GPy and some code speed-up work in R for Winston Hide and his collaborators. When I hang out in SITraN, I usually sit with the machine learning people and listen in on their conversions about Python, MATLAB, GPUs, C++ and R — it’s essentially Nerdvana for someone like me.

On to the point of this blog post. SITraN now have their own blog called SITraNsmissions where they’ll be discussing various aspects of their work and how it applies the principles of neuroscience to help treat diseases such as Motor Neurone Disease (MND). In the video below, taken from SITraNsmissions first blog post, Professor Pamela Shaw gives an overview of the work that SITraN does.

April 8th, 2015 | Categories: Apple, GPU, matlab | Tags:

I recently got a 15 inch Retina Macbook Pro which contains an NVIDIA GT 750M GPU. It’s been a while since I last got a laptop with a decent GPU in it so I wondered how it would perform in MATLAB using the Parallel Computing Toolbox.

Of course I didn’t read any documentation; I simply fired up MATLAB 2015a and issued the gpuDevice command.

>> gpuDevice
Error using gpuDevice (line 26)
There is a problem with the CUDA driver or with this GPU device. Be sure
that you have a supported GPU and that the latest driver is installed.

Caused by:
    The CUDA driver could not be loaded. The library name used was
    '/usr/local/cuda/lib/libcuda.dylib'. The error was:
    dlopen(/usr/local/cuda/lib/libcuda.dylib, 10): image not found

This is because I didn’t install a load of CUDA-related stuff! Following these instructions did the trick!

>> gpuDevice()

ans = 

  CUDADevice with properties:

                      Name: 'GeForce GT 750M'
                     Index: 1
         ComputeCapability: '3.0'
            SupportsDouble: 1
             DriverVersion: 6.5000
            ToolkitVersion: 6.5000
        MaxThreadsPerBlock: 1024
          MaxShmemPerBlock: 49152
        MaxThreadBlockSize: [1024 1024 64]
               MaxGridSize: [2.1475e+09 65535 65535]
                 SIMDWidth: 32
               TotalMemory: 2.1470e+09
           AvailableMemory: 444055552
       MultiprocessorCount: 2
              ClockRateKHz: 925500
               ComputeMode: 'Default'
      GPUOverlapsTransfers: 1
    KernelExecutionTimeout: 1
          CanMapHostMemory: 1
           DeviceSupported: 1
            DeviceSelected: 1

I headed over to the MATLAB File Exchange to get the GPU Bench App for MATLAB and fired it up. The summary of the results is below. Click on the image to see the detailed results.

Nvivida 750M Performance

 

The double precision performance of this GPU card is very poor – MUCH slower than the CPU on the Macbook Pro.

Looking on the bright side, the numbers for the CPU are pretty good for a laptop!

March 4th, 2015 | Categories: programming, python | Tags:

If you really want to learn the differences between Python 2 and Python 3, I suggest you try converting a non-trivial software project. I’m in the middle of doing one now and am learning all kinds of little gotchas over and above the standard stuff that everyone knows such as changes to print, integer division and removal of xrange.

The most recent one I learned about (about 10 minutes ago) amounted to this

#Python 2.7.6 (default, Mar 22 2014, 22:59:56) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> [x for x in range(10)]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> x
9

compared to

Python 3.4.0 (default, Apr 11 2014, 13:05:11) 
[GCC 4.8.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> [x for x in range(10)]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> x
Traceback (most recent call last):
  File "", line 1, in 
NameError: name 'x' is not defined

This is well documented (This StackOverflow Q+A is great!) but I didn’t know about it and, in the code I was looking at, there was a heck of a lot of complication between the list comprehension and when ‘x’ was used. As such, it took me a while to figure out!

Another change that had me scratching my head for a while is the fact that Python 3 ignores the __metaclass__ hook. I didn’t know this little fact but discovered it while debugging failing tests!

Of course, once you know these little gotchas, you’ll probably not be caught out by them again in your next Python 2->Python 3 porting project but they got me wondering…..

What changes from Python 2->Python 3 have really caught you out at some point?

February 26th, 2015 | Categories: HPC, Linux | Tags:

Environment modules are widely used in the High Performance Computing (HPC) world where sysadmins need to install dozens, or maybe hundreds of potentially conflicting applications, libraries and compilers on multi-user machines. The University of Manchester’s Computational Shared Facility (CSF), for example, makes extensive use of environment modules and would be extremely difficult to run without them.

Once the sysadmin has correctly installed an application (MATLAB 2014a say) and set up the corresponding module file, making it available to your shell is as easy as doing

module load apps/binapps/matlab/R2014a

Unloading the module is just as easy

module unload apps/binapps/matlab/R2014a

On a heavily used, multi-user system environment modules are invaluable! Every user can have whatever compilers, libraries and applications they like — they just load and unload whatever they need from the huge selection supported by their ever-friendly sysadmins.

Environment modules on Ubuntu

I needed to install environment modules on a VM running Ubuntu 14.04 for my own use. I found a very nice setup guide at http://www.setuptips.com/unix/setup-environment-modules-on-ubuntu/ but it didn’t work. On attempting to compile, I got the error message

cmdModule.c:644:15: error: 'Tcl_Interp' has no member named 'errorLine'

This is a known bug in version 3.2.9c of environment modules and has a work-around.

I also found a set up guide at http://nickgeoghegan.net/linux/installing-environment-modules which had some useful advice on configuration..

Combining information from these sources, I managed to get a working install. Here are the steps I did in full for a clean Ubuntu 14.04 image

#Install the tcl development package
sudo apt-get install tcl-dev

#Make the directories where my modules and packages are going to live
sudo mkdir /opt/modules
sudo mkdir /opt/packages

#Get the source code. This was the most up to date version as of 25th Feb 2015
wget http://downloads.sourceforge.net/project/modules/Modules/modules-3.2.9/modules-3.2.9c.tar.gz

#unpack and enter source directory
tar xvzf modules-3.2.9c.tar.gz
cd modules-3.2.9

#Configure using the workaround and selecting my module folder 
CPPFLAGS="-DUSE_INTERP_ERRORLINE" ./configure --with-module-path=/opt/modules/

#make and install
make
sudo make install

#Edit the modulefiles path. Comment out all lines starting /usr so that only /opt/modules is used
sudo sed -i 's~^/usr~#/usr~' /usr/local/Modules/3.2.9/init/.modulespath

#Configure the shell to use modules
sudo tee /etc/profile.d/modules.sh > /dev/null << 'EOF'
#----------------------------------------------------------------------#
# system-wide profile.modules #
# Initialize modules for all sh-derivative shells #
#----------------------------------------------------------------------#
trap "" 1 2 3

MODULES=/usr/local/Modules/3.2.9

case "$0" in
-bash|bash|*/bash) . $MODULES/init/bash ;;
-ksh|ksh|*/ksh) . $MODULES/init/ksh ;;
-sh|sh|*/sh) . $MODULES/init/sh ;;
*) . $MODULES/init/sh ;; # default for scripts
esac

trap - 1 2 3
EOF

#Add modules to your .bashrc file
echo '#For modules' >> ~/.bashrc
echo '. /etc/profile.d/modules.sh' >> ~/.bashrc

That takes care of the basic setup but modules is pretty useless at this stage. To make it useful, you need to install some extra software and the corresponding module file.

Installing a module file for Anaconda Python 2.1
This is a really simple example of how to set up a basic module file

I downloaded and installed Anaconda Python 2.1 to /opt/packages and created a file called anaconda2.1 in /opt/modules containing the following

#%Module1.0
proc ModulesHelp { } {
global dotversion
 
puts stderr "\tAnaconda Python 2.1 providing Python 2.7.8"
}
 
module-whatis "Anaconda Python 2.1"
prepend-path PATH /opt/packages/anaconda/bin

Now, when I do the command

module avail

I get

-------------------------- /opt/modules/ ---------------------------
anaconda2.1

I can load my anaconda2.1 module with the command

module load anaconda2.1

Now, when I type python at the command prompt, I’ll be using Anaconda’s python rather than the system python. Once I’m done, I can unload with

module unload anaconda2.1

This example is so trivial it’s almost not worth it — modules really come into their own when you need to support loads of compilers and corresponding libraries. There’s an example using gcc at http://nickgeoghegan.net/linux/installing-environment-modules.

February 19th, 2015 | Categories: math software, matlab, Scientific Software | Tags:

I’ve been working at The University of Manchester for almost a decade and will be leaving at the end of this week! A huge part of my job was to support a major subset of Manchester’s site licensed application software portfolio so naturally I’ve made use of a lot of it over the years. As of February 20th, I will no longer be entitled to use any of it!

This article is the second in a series where I’ll look at some of the software that’s become important to me and what my options are on leaving Manchester.  Here, I consider MATLAB – a technical computing environment that has come to dominate my career at Manchester. For the last 10 years, I’ve used MATLAB at least every week, if not most days.

I had a standalone license for MATLAB and several toolboxes – Simulink, Image Processing, Parallel Computing, Statistics and Optimization. Now, I’ve got nothing! Unfortunately for me, I’ve also got hundreds of scripts, mex files and a few Simulink models that I can no longer run! These are my options:

Go somewhere else that has a MATLAB site license

  • I’ll soon be joining the University of Sheffield who have a MATLAB site license. A great option if you can do it.

Use something else

  • Octave - Octave is a pretty good free and open source clone of MATLAB and quite a few of my programs would work without modification. Others would require some rewriting and, in some cases, that rewriting could be extensive! There is no Simulink support.
  • Scilab - It’s free and it’s MATLAB-like-ish but I’d have to rewrite my code most of the time. I could also port some of my Simulink models to Scilab as was done in this link.
  • Rewrite all my code to use something completely different. What I’d choose would depend on what I’m trying to achieve but options include Python, Julia and R among others.

Compile!

  • If all I needed was the ability to run a few MATLAB applications I’d written, I could compile them using the MATLAB Compiler and keep the result. The whole point of the MATLAB Compiler is to distribute MATLAB applications to those who don’t have a MATLAB license. Of course once I’ve lost access to MATLAB itself, debugging and adding features will be  um……tricky!

Get a hobbyist license for MATLAB

  • MATLAB Home - This is the full version of MATLAB for hobbyists. Writing a non-profit blog such as WalkingRandomly counts as a suitable ‘hobby’ activity so I could buy this license. MATLAB itself for 85 pounds with most of the toolboxes coming in at an extra 25 pounds each. Not bad at all! The extra cost of the toolboxes would still lead me to obsess over how to do things without toolboxes but, to be honest, I think that’s an obsession I’d miss if it weren’t there! Buying all of the same toolboxes as I had before would end up costing me a total of £210+VAT.
  • Find a MOOC that comes with free MATLAB - Mathworks make MATLAB available for free for students of some online courses such as the one linked to here. Bear in mind, however, that the license only lasts for the duration of the course.

Academic Use

If I were to stay in academia but go to an institution with no MATLAB license, I could buy myself an academic standalone license for MATLAB and the various toolboxes I’m interested in. The price lists are available at http://uk.mathworks.com/pricing-licensing/

For reference, current UK academic prices are

  • MATLAB £375 + VAT
  • Simulink £375 + VAT
  • Standard Toolboxes (statistics, optimisation, image processing etc) £150 +VAT each
  • Premium Toolboxes (MATLAB Compiler, MATLAB Coder etc) – Pricing currently not available

My personal mix of MATLAB, Simulink and 4 toolboxes would set me back £1350 + VAT.

Commercial Use

If I were to use MATLAB professionally and outside of academia, I’d need to get a commercial license. Prices are available from the link above which, at the time of writing, are

  • MATLAB £1600 +VAT
  • Simulink £2400 + VAT
  • Standard Toolboxes £800 +VAT each
  • Premium Toolboxes – Pricing currently not available

My personal mix of MATLAB, Simulink and 4 toolboxes would set me back £7200 + VAT.

Contact MathWorks

If anyone does find themselves in a situation where they have MATLAB code and no means to run it, then they can always try contacting MathWorks and ask for help in finding a solution.

 

February 2nd, 2015 | Categories: math software, mathematica | Tags:

I’ve been working at The University of Manchester for almost a decade and will be leaving in just less than 3 weeks time! A huge part of my job was to support a major subset of Manchester’s site licensed application software portfolio so naturally I’ve made use of a lot of it over the years. As of February 20th, I will no longer be entitled to use any of it!

This article is the first in a series where I’ll look at some of the software that’s become important to me and what my options are on leaving Manchester.

Here, I consider Mathematica – a computer algebra system and technical computing environment that I’m very fond of. I’ve been a Mathematica user for over 15 years and yet, suddenly, I find myself license-less! So much code, so much time invested! What to do?

Options for all use cases

  • Before leaving University, contact the administrator of your site license. It could be that you are entitled to a discount on buying one of the various licenses on offer.
  • Use the CDF Player - With this free tool, You’ll be able to look at and interact (at least partially) with Mathematica notebooks.
  • Re-write all code to use something else. Which language to use is open to massive debate but the closest open source systems to Mathematica’s notebook-like interface are Jupyter (previously IPython) and Sage. The languages are, of course, rather different though!

Hobbyist use

General mucking around!

  • Buy the home edition - The home edition of Mathematica can be used for  non-professional and non-academic purposes and, at the time of writing, costs £195 as a one-off cost or £95 per year.
  • Use Mathematica online: Home – Same rules as the home edition above but it’s a cloud-based, online version. Currently costs £95 per year.
  • Buy a Raspberry Pi –  The Raspberry Pi comes with a free version of Mathematica! This means that you can buy an entire computer AND a copy of Mathematica for less than the standard home-use license. I had a play with Mathematica on the Raspberry Pi just over a year ago and it was very nice. Now that the faster, more powerful Raspberry Pi 2 has been released this option is even more compelling!

Academic use

If you want to use Mathematica in an academic environment that doesn’t have a site license, you’ll need to purchase an individual academic license. At the time of writing, that will cost £860 + VAT.

Professional use

There are various grades of professional license and the cost varies according to how many compute kernels you need or Wolfram Alpha API calls you want to make. Current prices start at £2,035 +VAT

 

 

January 27th, 2015 | Categories: Linear Algebra, matlab, programming, python | Tags:

Linear Algebra – Foundations to Frontiers (or LAFF to its friends) is a popular, high quality and free MOOC that, as the title suggests, teaches aspects of linear algebra in a way that takes the student from the very basics through to some cutting edge techniques. I worked through much of it last year and thoroughly enjoyed the approach it took — focusing on programming aspects from the very beginning. The course authors are also among the developers of the FLAME project, a high performance linear algebra library, and one of the interesting aspects of the LAFF course (for me at least) was that it taught linear algebra in a way that also allowed you to understand the approaches used in the algorithms behind FLAME.

Last year, all of the programming assignments in LAFF were done in Python, making use of the IPython notebook. This year, the software stack will be different and will be based on MATLAB. I understand that everyone who signs up to LAFF will be able to get a free MATLAB license from Mathworks for the duration of the course. Understandably, this caused quite a bit of discussion between the LAFF team and software/language geeks like me. In a recent Facebook thread, I asked about the switch and received the reply

‘MATLAB will be free during the course. There are open source equivalents, but Mathworks staff is supporting the use of MATLAB (staff for us). There were some who never got the IPython notebooks to work properly. We are really excited at the opportunity to innovate again and perhaps clear up snags in the programming issues we had. It was complicated to support IPython on all of the operating systems and machines that participants use. MATLAB promises to be easier and will allow us again to concentrate on the Linear Algebra’ – LAFF UTx

I’m sufficiently interested in this change from IPython to MATLAB that I’ll be signing up for the course again this year and I encourage you to do the same — I believe that the programming-centric teaching approach taken by LAFF is extremely well done and your time would be well-spent working through the course.

The course starts on 28th January 2015 so sign up now!

Here’s the trailer for last year’s course.

January 15th, 2015 | Categories: Open Data Science | Tags:

I recently had the good fortune to be involved in the creation of a European H2020 grant proposal called OpenDreamKit along with an international team from 15 institutions. My own contributions to this proposal were extremely modest and it was my first ever experience of being directly involved in an academic grant proposal. It’s the very first thing I’ve been involved with as part of my new appointment at The University of Sheffield.

Quoting from the proposal:

OpenDreamKit will deliver a flexible toolkit enabling research groups to set up Virtual Research Environments, customised to meet the varied needs of research projects in pure mathematics and applications and supporting the full research life-cycle from exploration, through proof and publication, to archival and sharing of data and code.

One of the many things that’s so great about this proposal is how it was written. Co-ordinated by Nicolas Thiéry, 33 contributors wrote it in LaTeX with version control provided by git and github. The video below, produced using gource,  is a visualation of the github repo over time and shows how we all danced around and with each other. My new manager, Neil Lawrence, who was much more deeply involved than I has good things to say about the process too.

The proposal was submitted yesterday after a lot of hard work and, as Nicholas Thiery commented in one of his emails to the group, is “Open from start to end :-)”

The Sage Facebook page summed up my thoughts about this project perfectly: “See the collaboration behind the *proposal*, and imagine the collaboration in the software!”

December 12th, 2014 | Categories: Scientific Software, The internet, walking randomly | Tags:

Wakelet is a new content curation platform that I’ve been playing with recently and I have to say, I like it a lot! Here’s a screenshot from one of my wakes, ‘Best of WalkingRandomly’ where I’ve gathered together some of the most popular pages here.

WalkingRandomly wakelet

A ‘wake’ is a collection of images, notes, comments and links. It sounds simple, and it is, but I’ve found it very useful for all kinds of stuff. For example, whenever I find an interesting article about scientific computing, I usually post it on my twitter feed – https://twitter.com/walkingrandomly. I’ve done this for hundreds of links but they are difficult to subsequently look up. With this in mind, I’ve started adding some of the best links to my Scientific Computing wake.

WalkingRandomly wakelet

 

Wakelet is developed by a group in Manchester and I first learned about it because one of my friends is a developer there. At first, I dutifully played with it because of his involvement but I’ve continued using it’s really rather good!  Read more about wakelet: