May 22nd, 2013 | Categories: Free software, programming, Science, Scientific Software | Tags:

Earlier this year I was awarded a fellowship from the software sustainability institute, an organization that works to improve all aspects of research software.  During their recent collaborations workshop in Oxford, it occurred to me that I was aware of only a relatively tiny number of software projects at my own institution, The University of Manchester. I decided to change that and started contacting our researchers to see what software they had released freely to the world as part of their research activities.

Research software comes in many forms; from small but useful MATLAB, Python or R scripts with just a handful of users and one developer right through to fully-fledged applications used by large communities of researchers and supported by teams of specialist developers.  I’m interested in knowing about all of it.  After all, we live in a time when even a mistake in an Excel spreadsheet can change the world.

The list below is what’s been sent to me so far and is a mirror of an internal list that’s been doing the rounds at Manchester.  I’ll update it as more information becomes available.  If you are at Manchester and know of a project that I’ve missed, feel free to contact me.

Faculty of Life Sciences

  • antiSMASH – Genome annotation tool for secondary metabolite gene clusters.
  • MultiMetEval – Flux-balance analysis tool for comparative and multi-objective analysis of genome-scale metabolic models.
  • mzMatch/mzmatch.R/mzMatch.ISO – Comprehensive LC/MS metabolomics data processing toolbox.
  • Rank Products – Statistical tool for the identification of differentially expressed entities in molecular profiles.

Health Informatics

  • openCDMS – The openCDMS project is a community effort to develop a robust, commercial-grade, full-featured and open source clinical data management system for studies and trials.

IT Services

  • ParaFEM – A portable library for parallel finite element analysis. Contributions from MACE, SEAES, School of Materials.
  • The Reality Grid Steering Library – A software library for steering and monitoring numerical simulations, APIs available for Fortran/C++/Java and steering clients available for installation on laptops and mobile devices.  Developed in collaboration with the School of computer science.

Manchester Institute of Biotechnology

  • Copasi – COPASI is a software application for simulation and analysis of biochemical networks and their dynamics.
  • Condor Copasi – Condor-COPASI is a web-based interface for integrating COPASI with the Condor High Throughput Computing (HTC) environment.

School of Chemistry

  • DOSY Toolbox – A free, open source programme for processing PFG NMR diffusion data (a.k.a. DOSY data).

School of Computer Science

  • KUPKB (The Kidney & Urinary Pathway Knowledge Base) – The KUPKB is a collection of omics datasets that have been extracted from scientific publications and other related renal databases. The iKUP browser provides a single point of entry for you to query and browse these datasets.
  • MethodBox – MethodBox provides a simple, easy to use environment for browsing and sharing surveys, methods and data.
  • myExperiment – myExperiment makes it easy to find, use and share scientific workflows and other Research Objects, and to build communities.
  • Open PHACTS Discovery Platform – Freely available, this platform integrates pharmacological data from a variety of information resources and provides tools and services to question this integrated data to support pharmacological research.
  • OWL API – A Java API and reference implementation for creating, manipulating and serialising OWL Ontologies. The latest version of the API is focused towards OWL 2. The OWL API is open source and is available under either the LGPL or Apache Licenses.
  • RightField – Semantic annotation by stealth. RightField is tool for adding ontology term selection to Excel spreadsheets to create templates which are then reused by Scientists to collect and annotate their data without any need to understand, or even be aware of, RightField or the ontologies used. Later the annotations can be collected as RDF
  • SEEK – SEEK is a web-based platform, with associated tools, for finding, sharing and exchanging Data, Models and Processes in Systems Biology.
  • ServiceCatalographer – ServiceCatalographer is an open-source Web-based platform for describing, annotating, searching and monitoring REST and SOAP Web services.
  • Simple Spreadsheet Extractor – A simple ruby gem that provides a facility to read an XLS or XLSX Excel spreadsheet document and produce an XML representation of its content.
  • Taverna – Taverna is an open source and domain-independent Workflow Management System – a suite of tools used to design and execute scientific workflows and aid in silico experimentation.
  • Utopia Documents – Utopia Documents brings a fresh new perspective to reading the scientific literature, combining the convenience and reliability of the PDF with the flexibility and power of the web.

School of Mathematics

  • EIDORS – Electrical Impedance Tomography and Diffuse Optical Tomography Reconstruction Software.
  • IFISS – IFISS is a graphical package for the interactive numerical study of incompressible flow problems which can be run under Matlab or Octave.
  • Matrix Computation Toolbox – The Matrix Computation Toolbox is a collection of MATLAB M-files containing functions for constructing test matrices, computing matrix factorizations, visualizing matrices, and carrying out direct search optimization.
  • Matrix Function Toolbox – The Matrix Function Toolbox is a MATLAB toolbox connected with functions of matrices.
  • Matrix Logarithm – MATLAB Files. Two functions for computing the matrix logarithm by the inverse scaling and squaring method.
  • Matrix Logarithm with Frechet Derivatives and Condition Number – MATLAB files
  • NLEVP A Collection of Nonlinear Eigenvalue Problems – This MATLAB Toolbox provides a collection of nonlinear eigenvalue problems.
  • oomph-lib – An object-oriented, open-source finite-element library for the simulation of multiphysics problems.
  • Simfit – Free software for simulation, curve fitting, statistics, and plotting.
  • SmallOverlap – SmallOverlap is a GAP 4 package which implements new, highly efficient algorithms for computing with finitely presented semigroups and monoids whose defining presentations satisfy small overlap conditions (in the sense of J.H.Remmers)
  • Symmetric eigenvalue decomposition and the SVD – MATLAB files

School of Mechanical, Aerospace and Civil Engineering (MACE)

  • DualSPHysics – DualSPHysics is based on the Smoothed Particle Hydrodynamics model named SPHysics and makes use of GPUs.
  • SPHYSICS – SPHysics is a platform of Smoothed Particle Hydrodynamics (SPH) codes inspired by the formulation of Monaghan (1992) developed jointly by researchers at the Johns Hopkins University (U.S.A.), the University of Vigo (Spain), the University of Manchester (U.K.) and the University of Rome La Sapienza (Italy).
May 3rd, 2013 | Categories: control theory, math software, matlab | Tags:

I recently picked up a few control theory books from the University library to support a project I am involved with right now and was interested in the seemingly total dominance of MATLAB in this subject area.  Since I’m not an expert in control systems, I’m not sure if this is because MATLAB is genuinely the best tool for the job or if it’s simply because it’s been around for a very long time and so has become entrenched.  Comments from anyone who works in relevant fields would be most welcome.

On its own, MATLAB is insufficient to teach introductory control systems courses — you also need the control systems toolbox as a bare minimum but most books and courses also seem to require Simulink and the symbolic math toolbox.  All of these are included in the student edition of MATLAB which is very reasonably priced.

If you are not a registered student, however, and don’t work for someone who can provide you with MATLAB it’s going to be very expensive!  As far as I can tell, your only option would be to purchase commercial licenses which are very expensive (as in thousands of dollars/pounds for MATLAB and a few toolboxes).

What else is out there?

I have a strong interest in mathematical software and so I know that there are several products that have support for control theory. Here are some that I know of and have access to myself

  • Mathematica – Its symbolic math support far exceeds that of MATLAB and it is on an equal footing numerically but its control systems support is much more recent and I don’t know of a textbook that utilizes it.  One benefit of Mathematica is that it doesn’t separate functionality out into toolboxes – everything is just built in.  Another benefit to tinkerers is the home edition which gives you the full product at a much lower price than commercial licenses.
  • Maple – This also has very strong symbolic and numeric math support.  It also comes with some Control Systems support built in.  Like Mathematica, it has a home edition for non-commercial tinkering and learning.
  • Labview - A graphical programming language that I’m only just starting to get used to.  It has lots of users and advocates in my employers electrical and mechanical engineering departments.  There is no support for symbolic computing as far as I know.
  • Python – Python is a superb general purpose scripting language that’s also completely free.  Numerics are taken care of by Numpy, symbolics by Sympy and there is a control theory module, the development of which is coordinated by Richard Murray of Caltech (The same Richard Murray that co-wrote the book Feedback Systems: An Introduction for Scientists and Engineers).
  • Octave – Octave is a free implementation of the MATLAB .m language.  It also has a free control package.
  • Scilab – Scilab is a free numerical environment that also has a free control package.

I haven’t mentioned Simulink alternatives in this post since I’ve discussed them before.

Questions

Some questions that arise are

  • Are there any other alternatives to those listed above?
  • Do these alternatives have sufficient functionality to support undergraduate courses in control systems and control theory?
  • What would be the best language to use if you were teaching control systems as a Massively Open Online Course (MOOC)?
  • Does it matter to employers which computational language you learned your control systems in as an undergraduate?

I find that the final point is very divisive among people I discuss it with.  On the one hand you have those who say ‘It’s the concepts that matter, the language you choose to implement them in is much less important’ and on the other hand you have those who say ‘It’s gotta be MATLAB, my father used MATLAB and his grandfather before him. Industry uses MATLAB, I only know MATLAB, we must teach MATLAB.’

May 1st, 2013 | Categories: math software, Month of Math Software | Tags:

As I type this, the sun is shining (finally!) and the skies are blue.  You’d think that it would be difficult to concentrate on writing this Month’s mathematical software round-up but it has been such an interesting month that it turned out to be a breeze.  Thanks to everyone who submitted news items for this month’s review, your feedback and generosity is greatly appreciated–I would have given up long ago without it.

If you have any news items for next month’s issue, please let me know via the usual channels.  Click here for the Month of Math Software Archives.

Things that are a bit like MATLAB

Things written for MATLAB

  • GAGA: GPU Accelerated Greedy Algorithms for Compressed Sensing is “a software package for solving large compressed sensing problems with millions of unknowns in fractions of a second by exploiting the power of graphics processing units”. It saw its first ever release in April.
  • Version 3.4.3.3481 of the Multiprecision Computing Toolbox for MATLAB was released in April bringing several enhancements including the addition of the incomplete gamma function, improvement to the accuracy of eigensolvers and speed up of determinant computations.

Spreadsheets

  • One of the most famous spreadsheet errors of all time was unearthed this month.  I’ll leave the explaining to the BBC and New Scientist.
  • Gnumeric is the free spreadsheet program from the GNOME Office project and April saw it updated to version 1.12.2  Updates include a set of new computational functions, fixes to various file import tools and a new font selector.

Graphs and Plotting

  • GNUPlot is a free, open source plotting package that’s been around for over 25 years.  It has been ported to almost every computer system known to man including Ye Olde Windows MobileAndroid and Raspberry Pi along with all of the platforms you’d usually expect.  April 2013 saw version 4.6.3 and the list of changes is at http://www.gnuplot.info/announce_4.6.3.txt
  • DISLIN is a plotting library for C, Fortran 77 and Fortran 90/95 and is also callable from several other languages including Perl,Python and Java.  Developed by the Max Plank Institute for Solar System Research, DISLIN has just hit version number 10.3.2.  Take a look at the new goodness here.

Numerical libraries

Python

It’s been a big month for mathematical and scientific software in Python with several releases of note.

  • After 7 months of work, The SciPy team have unveiled version 0.12.0.  The full list of updates is at http://sourceforge.net/projects/scipy/files/scipy/0.12.0/ but standout features for me are a Basin Hopping Global Optimisation routine (never heard of that algorithm but sounds interesting),  the ability to inspect the contents of MATLAB .mat files without actually reading them to memory and documented BLAS and LAPACK low-level interfaces.
  • According to its website, numexpr “evaluates multiple-operator array expressions many times faster than NumPy can.”  In other words, numexpr is one way to get Python code going faster.  Something that I didn’t realise until I wrote this entry is that it supports the high performance Intel Vector Math Library (VML).  April saw a release to version 1.4.2 with the new stuff listed at https://code.google.com/p/numexpr/wiki/ReleaseNotes
  • Pweave is a scientific report generator and a literate programming tool for Python, inspired by Sweave for R.  Version 0.21.2 of Pweave was released earlier this month — take a look at the release notes for details of what’s new.  Thanks to @mpastell for the news.
  • The IPython (Interactive computing in Python) team have released a bugfix update.  The details of version 0.13.2 are in the release notes.
  • Version 1.0 of the PyASTRAToolbox was released on 23rd April.  “The PyASTRAToolbox is a Python interface to the ASTRA Toolbox, a tomography toolbox based on high-performance GPU primitives for 2D and 3D tomography.”

Misc

  • Derek of Coding-guidelines.com has released version 0.5 of his Numbers tool which looks at the numeric literals contained in the source code of any program you pass to it. The numbers program extracts these literals, compares them against a database of ‘interesting’ values and prints out any matches; it can also print out values that don’t match.  The matching is fuzzy, the intent being to find mistakes.  To see why this might be interesting and useful, take a look at this blog post where Derek discovers that both Maxima and R use a wide variety of different literal values for pi.
  • Version 2.19-5 of Magma, the regularly updated, commercial computer algebra system with a focus on algebra, number theory, algebraic geometry and algebraic combinatorics has been released.
  • Version 6.1 of MapleSim has been released.  MapleSim is a physical modeling and simulation tool.

From the blogs

 

April 23rd, 2013 | Categories: Making MATLAB faster, matlab, programming | Tags:

I was recently working on some MATLAB code with Manchester University’s David McCormick.  Buried deep within this code was a function that was called many,many times…taking up a significant amount of overall run time.  We managed to speed up an important part of this function by almost a factor of two (on his machine) simply by inserting two brackets….a new personal record in overall application performance improvement per number of keystrokes.

The code in question is hugely complex, but the trick we used is really very simple.  Consider the following MATLAB code

>> a=rand(4000);
>> c=12.3;
>> tic;res1=c*a*a';toc
Elapsed time is 1.472930 seconds.

With the insertion of just two brackets, this runs quite a bit faster on my Ivy Bridge quad-core desktop.

>> tic;res2=c*(a*a');toc
Elapsed time is 0.907086 seconds.

So, what’s going on? Well, we think that in the first version of the code, MATLAB first calculates c*a to form a temporary matrix (let’s call it temp here) and then goes on to find temp*a’.  However, in the second version, we think that MATLAB calculates a*a’ first and in doing so it takes advantage of the fact that the result of multiplying a matrix by its transpose will be symmetric which is where we get the speedup.

Another demonstration of this phenomena can be seen as follows

>> a=rand(4000);
>> b=rand(4000);
>> tic;a*a';toc 
Elapsed time is 0.887524 seconds.
>> tic;a*b;toc  
Elapsed time is 1.473208 seconds.
>> tic;b*b';toc
Elapsed time is 0.966085 seconds.

Note that the symmetric matrix-matrix multiplications are faster than the general, non-symmetric one.

April 18th, 2013 | Categories: math software, mathematica, Windows 8 | Tags:

A friend of mine recently got hold of a Microsoft Surface Pro tablet and he let me have a play on it for a couple of hours.  So, I installed Mathematica 9 and ran the benchmark.  A screenshot of the result is below with the Surface’s result in blue.  Not bad for a tablet!

Touch controlled Manipulates were a lot of fun too.  If only I could run such things on my iPad as appeared to be promised in http://blog.wolfram.com/2012/02/17/a-preview-of-cdf-on-ipad/

My only other comment is that the Touch Cover is truly awful, reminding me of ye-olde ZX81, but I’ve been told that the Type Cover is much better

Mathematica 9 benchmark on surface pro

April 4th, 2013 | Categories: The internet, walking randomly | Tags:

When I first started this blog, there were only really two methods by which readers could keep up with new content – by subscribing to the RSS feed or by regularly dropping by the site to see what’s new. Since then readers have steadily been requesting other ways to follow the blog and, for the most part, I have obliged.  Here’s a list of current methods:

  • Subscribe to the RSS feed – Join around 2500 others and subscribe to the WR RSS feed.  This number will probably be severely reduced once Google Reader shuts down.
  • Follow me on Twitter – I post every WR article to my twitter feed along with whatever else I find interesting. Twitter is also a great way of contacting me and is the social media platform on which I am most active.
  • WalkingRandomly on Google+ – I link to WR articles just after they are posted along with other random musings.
  • WalkingRandomly on Facebook – A small following compared to the other channels but useful to some it seems.
  • Drop by the site whenever the mood strikes you
April 3rd, 2013 | Categories: Month of Math Software | Tags:

Welcome to the latest edition of A Month of Math Software where I look back over the last month and report on all that is new and shiny in the world of mathematical software.  I’ve recently restarted work after the Easter break and so it seems fitting that I offer you all Easter Eggs courtesy of Lijia Yu and R.  Enjoy!

General purpose mathematical systems

MATLAB add-ons

  • The multiprecision MATLAB toolbox from Advanpix has been upgraded to version 3.4.3.3431 with the addition of multidimensional arrays.
  • The superb, free chebfun project has now been extended to 2 dimensions with the release of chebfun2.

GPU accelerated computation

Statistics and visualisation 

Finite elements

  • Version 7.3 of deal.II is now available.  deal.II is a C++ program library targeted at the computational solution of partial differential equations using adaptive finite elements.

 

March 25th, 2013 | Categories: mathematica | Tags:

I’m working on a presentation involving Mathematica 9 at the moment and found myself wanting a gallery of all built-in plots using default settings.  Since I couldn’t find such a gallery, I made one myself.  The notebook is available here and includes 99 built-in plots, charts and gauges generated using default settings.  If you hover your mouse over one the plots in the Mathematica notebook, it will display a ToolTip showing usage notes for the function that generated it.

The gallery only includes functions that are fully integrated with Mathematica so doesn’t include things from add-on packages such as StatisticalPlots.

A screenshot of the gallery is below.  I haven’t made an in-browser interactive version due to size.

Mathematica 9 charts

March 15th, 2013 | Categories: The internet | Tags:

Google Reader has been a part of my life for several years now, forming the basis of my news reading habits.  Barely a day goes by that I don’t use it via my Android phone, iPad or the web and I have dozens of feeds effortlessly synced across all platforms.  It is, along with Dropbox, one of the most useful cloud services I have signed up for…and now its gone.

I guess I shouldn’t complain too much–after all it is a free service just like Twitter, Facebook, Evernote, Dropbox, Gmail, etc and so Google has every right to yank it away from me if that’s what they want to do.  What the cloud giveth, the cloud taketh away and all that.

What if your favourite cloud-based service was switched off?

This has led me to face up to something I’ve always had at the back of my mind but, until now, never really worried about too much– I rely far too much on services that are potentially ephemeral and I have no control over.  The loss of Google Reader from my life is frustrating but hardly the end of the world.  The loss of something like Dropbox, Evernote, Facebook or Gmail would cause me a lot more pain.

The data I upload to these services may be mine but the platforms are not and since I don’t pay a penny for any of them (Dropbox being a major exception) I am not sure what my legal rights may be.  If, for example, a company such as Evernote were to suddenly say ‘This free-access stuff isn’t working out for us so we deleted all your stuff and closed your account, thanks for playing.’, would I have any legal recourse?  Even more importantly, would I have a local backup?

Longevity and owning your own platform.

Another issue to consider is longevity.  Over the years I have invested time and money in dozens of software applications and, apart from a few notable exceptions where the licensing was crazy, I can still run any one of them today.  Languishing in the depths of my hard drives are files so old that they can only be read by ancient applications written by long-dead software development companies yet I can still launch the application and access the data.  I can do this because I physically own the platform.  The only way someone could prevent me from using the software and data on this platform is to physically take it from me.

To prevent me from using a cloud based service, however, it seems that all it takes is for that service to become unpopular.

 

March 10th, 2013 | Categories: CUDA, GPU, Making MATLAB faster, matlab, random numbers | Tags:

Ever since I took a look at GPU accelerating simple Monte Carlo Simulations using MATLAB, I’ve been disappointed with the performance of its GPU random number generator. In MATLAB 2012a, for example, it’s not much faster than the CPU implementation on my GPU hardware.  Consider the following code

function gpuRandTest2012a(n)

mydev=gpuDevice();
disp('CPU - Mersenne Twister');
tic
CPU = rand(n);
toc

sg = parallel.gpu.RandStream('mrg32k3a','Seed',1);
parallel.gpu.RandStream.setGlobalStream(sg);
disp('GPU - mrg32k3a');
tic
Rg = parallel.gpu.GPUArray.rand(n);
wait(mydev);
toc

Running this on MATLAB 2012a on my laptop gives me the following typical times (If you try this out yourself, the first run will always be slower for various reasons I’ll not go into here)

>> gpuRandTest2012a(10000)
CPU - Mersenne Twister
Elapsed time is 1.330505 seconds.
GPU - mrg32k3a
Elapsed time is 1.006842 seconds.

Running the same code on MATLAB 2012b, however, gives a very pleasant surprise with typical run times looking like this

CPU - Mersenne Twister
Elapsed time is 1.590764 seconds.
GPU - mrg32k3a
Elapsed time is 0.185686 seconds.

So, generation of random numbers using the GPU is now over 7 times faster than CPU generation on my laptop hardware–a significant improvment on the previous implementation.

New generators in 2012b

The MATLAB developers went a little further in 2012b though.  Not only have they significantly improved performance of the mrg32k3a combined multiple recursive generator, they have also implemented two new GPU random number generators based on the Random123 library.  Here are the timings for the generation of 100 million random numbers in MATLAB 2012b

CPU - Mersenne Twister
Elapsed time is 1.370252 seconds.
GPU - mrg32k3a
Elapsed time is 0.186152 seconds.
GPU - Threefry4x64-20
Elapsed time is 0.145144 seconds.
GPU - Philox4x32-10
Elapsed time is 0.129030 seconds.

Bear in mind that I am running this on the relatively weak GPU of my laptop!  If anyone runs it on something stronger, I’d love to hear of your results.

  • Laptop model: Dell XPS L702X
  • CPU: Intel Core i7-2630QM @2Ghz software overclockable to 2.9Ghz. 4 physical cores but total 8 virtual cores due to Hyperthreading.
  • GPU: GeForce GT 555M with 144 CUDA Cores.  Graphics clock: 590Mhz.  Processor Clock:1180 Mhz. 3072 Mb DDR3 Memeory
  • RAM: 8 Gb
  • OS: Windows 7 Home Premium 64 bit.
  • MATLAB: 2012a/2012b