October 24th, 2013 | Categories: programming, Science, Scientific Software, software deployment | Tags:

One of my favourite parts of my job at The University of Manchester is the code optimisation service that my team provides to researchers there.  We take code written in languages such as  MATLAB, Python, Mathematica and R and attempt (usually successfully) to make them go faster.  It’s sort of a cross between training and consultancy, is a lot of fun and we can help a relatively large number of people in a short time.  It also turns out to be very useful to the researchers as some of my recent testimonials demonstrate.

Other researchers,however, need something more.  They already have a nice piece of code with several users and a selection of papers.  They also have a bucket-load of ideas about how to  turn this nice code into something amazing.  They have all this good stuff and yet they find that they are struggling to get their code to the next level.  What these people need is some quality time with a team of research software engineers.

Enter the Software Sustainability Institute (SSI), an organisation that I have been fortunate enough to have a fellowship with throughout 2013.  These guys are software-development ninjas, have extensive experience with working with academic researchers and, right now, they have an open call for projects.  It’s free to apply and, if your application is successful, all of their services will be provided for free.  If you’d like an idea of the sort of projects they work on, take a look at the extensive list of past projects.

So, if you are a UK-based researcher who has a software problem, and no one else can help, maybe you can hire* the SSI team.

*for free!

Links

October 10th, 2013 | Categories: Julia, matlab, programming, python | Tags:

From the wikipedia page on Division by Zero: “The IEEE 754 standard specifies that every floating point arithmetic operation, including division by zero, has a well-defined result”.

MATLAB supports this fully:

>> 1/0
ans =
   Inf
>> 1/(-0)
ans =
  -Inf
>> 0/0
ans =
   NaN

Julia is almost there, but doesn’t handled the signed 0 correctly (This is using Version 0.2.0-prerelease+3768 on Windows)

julia> 1/0
Inf

julia> 1/(-0)
Inf

julia> 0/0
NaN

Python throws an exception. (Python 2.7.5 using IPython shell)

In [4]: 1.0/0.0
---------------------------------------------------------------------------
ZeroDivisionError                         Traceback (most recent call last)
 in ()
----> 1 1.0/0.0

ZeroDivisionError: float division by zero

In [5]: 1.0/(-0.0)
---------------------------------------------------------------------------
ZeroDivisionError                         Traceback (most recent call last)
 in ()
----> 1 1.0/(-0.0)

ZeroDivisionError: float division by zero

In [6]: 0.0/0.0
---------------------------------------------------------------------------
ZeroDivisionError                         Traceback (most recent call last)
 in ()
----> 1 0.0/0.0

ZeroDivisionError: float division by zero

Update:
Julia does do things correctly, provided I make it clear that I am working with floating point numbers:

julia> 1.0/0.0
Inf

julia> 1.0/(-0.0)
-Inf

julia> 0.0/0.0
NaN
October 9th, 2013 | Categories: Android, Mobile Mathematics | Tags:

I got a Samsung Galaxy Note 3 yesterday and, since I’m so interested in compute performance, I ran various benchmarks on it.  Here are the results.

AnTuTu Benchmark Overall score was 35,637.  The screenshot below shows the comparison, given by the App, between my device and the Samsung Galaxy note 2 (my previous phone).

AnTuTu on Note3

Linpack Pro for Android – This was the app I used when I compared mobile phones to 1980s supercomputers back in 2010.  My phone at the time, a HTC Hero, managed just 2.3 Megaflops.  The Samsung Note 3, on the other hand, managed as much as 1074 Mflops. That’s quite an increase over the space of just 3 years!  I found that the results of this app are quite inconsistent with individual runs ranging from 666 to 1074 Mflops.

RgbenchMM – I’ve written about this benchmark, developed by Rahul Garg, before.  It gives more consistent results than Linkpack Pro and my Note 3 managed as much as 4471 Mflops!

Notes

  • The device was plugged in to the mains at the time of performing the benchmarks.  I rebooted the device after running each one.
  • There are at least two types of Note 3 in circulation that I know of – a quad core and an octo-core.  According to CPU-Z, mine has a Qualcomm Snapdragon 800 quad-core with a top speed of 2.27 Ghz.
  • Samsung have been accused of benchmark cheating by some.  See, for example, this post from The Register.
September 25th, 2013 | Categories: Julia, mathematica, matlab, programming, python | Tags:

I support scientific applications at The University of Manchester (see my LinkedIn profile if you’re interested in the details) and part of my job involves working on code written by researchers in a variety of languages.  When I say ‘variety’ I really mean it – MATLAB, Mathematica, Python, C, Fortran, Julia, Maple, Visual Basic and PowerShell are some languages I’ve worked with this month for instance.

Having to juggle the semantics of so many languages in my head sometimes leads to momentary confusion when working on someone’s program.  For example, I’ve been doing a lot of Python work recently but this morning I was hacking away on someone’s MATLAB code.  Buried deep within the program, it would have been very sensible to be able to do the equivalent of this:

a=rand(3,3)

a =
    0.8147    0.9134    0.2785
    0.9058    0.6324    0.5469
    0.1270    0.0975    0.9575

>> [x,y,z]=a(:,1)

Indexing cannot yield multiple results.

That is, I want to be able to take the first column of the matrix a and broadcast it out to the variables x,y and z. The code I’m working on uses MUCH bigger matrices and this kind of assignment is occasionally useful since the variable names x,y,z have slightly more meaning than a(1,3), a(2,3), a(3,3).

The only concise way I’ve been able to do something like this using native MATLAB commands is to first convert to a cell. In MATLAB 2013a for instance:

>> temp=num2cell(a(:,1));
>> [x y z] = temp{:}

x =
    0.8147

y =
    0.9058

z =
    0.1270

This works but I think it looks ugly and introduces conversion overheads. The problem I had for a short time is that I subconsciously expected multiple assignment to ‘Just Work’ in MATLAB since the concept makes sense in several other languages I use regularly.

Python:

from pylab import rand
a=rand(3,3)
[a,b,c]=a[:,0]

Mathematica:

a = RandomReal[1, {3, 3}]
{x,y,z}=a[[All,1]]

Julia:

a=rand(3,3);
(x,y,z)=a[:,1]

I’ll admit that I don’t often need this construct in MATLAB but it would definitely be occasionally useful. I wonder what other opinions there are out there? Do you think multiple assignment is useful (in any language)?

September 18th, 2013 | Categories: Problem of the week, programming | Tags:

While on the train to work this morning, I wondered which English words have the most anagrams that are also valid English words. So, I knocked up few lines of Mathematica code and came up with 4 sets of 7:

{{"ates", "east", "eats", "etas", "sate", "seat", "teas"},
{"pares", "parse", "pears", "rapes", "reaps", "spare", "spear"}, 
{"capers", "crapes", "pacers", "parsec", "recaps", "scrape", "spacer"},
{"carets", "caster", "caters", "crates", "reacts", "recast", "traces"}}

So, according to my program (and Mathematica’s dictionary), the best you can do is 7.  I’m not going to post the code until later because I don’t want to influence the challenge which is ‘Write a program in your language of choice that queries a dictionary to find which English words have the most anagrams that are also valid English words.’

September 16th, 2013 | Categories: Linear Algebra, math software, matlab, python | Tags:

Last week I gave a live demo of the IPython notebook to a group of numerical analysts and one of the computations we attempted to do was to solve the following linear system using Numpy’s solve command.

 \left( \begin{array}{ccc} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{array} \right) x = \left( \begin{array}{c} 15 \\ 15 \\ 15 \\ \end{array} \right)

Now, the matrix shown above is singular and so we expect that we might have problems. Before looking at how Numpy deals with this computation, lets take a look at what happens if you ask MATLAB to do it

>> A=[1 2 3;4 5 6;7 8 9];
>> b=[15;15;15];
>> x=A\b
Warning: Matrix is close to singular
or badly scaled. Results may be
inaccurate. RCOND =  1.541976e-18. 
x =
  -39.0000
   63.0000
  -24.0000

MATLAB gives us a warning that the input matrix is close to being singular (note that it didn’t actually recognize that it is singular) along with an estimate of the reciprocal of the condition number. It tells us that the results may be inaccurate and we’d do well to check. So, lets check:

>> A*x

ans =
   15.0000
   15.0000
   15.0000

>> norm(A*x-b)

ans =
2.8422e-14

We seem to have dodged the bullet since, despite the singular nature of our matrix, MATLAB has able to find a valid solution. MATLAB was right to have warned us though…in other cases we might not have been so lucky.

Let’s see how Numpy deals with this using the IPython notebook:

In [1]:
import numpy
from numpy import array
from numpy.linalg import solve
A=array([[1,2,3],[4,5,6],[7,8,9]])
b=array([15,15,15])
solve(A,b)

Out[1]:

array([-39.,  63., -24.])

It gave the same result as MATLAB [See note 1], presumably because it’s using the exact same LAPACK routine, but there was no warning of the singular nature of the matrix.  During my demo, it was generally felt by everyone in the room that a warning should have been given, particularly when working in an interactive setting.

If you look at the documentation for Numpy’s solve command you’ll see that it is supposed to throw an exception when the matrix is singular but it clearly didn’t do so here. The exception is sometimes thrown though:

In [4]:

C=array([[1,1,1],[1,1,1],[1,1,1]])
x=solve(C,b)

---------------------------------------------------------------------------
LinAlgError                               Traceback (most recent call last)
 in ()
      1 C=array([[1,1,1],[1,1,1],[1,1,1]])
----> 2 x=solve(C,b)

C:\Python32\lib\site-packages\numpy\linalg\linalg.py in solve(a, b)
    326     results = lapack_routine(n_eq, n_rhs, a, n_eq, pivots, b, n_eq, 0)
    327     if results['info'] > 0:
--> 328         raise LinAlgError('Singular matrix')
    329     if one_eq:
    330         return wrap(b.ravel().astype(result_t))

LinAlgError: Singular matrix

It seems that Numpy is somehow checking for exact singularity but this will rarely be detected due to rounding errors. Those I’ve spoken to consider that MATLAB’s approach of estimating the condition number and warning when that is high would be better behavior since it alerts the user to the fact that the matrix is badly conditioned.

Thanks to Nick Higham and David Silvester for useful discussions regarding this post.

Notes
[1] – The results really are identical which you can see by rerunning the calculation after evaluating format long in MATLAB and numpy.set_printoptions(precision=15) in Python

September 10th, 2013 | Categories: python | Tags:

Along with a colleague, I’ve been playing around with Anaconda Python recently and am very impressed with it. At the time of writing, it is at version 1.7 and comes with Python 2.7.5 by default but you can install Python 3.3 using their conda package manager.  After you’ve installed Anaconda, just start up a Windows command prompt (cmd.exe) and do

conda update conda
conda create -n py33 python=3.3 anaconda

It will chug along for a while, downloading and installing packages before leaving you with a Python 3.3 environment that is completely separated from the default 2.7.5 environment. All you have to do to activate Python 3.3 is issue the following command at the Windows command prompt

activate py33

To demonstrate that the standard anaconda build remains untouched, launch cmd.exe, type ipython and note that you are still using Python 2.7.5

Microsoft Windows [Version 6.1.7601]
Copyright (c) 2009 Microsoft Corporation.  All rights reserved.

C:\Users\testuser>ipython
Python 2.7.5 |Anaconda 1.7.0 (64-bit)| (default, Jul  1 2013, 12:37:52) [MSC v.1500 64 bit (AMD64)]
Type "copyright", "credits" or "license" for more information.

IPython 1.0.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]:

Exit out of ipython and activate the py33 environment before launching ipython again. This time, note that you are using Python 3.3.2

In [1]: exit()

C:\Users\testuser>activate py33
Activating environment "py33"...

[py33] C:\Users\testuser>ipython
Python 3.3.2 |Anaconda 1.7.0 (64-bit)| (default, May 17 2013, 11:32:27) [MSC v.1500 64 bit (AMD64)]
Type "copyright", "credits" or "license" for more information.

IPython 1.0.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]:
August 23rd, 2013 | Categories: Fractals, general math, mathematica | Tags:

In a recent blog-post, John Cook, considered when series such as the following converged for a given complex number z

z1 = sin(z)
z2 = sin(sin(z))
z3 = sin(sin(sin(z)))

John’s article discussed a theorem that answered the question for a few special cases and this got me thinking: What would the complete set of solutions look like? Since I was halfway through my commute to work and had nothing better to do, I thought I’d find out.

The following Mathematica code considers points in the square portion of the complex plane where both real and imaginary parts range from -8 to 8. If the sequence converges for a particular point, I colour it black.

LaunchKernels[4]; (*Set up for 4 core parallel compute*)
ParallelEvaluate[SetSystemOptions["CatchMachineUnderflow" -> False]];
convTest[z_, tol_, max_] := Module[{list},
  list = Quiet[
    NestWhileList[Sin[#] &, z, (Abs[#1 - #2] > tol &), 2, max]];
  If[
   Length[list] < max && NumericQ[list[[-1]]]
   , 1, 0]
  ]
step = 0.005;
extent = 8;
AbsoluteTiming[
 data = ParallelMap[convTest[#, 10*10^-4, 1000] &,
    Table[x + I y, {y, -extent, extent, step}, {x, -extent, extent,
      step}]
    , {2}];]
ArrayPlot[data]

 

Sine Fractal

I quickly emailed John to tell him of my discovery but on actually getting to work I discovered that the above fractal is actually very well known. There’s even a colour version on Wolfram’s MathWorld site.  Still, it was a fun discovery while it lasted

Other WalkingRandomly posts like this one:

August 12th, 2013 | Categories: applications, math software, software deployment | Tags:

Right now it’s packaging season (not the official term!) at my university–a time of year when IT staff have to battle with silent installers, SCCM, MSI creation and Adminstudio in order to create the student desktop image for the next academic year.  Packaging season makes me grouchy, it makes me work late and it makes me massively over-react to every minor installation issue caused by software vendors. Right now, however, I am not grouchy because of packaging season..I am grouchy because of concurrent network licensing (or floating licensing if you are Wikipedia).

I like network licenses…they make many aspects of my job easier but they the way they are implemented by some software vendors causes them to be a pain.  Over the years, I’ve bothered many a support-desk with my network license tails of woe and thought that I would collate them all together in one blog post.

The more of these things your software does, the more pain you cause me and my colleagues.

1. You don’t use standard FlexLM/FlexNet. 

Like it or loathe it, FlexLM is used by the vast majority of software vendors out there. We run license servers that host dozens of FlexLM based applications and we have got the administration of these down to a fine art.  In fact, we’ve replaced the vast majority of the process with a script. If an application uses FlexLM, system administration and license monitoring is bordering on the trivial for us now.  The further you stray away from FlexLM, the more difficult our job becomes.

One thing guaranteed to ruin my day is a call from a vendor I’ve worked with for years who says ‘Great news Mike, we’re ditching FlexLM for our own, in house license system.’  Fabulous! Rather than use our lovely, generic, one-size fits all scripts, we are going to have to do a load of extra work and testing just for you.  I look forward to all the new and interesting bugs your system will generate.

2. You don’t support redundant license servers.

The idea behind redundant license servers is this:  Instead of your application relying on just one machine, it relies on three; only two of which have to be operational at any one time.  This gives us resiliency and resiliency is a good thing when you are teaching a lab with 120 students in it.

I’ll keep this simple.  If you don’t support redundant license servers, it means that you don’t believe that your software is important. It tells me that you are just playing at being a big, grown up piece of software but you don’t really think anyone will take you seriously.

3. You support redundant license servers but can’t select them via the installer.

At install time, there is no option to give three severs. The user can only give one. You then expect the user to copy a pre-prepared license file that has details of all three servers as a post-installation step.

What usually happens here is that users give the primary license server, find that the application will launch and stop reading the installation instructions.  At some point in the future, we take down the primary license server for maintenance and the vast majority of self-serve installations break!

4. You use the LM_LICENSE_FILE environment variable

The problem is, so does everyone else. We end up with a situation where the LM_LICENSE_FILE variable is pointing at several license servers and some client applications really don’t like that. Be a good FlexLM citizen and use a vendor specific environment variable.  For example, MATLAB uses MLM_LICENSE_FILE and I could give them a big hug just for that!

5. You ‘helpfully’ tell the user when the license is about to expire.

I’ve moaned about this before. 1000 users panicking and emailing the helpdesk…lovely!  Bonus points are awarded if you don’t allow any supported way of switching these warnings off.

6. Your new license doesn’t support old clients.

This should speak for itself and happens more than I’d like.  We can’t upgrade the entire estate instantaneously and even if we could, we probably wouldn’t want to.  Some users, for one reason or another, cling to old versions of your software like grim death. They don’t care that there is a new shiny version available, all they know is that I broke their application and they hate me for it.

When we discover that old versions of your application will stop working, it also delays roll out of the new version since we have to do a lot of user-communications and ensure that nothing mission-critical will stop working.  This makes power-users of your application hate me because they want the new shiny version.

7. You don’t have a silent installer.

Strictly speaking not a network license moan but closely related.  We use network licensing because we deploy your software to lots of machines.  When I say ‘lots’ I mean thousands.  It turns out, however, that you don’t support scripted installation (sometimes called ‘silent installation’ or ‘unattended installation’).  This means that your software is a lot more difficult to deploy than your competitor!  I’m now a big fan of your competitor!

8. You have a silent installer but it’s a bad one.

If I install manually, via point and click, I can configure every aspect of your application.  Your silent installer, on the other hand, is just a /S switch that does a default install…no configuration possible.  Bonus points for ‘silent’ installers that include pop-up dialogue boxes that can’t be switched off.

While on the topic of silent installation, can I just ask that you directly support deployment by SCCM on Windows please?  It will help with next year’s packaging season big time!

Cheers,

Mike

August 6th, 2013 | Categories: math software, python | Tags:

The first stable version of KryPy was released in late July.  KryPy is “a Python  module for Krylov subspace methods for the solution of linear algebraic systems. This includes enhanced versions of CG, MINRES and GMRES as well as methods for the efficient solution of sequences of linear systems.”

Here’s a toy example taken from KryPy’s website that shows how easy it is to use.

from numpy import ones
from scipy.sparse import spdiags
from krypy.linsys import gmres

N = 10
A = spdiags(range(1,N+1), [0], N, N)
b = ones((N,1))

sol = gmres(A, b)
print (sol['relresvec'])

Thanks to KryPy’s author, André Gaul, for the news.