Archive for the ‘python’ Category

February 28th, 2014

A lot of people don’t seem to know this….and they should. When working with floating point arithmetic, it is not necessarily true that a+(b+c) = (a+b)+c. Here is a demo using MATLAB

>> x=0.1+(0.2+0.3);
>> y=(0.1+0.2)+0.3;
>> % are they equal?
>> x==y

ans =
     0

>> % lets look
>> sprintf('%.17f',x)
ans =
0.59999999999999998

>> sprintf('%.17f',y)
ans =
0.60000000000000009

These results have nothing to do with the fact that I am using MATLAB. Here’s the same thing in Python

>>> x=(0.1+0.2)+0.3
>>> y=0.1+(0.2+0.3)
>>> x==y
False
>>> print('%.17f' %x)
0.60000000000000009
>>> print('%.17f' %y)
0.59999999999999998

If this upsets you, or if you don’t understand why, I suggest you read the following

Does anyone else out there have suggestions for similar resources on this topic?

December 17th, 2013

In a recent Stack Overflow query, someone asked if you could switch off the balancing step when calculating eigenvalues in Python. In the document A case where balancing is harmful, David S. Watkins describes the balancing step as ‘the input matrix A is replaced by a rescaled matrix A* = D-1AD, where D is a diagonal matrix chosen so that, for each i, the ith row and the ith column of A* have roughly the same norm.’  

Such balancing is usually very useful and so is performed by default by software such as  MATLAB or Numpy.  There are times, however, when one would like to switch it off.

In MATLAB, this is easy and the following is taken from the online MATLAB documentation

A = [ 3.0     -2.0      -0.9     2*eps;
     -2.0      4.0       1.0    -eps;
     -eps/4    eps/2    -1.0     0;
     -0.5     -0.5       0.1     1.0];

[VN,DN] = eig(A,'nobalance')
VN =

    0.6153   -0.4176   -0.0000   -0.1528
   -0.7881   -0.3261         0    0.1345
   -0.0000   -0.0000   -0.0000   -0.9781
    0.0189    0.8481   -1.0000    0.0443

DN =
    5.5616         0         0         0
         0    1.4384         0         0
         0         0    1.0000         0
         0         0         0   -1.0000

At the time of writing, it is not possible to directly do this in Numpy (as far as I know at least). Numpy’s eig command currently uses the LAPACK routine DGEEV to do the heavy lifting for double precision matrices.  We can see this by looking at the source code of numpy.linalg.eig where the relevant subsection is

lapack_routine = lapack_lite.dgeev
        wr = zeros((n,), t)
        wi = zeros((n,), t)
        vr = zeros((n, n), t)
        lwork = 1
        work = zeros((lwork,), t)
        results = lapack_routine(_N, _V, n, a, n, wr, wi,
                                  dummy, 1, vr, n, work, -1, 0)
        lwork = int(work[0])
        work = zeros((lwork,), t)
        results = lapack_routine(_N, _V, n, a, n, wr, wi,
                                  dummy, 1, vr, n, work, lwork, 0)

My plan was to figure out how to tell DGEEV not to perform the balancing step and I’d be done. Sadly, however, it turns out that this is not possible. Taking a look at the reference implementation of DGEEV, we can see that the balancing step is always performed and is not user controllable–here’s the relevant bit of Fortran

*     Balance the matrix
*     (Workspace: need N)
*
      IBAL = 1
      CALL DGEBAL( 'B', N, A, LDA, ILO, IHI, WORK( IBAL ), IERR )

So, using DGEEV is a dead-end unless we are willing to modifiy and recompile the lapack source — something that’s rarely a good idea in my experience. There is another LAPACK routine that is of use, however, in the form of DGEEVX that allows us to control balancing.  Unfortunately, this routine is not part of the numpy.linalg.lapack_lite interface provided by Numpy and I’ve yet to figure out how to add extra routines to it.

I’ve also discovered that this functionality is an open feature request in Numpy.

Enter the NAG Library

My University has a site license for the commercial Numerical Algorithms Group (NAG) library.  Among other things, NAG offers an interface to all of LAPACK along with an interface to Python.  So, I go through the installation and do

import numpy as np
from ctypes import *
from nag4py.util import Nag_RowMajor,Nag_NoBalancing,Nag_NotLeftVecs,Nag_RightVecs,Nag_RCondEigVecs,Integer,NagError,INIT_FAIL
from nag4py.f08 import nag_dgeevx

eps = np.spacing(1)
np.set_printoptions(precision=4,suppress=True) 

def unbalanced_eig(A):
    """
    Compute the eigenvalues and right eigenvectors of a square array using DGEEVX via the NAG library.
    Requires the NAG C library and NAG's Python wrappers http://www.nag.co.uk/python.asp
    The balancing step that's performed in DGEEV is not performed here.
    As such, this function is the same as the MATLAB command eig(A,'nobalance')

    Parameters
    ----------
    A : (M, M) Numpy array
        A square array of real elements.

        On exit: 
        A is overwritten and contains the real Schur form of the balanced version of the input matrix .

    Returns
    -------
    w : (M,) ndarray
        The eigenvalues
    v : (M, M) ndarray
        The eigenvectors

    Author: Mike Croucher (www.walkingrandomly.com)
    Testing has been mimimal
    """

    order = Nag_RowMajor
    balanc = Nag_NoBalancing
    jobvl = Nag_NotLeftVecs
    jobvr = Nag_RightVecs
    sense = Nag_RCondEigVecs

    n = A.shape[0]
    pda = n
    pdvl = 1

    wr = np.zeros(n)
    wi = np.zeros(n)

    vl=np.zeros(1);
    pdvr = n
    vr = np.zeros(pdvr*n)

    ilo=c_long(0)
    ihi=c_long(0)

    scale = np.zeros(n)
    abnrm = c_double(0)
    rconde = np.zeros(n)
    rcondv = np.zeros(n)

    fail = NagError()
    INIT_FAIL(fail)

    nag_dgeevx(order,balanc,jobvl,jobvr,sense, 
               n, A.ctypes.data_as(POINTER(c_double)), pda, wr.ctypes.data_as(POINTER(c_double)),
               wi.ctypes.data_as(POINTER(c_double)),vl.ctypes.data_as(POINTER(c_double)),pdvl, 
               vr.ctypes.data_as(POINTER(c_double)),pdvr,ilo,ihi, scale.ctypes.data_as(POINTER(c_double)),
               abnrm, rconde.ctypes.data_as(POINTER(c_double)),rcondv.ctypes.data_as(POINTER(c_double)),fail)

    if all(wi == 0.0):
            w = wr
            v = vr.reshape(n,n)
    else:
            w = wr+1j*wi
            v = array(vr, w.dtype).reshape(n,n)

    return(w,v)

Define a test matrix:

A = np.array([[3.0,-2.0,-0.9,2*eps],
          [-2.0,4.0,1.0,-eps],
          [-eps/4,eps/2,-1.0,0],
          [-0.5,-0.5,0.1,1.0]])

Do the calculation

(w,v) = unbalanced_eig(A)

which gives

(array([ 5.5616,  1.4384,  1.    , -1.    ]),
 array([[ 0.6153, -0.4176, -0.    , -0.1528],
       [-0.7881, -0.3261,  0.    ,  0.1345],
       [-0.    , -0.    , -0.    , -0.9781],
       [ 0.0189,  0.8481, -1.    ,  0.0443]]))

This is exactly what you get by running the MATLAB command eig(A,’nobalance’).

Note that unbalanced_eig(A) changes the input matrix A to

array([[ 5.5616, -0.0662,  0.0571,  1.3399],
       [ 0.    ,  1.4384,  0.7017, -0.1561],
       [ 0.    ,  0.    ,  1.    , -0.0132],
       [ 0.    ,  0.    ,  0.    , -1.    ]])

According to the NAG documentation, this is the real Schur form of the balanced version of the input matrix.  I can’t see how to ask NAG to not do this. I guess that if it’s not what you want unbalanced_eig() to do,  you’ll need to pass a copy of the input matrix to NAG.

The IPython notebook

The code for this article is available as an IPython Notebook

The future

This blog post was written using Numpy version 1.7.1. There is an enhancement request for the functionality discussed in this article open in Numpy’s git repo and so I expect this article to become redundant pretty soon.

December 6th, 2013

A question I get asked a lot is ‘How can I do nonlinear least squares curve fitting in X?’ where X might be MATLAB, Mathematica or a whole host of alternatives.  Since this is such a common query, I thought I’d write up how to do it for a very simple problem in several systems that I’m interested in

This is the Python version. For other versions,see the list below

The problem

xdata = -2,-1.64,-1.33,-0.7,0,0.45,1.2,1.64,2.32,2.9
ydata = 0.699369,0.700462,0.695354,1.03905,1.97389,2.41143,1.91091,0.919576,-0.730975,-1.42001

and you’d like to fit the function

 F(p_1,p_2,x) = p_1 \cos(p_2 x)+p_2 \sin(p_1 x)

using nonlinear least squares.  You’re starting guesses for the parameters are p1=1 and P2=0.2

For now, we are primarily interested in the following results:

  • The fit parameters
  • Sum of squared residuals

Future updates of these posts will show how to get other results such as confidence intervals. Let me know what you are most interested in.

Python solution using scipy

Here, I use the curve_fit function from scipy

import numpy as np
from scipy.optimize import curve_fit

xdata = np.array([-2,-1.64,-1.33,-0.7,0,0.45,1.2,1.64,2.32,2.9])
ydata = np.array([0.699369,0.700462,0.695354,1.03905,1.97389,2.41143,1.91091,0.919576,-0.730975,-1.42001])

def func(x, p1,p2):
  return p1*np.cos(p2*x) + p2*np.sin(p1*x)

popt, pcov = curve_fit(func, xdata, ydata,p0=(1.0,0.2))

The variable popt contains the fit parameters

array([ 1.88184732,  0.70022901])

We need to do a little more work to get the sum of squared residuals

p1 = popt[0]
p2 = popt[1]
residuals = ydata - func(xdata,p1,p2)
fres = sum(residuals**2)

which gives

0.053812696547933969
November 26th, 2013

As soon as I heard the news that Mathematica was being made available completely free on the Raspberry Pi, I just had to get myself a Pi and have a play.  So, I bought the Raspberry Pi Advanced Kit from my local Maplin Electronics store, plugged it to the kitchen telly and booted it up.  The exercise made me miss my father because the last time I plugged a computer into the kitchen telly was when I was 8 years old; it was Christmas morning and dad and I took our first steps into a new world with my Sinclair Spectrum 48K.

How to install Mathematica on the Raspberry Pi

Future raspberry pis wll have Mathematica installed by default but mine wasn’t new enough so I just typed the following at the command line

sudo apt-get update && sudo apt-get install wolfram-engine

On my machine, I was told

The following extra packages will be installed:
  oracle-java7-jdk
The following NEW packages will be installed:
  oracle-java7-jdk wolfram-engine
0 upgraded, 2 newly installed, 0 to remove and 1 not upgraded.
Need to get 278 MB of archives.
After this operation, 588 MB of additional disk space will be used.

So, it seems that Mathematica needs Oracle’s Java and that’s being installed for me as well. The combination of the two is going to use up 588MB of disk space which makes me glad that I have an 8Gb SD card in my pi.

Mathematica version 10!

On starting Mathematica on the pi, my first big surprise was the version number.  I am the administrator of an unlimited academic site license for Mathematica at The University of Manchester and the latest version we can get for our PCs at the time of writing is 9.0.1.  My free pi version is at version 10!  The first clue is the installation directory:

/opt/Wolfram/WolframEngine/10.0

and the next clue is given by evaluating $Version in Mathematica itself

In[2]:= $Version

Out[2]= "10.0 for Linux ARM (32-bit) (November 19, 2013)"

To get an idea of what’s new in 10, I evaluated the following command on Mathematica on the Pi

Export["PiFuncs.dat",Names["System`*"]]

This creates a PiFuncs.dat file which tells me the list of functions in the System context on the version of Mathematica on the pi. Transfer this over to my Windows PC and import into Mathematica 9.0.1 with

pifuncs = Flatten[Import["PiFuncs.dat"]];

Get the list of functions from version 9.0.1 on Windows:

winVer9funcs = Names["System`*"];

Finally, find out what’s in pifuncs but not winVer9funcs

In[16]:= Complement[pifuncs, winVer9funcs]

Out[16]= {"Activate", "AffineStateSpaceModel", "AllowIncomplete", \
"AlternatingFactorial", "AntihermitianMatrixQ", \
"AntisymmetricMatrixQ", "APIFunction", "ArcCurvature", "ARCHProcess", \
"ArcLength", "Association", "AsymptoticOutputTracker", \
"AutocorrelationTest", "BarcodeImage", "BarcodeRecognize", \
"BoxObject", "CalendarConvert", "CanonicalName", "CantorStaircase", \
"ChromaticityPlot", "ClassifierFunction", "Classify", \
"ClipPlanesStyle", "CloudConnect", "CloudDeploy", "CloudDisconnect", \
"CloudEvaluate", "CloudFunction", "CloudGet", "CloudObject", \
"CloudPut", "CloudSave", "ColorCoverage", "ColorDistance", "Combine", \
"CommonName", "CompositeQ", "Computed", "ConformImages", "ConformsQ", \
"ConicHullRegion", "ConicHullRegion3DBox", "ConicHullRegionBox", \
"ConstantImage", "CountBy", "CountedBy", "CreateUUID", \
"CurrencyConvert", "DataAssembly", "DatedUnit", "DateFormat", \
"DateObject", "DateObjectQ", "DefaultParameterType", \
"DefaultReturnType", "DefaultView", "DeviceClose", "DeviceConfigure", \
"DeviceDriverRepository", "DeviceExecute", "DeviceInformation", \
"DeviceInputStream", "DeviceObject", "DeviceOpen", "DeviceOpenQ", \
"DeviceOutputStream", "DeviceRead", "DeviceReadAsynchronous", \
"DeviceReadBuffer", "DeviceReadBufferAsynchronous", \
"DeviceReadTimeSeries", "Devices", "DeviceWrite", \
"DeviceWriteAsynchronous", "DeviceWriteBuffer", \
"DeviceWriteBufferAsynchronous", "DiagonalizableMatrixQ", \
"DirichletBeta", "DirichletEta", "DirichletLambda", "DSolveValue", \
"Entity", "EntityProperties", "EntityProperty", "EntityValue", \
"Enum", "EvaluationBox", "EventSeries", "ExcludedPhysicalQuantities", \
"ExportForm", "FareySequence", "FeedbackLinearize", "Fibonorial", \
"FileTemplate", "FileTemplateApply", "FindAllPaths", "FindDevices", \
"FindEdgeIndependentPaths", "FindFundamentalCycles", \
"FindHiddenMarkovStates", "FindSpanningTree", \
"FindVertexIndependentPaths", "Flattened", "ForeignKey", \
"FormatName", "FormFunction", "FormulaData", "FormulaLookup", \
"FractionalGaussianNoiseProcess", "FrenetSerretSystem", "FresnelF", \
"FresnelG", "FullInformationOutputRegulator", "FunctionDomain", \
"FunctionRange", "GARCHProcess", "GeoArrow", "GeoBackground", \
"GeoBoundaryBox", "GeoCircle", "GeodesicArrow", "GeodesicLine", \
"GeoDisk", "GeoElevationData", "GeoGraphics", "GeoGridLines", \
"GeoGridLinesStyle", "GeoLine", "GeoMarker", "GeoPoint", \
"GeoPolygon", "GeoProjection", "GeoRange", "GeoRangePadding", \
"GeoRectangle", "GeoRhumbLine", "GeoStyle", "Graph3D", "GroupBy", \
"GroupedBy", "GrowCutBinarize", "HalfLine", "HalfPlane", \
"HiddenMarkovProcess", "", "", "", "", "", "", \
"", "", "", "", "", "", "", "", "", "", \
"", "", "", "", "", "", "", "", "", "", \
"", "", "", "", "", "", "", "", "", "", \
"ï ¯", "ï \.b2", "ï \.b3", "IgnoringInactive", "ImageApplyIndexed", \
"ImageCollage", "ImageSaliencyFilter", "Inactivate", "Inactive", \
"IncludeAlphaChannel", "IncludeWindowTimes", "IndefiniteMatrixQ", \
"IndexedBy", "IndexType", "InduceType", "InferType", "InfiniteLine", \
"InfinitePlane", "InflationAdjust", "InflationMethod", \
"IntervalSlider", "ï ¨", "ï ¢", "ï ©", "ï ¤", "ï \[Degree]", "ï ­", \
"ï ¡", "ï «", "ï ®", "ï §", "ï £", "ï ¥", "ï \[PlusMinus]", \
"ï \[Not]", "JuliaSetIterationCount", "JuliaSetPlot", \
"JuliaSetPoints", "KEdgeConnectedGraphQ", "Key", "KeyDrop", \
"KeyExistsQ", "KeyIntersection", "Keys", "KeySelect", "KeySort", \
"KeySortBy", "KeyTake", "KeyUnion", "KillProcess", \
"KVertexConnectedGraphQ", "LABColor", "LinearGradientImage", \
"LinearizingTransformationData", "ListType", "LocalAdaptiveBinarize", \
"LocalizeDefinitions", "LogisticSigmoid", "Lookup", "LUVColor", \
"MandelbrotSetIterationCount", "MandelbrotSetMemberQ", \
"MandelbrotSetPlot", "MinColorDistance", "MinimumTimeIncrement", \
"MinIntervalSize", "MinkowskiQuestionMark", "MovingMap", \
"NegativeDefiniteMatrixQ", "NegativeSemidefiniteMatrixQ", \
"NonlinearStateSpaceModel", "Normalized", "NormalizeType", \
"NormalMatrixQ", "NotebookTemplate", "NumberLinePlot", "OperableQ", \
"OrthogonalMatrixQ", "OverwriteTarget", "PartSpecification", \
"PlotRangeClipPlanesStyle", "PositionIndex", \
"PositiveSemidefiniteMatrixQ", "Predict", "PredictorFunction", \
"PrimitiveRootList", "ProcessConnection", "ProcessInformation", \
"ProcessObject", "ProcessStatus", "Qualifiers", "QuantityVariable", \
"QuantityVariableCanonicalUnit", "QuantityVariableDimensions", \
"QuantityVariableIdentifier", "QuantityVariablePhysicalQuantity", \
"RadialGradientImage", "RandomColor", "RegularlySampledQ", \
"RemoveBackground", "RequiredPhysicalQuantities", "ResamplingMethod", \
"RiemannXi", "RSolveValue", "RunProcess", "SavitzkyGolayMatrix", \
"ScalarType", "ScorerGi", "ScorerGiPrime", "ScorerHi", \
"ScorerHiPrime", "ScriptForm", "Selected", "SendMessage", \
"ServiceConnect", "ServiceDisconnect", "ServiceExecute", \
"ServiceObject", "ShowWhitePoint", "SourceEntityType", \
"SquareMatrixQ", "Stacked", "StartDeviceHandler", "StartProcess", \
"StateTransformationLinearize", "StringTemplate", "StructType", \
"SystemGet", "SystemsModelMerge", "SystemsModelVectorRelativeOrder", \
"TemplateApply", "TemplateBlock", "TemplateExpression", "TemplateIf", \
"TemplateObject", "TemplateSequence", "TemplateSlot", "TemplateWith", \
"TemporalRegularity", "ThermodynamicData", "ThreadDepth", \
"TimeObject", "TimeSeries", "TimeSeriesAggregate", \
"TimeSeriesInsert", "TimeSeriesMap", "TimeSeriesMapThread", \
"TimeSeriesModel", "TimeSeriesModelFit", "TimeSeriesResample", \
"TimeSeriesRescale", "TimeSeriesShift", "TimeSeriesThread", \
"TimeSeriesWindow", "TimeZoneConvert", "TouchPosition", \
"TransformedProcess", "TrapSelection", "TupleType", "TypeChecksQ", \
"TypeName", "TypeQ", "UnitaryMatrixQ", "URLBuild", "URLDecode", \
"URLEncode", "URLExistsQ", "URLExpand", "URLParse", "URLQueryDecode", \
"URLQueryEncode", "URLShorten", "ValidTypeQ", "ValueDimensions", \
"Values", "WhiteNoiseProcess", "XMLTemplate", "XYZColor", \
"ZoomLevel", "$CloudBase", "$CloudConnected", "$CloudDirectory", \
"$CloudEvaluation", "$CloudRootDirectory", "$EvaluationEnvironment", \
"$GeoLocationCity", "$GeoLocationCountry", "$GeoLocationPrecision", \
"$GeoLocationSource", "$RegisteredDeviceClasses", \
"$RequesterAddress", "$RequesterWolframID", "$RequesterWolframUUID", \
"$UserAgentLanguages", "$UserAgentMachine", "$UserAgentName", \
"$UserAgentOperatingSystem", "$UserAgentString", "$UserAgentVersion", \
"$WolframID", "$WolframUUID"}

There we have it, a preview of the list of functions that might be coming in desktop version 10 of Mathematica courtesy of the free Pi version.

No local documentation

On a desktop version of Mathematica, all of the Mathematica documentation is available on your local machine by clicking on Help->Documentation Center in the Mathematica notebook interface.  On the pi version, it seems that there is no local documentation, presumably to keep the installation size down.  You get to the documentation via the notebook interface by clicking on Help->OnlineDocumentation which takes you to http://reference.wolfram.com/language/?src=raspi

Speed vs my laptop

I am used to running Mathematica on high specification machines and so naturally the pi version felt very sluggish–particularly when using the notebook interface.  With that said, however,  I found it very usable for general playing around.  I was very curious, however, about the speed of the pi version compared to the version on my home laptop and so created a small benchmark notebook that did three things:

  • Calculate pi to 1,000,000 decimal places.
  • Multiply two 1000 x 1000 random matrices together
  • Integrate sin(x)^2*tan(x) with respect to x

The comparison is going to be against my Windows 7 laptop which has a quad-core Intel Core i7-2630QM. The procedure I followed was:

  • Start a fresh version of the Mathematica notebook and open pi_bench.nb
  • Click on Evaluation->Evaluate Notebook and record the times
  • Click on Evaluation->Evaluate Notebook again and record the new times.

Note that I use the AbsoluteTiming function instead of Timing (detailed reason given here) and I clear the system cache (detailed resason given here).   You can download the notebook I used here.  Alternatively, copy and paste the code below

(*Clear Cache*)
ClearSystemCache[]

(*Calculate pi to 1 million decimal places and store the result*)
AbsoluteTiming[pi = N[Pi, 1000000];]

(*Multiply two random 1000x1000 matrices together and store the \
result*)
a = RandomReal[1, {1000, 1000}];
b = RandomReal[1, {1000, 1000}];

AbsoluteTiming[prod = Dot[a, b];]

(*calculate an integral and store the result*)
AbsoluteTiming[res = Integrate[Sin[x]^2*Tan[x], x];]

Here are the results. All timings in seconds.

Test Laptop Run 1 Laptop Run 2 RaspPi Run 1 RaspPi Run 2 Best Pi/Best Laptop
Million digits of Pi 0.994057 1.007058 14.101360 13.860549 13.9434
Matrix product 0.108006 0.074004 85.076986 85.526180 1149.63
Symbolic integral 0.035002 0.008000 0.980086 0.448804 56.1

From these tests, we see that Mathematica on the pi is around 14 to 1149 times slower on the pi than my laptop. The huge difference between the pi and laptop for the matrix product stems from the fact that ,on the laptop, Mathematica is using Intels Math Kernel Library (MKL).  The MKL is extremely well optimised for Intel processors and will be using all 4 of the laptop’s CPU cores along with extra tricks such as AVX operations etc.  I am not sure what is being used on the pi for this operation.

I also ran the standard BenchMarkReport[] on the Raspberry Pi.  The results are available here.

Speed vs Python

Comparing Mathematica on the pi to Mathematica on my laptop might have been a fun exercise for me but it’s not really fair on the pi which wasn’t designed to perform against expensive laptops. So, let’s move on to a more meaningful speed comparison: Mathematica on pi versus Python on pi.

When it comes to benchmarking on Python, I usually turn to the timeit module.  This time, however, I’m not going to use it and that’s because of something odd that’s happening with sympy and caching.  I’m using sympy to calculate pi to 1 million places and for the symbolic calculus.  Check out this ipython session on the pi

pi@raspberrypi ~ $ SYMPY_USE_CACHE=no
pi@raspberrypi ~ $ ipython
Python 2.7.3 (default, Jan 13 2013, 11:20:46)
Type "copyright", "credits" or "license" for more information.

IPython 0.13.1 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: import sympy
In [2]: pi=sympy.pi.evalf(100000) #Takes a few seconds
In [3]: %timeit pi=sympy.pi.evalf(100000)
100 loops, best of 3: 2.35 ms per loop

In short, I have asked sympy not to use caching (I think!) and yet it is caching the result.  I don’t want to time how quickly sympy can get a result from the cache so I can’t use timeit until I figure out what’s going on here. Since I wanted to publish this post sooner rather than later I just did this:

import sympy
import time
import numpy

start = time.time()
pi=sympy.pi.evalf(1000000)
elapsed = (time.time() - start)
print('1 million pi digits: %f seconds' % elapsed)

a = numpy.random.uniform(0,1,(1000,1000))
b = numpy.random.uniform(0,1,(1000,1000))
start = time.time()
c=numpy.dot(a,b)
elapsed = (time.time() - start)
print('Matrix Multiply: %f seconds' % elapsed)

x=sympy.Symbol('x')
start = time.time()
res=sympy.integrate(sympy.sin(x)**2*sympy.tan(x),x)
elapsed = (time.time() - start)
print('Symbolic Integration: %f seconds' % elapsed)

Usually, I’d use time.clock() to measure things like this but something *very* strange is happening with time.clock() on my pi–something I’ll write up later.  In short, it didn’t work properly and so I had to resort to time.time().

Here are the results:

1 million pi digits: 5535.621769 seconds
Matrix Multiply: 77.938481 seconds
Symbolic Integration: 1654.666123 seconds

The result that really surprised me here was the symbolic integration since the problem I posed didn’t look very difficult. Sympy on pi was thousands of times slower than Mathematica on pi for this calculation!  On my laptop, the calculation times between Mathematica and sympy were about the same for this operation.

That Mathematica beats sympy for 1 million digits of pi doesn’t surprise me too much since I recall attending a seminar a few years ago where Wolfram Research described how they had optimized the living daylights out of that particular operation. Nice to see Python beating Mathematica by a little bit in the linear algebra though.

 

October 10th, 2013

From the wikipedia page on Division by Zero: “The IEEE 754 standard specifies that every floating point arithmetic operation, including division by zero, has a well-defined result”.

MATLAB supports this fully:

>> 1/0
ans =
   Inf
>> 1/(-0)
ans =
  -Inf
>> 0/0
ans =
   NaN

Julia is almost there, but doesn’t handled the signed 0 correctly (This is using Version 0.2.0-prerelease+3768 on Windows)

julia> 1/0
Inf

julia> 1/(-0)
Inf

julia> 0/0
NaN

Python throws an exception. (Python 2.7.5 using IPython shell)

In [4]: 1.0/0.0
---------------------------------------------------------------------------
ZeroDivisionError                         Traceback (most recent call last)
 in ()
----> 1 1.0/0.0

ZeroDivisionError: float division by zero

In [5]: 1.0/(-0.0)
---------------------------------------------------------------------------
ZeroDivisionError                         Traceback (most recent call last)
 in ()
----> 1 1.0/(-0.0)

ZeroDivisionError: float division by zero

In [6]: 0.0/0.0
---------------------------------------------------------------------------
ZeroDivisionError                         Traceback (most recent call last)
 in ()
----> 1 0.0/0.0

ZeroDivisionError: float division by zero

Update:
Julia does do things correctly, provided I make it clear that I am working with floating point numbers:

julia> 1.0/0.0
Inf

julia> 1.0/(-0.0)
-Inf

julia> 0.0/0.0
NaN
September 25th, 2013

I support scientific applications at The University of Manchester (see my LinkedIn profile if you’re interested in the details) and part of my job involves working on code written by researchers in a variety of languages.  When I say ‘variety’ I really mean it – MATLAB, Mathematica, Python, C, Fortran, Julia, Maple, Visual Basic and PowerShell are some languages I’ve worked with this month for instance.

Having to juggle the semantics of so many languages in my head sometimes leads to momentary confusion when working on someone’s program.  For example, I’ve been doing a lot of Python work recently but this morning I was hacking away on someone’s MATLAB code.  Buried deep within the program, it would have been very sensible to be able to do the equivalent of this:

a=rand(3,3)

a =
    0.8147    0.9134    0.2785
    0.9058    0.6324    0.5469
    0.1270    0.0975    0.9575

>> [x,y,z]=a(:,1)

Indexing cannot yield multiple results.

That is, I want to be able to take the first column of the matrix a and broadcast it out to the variables x,y and z. The code I’m working on uses MUCH bigger matrices and this kind of assignment is occasionally useful since the variable names x,y,z have slightly more meaning than a(1,3), a(2,3), a(3,3).

The only concise way I’ve been able to do something like this using native MATLAB commands is to first convert to a cell. In MATLAB 2013a for instance:

>> temp=num2cell(a(:,1));
>> [x y z] = temp{:}

x =
    0.8147

y =
    0.9058

z =
    0.1270

This works but I think it looks ugly and introduces conversion overheads. The problem I had for a short time is that I subconsciously expected multiple assignment to ‘Just Work’ in MATLAB since the concept makes sense in several other languages I use regularly.

Python:

from pylab import rand
a=rand(3,3)
[a,b,c]=a[:,0]

Mathematica:

a = RandomReal[1, {3, 3}]
{x,y,z}=a[[All,1]]

Julia:

a=rand(3,3);
(x,y,z)=a[:,1]

I’ll admit that I don’t often need this construct in MATLAB but it would definitely be occasionally useful. I wonder what other opinions there are out there? Do you think multiple assignment is useful (in any language)?

September 16th, 2013

Last week I gave a live demo of the IPython notebook to a group of numerical analysts and one of the computations we attempted to do was to solve the following linear system using Numpy’s solve command.

 \left( \begin{array}{ccc} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{array} \right) x = \left( \begin{array}{c} 15 \\ 15 \\ 15 \\ \end{array} \right)

Now, the matrix shown above is singular and so we expect that we might have problems. Before looking at how Numpy deals with this computation, lets take a look at what happens if you ask MATLAB to do it

>> A=[1 2 3;4 5 6;7 8 9];
>> b=[15;15;15];
>> x=A\b
Warning: Matrix is close to singular
or badly scaled. Results may be
inaccurate. RCOND =  1.541976e-18. 
x =
  -39.0000
   63.0000
  -24.0000

MATLAB gives us a warning that the input matrix is close to being singular (note that it didn’t actually recognize that it is singular) along with an estimate of the reciprocal of the condition number. It tells us that the results may be inaccurate and we’d do well to check. So, lets check:

>> A*x

ans =
   15.0000
   15.0000
   15.0000

>> norm(A*x-b)

ans =
2.8422e-14

We seem to have dodged the bullet since, despite the singular nature of our matrix, MATLAB has able to find a valid solution. MATLAB was right to have warned us though…in other cases we might not have been so lucky.

Let’s see how Numpy deals with this using the IPython notebook:

In [1]:
import numpy
from numpy import array
from numpy.linalg import solve
A=array([[1,2,3],[4,5,6],[7,8,9]])
b=array([15,15,15])
solve(A,b)

Out[1]:

array([-39.,  63., -24.])

It gave the same result as MATLAB [See note 1], presumably because it’s using the exact same LAPACK routine, but there was no warning of the singular nature of the matrix.  During my demo, it was generally felt by everyone in the room that a warning should have been given, particularly when working in an interactive setting.

If you look at the documentation for Numpy’s solve command you’ll see that it is supposed to throw an exception when the matrix is singular but it clearly didn’t do so here. The exception is sometimes thrown though:

In [4]:

C=array([[1,1,1],[1,1,1],[1,1,1]])
x=solve(C,b)

---------------------------------------------------------------------------
LinAlgError                               Traceback (most recent call last)
 in ()
      1 C=array([[1,1,1],[1,1,1],[1,1,1]])
----> 2 x=solve(C,b)

C:\Python32\lib\site-packages\numpy\linalg\linalg.py in solve(a, b)
    326     results = lapack_routine(n_eq, n_rhs, a, n_eq, pivots, b, n_eq, 0)
    327     if results['info'] > 0:
--> 328         raise LinAlgError('Singular matrix')
    329     if one_eq:
    330         return wrap(b.ravel().astype(result_t))

LinAlgError: Singular matrix

It seems that Numpy is somehow checking for exact singularity but this will rarely be detected due to rounding errors. Those I’ve spoken to consider that MATLAB’s approach of estimating the condition number and warning when that is high would be better behavior since it alerts the user to the fact that the matrix is badly conditioned.

Thanks to Nick Higham and David Silvester for useful discussions regarding this post.

Notes
[1] – The results really are identical which you can see by rerunning the calculation after evaluating format long in MATLAB and numpy.set_printoptions(precision=15) in Python

September 10th, 2013

Along with a colleague, I’ve been playing around with Anaconda Python recently and am very impressed with it. At the time of writing, it is at version 1.7 and comes with Python 2.7.5 by default but you can install Python 3.3 using their conda package manager.  After you’ve installed Anaconda, just start up a Windows command prompt (cmd.exe) and do

conda update conda
conda create -n py33 python=3.3 anaconda

It will chug along for a while, downloading and installing packages before leaving you with a Python 3.3 environment that is completely separated from the default 2.7.5 environment. All you have to do to activate Python 3.3 is issue the following command at the Windows command prompt

activate py33

To demonstrate that the standard anaconda build remains untouched, launch cmd.exe, type ipython and note that you are still using Python 2.7.5

Microsoft Windows [Version 6.1.7601]
Copyright (c) 2009 Microsoft Corporation.  All rights reserved.

C:\Users\testuser>ipython
Python 2.7.5 |Anaconda 1.7.0 (64-bit)| (default, Jul  1 2013, 12:37:52) [MSC v.1500 64 bit (AMD64)]
Type "copyright", "credits" or "license" for more information.

IPython 1.0.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]:

Exit out of ipython and activate the py33 environment before launching ipython again. This time, note that you are using Python 3.3.2

In [1]: exit()

C:\Users\testuser>activate py33
Activating environment "py33"...

[py33] C:\Users\testuser>ipython
Python 3.3.2 |Anaconda 1.7.0 (64-bit)| (default, May 17 2013, 11:32:27) [MSC v.1500 64 bit (AMD64)]
Type "copyright", "credits" or "license" for more information.

IPython 1.0.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]:
August 6th, 2013

The first stable version of KryPy was released in late July.  KryPy is “a Python  module for Krylov subspace methods for the solution of linear algebraic systems. This includes enhanced versions of CG, MINRES and GMRES as well as methods for the efficient solution of sequences of linear systems.”

Here’s a toy example taken from KryPy’s website that shows how easy it is to use.

from numpy import ones
from scipy.sparse import spdiags
from krypy.linsys import gmres

N = 10
A = spdiags(range(1,N+1), [0], N, N)
b = ones((N,1))

sol = gmres(A, b)
print (sol['relresvec'])

Thanks to KryPy’s author, André Gaul, for the news.

June 18th, 2013

I was recently chatting to a research group who were considering moving to Python from MATLAB for some of their research software output.  One team member was very worried about Python’s use of indentation to denote blocks of code and wondered if braces would ever find their way into the language?  Another team member pointed out that this was extremely unlikely and invited us to attempt to import braces from the __future__ module.

>>> from __future__ import braces
  File "", line 1
SyntaxError: not a chance