## Archive for the ‘python’ Category

For a while now, Microsoft have provided a free Jupyter Notebook service on Microsoft Azure. At the moment they provide compute kernels for Python, R and F# providing up to 4Gb of memory per session. Anyone with a Microsoft account can upload their own notebooks, share notebooks with others and start computing or doing data science for free.

They University of Cambridge uses them for teaching, and they’ve also been used by the LIGO people (gravitational waves) for dissemination purposes.

This got me wondering. How much power does Microsoft provide for free within these notebooks? Computing is pretty cheap these days what with the Raspberry Pi and so on but what do you get for nothing? The memory limit is 4GB but how about the computational power?

To find out, I created a simple benchmark notebook that finds out how quickly a computer multiplies matrices together of various sizes.

- The benchmark notebook is here on Azure https://notebooks.azure.com/walkingrandomly/libraries/MatrixMatrix
- and here on GitHub https://github.com/mikecroucher/Jupyter-Matrix-Matrix

Matrix-Matrix multiplication is often used as a benchmark because it’s a common operation in many scientific domains and it has been optimised to within an inch of it’s life. I have lost count of the number of times where my contribution to a researcher’s computational workflow has amounted to little more than ‘don’t multiply matrices together like that, do it like this…it’s much faster’

So how do Azure notebooks perform when doing this important operation? It turns out that **they max out at 263 Gigaflops**!

For context, here are some other results:

- A 16 core Intel Xeon E5-2630 v3 node running on Sheffield’s HPC system achieved around 500 Gigaflops.
- My mid-2014 Mabook Pro, with a Haswell Intel CPU hit, hit 169 Gigaflops.
- My Dell XPS9560 laptop, with a Kaby Lake Intel CPU, manages 153 Gigaflops.

As you can see, we are getting quite a lot of compute power for nothing from Azure Notebooks. Of course, one of the limiting factors of the free notebook service is that we are limited to 4GB of RAM but that was more than I had on my own laptops until 2011 and I got along just fine.

Another fun fact is that according to https://www.top500.org/statistics/perfdevel/, 263 Gigaflops **would have made it the fastest computer in the world until 1994**. It would have stayed in the top 500 supercomputers of the world until June 2003 [1].

Not bad for free!

[1] The top 500 list is compiled using a different benchmark called LINPACK so a direct comparison isn’t strictly valid…I’m using a little poetic license here.

There are lots of Widgets in ipywidgets. Here’s how to list them

from ipywidgets import * widget.Widget.widget_types

At the time of writing, this gave me

{'Jupyter.Accordion': ipywidgets.widgets.widget_selectioncontainer.Accordion, 'Jupyter.BoundedFloatText': ipywidgets.widgets.widget_float.BoundedFloatText, 'Jupyter.BoundedIntText': ipywidgets.widgets.widget_int.BoundedIntText, 'Jupyter.Box': ipywidgets.widgets.widget_box.Box, 'Jupyter.Button': ipywidgets.widgets.widget_button.Button, 'Jupyter.Checkbox': ipywidgets.widgets.widget_bool.Checkbox, 'Jupyter.ColorPicker': ipywidgets.widgets.widget_color.ColorPicker, 'Jupyter.Controller': ipywidgets.widgets.widget_controller.Controller, 'Jupyter.ControllerAxis': ipywidgets.widgets.widget_controller.Axis, 'Jupyter.ControllerButton': ipywidgets.widgets.widget_controller.Button, 'Jupyter.Dropdown': ipywidgets.widgets.widget_selection.Dropdown, 'Jupyter.FlexBox': ipywidgets.widgets.widget_box.FlexBox, 'Jupyter.FloatProgress': ipywidgets.widgets.widget_float.FloatProgress, 'Jupyter.FloatRangeSlider': ipywidgets.widgets.widget_float.FloatRangeSlider, 'Jupyter.FloatSlider': ipywidgets.widgets.widget_float.FloatSlider, 'Jupyter.FloatText': ipywidgets.widgets.widget_float.FloatText, 'Jupyter.HTML': ipywidgets.widgets.widget_string.HTML, 'Jupyter.Image': ipywidgets.widgets.widget_image.Image, 'Jupyter.IntProgress': ipywidgets.widgets.widget_int.IntProgress, 'Jupyter.IntRangeSlider': ipywidgets.widgets.widget_int.IntRangeSlider, 'Jupyter.IntSlider': ipywidgets.widgets.widget_int.IntSlider, 'Jupyter.IntText': ipywidgets.widgets.widget_int.IntText, 'Jupyter.Label': ipywidgets.widgets.widget_string.Label, 'Jupyter.PlaceProxy': ipywidgets.widgets.widget_box.PlaceProxy, 'Jupyter.Play': ipywidgets.widgets.widget_int.Play, 'Jupyter.Proxy': ipywidgets.widgets.widget_box.Proxy, 'Jupyter.RadioButtons': ipywidgets.widgets.widget_selection.RadioButtons, 'Jupyter.Select': ipywidgets.widgets.widget_selection.Select, 'Jupyter.SelectMultiple': ipywidgets.widgets.widget_selection.SelectMultiple, 'Jupyter.SelectionSlider': ipywidgets.widgets.widget_selection.SelectionSlider, 'Jupyter.Tab': ipywidgets.widgets.widget_selectioncontainer.Tab, 'Jupyter.Text': ipywidgets.widgets.widget_string.Text, 'Jupyter.Textarea': ipywidgets.widgets.widget_string.Textarea, 'Jupyter.ToggleButton': ipywidgets.widgets.widget_bool.ToggleButton, 'Jupyter.ToggleButtons': ipywidgets.widgets.widget_selection.ToggleButtons, 'Jupyter.Valid': ipywidgets.widgets.widget_bool.Valid, 'jupyter.DirectionalLink': ipywidgets.widgets.widget_link.DirectionalLink, 'jupyter.Link': ipywidgets.widgets.widget_link.Link}

This is my rant on **import *.** There are many like it, but this one is mine.

I tend to work with scientists so I’ll use something from mathematics as my example. What is the result of executing the following line of Python code?

result = sqrt(-1)

Of course, you have no idea if you don’t know which module **sqrt** came from. Let’s look at a few possibilities. Perhaps you’ll get an exception:

In [1]: import math In [2]: math.sqrt(-1) --------------------------------------------------------------------------- ValueError Traceback (most recent call last) in () ----> 1 math.sqrt(-1) ValueError: math domain error

Or maybe you’ll just get a warning and a nan

In [3]: import numpy In [4]: numpy.sqrt(-1) /Users/walkingrandomly/anaconda/bin/ipython:1: RuntimeWarning: invalid value encountered in sqrt #!/bin/bash /Users/walkingrandomly/anaconda/bin/python.app Out[4]: nan

You might get an answer but the datatype of your answer could be all sorts of strange and wonderful stuff.

In [5]: import cmath In [6]: cmath.sqrt(-1) Out[6]: 1j In [7]: type(cmath.sqrt(-1)) Out[7]: complex In [8]: import scipy In [9]: scipy.sqrt(-1) Out[9]: 1j In [10]: type(scipy.sqrt(-1)) Out[10]: numpy.complex128 In [11]: import sympy In [12]: sympy.sqrt(-1) Out[12]: I In [13]: type(sympy.sqrt(-1)) Out[13]: sympy.core.numbers.ImaginaryUnit

Even the humble square root function behaves very differently when imported from different modules! There are probably other sqrt functions, with yet more behaviours that I’ve missed.

Sometimes, they seem to behave in very similar ways:-

In [16]: math.sqrt(2) Out[16]: 1.4142135623730951 In [17]: numpy.sqrt(2) Out[17]: 1.4142135623730951 In [18]: scipy.sqrt(2) Out[18]: 1.4142135623730951

Let’s invent some trivial code.

from scipy import sqrt x = float(input('enter a number\n')) y = sqrt(x) # important things happen after here. Complex numbers are fine!

I can input -1 just fine. Then, someone comes along and decides that they need a function from **math** in the ‘important bit’. They use **import ***

from scipy import sqrt from math import * x = float(input('enter a number\n')) y = sqrt(x) # important things happen after here. Complex numbers are fine!

They test using inputs like 2 and 4 and everything works (we don’t have automated tests — we suck!). Of course it breaks for -1 now though. This is easy to diagnose when you’ve got a few lines of code but it causes a lot of grief when there’s hundreds…or, horror of horrors, if the **‘from math import *’** was done somewhere in the middle of the source file!

I’m sometimes accused of being obsessive and maybe I’m labouring the point a little but I see this stuff, in various guises, all the time!

So, yeah, don’t use **import ***.

While waiting for the rain to stop before heading home, I started messing around with the heart equation described in an old WalkingRandomly post. Playing code golf with myself, I worked to get the code tweetable. In Python:

from pylab import *

x=r_[-2:2:0.001]

show(plot((sqrt(cos(x))*cos(200*x)+sqrt(abs(x))-0.7)*(4-x*x)**0.01)) pic.twitter.com/gbOTbYSaIG— Mike Croucher (@walkingrandomly) February 8, 2016

In R:

x=seq(-2,2,0.001)

y=Re((sqrt(cos(x))*cos(200*x)+sqrt(abs(x))-0.7)*(4-x*x)^0.01)

plot(x,y)#rstats pic.twitter.com/trpgEnNna4— Mike Croucher (@walkingrandomly) February 8, 2016

I liked the look of the default plot in R so animated it by turning 200 into a parameter that ranged from 1 to 200. The result was this animation:

Finding this animation based on previous tweets oddly mesmerising #rstats pic.twitter.com/e3q6lZqWcP

— Mike Croucher (@walkingrandomly) February 8, 2016

The code for the above isn’t quite tweetable:

options(warn=-1) for(num in seq(1,200,1)) { filename = paste("rplot" ,sprintf("%03d", num),'.jpg',sep='') jpeg(filename) x=seq(-2,2,0.001) y=Re((sqrt(cos(x))*cos(num*x)+sqrt(abs(x))-0.7)*(4-x*x)^0.01) plot(x,y,axes=FALSE,ann=FALSE) dev.off() }

This produces a lot of .jpg files which I turned into the animated gif with ImageMagick:

convert -delay 12 -layers OptimizeTransparency -colors 8 -loop 0 *.jpg animated.gif

The test suite of a project I’m working on is poking around at the extreme edges of the range of double precision numbers. I noticed a difference between Windows and other platforms that I can’t yet fully explain. On Windows, the test suite was pumping out RuntimeWarnings that we don’t see in Linux or Mac. I’ve distilled the issue down to a single numpy command:

np.log1p(1.7976931348622732e+308)

On Windows 7 Anaconda Python 2.3, this gives a RuntimeWarning and returns inf whereas on Linux and Mac OS X it evaluates to 709.78-ish

Numpy version is 1.9.2 in all cases.

**64 bit Windows 7**

Python 2.7.10 |Continuum Analytics, Inc.| (default, May 28 2015, 16:44:52) [MSC v.1500 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information. Anaconda is brought to you by Continuum Analytics. Please check out: http://continuum.io/thanks and https://binstar.org >>> import numpy as np >>> np.log1p(1.7976931348622732e+308) __main__:1: RuntimeWarning: overflow encountered in log1p inf

**64 bit Linux**

Python 2.7.9 (default, Apr 2 2015, 15:33:21) [GCC 4.9.2] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import numpy as np >>> np.log1p(1.7976931348622732e+308) 709.78271289338397

**Mac OS X**

Python 2.7.10 |Anaconda 2.3.0 (x86_64)| (default, May 28 2015, 17:04:42) [GCC 4.2.1 (Apple Inc. build 5577)] on darwin Type "help", "copyright", "credits" or "license" for more information. Anaconda is brought to you by Continuum Analytics. Please check out: http://continuum.io/thanks and https://binstar.org >>> import numpy as np >>> np.log1p(1.7976931348622732e+308) 709.78271289338397

The argument to log1p is getting close to the largest double precision number:

>>> sys.float_info.max 1.7976931348623157e+308

The ever-superb John D. Cook recently found this lovely looking curve in a book he’s currently reading

John posted some Python code that reproduced this curve. I ~~stole~~ borrowed his code, put it in a Jupyter notebook and wrapped it in an interactive widget to allow me to play with the parameters and see what other curves I could come up with. The result looks like this.

If you’d like something where those sliders work, you need to run the notebook I’ve created in Project Jupyter. Here are 2 ways to do that.

- Download the notebook from github
- Method 1: Upload this notebook to Try Jupyter.
- Method 2: Install Anaconda Python on your machine. Launch the notebook and open the file downloaded above.

Once you have the notebook open, click on **Cell**->**Run All **and play with the sliders that pop up.

Other posts about these curves:

- Random cyclic curves – Includes code written in Haskell
- A geogebra applet

I recently found myself in need of a portable install of the Jupyter notebook which made use of a portable install of R as the compute kernel. When you work in institutions that have locked-down managed Windows desktops, such portable installs can be a life-saver! This is particularly true when you are working with rapidly developing projects such as Jupyter and IRKernel.

It’s not perfect but it works for the fairly modest requirements I had for it. Here are the steps I took to get it working.

**Download and install Portable Python**

I downloaded Portable Python 2.7.6.1 from http://portablepython.com/ and installed into a directory called **Portable Python 2.7.6.1**

**Update IPython and install the extra modules we need**

This version of Portable Python comes with a portable IPython instance but it is too old to support alternative kernels. As such, we need to install a newer version.

Open a **cmd.exe** command prompt and navigate to **Portable Python 2.7.6.1\App\Scripts**.

Enter the command

easy_install ipython.exe

You’ll now find that you can launch the ipython.exe terminal from within this directory:

C:\Users\walkingrandomly\Desktop\Portable Python 2.7.6.1\App\Scripts>ipython Python 2.7.6 (default, Nov 10 2013, 19:24:18) [MSC v.1500 32 bit (Intel)] Type "copyright", "credits" or "license" for more information. IPython 3.1.0 -- An enhanced Interactive Python. ? -> Introduction and overview of IPython's features. %quickref -> Quick reference. help -> Python's own help system. object? -> Details about 'object', use 'object??' for extra details. In [1]: exit()

If you try to launch the notebook, however, you’ll get error messages. This is because we haven’t taken care of all the dependencies. Let’s do that now. Ensuring you are still in the **Portable Python 2.7.6.1\App\Scripts **folder, execute the following commands.

easy_install pyzmq easy_install jinja2 easy_install tornado easy_install jsonschema

You should now be able to launch the notebook using

ipython notebook

**Install portable R and IRKernel**

- I downloaded Portable R 3.2 from http://sourceforge.net/projects/rportable/files/ and installed into a directory called
**R-Portable** - Move this directory into the Portable Python directory. It needs to go inside
**Portable Python 2.7.6.1\App**(see this discussion to learn how I discovered that this location was the correct one) - Launch the Portable R executable which should be at
**Portable Python 2.7.6.1\App\R-Portable\R-portable.exe**and install the IRKernel packages by doing

install.packages(c("rzmq","repr","IRkernel","IRdisplay"), repos="http://irkernel.github.io/")

**Install additional R packages**

The version of Portable R I used didn’t include various necessary packages. Here’s how I fixed that.

- Launch the Portable R executable which should be at
**Portable Python 2.7.6.1\App\R-Portable\R-portable.exe**and install the following packagesinstall.packages('digest') install.packages('uuid') install.packages('base64enc') install.packages('evaluate') install.packages('jsonlite')

**Install the R kernel file**

Create the directory structure **Portable Python 2.7.6.1\App\share\jupyter\kernels\R_kernel**

Create a file called **kernel.json** that contains the following

{"argv": ["R-Portable/App/R-Portable/bin/i386/R.exe","-e","IRkernel::main()", "--args","{connection_file}"], "display_name":"Portable R" }

This file needs to go in the **R_kernel** directory created earlier. Note that the kernel location specified in kernel.json uses Linux style forward slashes in the path rather than the backslashes that Windows users are used to. I found that this was necessary for the kernel to work –it was ignored by the notebook otherwise.

**Finishing off**

Everything created so far, including R, is in the folder **Portable Python 2.7.6**

I created a folder called **PortableJupyter** and put the **Portable Python 2.7.6** folder inside it. I also created the folder **PortableJupyter\notebooks** to allow me to carry my notebooks around with the software that runs them.

There is a bug in Portable Python 2.7.6.1 relating to scripts like IPython.exe that have been installed using easy_install. In short, they stop working if you move the directory they’re installed in – breaking portability somewhat! (Details here)

The workaround is to launch Ipython by running the script **Portable Python 2.7.6.1\App\Scripts\ipython-script.py**

I didn’t want to bother with that so created a shortcut in my **PortableJupyter** folder called **Launch notebook.** The target of this shortcut was the following line

%windir%\system32\cmd.exe /c "cd notebooks && "%CD%/Portable Python 2.7.6.1/App\python.exe" "%CD%/Portable Python 2.7.6.1\App\Scripts\ipython-script.py" notebook"

This starts the notebook using the default web browser and puts you in the notebooks directory.

**The pay off**

My folder looks like this:

If I click on the Launch Notebook shortcut, I get a Jupyter session with 2 kernel options

I can choose the Portable R kernel and start using R in the notebook!

If you **really** want to learn the differences between Python 2 and Python 3, I suggest you try converting a non-trivial software project. I’m in the middle of doing one now and am learning all kinds of little gotchas over and above the standard stuff that everyone knows such as changes to print, integer division and removal of xrange.

The most recent one I learned about (about 10 minutes ago) amounted to this

#Python 2.7.6 (default, Mar 22 2014, 22:59:56) [GCC 4.8.2] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> [x for x in range(10)] [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] >>> x 9

compared to

Python 3.4.0 (default, Apr 11 2014, 13:05:11) [GCC 4.8.2] on linux Type "help", "copyright", "credits" or "license" for more information. >>> [x for x in range(10)] [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] >>> x Traceback (most recent call last): File "", line 1, in NameError: name 'x' is not defined

This is well documented (This StackOverflow Q+A is great!) but I didn’t know about it and, in the code I was looking at, there was a heck of a lot of complication between the list comprehension and when ‘x’ was used. As such, it took me a while to figure out!

Another change that had me scratching my head for a while is the fact that Python 3 ignores the __metaclass__ hook. I didn’t know this little fact but discovered it while debugging failing tests!

Of course, once you know these little gotchas, you’ll probably not be caught out by them again in your next Python 2->Python 3 porting project but they got me wondering…..

**What changes from Python 2->Python 3 have really caught you out at some point?**

Linear Algebra – Foundations to Frontiers (or LAFF to its friends) is a popular, high quality and free MOOC that, as the title suggests, teaches aspects of linear algebra in a way that takes the student from the very basics through to some cutting edge techniques. I worked through much of it last year and thoroughly enjoyed the approach it took — focusing on programming aspects from the very beginning. The course authors are also among the developers of the FLAME project, a high performance linear algebra library, and one of the interesting aspects of the LAFF course (for me at least) was that it taught linear algebra in a way that also allowed you to understand the approaches used in the algorithms behind FLAME.

Last year, all of the programming assignments in LAFF were done in Python, making use of the IPython notebook. This year, the software stack will be different and will be based on MATLAB. I understand that everyone who signs up to LAFF will be **able to get a free MATLAB license from Mathworks for the duration of the course**. Understandably, this caused quite a bit of discussion between the LAFF team and software/language geeks like me. In a recent Facebook thread, I asked about the switch and received the reply

*‘MATLAB will be free during the course. There are open source equivalents, but Mathworks staff is supporting the use of MATLAB (staff for us). There were some who never got the IPython notebooks to work properly. We are really excited at the opportunity to innovate again and perhaps clear up snags in the programming issues we had. It was complicated to support IPython on all of the operating systems and machines that participants use. MATLAB promises to be easier and will allow us again to concentrate on the Linear Algebra’ – LAFF UTx*

I’m sufficiently interested in this change from IPython to MATLAB that I’ll be signing up for the course again this year and I encourage you to do the same — I believe that the programming-centric teaching approach taken by LAFF is extremely well done and your time would be well-spent working through the course.

The course starts on 28th January 2015 so sign up now!

Here’s the trailer for last year’s course.

Given a symmetric matrix such as

What’s the nearest correlation matrix? A 2002 paper by Manchester University’s Nick Higham which answered this question has turned out to be rather popular! At the time of writing, Google tells me that it’s been cited 394 times.

Last year, Nick wrote a blog post about the algorithm he used and included some MATLAB code. He also included links to applications of this algorithm and implementations of various NCM algorithms in languages such as MATLAB, R and SAS as well as details of the superb commercial implementation by The Numerical algorithms group.

I noticed that there was no Python implementation of Nick’s code so I ported it myself.

Here’s an example IPython session using the module

In [1]: from nearest_correlation import nearcorr In [2]: import numpy as np In [3]: A = np.array([[2, -1, 0, 0], ...: [-1, 2, -1, 0], ...: [0, -1, 2, -1], ...: [0, 0, -1, 2]]) In [4]: X = nearcorr(A) In [5]: X Out[5]: array([[ 1. , -0.8084125 , 0.1915875 , 0.10677505], [-0.8084125 , 1. , -0.65623269, 0.1915875 ], [ 0.1915875 , -0.65623269, 1. , -0.8084125 ], [ 0.10677505, 0.1915875 , -0.8084125 , 1. ]])

This module is in the early stages and there is a lot of work to be done. For example, I’d like to include a lot more examples in the test suite, add support for the commercial routines from NAG and implement other algorithms such as the one by Qi and Sun among other things.

Hopefully, however, it is just good enough to be useful to someone. Help yourself and let me know if there are any problems. Thanks to Vedran Sego for many useful comments and suggestions.

- github repository for the Python NCM module, nearest_correlation
- Nick Higham’s original MATLAB code.
- NAG’s commercial implementation – callable from C, Fortran, MATLAB, Python and more. A superb implementation that is significantly faster and more robust than this one!