I was recently invited to give a talk at the Sheffield R Users Group and decided to give a brief overview of how R relates to other technologies. Subjects included Mathematica’s integration of R, Intel’s compilers, Math Kernel Library and how they can make R faster and a range of Microsoft technologies including R Tools for Visual Studio, Microsoft R Open and the MRAN for reproducibility. I also touched upon the NAG Library, Maple’s code generation for R, GPUs and Spark.
- Slide deck: _ and R: How R relates to other technologies
Did I miss anything? If you were to give a similar talk, what might you have included?
I was in a funk!
Not long after joining the University of Sheffield, I had helped convince a raft of lecturers to switch to using the Jupyter notebook for their lecturing. It was an easy piece of salesmanship and a whole lot of fun to do. Lots of people were excited by the possibilities.
The problem was that the University managed desktop was incapable of supporting an instance of the notebook with all of the bells and whistles included. As a cohort, we needed support for Python 2 and 3 kernels as well as R and even Julia. The R install needed dozens of packages and support for bioconductor. We needed LateX support to allow export to pdf and so on. We also needed to keep up to date because Jupyter development moves pretty fast! When all of this was fed into the managed desktop packaging machinery, it died. They could give us a limited, basic install but not one with batteries included.
I wanted those batteries!
In the early days, I resorted to strange stuff to get through the classes but it wasn’t sustainable. I needed a miracle to help me deliver some of the promises I had made.
Miracle delivered – SageMathCloud
During the kick-off meeting of the OpenDreamKit project, someone introduced SageMathCloud to the group. This thing had everything I needed and then some! During that presentation, I could see that SageMathCloud would solve all of our deployment woes as well as providing some very cool stuff that simply wasn’t available elsewhere. One killer-application, for example, was Google-docs-like collaborative editing of Jupyter notebooks.
I fired off a couple of emails to the lecturers I was supporting (“Everything’s going to be fine! Trust me!”) and started to learn how to use the system to support a course. I fired off dozens of emails to SageMathCloud’s excellent support team and started working with Dr Marta Milo on getting her Bioinformatics course material ready to go.
TL; DR: The course was a great success and a huge part of that success was the SageMathCloud platform
Giving back – A tutorial for lecturers on using SageMathCloud
I’m currently working on a tutorial for lecturers and teachers on how to use SageMathCloud to support a course. The material is licensed CC-BY and is available at https://github.com/mikecroucher/SMC_tutorial
If you find it useful, please let me know. Comments and Pull Requests are welcome.
Some numbers have something to say. Take the following, rather huge number, for example:
This number wants to tell you ‘Happy Holidays’, it just needs a little code to help it out. In Maple, this code is:
n := 18532529104068264480353131238404133659515101876112780772576330806424607039523076495646885634139967048751461005248758632306757568791464282975763655513845614593843019187655175699232981800640177552230121901623724542589154403221854439086181827152684585874764890938291566599716051702867105827305295569713835061785617174899049034655848488352249531058730460687733248824488684969031964141214711866905054239875930383262767247976845232997188307342087743859641917976242185446451606034726912968063437466250120212904972794971185874579656679344857677824: modnew := proc (x, y) options operator, arrow; x-y*floor(x/y) end proc: tupper := piecewise(1/2 < floor(modnew(floor((1/17)*y)*2^(-17*floor(x)-modnew(floor(y), 17)), 2)), 0, 1): points := [seq([seq(tupper(x, y), y = n+16 .. n, -1)], x = 105 .. 0, -1)]: plots:-listdensityplot(points, scaling = constrained, view = [0 .. 106, 0 .. 17], style = patchnogrid, size = [800, 800]);
The result is the following plot
Thanks to Samir for this one!
The mathematics is based on a generalisation of Tupper’s self-referential formula.
There’s more than one way to send a message with an equation, however. Here’s an image of one I discovered a few years ago — The equation that says Hi
Way back in 2008, I wrote a few blog posts about using mathematical software to generate christmas cards:
- Mathematical Christmas Cards – Walking Randomly Christmas Challenge
- A MATLAB Christmas card
- Christmas geetings – SAGE style
I’ve started moving the code from these to a github repository. If you’ve never contributed to an open source project before and want some practice using git or github, feel free to write some code for a christmas message along similar lines and submit a Pull Request.
The test suite of a project I’m working on is poking around at the extreme edges of the range of double precision numbers. I noticed a difference between Windows and other platforms that I can’t yet fully explain. On Windows, the test suite was pumping out RuntimeWarnings that we don’t see in Linux or Mac. I’ve distilled the issue down to a single numpy command:
On Windows 7 Anaconda Python 2.3, this gives a RuntimeWarning and returns inf whereas on Linux and Mac OS X it evaluates to 709.78-ish
Numpy version is 1.9.2 in all cases.
64 bit Windows 7
Python 2.7.10 |Continuum Analytics, Inc.| (default, May 28 2015, 16:44:52) [MSC v.1500 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information. Anaconda is brought to you by Continuum Analytics. Please check out: http://continuum.io/thanks and https://binstar.org >>> import numpy as np >>> np.log1p(1.7976931348622732e+308) __main__:1: RuntimeWarning: overflow encountered in log1p inf
64 bit Linux
Python 2.7.9 (default, Apr 2 2015, 15:33:21) [GCC 4.9.2] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import numpy as np >>> np.log1p(1.7976931348622732e+308) 709.78271289338397
Mac OS X
Python 2.7.10 |Anaconda 2.3.0 (x86_64)| (default, May 28 2015, 17:04:42) [GCC 4.2.1 (Apple Inc. build 5577)] on darwin Type "help", "copyright", "credits" or "license" for more information. Anaconda is brought to you by Continuum Analytics. Please check out: http://continuum.io/thanks and https://binstar.org >>> import numpy as np >>> np.log1p(1.7976931348622732e+308) 709.78271289338397
The argument to log1p is getting close to the largest double precision number:
>>> sys.float_info.max 1.7976931348623157e+308
It is possible to write quick, interactive demonstrations in a variety of languages these days. Functions such as Mathematica’s Manipulate, Sage Math’s interact and IPython’s interact allow programmers to write functional graphical user interfaces with just a few lines of code.
Earlier this week, I hosted a session in the Faculty of Engineering at The University of Sheffield where Maplesoft showed us, among other things, their version of this technology. This blog post is an extension of my notes from this part of the session.
- The Maple Worksheet for this blog post is available on github.
The series command expands a function as a power series around a point. For example, let’s expand sin(x) as a power series around the point x=0.
series(sin(x), x = 0, 10)
plot(series(sin(x), x = 0, 10), x = -2*Pi .. 2*Pi, y = -3 .. 3) Warning, unable to evaluate the function to numeric values in the region; see the plotting command's help page to ensure the calling sequence is correct
This is because the output of the series command is a series data structure — something that the plot function cannot handle. We can, however, convert this to a polynomial which is something that the plot function can handle
convert(series(sin(x), x = 0, 10), polynom)
Wrapping the above with plot gives:
plot(convert(series(sin(x), x = 0, 10), polynom), x = -2*Pi .. 2*Pi, y = -3 .. 3);
plot([sin(x), convert(series(sin(x), x = 0, 10), polynom)], x = -2*Pi .. 2*Pi, y = -3 .. 3);
It would be nice if we could see how the approximation varies as we vary the number of terms in the expansion. Change the value 10 to a parameter a, pass the whole thing to the Explore function and we get an interactive widget.
Explore(plot([sin(x), convert(series(sin(x), x = 0, a), polynom)], x = -2*Pi .. 2*Pi, y = -3 .. 3), parameters = [a = 2 .. 20]);
Here’s a screenshot of it:
Explore(plot([sin(x), convert(series(sin(x), x = b, a), polynom)], x = -2*Pi .. 2*Pi, y = -3 .. 3), parameters = [a = 2 .. 20, b = -2*Pi .. 2*Pi]);
To see what this looks like, open the companion worksheet in Maple.
Adding labels to the sliders
We can change the labels on the sliders as follows
Explore(plot([sin(x), convert(series(sin(x), x = b, a), polynom)], x = -2*Pi .. 2*Pi, y = -3 .. 3), parameters = [[a = 2 .. 20, label = `Number Of Terms`], [b = -2*Pi .. 2*Pi, label = `Expansion location`]]);
To see what this looks like, open the companion worksheet in Maple.
Adding initial values
Finally, let’s set some starting values for each slider
Explore(plot([sin(x), convert(series(sin(x), x = b, a), polynom)], x = -2*Pi .. 2*Pi, y = -3 .. 3), parameters = [[a = 2 .. 20, label = `Number Of Terms`], [b = -2*Pi .. 2*Pi, label = `Expansion location`]], initialvalues = [a = 2, b = 1]);
The resulting interactive widget looks like this:
Not bad for one line of code!
Upload to the Maple Cloud
At The University of Sheffield, we are lucky because all of our staff and students have access to Maple on both university-owned and personally-owned equipment. If your audience isn’t as fortunate, they can access the resulting worksheet on the Maple Cloud.
Update: 2nd July 2015 The code in github has moved on a little since this post was written so I changed the link in the text below to the exact commit that produced the results discussed here.
Imagine that you are a very new MATLAB programmer and you have to create an N x N matrix called A where A(i,j) = i+j
Your first attempt at a solution might look like this
N=2000 % Generate a N-by-N matrix where A(i,j) = i + j; for ii = 1:N for jj = 1:N A(ii,jj) = ii + jj; end end
On my current machine (Macbook Pro bought in early 2015), the above loop takes 2.03 seconds. You might think that this is a long time for something so simple and complain that MATLAB is slow. The person you complain to points out that you should preallocate your array before assigning to it.
N=2000 % Generate a N-by-N matrix where A(i,j) = i + j; A=zeros(N,N); for ii = 1:N for jj = 1:N A(ii,jj) = ii + jj; end end
This now takes 0.049 seconds on my machine – a speed up of over 41 times! MATLAB suddenly doesn’t seem so slow after all.
Word gets around about your problem, however, and seasoned MATLAB-ers see that nested loop, make a funny face twitch and start muttering ‘vectorise, vectorise, vectorise’. Emails soon pile in with vectorised solutions
% Method 1: MESHGRID. [X, Y] = meshgrid(1:N, 1:N); A = X + Y;
This takes 0.025 seconds on my machine — a healthy speed-up on the loop-with-preallocation solution. You have to understand the meshgrid command, however, in order to understand what’s going on here. It’s still clear (to me at least) what its doing but not as clear as the nice,obvious double loop. Call me old fashioned but I like loops…I understand them.
% Method 2: Matrix multiplication. A = (1:N).' * ones(1, N) + ones(N, 1) * (1:N);
This one is MUCH harder to read but you don’t worry about it too much because at 0.032 seconds it’s slower than meshgrid.
% Method 3: REPMAT. A = repmat(1:N, N, 1) + repmat((1:N).', 1, N);
This one appears to be interesting! At 0.009 seconds, it’s the fastest so far – by a healthy amount!
% Method 4: CUMSUM. A = cumsum(ones(N)) + cumsum(ones(N), 2);
Coming in at 0.052 seconds, this cumsum solution is slower than the preallocated loop.
% Method 5: BSXFUN. A = bsxfun(@plus, 1:N, (1:N).');
Ahhh, bsxfun or ‘The Widow-maker function’ as I sometimes refer to it. Responsible for some of the fastest and most unreadable vectorised MATLAB code I’ve ever written. In this case, it brings execution time down to 0.0045 seconds.
Whenever I see something that can be vectorised with a repmat, I try to figure out if I can rewrite it as a bsxfun. The result is usually horrible to look at so I tend to keep the original loop commented out above it as an explanation! This particular example isn’t too bad but bsxfun can quickly get hairy.
Loops in MATLAB aren’t anywhere near as bad as they used to be thanks to advances in JIT compilation but it can often pay, speed-wise, to switch to vectorisation. The price you often pay for this speed-up is that vectorised code can become very difficult to read.
If you’d like the code I ran to get the timings above, it’s on github (link refers to the exact commit used for this post) . Here’s the output from the run I referred to in this post.
Original loop time is 2.025441 Preallocate and loop is 0.048643 Meshgrid time is 0.025277 Matmul time is 0.032069 Repmat time is 0.009030 Cumsum time is 0.051966 bsxfun time is 0.004527
- MATLAB Version: 2015a
- Early 2015 Macbook Pro with 16Gb RAM
- CPU: 2.8Ghz quad core i7 Haswell
This post is based on a demonstration given by Mathwork’s Ken Deeley during a recent session at The University of Sheffield.
I’ve been working at The University of Manchester for almost a decade and will be leaving at the end of this week! A huge part of my job was to support a major subset of Manchester’s site licensed application software portfolio so naturally I’ve made use of a lot of it over the years. As of February 20th, I will no longer be entitled to use any of it!
This article is the second in a series where I’ll look at some of the software that’s become important to me and what my options are on leaving Manchester. Here, I consider MATLAB – a technical computing environment that has come to dominate my career at Manchester. For the last 10 years, I’ve used MATLAB at least every week, if not most days.
I had a standalone license for MATLAB and several toolboxes – Simulink, Image Processing, Parallel Computing, Statistics and Optimization. Now, I’ve got nothing! Unfortunately for me, I’ve also got hundreds of scripts, mex files and a few Simulink models that I can no longer run! These are my options:
Go somewhere else that has a MATLAB site license
- I’ll soon be joining the University of Sheffield who have a MATLAB site license. A great option if you can do it.
Use something else
- Octave – Octave is a pretty good free and open source clone of MATLAB and quite a few of my programs would work without modification. Others would require some rewriting and, in some cases, that rewriting could be extensive! There is no Simulink support.
- Scilab – It’s free and it’s MATLAB-like-ish but I’d have to rewrite my code most of the time. I could also port some of my Simulink models to Scilab as was done in this link.
- Rewrite all my code to use something completely different. What I’d choose would depend on what I’m trying to achieve but options include Python, Julia and R among others.
- If all I needed was the ability to run a few MATLAB applications I’d written, I could compile them using the MATLAB Compiler and keep the result. The whole point of the MATLAB Compiler is to distribute MATLAB applications to those who don’t have a MATLAB license. Of course once I’ve lost access to MATLAB itself, debugging and adding features will be um……tricky!
Get a hobbyist license for MATLAB
- MATLAB Home – This is the full version of MATLAB for hobbyists. Writing a non-profit blog such as WalkingRandomly counts as a suitable ‘hobby’ activity so I could buy this license. MATLAB itself for 85 pounds with most of the toolboxes coming in at an extra 25 pounds each. Not bad at all! The extra cost of the toolboxes would still lead me to obsess over how to do things without toolboxes but, to be honest, I think that’s an obsession I’d miss if it weren’t there! Buying all of the same toolboxes as I had before would end up costing me a total of £210+VAT.
- Find a MOOC that comes with free MATLAB – Mathworks make MATLAB available for free for students of some online courses such as the one linked to here. Bear in mind, however, that the license only lasts for the duration of the course.
If I were to stay in academia but go to an institution with no MATLAB license, I could buy myself an academic standalone license for MATLAB and the various toolboxes I’m interested in. The price lists are available at http://uk.mathworks.com/pricing-licensing/
For reference, current UK academic prices are
- MATLAB £375 + VAT
- Simulink £375 + VAT
- Standard Toolboxes (statistics, optimisation, image processing etc) £150 +VAT each
- Premium Toolboxes (MATLAB Compiler, MATLAB Coder etc) – Pricing currently not available
My personal mix of MATLAB, Simulink and 4 toolboxes would set me back £1350 + VAT.
If I were to use MATLAB professionally and outside of academia, I’d need to get a commercial license. Prices are available from the link above which, at the time of writing, are
- MATLAB £1600 +VAT
- Simulink £2400 + VAT
- Standard Toolboxes £800 +VAT each
- Premium Toolboxes – Pricing currently not available
My personal mix of MATLAB, Simulink and 4 toolboxes would set me back £7200 + VAT.
If anyone does find themselves in a situation where they have MATLAB code and no means to run it, then they can always try contacting MathWorks and ask for help in finding a solution.
I’ve been working at The University of Manchester for almost a decade and will be leaving in just less than 3 weeks time! A huge part of my job was to support a major subset of Manchester’s site licensed application software portfolio so naturally I’ve made use of a lot of it over the years. As of February 20th, I will no longer be entitled to use any of it!
This article is the first in a series where I’ll look at some of the software that’s become important to me and what my options are on leaving Manchester.
Here, I consider Mathematica – a computer algebra system and technical computing environment that I’m very fond of. I’ve been a Mathematica user for over 15 years and yet, suddenly, I find myself license-less! So much code, so much time invested! What to do?
Options for all use cases
- Before leaving University, contact the administrator of your site license. It could be that you are entitled to a discount on buying one of the various licenses on offer.
- Use the CDF Player – With this free tool, You’ll be able to look at and interact (at least partially) with Mathematica notebooks.
- Re-write all code to use something else. Which language to use is open to massive debate but the closest open source systems to Mathematica’s notebook-like interface are Jupyter (previously IPython) and Sage. The languages are, of course, rather different though!
General mucking around!
- Buy the home edition – The home edition of Mathematica can be used for non-professional and non-academic purposes and, at the time of writing, costs £195 as a one-off cost or £95 per year.
- Use Mathematica online: Home – Same rules as the home edition above but it’s a cloud-based, online version. Currently costs £95 per year.
- Buy a Raspberry Pi – The Raspberry Pi comes with a free version of Mathematica! This means that you can buy an entire computer AND a copy of Mathematica for less than the standard home-use license. I had a play with Mathematica on the Raspberry Pi just over a year ago and it was very nice. Now that the faster, more powerful Raspberry Pi 2 has been released this option is even more compelling!
If you want to use Mathematica in an academic environment that doesn’t have a site license, you’ll need to purchase an individual academic license. At the time of writing, that will cost £860 + VAT.
There are various grades of professional license and the cost varies according to how many compute kernels you need or Wolfram Alpha API calls you want to make. Current prices start at £2,035 +VAT
RLink is Mathematica’s interface to the R language – a feature that has been extremely popular since its debut in Mathematica version 9. It’s a great package but has one or two issues. For example, RLink makes use of a built in version of R which is currently stuck at the rather old version 2.14. Official support for the use of external versions of R and adding third-party libraries varies by operating system and version of Mathematica. Windows support is great — OS X support, not so much.
Expert Mathematica user Szabolcs Horvát has written the definitive guide on how to get RLink up and running with the latest version of R on all three major operating systems, building on earlier work by Leonid Shifrin and members of the Mathematica Stack Exchange community. Thanks to this work, we can now enjoy any version of R we like with Mathematica!