Archive for the ‘programming’ Category

September 25th, 2013

I support scientific applications at The University of Manchester (see my LinkedIn profile if you’re interested in the details) and part of my job involves working on code written by researchers in a variety of languages.  When I say ‘variety’ I really mean it – MATLAB, Mathematica, Python, C, Fortran, Julia, Maple, Visual Basic and PowerShell are some languages I’ve worked with this month for instance.

Having to juggle the semantics of so many languages in my head sometimes leads to momentary confusion when working on someone’s program.  For example, I’ve been doing a lot of Python work recently but this morning I was hacking away on someone’s MATLAB code.  Buried deep within the program, it would have been very sensible to be able to do the equivalent of this:

a=rand(3,3)

a =
    0.8147    0.9134    0.2785
    0.9058    0.6324    0.5469
    0.1270    0.0975    0.9575

>> [x,y,z]=a(:,1)

Indexing cannot yield multiple results.

That is, I want to be able to take the first column of the matrix a and broadcast it out to the variables x,y and z. The code I’m working on uses MUCH bigger matrices and this kind of assignment is occasionally useful since the variable names x,y,z have slightly more meaning than a(1,3), a(2,3), a(3,3).

The only concise way I’ve been able to do something like this using native MATLAB commands is to first convert to a cell. In MATLAB 2013a for instance:

>> temp=num2cell(a(:,1));
>> [x y z] = temp{:}

x =
    0.8147

y =
    0.9058

z =
    0.1270

This works but I think it looks ugly and introduces conversion overheads. The problem I had for a short time is that I subconsciously expected multiple assignment to ‘Just Work’ in MATLAB since the concept makes sense in several other languages I use regularly.

Python:

from pylab import rand
a=rand(3,3)
[a,b,c]=a[:,0]

Mathematica:

a = RandomReal[1, {3, 3}]
{x,y,z}=a[[All,1]]

Julia:

a=rand(3,3);
(x,y,z)=a[:,1]

I’ll admit that I don’t often need this construct in MATLAB but it would definitely be occasionally useful. I wonder what other opinions there are out there? Do you think multiple assignment is useful (in any language)?

September 18th, 2013

While on the train to work this morning, I wondered which English words have the most anagrams that are also valid English words. So, I knocked up few lines of Mathematica code and came up with 4 sets of 7:

{{"ates", "east", "eats", "etas", "sate", "seat", "teas"},
{"pares", "parse", "pears", "rapes", "reaps", "spare", "spear"}, 
{"capers", "crapes", "pacers", "parsec", "recaps", "scrape", "spacer"},
{"carets", "caster", "caters", "crates", "reacts", "recast", "traces"}}

So, according to my program (and Mathematica’s dictionary), the best you can do is 7.  I’m not going to post the code until later because I don’t want to influence the challenge which is ‘Write a program in your language of choice that queries a dictionary to find which English words have the most anagrams that are also valid English words.’

August 5th, 2013

While working on someone’s MATLAB code today there came a point when it was necessary to generate a vector of powers.  For example, [a a^2 a^3....a^10000] where a=0.999

a=0.9999;
y=a.^(1:10000);

This isn’t the only way one could form such a vector and I was curious whether or not an alternative method might be faster. On my current machine we have:

>> tic;y=a.^(1:10000);toc
Elapsed time is 0.001526 seconds.
>> tic;y=a.^(1:10000);toc
Elapsed time is 0.001529 seconds.
>> tic;y=a.^(1:10000);toc
Elapsed time is 0.001716 seconds.

Let’s look at the last result in the vector y

>> y(end)
ans =
   0.367861046432970

So, 0.0015-ish seconds to beat.

>> tic;y1=cumprod(ones(1,10000)*a);toc
Elapsed time is 0.000075 seconds.
>> tic;y1=cumprod(ones(1,10000)*a);toc
Elapsed time is 0.000075 seconds.
>> tic;y1=cumprod(ones(1,10000)*a);toc
Elapsed time is 0.000075 seconds.

soundly beaten! More than a factor of 20 in fact. Let’s check out that last result

>> y1(end)
ans =
   0.367861046432969

Only a difference in the 15th decimal place–I’m happy with that. What I’m wondering now, however, is will my faster method ever cause me grief?

This is only an academic exercise since this is not exactly a hot spot in the code!

August 2nd, 2013

In common with many higher educational establishments, the University I work for has site licenses for a wide variety of scientific software applications such as Labview, MATLAB, Mathematica, NAG, Maple and Mathcad– a great boon to students and researcher who study and work there.   The computational education of a typical STEM undergraduate will include exposure to at least some of these systems along with traditional programming languages such as Java, C or Fortran and maybe even a little Excel when times are particularly bad!

Some argue that such exposure to multiple computational systems is a good thing while others argue that it leads to confusion and a ‘jack of all trades and master of none situation.’  Those who take the latter viewpoint tend to want to standardize on one system or other depending on personal preferences and expertise.

MATLAB, Python, Fortran and Mathematica are a few systems I’ve seen put forward over the years with the idea being that students will learn the basics of one system in their first few weeks and then the entire subject curriculum will be interwoven with these computational skills.  In this way, students can use their computational skills as an aid to deeper subject understanding without getting bogged down with the technical details of several different computational systems.  As you might expect, software vendors are extremely keen on this idea and will happily parachute in a few experts to help universities with curriculum development for free since this will possibly lead to their system being adopted.

Maybe we’ll end up with electrical engineers who’ve only ever seen Labview, mathematicians who’ve only ever used Maple, mechanical engineers who’ve only ever used MATLAB and economists who can only use Excel.  While the computational framework(s) used to teach these subjects are less important than the teaching of the subject itself, I firmly believe that part of a well-rounded, numerate education should include exposure to several computational systems and such software mono-cultures should be avoided at all costs.

Part of the reason for writing this post is to ask what you think so comments are very welcome.

June 18th, 2013

I was recently chatting to a research group who were considering moving to Python from MATLAB for some of their research software output.  One team member was very worried about Python’s use of indentation to denote blocks of code and wondered if braces would ever find their way into the language?  Another team member pointed out that this was extremely unlikely and invited us to attempt to import braces from the __future__ module.

>>> from __future__ import braces
  File "", line 1
SyntaxError: not a chance
May 22nd, 2013

Earlier this year I was awarded a fellowship from the software sustainability institute, an organization that works to improve all aspects of research software.  During their recent collaborations workshop in Oxford, it occurred to me that I was aware of only a relatively tiny number of software projects at my own institution, The University of Manchester. I decided to change that and started contacting our researchers to see what software they had released freely to the world as part of their research activities.

Research software comes in many forms; from small but useful MATLAB, Python or R scripts with just a handful of users and one developer right through to fully-fledged applications used by large communities of researchers and supported by teams of specialist developers.  I’m interested in knowing about all of it.  After all, we live in a time when even a mistake in an Excel spreadsheet can change the world.

The list below is what’s been sent to me so far and is a mirror of an internal list that’s been doing the rounds at Manchester.  I’ll update it as more information becomes available.  If you are at Manchester and know of a project that I’ve missed, feel free to contact me.

Centre for Imaging Sciences

  • BoneFinder – BoneFinder is a fully automatic segmentation tool to extract bone contours from 2D radiographs. It is written in C++ and is available for Linux and Windows.

Faculty of Life Sciences

  • antiSMASH – Genome annotation tool for secondary metabolite gene clusters.
  • MultiMetEval – Flux-balance analysis tool for comparative and multi-objective analysis of genome-scale metabolic models.
  • mzMatch/mzmatch.R/mzMatch.ISO – Comprehensive LC/MS metabolomics data processing toolbox.
  • Rank Products – Statistical tool for the identification of differentially expressed entities in molecular profiles.

Health Informatics

  • openCDMS – The openCDMS project is a community effort to develop a robust, commercial-grade, full-featured and open source clinical data management system for studies and trials.

IT Services

  • idiffh – Research software can produce huge text files (e.g. logs). The GNU diff program needs to read the files into memory and therefore has an upper bound on file size. idiffh might only use a simple heuristic but is only bounded by the maximum file size (and free file store).
  • ParaFEM – A portable library for parallel finite element analysis. Contributions from MACE, SEAES, School of Materials.
  • Shadow – This is an Apple Mac OS X shell level application that can monitor Dropbox shared folders for file deletions and restore them.
  • The Reality Grid Steering Library – A software library for steering and monitoring numerical simulations, APIs available for Fortran/C++/Java and steering clients available for installation on laptops and mobile devices.  Developed in collaboration with the School of computer science.

Manchester Institute of Biotechnology

  • Copasi – COPASI is a software application for simulation and analysis of biochemical networks and their dynamics.
  • Condor Copasi – Condor-COPASI is a web-based interface for integrating COPASI with the Condor High Throughput Computing (HTC) environment.

School of Chemical Engineering & Analytical Science

School of Chemistry

  • DOSY Toolbox – A free, open source programme for processing PFG NMR diffusion data (a.k.a. DOSY data).

School of Computer Science

  • Clinical NERC – Clinical NERC is a simple customizable state-of-the-art named entity recognition, and classification software for clinical concepts or entities.
  • EasyChair – EasyChair is a free conference management system.
  • GPC – The University of Manchester GPC library is a flexible and highly robust polygon set operations library for use with C, C#, Delphi, Java, Perl, Python, Haskell, Lua, VB.Net (and other) applications.
  • HiPLAR – High Performance Linear Algebra in R.  A collaboration between Manchester and Imperial.
  • iProver – 7 times word champion in theorem proving.
  • INSEE – Interconnection Networks Simulation and Evaluation Environment
  • KUPKB (The Kidney & Urinary Pathway Knowledge Base) – The KUPKB is a collection of omics datasets that have been extracted from scientific publications and other related renal databases. The iKUP browser provides a single point of entry for you to query and browse these datasets.
  • ManTIME – ManTIME is an open-source machine learning pipeline for the extraction of temporal expressions from general domain texts.
  • MethodBox – MethodBox provides a simple, easy to use environment for browsing and sharing surveys, methods and data.
  • myExperiment – myExperiment makes it easy to find, use and share scientific workflows and other Research Objects, and to build communities.
  • Open PHACTS Discovery Platform – Freely available, this platform integrates pharmacological data from a variety of information resources and provides tools and services to question this integrated data to support pharmacological research.
  • OWL API – A Java API and reference implementation for creating, manipulating and serialising OWL Ontologies. The latest version of the API is focused towards OWL 2. The OWL API is open source and is available under either the LGPL or Apache Licenses.
  • OWL Tools – a collection of tools for working with OWL ontologies
  • OWL Webapps – a collection of web apps for working with OWL ontologies
  • RightField – Semantic annotation by stealth. RightField is tool for adding ontology term selection to Excel spreadsheets to create templates which are then reused by Scientists to collect and annotate their data without any need to understand, or even be aware of, RightField or the ontologies used. Later the annotations can be collected as RDF
  • SEEK – SEEK is a web-based platform, with associated tools, for finding, sharing and exchanging Data, Models and Processes in Systems Biology.
  • ServiceCatalographer – ServiceCatalographer is an open-source Web-based platform for describing, annotating, searching and monitoring REST and SOAP Web services.
  • Simple Spreadsheet Extractor – A simple ruby gem that provides a facility to read an XLS or XLSX Excel spreadsheet document and produce an XML representation of its content.
  • Taverna – Taverna is an open source and domain-independent Workflow Management System – a suite of tools used to design and execute scientific workflows and aid in silico experimentation.
  • TERN – TERN is a temporal expressions identification and normalisation software; designed for clinical data.
  • Utopia Documents – Utopia Documents brings a fresh new perspective to reading the scientific literature, combining the convenience and reliability of the PDF with the flexibility and power of the web.

School of Electrical and Electronic Engineering

  • Automatic classification of eye fixations - Identify fixations and saccades from point-of-gaze data without parametric assumptions or expert judgement. MATLAB code.
  • Bootstrap Threshold Software – Estimate a threshold and a robust SD from stimulus-response data with a normal cumulative distribution.  Written in C.
  • LDLTS – Laplace transform Transient Processor and Deep Level Spectroscopy.  A collaboration between Manchester and the Institute of Physics Polish Academy of Sciences in Warsaw
  • Model-Free Psychometric Function Software – Fit a stimulus-response curve and estimate a threshold and SD without a parametric model
  • Raspbian – Raspbian is a free operating system based on Debian optimized for the Raspberry Pi hardware.
  • Signal Wizard – Digital signal processing software.

School of Mathematics

  • EIDORS – Electrical Impedance Tomography and Diffuse Optical Tomography Reconstruction Software.
  • Fractional Matrix Powers – MATLAB functions to compute fractional matrix powers with Frechet derivatives and condition number estimate
  • IFISS – IFISS is a graphical package for the interactive numerical study of incompressible flow problems which can be run under Matlab or Octave.
  • MARKOVFUNMV – An adaptive black-box rational Arnoldi method for the approximation of Markov functions.
  • Matrix Computation Toolbox – The Matrix Computation Toolbox is a collection of MATLAB M-files containing functions for constructing test matrices, computing matrix factorizations, visualizing matrices, and carrying out direct search optimization.
  • Matrix Function Toolbox – The Matrix Function Toolbox is a MATLAB toolbox connected with functions of matrices.
  • Matrix Logarithm – MATLAB Files. Two functions for computing the matrix logarithm by the inverse scaling and squaring method.
  • Matrix Logarithm with Frechet Derivatives and Condition Number – MATLAB files
  • NLEVP A Collection of Nonlinear Eigenvalue Problems – This MATLAB Toolbox provides a collection of nonlinear eigenvalue problems.
  • oomph-lib – An object-oriented, open-source finite-element library for the simulation of multiphysics problems.
  • Simfit – Free software for simulation, curve fitting, statistics, and plotting.
  • SmallOverlap – SmallOverlap is a GAP 4 package which implements new, highly efficient algorithms for computing with finitely presented semigroups and monoids whose defining presentations satisfy small overlap conditions (in the sense of J.H.Remmers)
  • Symmetric eigenvalue decomposition and the SVD – MATLAB files

School of Mechanical, Aerospace and Civil Engineering (MACE)

  • DualSPHysics – DualSPHysics is based on the Smoothed Particle Hydrodynamics model named SPHysics and makes use of GPUs.
  • FLIGHT – FLIGHT specialises in the prediction and modelling of fixed-wing aircraft performance
  • SPHYSICS – SPHysics is a platform of Smoothed Particle Hydrodynamics (SPH) codes inspired by the formulation of Monaghan (1992) developed jointly by researchers at the Johns Hopkins University (U.S.A.), the University of Vigo (Spain), the University of Manchester (U.K.) and the University of Rome La Sapienza (Italy).
  • SWAB Online – Innovative and User Friendly Web Application in Running Fortran-based 1-D Shallow Water near Shore Wave Simulation Modelling

School of Physics and Astronomy

  • Herwig++ – Herwig++ is a new event generator, written in C++, built on the experience gained with the well-known event generator HERWIG, which was used by the particle physics community for nearly 30 years. Herwig++ is used by the LHC experiments to predict the results of their collisions and as an essential component of their data analysis. It is developed by a consortium of four main nodes, including Manchester, and its published write-up has been cited over 500 times.
  • im3shape - Im3shape measures the shapes of galaxies in astronomical survey images, taking into account that they have been distorted by a point-spread function.
  • MAD8/madinput – Mathematica code and MAD8 installer for performing optics calculations for particle accelerator design.
  • PolyParticleTracker – MATLAB code for particle tracking against complex optical backgrounds

April 23rd, 2013

I was recently working on some MATLAB code with Manchester University’s David McCormick.  Buried deep within this code was a function that was called many,many times…taking up a significant amount of overall run time.  We managed to speed up an important part of this function by almost a factor of two (on his machine) simply by inserting two brackets….a new personal record in overall application performance improvement per number of keystrokes.

The code in question is hugely complex, but the trick we used is really very simple.  Consider the following MATLAB code

>> a=rand(4000);
>> c=12.3;
>> tic;res1=c*a*a';toc
Elapsed time is 1.472930 seconds.

With the insertion of just two brackets, this runs quite a bit faster on my Ivy Bridge quad-core desktop.

>> tic;res2=c*(a*a');toc
Elapsed time is 0.907086 seconds.

So, what’s going on? Well, we think that in the first version of the code, MATLAB first calculates c*a to form a temporary matrix (let’s call it temp here) and then goes on to find temp*a’.  However, in the second version, we think that MATLAB calculates a*a’ first and in doing so it takes advantage of the fact that the result of multiplying a matrix by its transpose will be symmetric which is where we get the speedup.

Another demonstration of this phenomena can be seen as follows

>> a=rand(4000);
>> b=rand(4000);
>> tic;a*a';toc 
Elapsed time is 0.887524 seconds.
>> tic;a*b;toc  
Elapsed time is 1.473208 seconds.
>> tic;b*b';toc
Elapsed time is 0.966085 seconds.

Note that the symmetric matrix-matrix multiplications are faster than the general, non-symmetric one.

March 2nd, 2013

In a recent article, Matt Asher considered the feasibility of doing statistical computations in JavaScript.  In particular, he showed that the generation of 10 million normal variates can be as fast in Javascript as it is in R provided you use Google’s Chrome for the web browser.  From this, one might infer that using javascript to do your Monte Carlo simulations could be a good idea.

It is worth bearing in mind, however, that we are not comparing like for like here.

The default random number generator for R uses the Mersenne Twister algorithm which is of very high quality, has a huge period and is well suited for Monte Carlo simulations.  It is also the default algorithm for modern versions of MATLAB and is available in many other high quality mathematical products such as Mathematica, The NAG library, Julia and Numpy.

The algorithm used for Javascript’s math.random() function depends upon your web-browser.  A little googling uncovered a document that gives details on some implementations.  According to this document, Internet Explorer and Firefox both use 48 bit Linear Congruential Generator (LCG)-style generators but use different methods to set the seed.  Safari on Mac OS X uses a 31 bit LCG generator and Version 8 of Chrome on Windows uses 2 calls to rand() in msvcrt.dll.  So, for V8 Chrome on Windows, Math.random() is a floating point number consisting of the second rand() value, concatenated with the first rand() value, divided by 2^30.

The points I want to make here are:-

  • Javascript’s math.random() uses different algorithms between browsers.
  • These algorithms have relatively small periods.  For example, a 48-bit LCG has a period of 2^48 compared to 2^19937-1 for Mersenne Twister.
  • They have poor statistical properties.  For example, the 48bit LCG implemented in Java’s java.util.Random function fails 21 of the BigCrush tests.  I haven’t found any test results for JavaScript implementations but expect them to be at least as bad. I understand that Mersenne Twister fails 2 of the BigCrush tests but these are not considered to be an issue by many people.
  • You can’t manually set the seed for math.random() so reproducibility is impossible.
February 19th, 2013

From time to time I find myself having to write or modify windows batch files.  Sometimes it might be better to use PowerShell, VBScript or Python but other times a simple batch script will do fine.  As I’ve been writing these scripts, I’ve kept notes on how to do things for future reference.  Here’s a summary of these notes.  They were written for my own personal use and I put them here for my own convenience but if you find them useful, or have any comments or corrections, that’s great.

These notes were made for Windows 7 and may contain mistakes, please let me know if you spot any.  If you use any of the information here, we agree that its not my fault if you break your Windows installation.  No warranty and all that.

These notes are not meant to be a tutorial.

Comments

Comments in windows batch files start with REM. Some people use :: which is technically a label. Apparently using :: can result in faster script execution (See here and here). I’ve never checked.

REM This is a comment
:: This is a comment too...but different. Might be faster.

If statements

If "%foo%"=="bar" (
REM Do stuff
REM Do more stuff
)
else (
REM Do different stuff
)

Check for existence of file

if exist {insert file name here} (
    rem file exists
) else (
    rem file doesn't exist
)

Or on a single line (if only a single action needs to occur):

if exist {insert file name here} {action}

for example, the following opens notepad for autoexec.bat, if the file exists:

if exist c:\autoexec.bat notepad c:\autoexec.bat

Echoing and piping output
To get a newline in echoed output, chain commands together with &&

echo hello && echo world

gives

hello
world

To pipe output to a file use > or >> The construct 2>&1 ensures that you get both standard output and standard error in your file

REM > Clobbers log.txt, overwriting anything you already have in it
"C:\SomeProgram.exe" > C:\log.txt 2>&1

REM >> concatenates output of SomeProgram.exe to log.txt
"C:\SomeProgram.exe" >> C:\log.txt 2>&1

Environment Variables

set and setx

  • set – sets variable immediately in the current context only.  So, variable will be lost when you close cmd.exe.
  • setx – sets variable permanently but won’t be valid until you start a new context (i.e. open a new cmd.exe)

List all environment variables defined in current session using the set command

set

To check if the environment variable FOO is defined

if defined FOO (
 echo "FOO is defined and is set to %FOO%"
)

To permanently set the system windows environment variable FOO, use setx /m

setx FOO /m "someValue"

To permanently unset the windows environment variable FOO, set it to an empty value

setx FOO ""

A reboot may be necessary. Strictly speaking this does not remove the variable since it will still be in the registry and will still be visible from Control Panel->System->Advanced System Settings->Environment variables. However, the variable will not be listed when you perform a set command and defined FOO will return false.  To remove all trace of the variable, delete it from the registry.

Environment variables in the registry

On Windows 7:

  •  System environment variables are at HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\Environment
  •  User environment variables are at HKEY_CURRENT_USER\Environment

If you change environment variables using the registry, you will need to reboot for them to take effect.

Pausing
This command will pause for 10 seconds

TIMEOUT /T 10

Killing an application

This command will kill the notepad.exe window with the title Readme.txt

taskkill /f /im notepad.exe /fi "WINDOWTITLE eq Readme.txt"

Time stamping

The variable %time% expands to the current time which leads you to do something like the following to create time stamps between the execution of commands.

echo %time%
timeout /t 1
echo %time%

This works fine unless your code is in a block (i.e. surrounded by brackets), as it might be if it is part of an if-statement for example:

(
echo %time%
timeout /t 1
echo %time%
)

If you do this, the echoed time will be identical in both cases because the %time% entries get parsed at the beginning of the code block. This is almost certainly not what you want.

Setlocal EnableDelayedExpansion
(
echo !time!
timeout /t 1
echo !time!
)

Now we get the type of behaviour we expect.

Where is this script?

Sometimes your script will need to know where it is.  Say test.bat is at C:\Users\mike\Desktop\test.bat and contains the following

set whereAmI=%~dp0

When you run test.bat, the variable whereAmI will contain C:\Users\mike\Desktop\

Details on %dp0 can be found at StackOverflow.

Variable substrings
This came from StackOverflow’s Hidden features of Windows batch files which is a great resource.  They’ve tightened up on what constitutes a ‘valid question’ these days and so great Q+A such as this won’t be appearing there in future.

> set str=0123456789
> echo %str:~0,5%
01234
> echo %str:~-5,5%
56789
> echo %str:~3,-3%
3456
January 30th, 2013

In a recent blog post, I demonstrated how to use the MATLAB 2012a Symbolic Toolbox to perform Variable precision QR decomposition in MATLAB.  The result was a function called vpa_qr which did the necessary work.

>> a=vpa([2 1 3;-1 0 7; 0 -1 -1]);
>> [Q R]=vpa_qr(a);

I’ve suppressed the output because it’s so large but it definitely works. When I triumphantly presented this function to the user who requested it he was almost completely happy.  What he really wanted, however, was for this to work:

>> a=vpa([2 1 3;-1 0 7; 0 -1 -1]);
>> [Q R]=qr(a);

In other words he wants to override the qr function such that it accepts variable precision types. MATLAB 2012a does not allow this:

>> a=vpa([2 1 3;-1 0 7; 0 -1 -1]);
>> [Q R]=qr(a)
Undefined function 'qr' for input arguments of type 'sym'.

I put something together that did the job for him but felt that it was unsatisfactory.  So, I sent my code to The MathWorks and asked them if what I had done was sensible and if there were any better options.  A MathWorks engineer called Hugo Carr sent me such a great, detailed reply that I asked if I could write it up as a blog post.  Here is the result:

Approach 1:  Define a new qr function, with a different name (such as vpa_qr).  This is probably the safest and simplest option and was the method I used in the original blog post.

  • Pros: The new function will not interfere with your MATLAB namespace
  • Cons: MATLAB will only use this function if you explicitly define that you wish to use it in a given function.  You would have to find all prior references to the qr algorithm and make a decision about which to use.

Approach 2: Define a new qr function and use the ‘isa’ function to catch instances of ‘sym’. This is the approach I took in the code I sent to The MathWorks.

function varargout = qr( varargin )

if nargin == 1 && isa( varargin{1}, 'sym' )
    [varargout{1:nargout}] = vpa_qr( varargin{:} );
else
    [varargout{1:nargout}] = builtin( 'qr', varargin{:} );
end
  • Pros: qr will always select the correct code when executed on sym objects
  • Cons: This code only works for shadowing built-ins and will produce a warning reminding you of this fact. If you wish to extend this pattern for other class types, you’ll require a switch statement (or nested if-then-else block), which could lead to a complex comparison each time qr is invoked (and subsequent performance hit). Note that switch statements in conjunction with calls to ‘isa’ are usually indicators that an object oriented approach is a better way forward.

Approach 3: The MathWorks do not recommend that you modify your MATLAB install. However for completeness, it is possible to add a new ‘method’ to the sym class by dropping your function into the sym class folder.  For MATLAB 2012a on Windows, this folder is at

C:\Program Files\MATLAB\R2012a\toolbox\symbolic\symbolic\@sym

For the sake of illustration, here is a simplified implementation. Call it qr.m

function result = qr( this )
  result = feval(symengine,'linalg::factorQR', this);
end

Pros: Functions saved to a class folder take precedence over built in functionality, which means that MATLAB will always use your qr method for sym objects.

Cons: If you share code which uses this functionality, it won’t run on someone’s computer unless they update their sym class folder with your qr code. Additionally, if a new method is added to a class it may shadow the behaviour of other MATLAB functionality and lead to unexpected behaviour in Symbolic Toolbox.

Approach 4: For more of an object-oriented approach it is possible to sub-class the sym class, and add a new qr method.

classdef mySym < sym

    methods
        function this = mySym(arg)
            this = this@sym(arg);
        end

        function result = qr( this )
            result = feval(symengine,'linalg::factorQR', this);
        end
    end

end

Pros: Your change can be shipped with your code and it will work on a client’s computer without having to change the sym class.

Cons: When calling superclass methods on your mySym objects (such as sin(mySym1)), the result will be returned as the superclass unless you explicitly redefine the method to return the subclass.

N.B. There is a lot of literature which discusses why inheritance (subclassing) to augment a class’s behaviour is a bad idea. For example, if Symbolic Toolbox developers decide to add their own qr method to the sym API, overriding that function with your own code could break the system. You would need to update your subclass every time the superclass is updated. This violates encapsulation, as the subclass implementation depends on the superclass. You can avoid problems like these by using composition instead of inheritance.

Approach 5: You can create a new sym class by using composition, but it takes a little longer than the other approaches. Essentially, this involves creating a wrapper which provides the functionality of the original class, as well as any new functions you are interested in.

classdef mySymComp

    properties
        SymProp
    end

    methods
        function this = mySymComp(symInput)
            this.SymProp = symInput;
        end

        function result = qr( this )
            result = feval(symengine,'linalg::factorQR', this.SymProp);
        end
    end

end


Note that in this example we did not add any of the original sym functions to the mySymComp class, however this can be done for as many as you like. For example, I might like to use the sin method from the original sym class, so I can just delegate to the methods of the sym object that I passed in during construction:

classdef mySymComp

    properties
        SymProp
    end

    methods
        function this = mySymComp(symInput)
            this.SymProp = symInput;
        end

        function result = qr( this )
            result = feval(symengine,'linalg::factorQR', this.SymProp);
        end

        function G = sin(this)
            G = mySymComp(sin(this.SymProp));
        end
    end

end

Pros: The change is totally encapsulated, and cannot be broken save for a significant change to the sym api (for example, the MathWorks adding a qr method to sym would not break your code).

Cons: The wrapper can be time consuming to write, and the resulting object is not a ‘sym’, meaning that if you pass a mySymComp object ‘a’ into the following code:

isa(a, 'sym')

MATLAB will return ‘false’ by default.