Randomness in MATLAB : Are you ruining your results due to your choice of syntax?

September 26th, 2012 | Categories: math software, matlab, programming, random numbers | Tags:

Pop quiz: What does the following line of MATLAB code do?

rand('state',10)

If you said ‘It changes the seed of the random number generator to 10’ you get half a point.

‘Only half a point!?’ I hear you say accusingly ‘but it says so in my book [for example, 1-3], why not a full point?’

You only get a full point if you’d said something like ‘It changes the seed of the random number generator to 10 and it also changes the random number generator from the high quality, default Mersenne Twister generator to a lower quality legacy random number generator.

OK, how about this one?

rand('seed',10)

This behaves in a very similar manner– it changes both the seed and the type of the underlying generator. However, the random number generator it switches to this time is an even older one that was introduced as far back as MATLAB version 4.  It is not very good at all by modern standards!

A closer look

Open up a fresh copy of a recent version of MATLAB and ask it about the random number generator it’s using

>> RandStream.getGlobalStream
ans =
mt19937ar random stream (current global stream)
             Seed: 0
  NormalTransform: Ziggurat

mt1993ar refers to a particular variant of the Mersenne Twister algorithm— an industry strength random number generator that’s used in many software packages and simulations.  It’s been the default generator in MATLAB since 2007a.  Change the seed using the modern (since 2011a), recommended syntax and ask again:

>> rng(10)
>> RandStream.getGlobalStream
ans =
mt19937ar random stream (current global stream)
             Seed: 10
  NormalTransform: Ziggurat

This is behaving exactly as you’d expect, you ask it to change the seed and it changes the seed…nothing more, nothing less. Now, let’s use the older syntax

>> rand('state',10)
>> RandStream.getGlobalStream
ans =
legacy random stream (current global stream)
  RAND algorithm: V5 (Subtract-with-Borrow), RANDN algorithm: V5 (Ziggurat)

The random number generator has completely changed!   We are no longer using the Mersenne Twister algorithm, we are now using a ‘subtract with borrow’ [see reference 4 for implementation details] generator which has been shown to have several undesirable issues [5-7].

Let’s do it again but this time using the even older ‘seed’ version:

>> rand('seed',10)
>> RandStream.getGlobalStream
ans =
legacy random stream (current global stream)
  RAND algorithm: V4 (Congruential), RANDN algorithm: V5 (Ziggurat)

Now, this random number generator is ancient by computing standards.  It also has a relatively tiny period of only 2 billion or so.  For details see [4]

Why this matters

Now, all of this is well documented so you may wonder why I am making such a big issue out of it.  Here are my reasons

  • I often get sent MATLAB code for the purposes of code-review and optimisation.  I see the old seeding syntax a LOT and the program’s authors are often blissfully unaware of the consequnces.
  • The old syntax looks like all it should do is change the seed.  It doesn’t!  Before 2007a, however, it did!
  • The old syntax is written in dozens of books because it was once the default, correct syntax to use.
  • Many users don’t read the relevent section of the MATLAB documentation because they have no idea that there is a potential issue.  They read a book or tutorial..it says to use rand(‘state’,10) so they do.
  • MATLAB doesn’t use the old generators by default any more because they are not very good [4-7]!
  • Using these old generators may adversely affect the quality of your simulation.

The bottom line

Don’t do either of these to change the seed of the default generator to 10:

rand('state',10)
rand('seed',10)

Do this instead:

rng(10)

Only if you completely understand and accept the consequences of the older syntax should you use it.

References

1. ‘MATLAB – A practical introduction to programming and problem solving’, 2009,Stormy Attaway

2. MATLAB Guide (Second Edition), 2005, Desmond Higham and Nicholas Higham

3. Essential MATLAB for Engineers and Scientists (Fourth Edition), 2009, Hahn and Valentine

4. Numerical Computing with MATLAB, 2004, Cleve Moler (available online)

5.  Why does the random number generator in MATLAB fail a particular test of randomness? The Mathworks, retreived 26th September 2012

6. A strong nonrandom pattern in Matlab default random number generator, 2006, Petr Savicky, retreived 26th September 2012

7.  Learning Random Numbers: A Matlab Anomaly, 2008, Petr Savicky and Marko Robnik-Šikonja, Applied Artificial Intelligence, Vol22 issue 3, pp 254-265

Other posts on random numbers in MATLAB

  1. Umberto
    September 26th, 2012 at 15:24
    Reply | Quote | #1

    Hi

    thanks for the post. Reference n. 6 is a dead link.

    Best
    U

  2. September 26th, 2012 at 15:32
    Reply | Quote | #2

    Thanks for letting me know. Should be fixed now.

  3. September 26th, 2012 at 17:45
    Reply | Quote | #3

    Thanks very much for this helpful post.

    If a user types rand(‘state’,10) into the MATLAB Editor, the built-in Code Analyzer highlights the code and gives the message “Controlling RAND and RANDN with ‘seed’, ‘state’, or ‘twister’ is not recommended.

  4. September 26th, 2012 at 18:58
    Reply | Quote | #4

    Mike – you’re right that the changed syntax is not widely known.
    It makes it a pain to write code that works in both current and older versions of MATLAB. We decided to include both the old and new syntaxes in NLEVP http://www.mims.manchester.ac.uk/research/numerical-analysis/nlevp.html (see, e.g. gen_hyper2.m).
    –Nick Higham

  5. September 26th, 2012 at 19:33
    Reply | Quote | #5

    Hi Steve

    The code analyzer message is a great start but should be stronger IMHO. Lots of things arent recommended but people do them anyway with no real consequences. In practice, many people just ignore it.

    Cheers,
    Mike

  6. NAG fan
    September 27th, 2012 at 14:47
    Reply | Quote | #6

    Another implementation of the Mersenne Twister algorithm now with skip-ahead facility in the latest NAG release (as well as other RNGs such as L’Ecuyer MRG32K3a generator and Sobol) is available in the NAG Toolbox for MATLAB. I like the NAG Toolbox because as well as being able to use these functions in MATLAB for prototyping, when I write my production code in C++ I can access the identical functions via NAG C. I think these functions are also available from NAG in CUDA and Fortran, but you should check this out yourself if interest. See http://www.nag.co.uk/numeric/MB/start.asp for more on the NAG Toolbox.

  7. September 27th, 2012 at 17:27
    Reply | Quote | #7

    I tend to use rand(‘twister’,10) because it works in Octave and older versions of Matlab. In fact rng doesn’t work yet in the version of Matlab my department has installed.

    Sadly in the classic system, randn needs seeding separately and randn(‘twister’,10) doesn’t work.

  8. Peter Perkins
    September 29th, 2012 at 15:50
    Reply | Quote | #8

    Excellent post. It’s always hard to change a habit that’s deeply ingrained, but your post explains why it’s important in this case. Hopefully the rng function can become just as second nature.

    Two comments:

    1) It’s worth noting that the RNG function not only lets you set the seed, but also lets you switch generators. Besides the MT, there are two other choices of “modern” generators — L’Ecuyer’s combined recursive, and Mascagni’s multiplicative lagged Fibonacci. Those two are specifically designed for parallel generation and intended to address the potential problem raised in the very last link in the original post, “Probability of overlapping subsequences”. To really solve the parallel generation problem, you need to use use those generators via the RandStream interface to MATLAB’s generators, though — RNG is really aimed at the most common, simplest tasks. Lest this sound as if you have to learn yet another whole new thing, let me quickly say that most people will find that RNG does what they need, and this page

    http://www.mathworks.com/help/matlab/math/controlling-random-number-generation.html

    shows how to use it. The next-to-last link in the original posr, “Parallel Random Numbers in MATLAB” talks a lot about more complicated parallel issues better solved using RandStream.

    2) As with the code that people send you, I have seen the old “seed/state setting” syntax a lot in code on the MATLAB Central File Exchange. A surprisingly high percentage of the time, that random number generator “control” is unnecessary and even undesirable, because it can silently make things not (even pseudo)random at all. So sometimes the change that’s needed is not to replace rand(‘seed’,0) with “rng default”, but rather to just delete it. If possible, it’s best to leave random number generator “control” entirely up to the person calling the library/utility/whatever so that’s that there are no hidden surprises. Global persistent state is a tricky thing, and it’s best to ask yourself why you really need to put a command such as rng(10) in your code. Sometimes it’s important because you _want_ to get predictable results, or make sure your stream doesn’t overlap with things you did previously. But many times, you can just leave it out.

  9. October 5th, 2012 at 09:36
    Reply | Quote | #9

    When `rng` came out in R2011a, I loved it, but I have stopped using it since then since I keep running into backward compatibility issue when I share my code.

    The new stream notation is compatible back to R2008b, so you can still benefit from this without using `rng`. But the official syntax is hard to memorize (I use something like:
    RandStream.setDefaultStream(RandStream(‘mt19937ar’, ‘seed’, sd ));
    )
    More often, when I just want reproducibility, I use the easier to remember following syntax, which resets to the default state:
    reset( RandStream.getDefaultStream );