Archive for November, 2018

November 20th, 2018

In a recent blog post, Daniel Lemire explicitly demonstrated that vectorising random number generators using SIMD instructions could give useful speed-ups.  This reminded me of the one of the first times I played with the Julia language where I learned that Julia’s random number generator used a SIMD-accelerated implementation of Mersenne Twister called dSFMT to generate random numbers much faster than MATLAB’s Mersenne Twister implementation.

Just recently, I learned that MATLAB now has its own SIMD accelerated version of Mersenne Twister which can be activated like so:


This new Mersenne Twister implementation gives different random variates to the original implementation (which I demonstrated is the same as Numpy’s implementation in an older post) as you might expect

>> rng(1,'Twister')
>> rand(1,5)
ans =
0.4170 0.7203 0.0001 0.3023 0.1468
>> rng(1,'simdTwister')
>> rand(1,5)
ans =
0.1194 0.9124 0.5032 0.8713 0.5324

So it’s clearly a different algorithm and, on CPUs that support the relevant instructions, it’s about twice as fast!  Using my very unscientific test code:

format compact
number_of_randoms = 10000000
disp('Timing standard twister')
disp('Timing SIMD twister')

gives the following results for a typical run on my Dell XPS 15 9560 which supports AVX instructions

number_of_randoms =
Timing standard twister
Elapsed time is 0.101307 seconds.
Timing SIMD twister
Elapsed time is 0.057441 seconds

The MATLAB documentation does not tell us which algorithm their implementation is based on but it seems to be different from Julia’s. In Julia, if we set the seed to 1 as we did for MATLAB and ask for 5 random numbers, we get something different from MATLAB:

julia> using Random
julia> Random.seed!(1);
julia> rand(1,5)
1×5 Array{Float64,2}:
 0.236033  0.346517  0.312707  0.00790928  0.48861

The performance of MATLAB’s new generator is on-par with Julia’s although I’ll repeat that these timing tests are far far from rigorous.

julia> Random.seed!(1);

julia> @time rand(1,10000000);
  0.052981 seconds (6 allocations: 76.294 MiB, 29.40% gc time)