The test suite of a project I’m working on is poking around at the extreme edges of the range of double precision numbers. I noticed a difference between Windows and other platforms that I can’t yet fully explain. On Windows, the test suite was pumping out RuntimeWarnings that we don’t see in Linux or Mac. I’ve distilled the issue down to a single numpy command:
On Windows 7 Anaconda Python 2.3, this gives a RuntimeWarning and returns inf whereas on Linux and Mac OS X it evaluates to 709.78-ish
Numpy version is 1.9.2 in all cases.
64 bit Windows 7
Python 2.7.10 |Continuum Analytics, Inc.| (default, May 28 2015, 16:44:52) [MSC v.1500 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information. Anaconda is brought to you by Continuum Analytics. Please check out: http://continuum.io/thanks and https://binstar.org >>> import numpy as np >>> np.log1p(1.7976931348622732e+308) __main__:1: RuntimeWarning: overflow encountered in log1p inf
64 bit Linux
Python 2.7.9 (default, Apr 2 2015, 15:33:21) [GCC 4.9.2] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import numpy as np >>> np.log1p(1.7976931348622732e+308) 709.78271289338397
Mac OS X
Python 2.7.10 |Anaconda 2.3.0 (x86_64)| (default, May 28 2015, 17:04:42) [GCC 4.2.1 (Apple Inc. build 5577)] on darwin Type "help", "copyright", "credits" or "license" for more information. Anaconda is brought to you by Continuum Analytics. Please check out: http://continuum.io/thanks and https://binstar.org >>> import numpy as np >>> np.log1p(1.7976931348622732e+308) 709.78271289338397
The argument to log1p is getting close to the largest double precision number:
>>> sys.float_info.max 1.7976931348623157e+308
Given a symmetric matrix such as
What’s the nearest correlation matrix? A 2002 paper by Manchester University’s Nick Higham which answered this question has turned out to be rather popular! At the time of writing, Google tells me that it’s been cited 394 times.
Last year, Nick wrote a blog post about the algorithm he used and included some MATLAB code. He also included links to applications of this algorithm and implementations of various NCM algorithms in languages such as MATLAB, R and SAS as well as details of the superb commercial implementation by The Numerical algorithms group.
I noticed that there was no Python implementation of Nick’s code so I ported it myself.
Here’s an example IPython session using the module
In : from nearest_correlation import nearcorr In : import numpy as np In : A = np.array([[2, -1, 0, 0], ...: [-1, 2, -1, 0], ...: [0, -1, 2, -1], ...: [0, 0, -1, 2]]) In : X = nearcorr(A) In : X Out: array([[ 1. , -0.8084125 , 0.1915875 , 0.10677505], [-0.8084125 , 1. , -0.65623269, 0.1915875 ], [ 0.1915875 , -0.65623269, 1. , -0.8084125 ], [ 0.10677505, 0.1915875 , -0.8084125 , 1. ]])
This module is in the early stages and there is a lot of work to be done. For example, I’d like to include a lot more examples in the test suite, add support for the commercial routines from NAG and implement other algorithms such as the one by Qi and Sun among other things.
Hopefully, however, it is just good enough to be useful to someone. Help yourself and let me know if there are any problems. Thanks to Vedran Sego for many useful comments and suggestions.
- github repository for the Python NCM module, nearest_correlation
- Nick Higham’s original MATLAB code.
- NAG’s commercial implementation – callable from C, Fortran, MATLAB, Python and more. A superb implementation that is significantly faster and more robust than this one!
A lot of people don’t seem to know this….and they should. When working with floating point arithmetic, it is not necessarily true that a+(b+c) = (a+b)+c. Here is a demo using MATLAB
>> x=0.1+(0.2+0.3); >> y=(0.1+0.2)+0.3; >> % are they equal? >> x==y ans = 0 >> % lets look >> sprintf('%.17f',x) ans = 0.59999999999999998 >> sprintf('%.17f',y) ans = 0.60000000000000009
These results have nothing to do with the fact that I am using MATLAB. Here’s the same thing in Python
>>> x=(0.1+0.2)+0.3 >>> y=0.1+(0.2+0.3) >>> x==y False >>> print('%.17f' %x) 0.60000000000000009 >>> print('%.17f' %y) 0.59999999999999998
If this upsets you, or if you don’t understand why, I suggest you read the following
Does anyone else out there have suggestions for similar resources on this topic?