String sorting in R appears to use different ordering from everyone else

April 12th, 2018 | Categories: C/C++, matlab, programming, python, R | Tags:

Update
A discussion on twitter determined that this was an issue with Locales. The practical upshot is that we can make R act the same way as the others by doing

Sys.setlocale("LC_COLLATE", "C")

which may or may not be what you should do!

Original post

While working on a project that involves using multiple languages, I noticed some tests failing in one language and not the other. Further investigation revealed that this was essentially because R's default sort order for strings is different from everyone else's.

I have no idea how to say to R 'Use the sort order that everyone else is using'. Suggestions welcomed.

R 3.3.2

sort(c("#b","-b","-a","#a","a","b"))

[1] "-a" "-b" "#a" "#b" "a" "b"

Python 3.6

sorted({"#b","-b","-a","#a","a","b"})

['#a', '#b', '-a', '-b', 'a', 'b']


MATLAB 2018a

sort([{'#b'},{'-b'},{'-a'},{'#a'},{'a'},{'b'}])

ans =
1×6 cell array
{'#a'} {'#b'} {'-a'} {'-b'} {'a'} {'b'}

C++

int main(){ 

std::string mystrs[] = {"#b","-b","-a","#a","a","b"}; 
std::vector<std::string> stringarray(mystrs,mystrs+6);
std::vector<std::string>::iterator it; 

std::sort(stringarray.begin(),stringarray.end());

for(it=stringarray.begin(); it!=stringarray.end();++it) {
   std::cout << *it << " "; 
} 

return 0;
} 

Result:

#a #b -a -b a b
No comments yet.