String sorting in R appears to use different ordering from everyone else

April 12th, 2018 | Categories: C/C++, matlab, programming, python, R | Tags:

A discussion on twitter determined that this was an issue with Locales. The practical upshot is that we can make R act the same way as the others by doing

Sys.setlocale("LC_COLLATE", "C")

which may or may not be what you should do!

Original post

While working on a project that involves using multiple languages, I noticed some tests failing in one language and not the other. Further investigation revealed that this was essentially because R's default sort order for strings is different from everyone else's.

I have no idea how to say to R 'Use the sort order that everyone else is using'. Suggestions welcomed.

R 3.3.2


[1] "-a" "-b" "#a" "#b" "a" "b"

Python 3.6


['#a', '#b', '-a', '-b', 'a', 'b']

MATLAB 2018a


ans =
1×6 cell array
{'#a'} {'#b'} {'-a'} {'-b'} {'a'} {'b'}


int main(){ 

std::string mystrs[] = {"#b","-b","-a","#a","a","b"}; 
std::vector<std::string> stringarray(mystrs,mystrs+6);
std::vector<std::string>::iterator it; 


for(it=stringarray.begin(); it!=stringarray.end();++it) {
   std::cout << *it << " "; 

return 0;


#a #b -a -b a b
  1. May 24th, 2018 at 23:07
    Reply | Quote | #1

    There’s a whole analysis of R’s sorting implementation, along with trying a bunch of implementations out in Julia, at this post:

  2. P. Fonseca
    May 28th, 2018 at 08:14
    Reply | Quote | #2

    And on the WL:

    In[] := Sort[{“#b”, “-b”, “-a”, “#a”, “a”, “b”}]
    out[]:= {“#a”, “-a”, “a”, “#b”, “-b”, “b”}