String sorting in R appears to use different ordering from everyone else
Update
A discussion on twitter determined that this was an issue with Locales. The practical upshot is that we can make R act the same way as the others by doing
Sys.setlocale("LC_COLLATE", "C")
which may or may not be what you should do!
Original post
While working on a project that involves using multiple languages, I noticed some tests failing in one language and not the other. Further investigation revealed that this was essentially because R's default sort order for strings is different from everyone else's.
I have no idea how to say to R 'Use the sort order that everyone else is using'. Suggestions welcomed.
R 3.3.2
sort(c("#b","-b","-a","#a","a","b"))
[1] "-a" "-b" "#a" "#b" "a" "b"
Python 3.6
sorted({"#b","-b","-a","#a","a","b"})
['#a', '#b', '-a', '-b', 'a', 'b']
MATLAB 2018a
sort([{'#b'},{'-b'},{'-a'},{'#a'},{'a'},{'b'}])
ans =
1×6 cell array
{'#a'} {'#b'} {'-a'} {'-b'} {'a'} {'b'}
C++
int main(){
std::string mystrs[] = {"#b","-b","-a","#a","a","b"};
std::vector<std::string> stringarray(mystrs,mystrs+6);
std::vector<std::string>::iterator it;
std::sort(stringarray.begin(),stringarray.end());
for(it=stringarray.begin(); it!=stringarray.end();++it) {
std::cout << *it << " ";
}
return 0;
}
Result:
#a #b -a -b a b
