Natural Sort is a hazy concept; it basically means something like human-friendly sort; it could be useful if the strings being sorted are going to be visible to the user.
Difference between normal sort and natural sort
Let us look at an example, we consider the list [1, 2, 10] to be an ascending order list, and sorting algorithms would not disagree. But if the list happened to be a list of strings most sorting libraries would sort it like this [ “1”, “10”, “2”]. The other common sorting inconvenience is related to the fact that in ASCII order, capital letters come before lower case letters. So if you try to sort the list [“a”, “B”] in ascending order, you will get the result [“B”, “a”].
I had always assumed that a character by character sort is the only practical way to sort strings and have always relied on zero-padding to avoid this problem. It was only recently that I noticed a difference between the way Windows Explorer sorts file names and the way PowerShell sorts file names. I was trying to create an m3u playlist using PowerShell. When I use PowerShell to create the playlist, files are in the
wrong order. But if I create the playlist using drag-and-drop via Explorer, the order of files is
correct. Here is an example of how Explorer sorts file names.
Limitations of Natural Sort
The Case Problem
case problem is the result of a mismatch between ASCII or Unicode ordering and alphabetic ordering. This is one of the most common sorting complaints users have and it is solvable. I believe the Python Natsort library handles it in a locale-aware manner. If you are dealing with only English, you don’t need an external library to handle this.
Numbers with special meaning inside a string
The problem of numbers in a string is a lot more thorny. Let us look at an example of how numbers with special meaning get added to a string. A user wants to store multiple pieces of data relating to an object. Only one variable or field is available to store the data. If the only variable available happens to be a permissive data type like the String, the user will likely dump all the data into the single string.
In order to do a real natural sort on this kind of compound string, you need to recognize the components which make up the string and guess which component the user wants to sort by. For example, let us say I have two files
Elephant_Lesson2.ai. Here I have added the file name and lesson number into one string. What should be the natural order of these files? Should lesson1 come before lesson2 or should the sorting be alphabetical? If we decide that lesson1 must come before lesson2, we will have a hard time dealing with that 4 in the middle which has no special meaning.
Sometimes the value inside the string has a definite and single data type. For example, there are strings which contain only values like dates, measurements, coordinate values, etc. Naturally, the first thing to try when faced with these type of strings is to see if a more appropriate data type can be used. It is not possible for a general sorting algorithm to deal with all these specific cases; I believe they do deal with some of them.
You have to say that programmers are in a somewhat unenviable position. Natural sorting has its unpredictability. On the other hand, users expect programmers to take care of at least the two most common sorting inconveniences: ASCII ordering and the case of numbers tagged to the beginning of the string. As mentioned before popular applications like Windows Explorer are already doing it. It is probably best to use an algorithm which handles only these two cases and sorts the rest in a predictable manner.