Technical debt that lasts forever

I noticed that ls output is sorted case-sensitively on macOS; that is, “abc” is sorted after “Xyz.” It doesn’t appear there are any mechanisms to get ls to do a case-insensitive sort, either. To work around this in a script I was writing, I looked to sort to do this for me, and stumbled upon the always-confusing flag:

-f, --ignore-case: Convert all lowercase characters to their uppercase equivalent before comparison, that is, perform case-independent sorting.

Which brings about the question: what does -f have to do with case-insensitive sorting? The answer to this part of the mystery is more apparent in the coreutils version, which describes it as:

-f, --ignore-case: fold lower case to upper case characters

So the -f short-form flag is for “fold.” Case folding is a mechanism for comparing strings while mapping some characters to others, or in this case mapping lowercase to uppercase using Unicode’s Case Folding table.

The long-form version of this flag was added in 2001, citing as “add support for long options.” The short-form version was added in 1993, likely for compatibility with some pre-existing Unix version. The first version of the POSIX standard in “Commands and Utilities, Issue 4, Version 2” (1994, pg. 647) doesn’t even use the word “fold,” defining it as:

-f: Consider all lower-case characters that have upper-case equivalents, according to the current setting of LC_CTYPE, to be the upper-case equivalent for the purposes of comparison.

Sadly, the oldest version of sort that I can find is from 4.4BSD-lite2, which describes it the same as macOS does now, also without the word “fold” in sight. I’m guessing some older, more ancient documentation for proprietary Unix is floating around somewhere that describes the flags, too.

Amusingly, the option you’d think would be the case-insensitive comparison flag, -i, is instead “ignore all non-printable characters.” This is a great example of picking a precise-but-confusing name for something and getting stuck with it until the end of time.