This is not one of the super-technical articles on the site. But, that's on purpose. You see, this is one of those articles where no one in the computer industry really wants to address a particular problem, and it's annoying the heck out of me.
Evil Machines Part 1Have you ever noticed how you name some files:
1.jpg And when you ask a computer to sort them, it does this?
1.jpg So of course, being frustrated, you rename all the files:
0000001.jpg
Which is just ugly, but of course you put up with it, because everyone puts up with it. Wasn't that cathartic? And now that we're all upset together, how about let's fix it? So, the way we can fix it is you take my free code, use it in your projects, print it out and give it to your programmer friends, and tell me any glaring bugs you find. And then we'll all be nice happy people again.
Squashing The BugWhat's the culprit? Well, everybody's calling the C-library strcmp, a badly abused function that is behind this mess.And strcmp is well-intentioned. It knows how to compare simple strings, even. But the creators of strcmp assumed that our deepest desire was to do a binary-level compare of two pieces of raw data. But in reality, well, we don't, because we invented our own language, among other reasons. So, anyway, what we do now is define some rules, The Cardinal Rules of Comparing Strings:
|
inline char tlower(char b) { if (b >= 'A' && b <= 'Z') return b - 'A' + 'a'; return b; } inline char isnum(char b) { if (b >= '0' && b <= '9') return 1; return 0; } inline int parsenum(char *&a) { int result = *a - '0'; ++a; while (isnum(*a)) { result *= 10; result += *a - '0'; ++a; } --a; return result; } inline int StringCompare(char *a, char *b) { if (a == b) return 0; if (a == NULL) return -1; if (b == NULL) return 1; while (*a && *b) { int a0, b0; // will contain either a number or a letter if (isnum(*a)) { a0 = parsenum(a) + 256; } else { a0 = tlower(*a); } if (isnum(*b)) { b0 = parsenum(b) + 256; } else { b0 = tlower(*b); } if (a0 < b0) return -1; if (a0 > b0) return 1; ++a; ++b; } if (*a) return 1; if (*b) return -1; return 0; }And now, will you please stop typing 00000008 when you really want to type 8? C'mon, I know you want to.
And spread this code everywhere you find a programmer.
I would add this as the first line:
if (a == b) return 0;
First, it is a reasonable optimization when comparing a long string to itself. Second, it handles the case a==NULL && b==NULL correctly, making StringCompare antisymmetric (StringCompare (a,b) == -StringCompare (b,a))
See also this article in the forums. (not incorporated yet.)