true string indices

The other day cperciva answered why strchr returns a pointer. Many other languages do return an offset, but of course many of those lanuages don’t have pointers. Poor things. I happened to be writing a bunch of code using strchr recently, and needed both pointers and offsets.

Let’s imagine we have two similar functions, strchr and index.

``````char *strchr(char *str, int ch);
size_t index(char *str, int ch);
``````

index is actually an archaic synonym for strchr, but it suits my purpose to make it return an integer.

We’d use these functions in approximately the same way. After all, there’s a straightforward equivalence between pointers and indices.

``````char *s = "an eater";

char *p = strchr(s, ' ');
size_t i = index(s, ' ');

*p = 't';
s[i] = 't';
``````

Which technique is better? Trick question. Modifying a constant string is always undefined behavior. Other than that, though, and they’re about that same. Regardless of which answer we have, it’s easy to calculate the other.

``````p = s + i;
i = p - s;
``````

Things start getting a little more interesting if we have to call strchr twice, however.

``````char *s = "mississippi";
int n = 4;
char *p = s;
size_t i = 0;

while (--n) {
p = strchr(p + 1, 'i');
i = index(s + i + 1, 'i');
}
``````

Alas, this doesn’t work quite as intended with index. The cumulative offset from the beginning of the string is not what index returns. We’d need to introduce another variable, or change the prototype to include an offset.

What happens when we aren’t sure we’ll find the character searched for?

``````if ((p = strchr(s, '.')))
*p = 0;
``````

If there was a dot, now there’s not. If there was no dot, still there’s not. Don’t want to dereference NULL pointers.

Unfortunately, in a language with zero based indices, zero is a valid return for index, ruling it out as an error return. The obvious choice would be SIZE_MAX or -1. That will probably work for most code. But from a standards specification view, it’s not a great choice.

What’s the longest string one can make? SIZE_MAX - 1. (Why?) At first, it seems that reserving SIZE_MAX to indicate not found would work. But strchr and index can also be used to search for the nul byte. If we search the longest possible string for the terminating nul and get back SIZE_MAX, does that mean we found it or not? (Correction. Actually, SIZE_MAX should work. The largest value for a non nul byte would be SIZE_MAX - 2, so the nul is one past that. Math is hard.)

Pointers and indices occupy slightly different semantic spaces. A false pointer is always an invalid pointer. A false index is a valid index (most of the time, anyway).

Posted 24 Jun 2016 13:42 by tedu Updated: 25 Jun 2016 17:03