JoS Schtick

Joel Spolsky thinks you should learn C if you want to make it in programming. For instance, he says you should know C well enough to know why

while (*s++ = *t++);

copies a string. Personally I think you should know C well enough to know why

while (*s++ = *t++);

puts your program at serious risk from exploits based on buffer-overruns.

One Response to “JoS Schtick”

  1. Dominic Says:

    In case anybody’s curious, I believe it goes like this:

    In C, there is no boolean type. 0 is used for false, any non-zero value for true.

    The assignment operator (=, not to be confused with the comparison operator, ==) passes along its right-hand side value as a return value, so while (a = b) will loop for as long as b is true, i.e. non-zero.

    A string in C is an array of chars, and a char is just a byte because every character in the whole goddamn world is an ASCII character.

    This array is terminated with a 0, which is how you can tell where the string ends and random other memory begins.

    s and t are pointers to values, in this case values in the source and target arrays (confusingly, s points to the target array, and t to the source array). *s refers to the value pointed to by s, *t to the value pointed to by t.

    The operator ++ will increment the value it is attached to. If it is used as a prefix, e.g. ++s, then the value will be incremented then returned. If it is used as a suffix, e.g. s++, then the value will be returned then incremented.

    When the value of a pointer is incremented, the pointer moves forwards (e.g. to the next value in an array). This is different from incrementing the value referenced by the pointer. If we wanted to do that, we’d have to say (*s)++.

    In C, the body of a while loop does not have to be present. The side-effects you want might be produced just by evaluating the check condition.

    That is what is happening here. Evaluating the check condition, *s++ = *t++, causes the value referenced by t to be copied to the location referenced by s, and both t and s to be moved forward to the next values in their respective arrays. The value referenced by t is returned, and will be non-zero if the pointer has not reached the end of the string. The while loop will continue to executed until the value referenced by t (before it was incremented) is 0, at which point the check condition will fail and the loop will terminate.

    The risk of buffer-overrun exploits arises because there is no check on the size of the array referenced by t, and hence no guarantee that the array referenced by s will be large enough to contain it. The loop will continue to run until all of the values in the source array have been copied, and some of them may therefore end up being copied into random other memory – possibly overwriting executable code, and potentially replacing it with malicious code intended to do nasty things with whatever privileges the running application has been granted.

Leave a Reply