Can you tell a Computer Scientist from the way they write loops?
This is a guest article written by friend and colleague, Ian Cottam.
This brief guest piece for Walking Randomly was inspired by reading about some of the Hackday outputs at the recent SSI collaborative workshop CW14 held in Oxford. I wasn’t there, but I gather that some of the outputs from the day examined source code for various properties (perhaps a little tongue-in-cheek in some cases).
So, my also slightly tongue-in-cheek question is “Given a piece of source code written in a language with “while loops”: how do you know if the author is a computer scientist by education/training?”
I’ll use C as my language and note that “for loops” in C are basically syntactic sugar for while loops (allowing one to gather the initialisation, guard and increment parts neatly together). In other languages “for loops” are closer to Fortran’s original iterative “do loop”. Also, I will work with that subset of code fragments that obey traditional structured (one-entry, one-exit) programming constructs. If I didn’t, perhaps one could argue, as famously Dijkstra originally did, that the density of “goto” statements, even when spelt “break” or “continue”, etc., might be a deciding quality factor.
(Purely as an aside, I note that Linux (and related free/open source) contributors seem to use goto fairly freely as an exception case mechanism; and they might well have a justification. The density of gotos in Apple’s SSL code was illustrated recently by the so-called “goto fail” bug. See also Knuth’s famous article on this subject.)
In my own programming, I know from experience that if I use a goto, I find it so much more difficult to reason logically (and non-operationally) about my code that I avoid them. Whenever I have used a programming language without the goto statement, I have never missed it.
Now, finally to the point at hand, suppose one is processing the elements of an array of single dimension and of length N. The C convention is that the index goes from 0 to N-1. Code fragment A below is written by a non computer scientist, whereas B is.
/* Code fragment A */ for (i= 0; i < N; ++i) { /* do stuff with a[i] */ }
/* Code fragment B */ for (i= 0; i != N; ++i) { /* do stuff with a[i] */ }
The only difference is the loop’s guard: i<N versus i!=N.
As a computer scientist by training I would always write B; which would you write?
I would – and will in a follow-up – argue that B is better even though I am not saying that code fragment A is incorrect. Also in the follow-up I will acknowledge the computer scientist who first pointed this out – at least to me – some 33 years ago.
As a non computer scientist (I have an electrical and computer engineering background), I’ve always used A (professors teached me like that, and “C++ from the Ground Up” too).
It may give the (probably unrealistic) feeling of being more safe since if the ‘i’ var “jumps over” ‘N’ the loop will still be terminated… :D
I am also a Computer Scientist by training but a bit younger (because of the “33 years ago” comment) so I’m sure that has something to do with my viewpoint… I am interested to see where you go with this, mainly because I’m not sure I agree with your statement of the first being better.
An example – what if I have an array of size 5. I then goof and initialize i to be 10 instead of 0. This causes a “semi” infinite loop since the guard checks that it doesn’t equal N, which doesn’t happen until overflow occurs and i wraps back around. An easy mistake to fix but would not happen if you used A.
However, I’m still interested to see how you view it as better so maybe I should have waited for the next post!
I’m a non-computer scientist (trained as a physicist) and I use A too.
Just read Sousa’s comment and that brings up another point: there is nothing stopping someone from modifying i within the loop itself so you could easily add an additional i++ (or whatever) which would, as Sousa says, jump over N causing it to go into the loop I mentioned in my last comment.
always A. B could end up in an infinite loop if something went completly wrong.
I thought the “goto fail” bug was in Apple’s own SSL implementation, rather than OpenSSL. It’s the Heartbleed bug that’s put OpenSSL in the news.
I too am keen to hear why B might be considered better. Others have mentioned that A is safer because it terminates if i “jumps over” n, but perhaps the point is that errors are easy to detect if they give rise to obviously bad behavior, like an infinite loop.
Apologies for me confusing Apple’s SSL with OpenSSL.
-Ian
I’ve just had a quick play:
On my machine, “gcc -O2 -s” compiles both of these to the same assembler in simple cases where it obviously can. Without an the optimization flag, the assembler corresponds straightforwardly to the requested comparisons.
I always do A. I also often do “i++” in the loop, as I originally learned the idiom, even though there are cases where “++i” is better.
Neither A nor B but C99:
for (int i= 0; i < N; ++i) {
/* do stuff with a[i] */
}
If you mistake i for a float then i != N is an invitation to disaster.
For the question “Given a piece of source code written in a language with “while loops”: how do you know if the author is a computer scientist by education/training?” I answer with a famous quote:
“Bad programmers worry about the code. Good programmers worry about data structures and their relationships.” – Linus Torvalds
Dijkstra’s comments on goto as other Dijkstra’s comments such as “On the fact that the Atlantic Ocean has two sides.” are very revealing of the kind of man he was, an egotistical self-centered man.
Not A or B but “F” as in Functional Programming. Here’s the code in Mathematica for example:
Table[(*do stuff with a[i]*), {N}]
Way more compact (and faster!)
Sorry, correction:
Table[(*do stuff with a[i]*), {i, 0, N}]
@George Danner
I think initially a developer’s habits are driven by the examples they are shown when first starting to program.
Anybody who uses option B for any length of time gets to experience how fragile it is in the presence of developer mistakes and switches to option A.
For more information than you probably want to know about C usage see: http://www.knosof.co.uk/cbook/cbook1_2.pdf
As a physicist who also has a computer science background, I would sill always use A.
I have been well trained in the “goto’s are evil” approach, however; as a pascal programmer the only goto allowed was the error exit….
Niels
As a computer scientist, I scope my indices in the loop, or I give them actual names.
And of course I use A. B is useless. And replacing breaks with flag and continues with enormous ifs is hypocritical.