Can you tell a Computer Scientist from the way they write loops?

May 6th, 2014 | Categories: C/C++, Guest posts, programming | Tags:

This is a guest article written by friend and colleague, Ian Cottam.

This brief guest piece for Walking Randomly was inspired by reading about some of the Hackday outputs at the recent SSI collaborative workshop CW14 held in Oxford. I wasn’t there, but I gather that some of the outputs from the day examined source code for various properties (perhaps a little tongue-in-cheek in some cases).

So, my also slightly tongue-in-cheek question is “Given a piece of source code written in a language with “while loops”: how do you know if the author is a computer scientist by education/training?”

I’ll use C as my language and note that “for loops” in C are basically syntactic sugar for while loops (allowing one to gather the initialisation, guard and increment parts neatly together). In other languages “for loops” are closer to Fortran’s original iterative “do loop”. Also, I will work with that subset of code fragments that obey traditional structured (one-entry, one-exit) programming constructs. If I didn’t, perhaps one could argue, as famously Dijkstra originally did, that the density of “goto” statements, even when spelt “break” or “continue”, etc., might be a deciding quality factor.

(Purely as an aside, I note that Linux (and related free/open source) contributors seem to use goto fairly freely as an exception case mechanism; and they might well have a justification. The density of gotos in Apple’s SSL code was illustrated recently by the so-called “goto fail” bug. See also Knuth’s famous article on this subject.)

In my own programming, I know from experience that if I use a goto, I find it so much more difficult to reason logically (and non-operationally) about my code that I avoid them. Whenever I have used a programming language without the goto statement, I have never missed it.

Now, finally to the point at hand, suppose one is processing the elements of an array of single dimension and of length N. The C convention is that the index goes from 0 to N-1. Code fragment A below is written by a non computer scientist, whereas B is.

/* Code fragment A */
for (i= 0; i < N; ++i) {

/* do stuff with a[i] */

}
/* Code fragment B */
for (i= 0; i != N; ++i) {

/* do stuff with a[i] */

}

The only difference is the loop’s guard: i<N versus i!=N.

As a computer scientist by training I would always write B; which would you write?

I would – and will in a follow-up – argue that B is better even though I am not saying that code fragment A is incorrect. Also in the follow-up I will acknowledge the computer scientist who first pointed this out – at least to me – some 33 years ago.

  1. May 6th, 2014 at 16:36
    Reply | Quote | #1

    As a non computer scientist (I have an electrical and computer engineering background), I’ve always used A (professors teached me like that, and “C++ from the Ground Up” too).

    It may give the (probably unrealistic) feeling of being more safe since if the ‘i’ var “jumps over” ‘N’ the loop will still be terminated… :D

  2. May 6th, 2014 at 16:37
    Reply | Quote | #2

    I am also a Computer Scientist by training but a bit younger (because of the “33 years ago” comment) so I’m sure that has something to do with my viewpoint… I am interested to see where you go with this, mainly because I’m not sure I agree with your statement of the first being better.

    An example – what if I have an array of size 5. I then goof and initialize i to be 10 instead of 0. This causes a “semi” infinite loop since the guard checks that it doesn’t equal N, which doesn’t happen until overflow occurs and i wraps back around. An easy mistake to fix but would not happen if you used A.

    However, I’m still interested to see how you view it as better so maybe I should have waited for the next post!

  3. Mike Croucher
    May 6th, 2014 at 16:38
    Reply | Quote | #3

    I’m a non-computer scientist (trained as a physicist) and I use A too.

  4. May 6th, 2014 at 16:41
    Reply | Quote | #4

    Just read Sousa’s comment and that brings up another point: there is nothing stopping someone from modifying i within the loop itself so you could easily add an additional i++ (or whatever) which would, as Sousa says, jump over N causing it to go into the loop I mentioned in my last comment.

  5. markuman
    May 6th, 2014 at 17:26
    Reply | Quote | #5

    always A. B could end up in an infinite loop if something went completly wrong.

  6. May 6th, 2014 at 18:18
    Reply | Quote | #6

    I thought the “goto fail” bug was in Apple’s own SSL implementation, rather than OpenSSL. It’s the Heartbleed bug that’s put OpenSSL in the news.

  7. Jeremy
    May 6th, 2014 at 20:19
    Reply | Quote | #7

    I too am keen to hear why B might be considered better. Others have mentioned that A is safer because it terminates if i “jumps over” n, but perhaps the point is that errors are easy to detect if they give rise to obviously bad behavior, like an infinite loop.

  8. Ian Cottam
    May 6th, 2014 at 21:42
    Reply | Quote | #8

    Apologies for me confusing Apple’s SSL with OpenSSL.
    -Ian

  9. May 6th, 2014 at 22:03
    Reply | Quote | #9

    I’ve just had a quick play:

    On my machine, “gcc -O2 -s” compiles both of these to the same assembler in simple cases where it obviously can. Without an the optimization flag, the assembler corresponds straightforwardly to the requested comparisons.

    I always do A. I also often do “i++” in the loop, as I originally learned the idiom, even though there are cases where “++i” is better.

  10. C. Ribeiro
    May 6th, 2014 at 22:04

    Neither A nor B but C99:

    for (int i= 0; i < N; ++i) {

    /* do stuff with a[i] */

    }

    If you mistake i for a float then i != N is an invitation to disaster.

  11. C. Ribeiro
    May 6th, 2014 at 22:22

    For the question “Given a piece of source code written in a language with “while loops”: how do you know if the author is a computer scientist by education/training?” I answer with a famous quote:

    “Bad programmers worry about the code. Good programmers worry about data structures and their relationships.” – Linus Torvalds

    Dijkstra’s comments on goto as other Dijkstra’s comments such as “On the fact that the Atlantic Ocean has two sides.” are very revealing of the kind of man he was, an egotistical self-centered man.

  12. May 7th, 2014 at 12:55

    Not A or B but “F” as in Functional Programming. Here’s the code in Mathematica for example:

    Table[(*do stuff with a[i]*), {N}]

    Way more compact (and faster!)

  13. May 7th, 2014 at 12:56

    Sorry, correction:

    Table[(*do stuff with a[i]*), {i, 0, N}]

  14. May 7th, 2014 at 12:59

    George Danner :Not A or B but “F” as in Functional Programming. Here’s the code in Mathematica for example:
    Table[(*do stuff with a[i]*), {i, 0, N}]
    Way more compact (and faster!)

    @George Danner

  15. May 7th, 2014 at 20:06

    I think initially a developer’s habits are driven by the examples they are shown when first starting to program.

    Anybody who uses option B for any length of time gets to experience how fragile it is in the presence of developer mistakes and switches to option A.

    For more information than you probably want to know about C usage see: http://www.knosof.co.uk/cbook/cbook1_2.pdf

  16. Niels Walet
    May 12th, 2014 at 10:56

    As a physicist who also has a computer science background, I would sill always use A.

    I have been well trained in the “goto’s are evil” approach, however; as a pascal programmer the only goto allowed was the error exit….

    Niels

  17. Simon
    July 20th, 2015 at 23:11

    As a computer scientist, I scope my indices in the loop, or I give them actual names.

    And of course I use A. B is useless. And replacing breaks with flag and continues with enormous ifs is hypocritical.