Loopy Decisions

Developers love ?rulesof thumb? for perf; especially if a rule is somewhat nonobvious and can be used to show how elite one is. A rule is accepted as fact, so a developer can simply match the situation to the rule and then stop thinking further.

So, what could be simpler than looping through an array of integers in C#? Which way would you do the loop?

  1. foreach (int i in foo) {}
  2. for (int i = 0; i < foo.Length; i++) {}
  3. int len = foo.Length; for (int i = 0; i < len; i++) {}

Eric Gunnerson works on the C# team and is a certified expert. He decides that option 1 is the best. Joe Duffy sets up a perf harness and decides that option 1 is best. Kevin Ransom analyzed the IL directly and decided that one and two are basically identical. Justin Rogers has his own comments on how the JIT optimizes these loops, in response to a discussion about the issue on Brad Abrams blog.

Now we have a bunch of people running around very exited at their newfound enlightenment and preaching the virtues of whichever ?rule? they happened to read on a blog. The flurry of dissasembly and ?hard facts? in the posts make each decision seem impervious to criticism.

Unfortunately, they’re all wrong.

First, when I disassembled these three options myself, I found that option 3 was more efficient than 2 by two instructions (in the loop), and 2 was more efficient than 1 by an additional two instructions (in the loop). This differs with what other people have posted, which is a result of other people writing the code differently than I did(I just compiled the exact loops I wrote above). None are different enough to warrant a change in coding style, and asking people to code things differently based on a very arguable ILDASM over a contrived example is not smart. Furthermore, depending on what sort of collection is being used, the semantics of the three examples can be different.

There are two ?rules? I would like to propose:

  • Don’t be too clever. Writing loop 3 just because you think it will be faster is too clever. Avoiding loop 3 just because ?the JIT can’t optimize it? is also trying to fool the compiler. Just write the code that does what you want.
  • Measure perf specific to your scenario. Measure it yourself.

Here is my IL:

Loop 1

IL_000e: ldloc.2
IL_000f: ldloc.3
IL_0010: ldelem.i4
IL_0011: stloc.1
IL_0012: ldloc.1
IL_0013: call void [mscorlib]System.Console::WriteLine(int32)
IL_0018: ldloc.3
// load index of ‘current’
IL_0019: ldc.i4.1
// will increment by one
IL_001a: add // do the increment
IL_001b: stloc.3 // store back the index of ‘current’
IL_001c: ldloc.3 // load index of ‘current’ again
IL_001d: ldloc.2
// we want the length of a particular array
IL_001e: ldlen
// get the length
IL_001f: conv.i4 // convert it to int
IL_0020: blt.s IL_000e // compare and branch

Loop 2

IL_000c: ldloc.0
IL_000d: ldloc.1
IL_000e: ldelem.i4
IL_000f: call void [mscorlib]System.Console::WriteLine(int32)
IL_0014: ldloc.1
// get i

IL_0015: ldc.i4.1
// increment by 1
IL_0016: add
// do the increment
IL_0017: stloc.1
// store the new value for i
IL_0018: ldloc.1 // get i again
IL_0019:
ldloc.0 //we wantlength for a particular array
IL_001a: ldlen // get length of that array
IL_001b: conv.i4 // convert to int
IL_001c: blt.s IL_000c
// compare and branch

Loop 3

IL_0010: ldloc.0
IL_0011: ldloc.2
IL_0012: ldelem.i4
IL_0013: call void [mscorlib]System.Console::WriteLine(int32)
IL_0018: ldloc.2
// get i
IL_0019: ldc.i4.1 //increment by one
IL_001a: add //do the increment
IL_001b: stloc.2 // storethe new value for i

IL_001c: ldloc.2 // get i again
IL_001d: ldloc.1 // get len
IL_001e: blt.s IL_0010
// compare and branch

2 Comments

  • Changing the post-increment (i++) to a pre-increment should up performance even more.

    Good article.

  • [...] In an old post I found online here the author asks how you would go about writing a simple for loop. I was bored tonight, so I wrote a simple program to time several different types of loops to confirm which is the fastest at iterating through a generic list, yes… that bored. My list contained 67,108,863 integers. Here are the results: Foreach loop: 1185.8ms int tmp; foreach (int i in array) { tmp = i; } Standard for loop: 932.1ms int tmp; for (int i = 0; i < array.Count; i++) { tmp = array[i]; } Optimized for loop: 726.1ms int tmp; int cnt = array.Count; for (int i = 0; i < cnt; ++i) { tmp = array[i]; } [...]

Leave a Reply

Your email is never shared.Required fields are marked *