Recursion can bite you big time if you're not careful

I started taking a Programming Languages MOOC, which aims to teach similarities between all languages through examples in ML.

While illustrating the benefits of ML’s LET command, the instructor presented this intentionally poorly written algorithm for finding the maximum value in a list:

// Let Expressions to Avoid Repeated Computation
fun bad_max (xs : int list) =  
    if null x
    then 0
    else if null (tl xs)
    then hd xs
    else if hd xs > bad_max(tl xs)
    then hd xs
    else bad_max(tl xs)

Around 25-30 numbers, it gets reallllly slow.

I figured I’d translate it to C#, see if the effect was the same, and then illustrate why it is.

Here’s my translation, using recursion to get the max value, and calling BadMax() twice, instead of only calling it once and storing the value.

private int BadMax(List<int> sequence)  
{
    maxRoutineCounter++;

    if (!sequence.Skip(1).Any())
        return sequence.First();

    return sequence.First() > BadMax(sequence.Skip(1).ToList())
        ? sequence.First()
        : BadMax(sequence.Skip(1).ToList());
}

Now I’ll time it using a StopWatch, and test multiple “max” values from 20 to 30:

var s = new Stopwatch();

for (var i = 20; i <= 30; i++)  
{
    maxRoutineCounter = 0;
    s.Reset();
    s.Start();
    BadMax(Enumerable.Range(1, i).Reverse().ToList());
    s.Stop();
    Console.WriteLine("{0}: {1} ({2})", i, s.Elapsed, maxRoutineCounter);

    maxRoutineCounter = 0;
    s.Reset();
    s.Start();
    BadMax(Enumerable.Range(1, i).ToList());
    s.Stop();
    Console.WriteLine("{0}: {1} ({2})", i, s.Elapsed, maxRoutineCounter);
}

Here's the output, presenting the total time it took to find the max between 1 and the maximum range. There are a few things to note.

  • The list of values going in decreasing order barely increases in time at all, going from 30 to 20.
  • When decreasing, the method is called n times, where n is the number of elements being compared.
  • The list of values going in increasing order doubles every time the max value increases by 1.
  • When increasing, the number of times the method is called is 2^n - 1 times.
20: 00:00:00.0042283 (20)  
20: 00:00:00.3255788 (1048575)    // quick, but still 80x slower than decreasing order  
21: 00:00:00.0000277 (21)  
21: 00:00:00.5869601 (2097151)  
22: 00:00:00.0000166 (22)  
22: 00:00:01.1917259 (4194303)  
23: 00:00:00.0000162 (23)  
23: 00:00:02.3495555 (8388607)    //   2 seconds  
24: 00:00:00.0000179 (24)  
24: 00:00:04.7185666 (16777215)   //   5 seconds  
25: 00:00:00.0000188 (25)  
25: 00:00:09.4540360 (33554431)   //   9 seconds  
26: 00:00:00.0000282 (26)  
26: 00:00:19.1597141 (67108863)   //  19 seconds  
27: 00:00:00.0000213 (27)  
27: 00:00:38.0982709 (134217727)  //  38 seconds ...  19 x 2 = 38  
28: 00:00:00.0000235 (28)  
28: 00:01:17.3137448 (268435455)  //  77 seconds ...  38 x 2 = 76  
29: 00:00:00.0000248 (29)  
29: 00:02:32.4997444 (536870911)  // 152 seconds ...  77 x 2 = 154  
30: 00:00:00.0000248 (30)  
30: 00:05:03.6409751 (1073741823) // 304 seconds ... 152 x 2 = 304  

Using a state-of-the-art utility called Paint, I just walked the same path the application is taking.

First, here it goes in decreasing order. Including the initial call to BadMax() to start the process (not shown here), there are 7 calls.

reflection_linear

This may be a bit messy, but here's the process of going in increasing order. Because the else block executes, we end up running the BadMax() method way, way more. This is just a portion of the tree, but the total calls for the range 1 to 7 ends up being 127.

reflection_exponential

After staring at all these numbers for a couple hours, I think I've forgotten what I'm doing here.

hypnotized

Oh yeah, having fun with numbers. And pointing out how screwed recursion can get you if you're careless.

You might be thinking, "well, duh, this is all obvious". But even outside of recursion, it'd be easy to call a method to use the value it returns, in order to decide between several possible actions, one of which then calls the same method again to use it's value for some other purpose. Recursion just makes it worse.

Here's a "corrected" version, where we store the value and reuse it as necessary:

private int NotAsBadMax(List<int> sequence)  
{
    if (!sequence.Skip(1).Any())
        return sequence.First();

    var nextMax = NotAsBadMax(sequence.Skip(1).ToList());

    return sequence.First() > nextMax
        ? sequence.First()
        : nextMax;
}
20: 00:00:00.0000119 (20)  
20: 00:00:00.0000132 (20)  
21: 00:00:00.0000145 (21)  
21: 00:00:00.0000141 (21)  
22: 00:00:00.0000158 (22)  
22: 00:00:00.0000218 (22)  
23: 00:00:00.0000273 (23)  
23: 00:00:00.0000226 (23)  
24: 00:00:00.0000265 (24)  
24: 00:00:00.0000230 (24)  
25: 00:00:00.0000290 (25)  
25: 00:00:00.0000265 (25)  
26: 00:00:00.0000312 (26)  
26: 00:00:00.0000295 (26)  
27: 00:00:00.0000359 (27)  
27: 00:00:00.0000307 (27)  
28: 00:00:00.0000342 (28)  
28: 00:00:00.0000329 (28)  
29: 00:00:00.0000384 (29)  
29: 00:00:00.0000226 (29)  
30: 00:00:00.0000449 (30)  
30: 00:00:00.0000384 (30)  

In fact, we can go much higher now. The method will only be called n times in either case, not 2^n - 1 times.

397: 00:00:00.0039696 (397)  
397: 00:00:00.0037194 (397)  
398: 00:00:00.0038669 (398)  
398: 00:00:00.0040393 (398)  
399: 00:00:00.0018340 (399)  
399: 00:00:00.0017656 (399)  
400: 00:00:00.0021402 (400)  
400: 00:00:00.0019294 (400)  

Bearing in mind that the first algorithm increased in execution time at roughly 19 x 2^(currentMax - 26) seconds (I just chose an arbitrary point where the seconds started increasing considerably), and that the number of times the routine was called (when going in increasing order) was 2^n - 1 times:

  • The routine would've executed 2.58 trillion trillion trillion trillion trillion trillion trillion trillion trillion trillion times, and
  • It would've taken 731,100 trillion trillion trillion trillion trillion trillion trillion trillion trillion years to do so.

It would eventually complete, but not before the heat death of the universe. (We could also eliminate the duplicate call to sequence.First(), but in testing the savings were negligible to nil.)

Thankfully, in C#, we've already got Linq.Enumerable.Max, which performs much, much faster than the code above, because we can just iterate a list, instead of having to constantly take the head and tail of a list and operate recursively like in ML. This code tests each element in the list, and stores the current element n if it's greater than the previous max element: if (n > maxValue) maxValue = n;

for (var i = 395; i <= 400; i++)  
{
    s.Reset();
    s.Start();
    Enumerable.Range(1, i).Reverse().Max();
    s.Stop();
    Console.WriteLine("{0}: {1}", i, s.Elapsed);

    s.Reset();
    s.Start();
    Enumerable.Range(1, i).Max();
    s.Stop();
    Console.WriteLine("{0}: {1}", i, s.Elapsed);
}
395: 00:00:00.0000102  
395: 00:00:00.0000034  
396: 00:00:00.0000119  
396: 00:00:00.0000034  
397: 00:00:00.0000085  
397: 00:00:00.0000038  
398: 00:00:00.0000106  
398: 00:00:00.0000034  
399: 00:00:00.0000085  
399: 00:00:00.0000038  
400: 00:00:00.0000106  
400: 00:00:00.0000038  // a few microseconds  

Alright, that was a fun waste of time. I'm done.


My CPU, slowly toasting as it chugs away on hundreds of millions of recursive calls:

recursion<em>killing</em>cpu

That's one way to stay warm as the weather turns colder outside. :p

Subscribe to Weekly Updates!

Get an email with the latest posts, once per week...
* indicates required