/ c#

A Tail of Recursion in Erlang (a C# developer's observations)

A group of us at VHT has been meeting weekly, reviewing some Erlang basics and running through examples. Even though it means giving up a lunch hour, over a dozen people have still been showing up to learn and help each other.

Two of us were recently challenged to figure out a couple problems in Erlang without using obvious built-in functions:

  1. Determine the length of a list. (without using length([1,2,3]).)
  2. Reverse the list. (without using lists:reverse([1,2,3]).)

We looked into it, and even shared what we found with the group a week later. This post is inspired by the challenge, what we found, and some comparisons to my experience with C#.

If you find this useful, let me know! If something’s unclear or misleading, please let me know that too.

Using C#

There are very few times in C# where I’d willingly replace a loop structure with recursion, for reasons I’ll explain later. Check out these sample problems, comparing the first solution (using a loop) with the second solution (using recursion).

Which is shorter? Which is easier to read? If you’re already familiar with recursion, it may be the recursive solution, but I’d wager that for most developers, it’s the non-recursive one.

Finding Length of an Array (ForEach vs Recursion)

A solution for the length problem using a foreach loop, without using readily-available methods in the .NET framework. I’m using generics so the method supports arrays made up of any type.

public static void Main()
    int totalCount = GetListLength(new[] {1, 3, 4, 5, 9});  // should be 5
    Console.WriteLine("Total items in array: {0}", totalCount);  // Total items in array: 5
static int counter = 0;
public static int GetListLength<T>(T[] array)
    foreach (T item in array)
        counter += 1;
    return counter;

A solution for the length problem using recursion. Note the !array.Any() call. We always need a way to terminate the recursive calls, so here we tell it that GetListLength should return 0 (and not call itself anymore) for an empty array. If we forget those first two lines, Skip(1) will just keep passing an empty array to itself again, until the stack fills up (more on that later).

public static int GetListLength(int[] array)
    if (!array.Any())
        return 0;
    return 1 + GetListLength(array.Skip(1).ToArray());

Reversing an Array (For vs Recursion)

A solution for reversing an array using a for loop. We iterate the items in an array, populating a new array with the items in reverse.

public static int[] ReverseArray(int[] array)
    int[] newArray = new int[array.Length];
    for (int i=0; i<array.Length; i++)
        newArray[array.Length - (i+1)] = array[i];
    return newArray;

A solution for reversing an array using recursion. The first method jump starts the recursive method. Note the terminating condition again – when we’ve reached the end of the original array, our work is done and we can simply return. The entire stack is unwound, and all the actual assignments can be made to the new array since all the ReverseArray calls have been made.

public static int[] ReverseArray(int[] array)
    int[] newArray = new int[array.Length];
    ReverseArray(array, newArray, 0);
    return newArray;
public static void ReverseArray(int[] origArray, int[] newArray, int currentIndex)
    if (currentIndex >= origArray.Length)
    ReverseArray(origArray, newArray, currentIndex+1);
    newArray[origArray.Length - (currentIndex+1)] = origArray[currentIndex];

Finding a String in a Collection (ForEach vs Recursion)

Alright, one last example. Return the total count of matching strings in a collection, using foreach.

public static int FindOccurrencesOfName(string nameToFind, string[] names)
    int nameCounter = 0;
    foreach (string name in names)
        if (name == nameToFind)
            nameCounter += 1;
    return nameCounter;

Find matching strings using recursion. We’ll compare the string to the first element in the collection, then pass the rest of the collection to the same method again. This continues until Skip(1) sends an empty list, at which point we just return 0. Then the stack unwinds, all the 1’s and 0’s are summed up, and the count is returned.

public static int FindOccurrencesOfName(string nameToFind, string[] names)
    if (!names.Any())
        return 0;
    return (nameToFind == names.First() ? 1 : 0)
        + FindOccurrencesOfName(nameToFind, names.Skip(1).ToArray());

Why Recursion is Sometimes Discouraged

It takes a different mindset and it’s pretty easy to mess up and throw an exception (often because there was no terminating condition). Instead of doing the first thing first and the last thing last, we have to think in terms of evaluating the last thing first and moving backwards, basing the value of each previous method call on the evaluation of subsequent ones.

Sometimes there are specific reasons to use it, but it can also be more difficult to debug or to follow what’s going on when you come back to it later.

If I had 10 tools hanging on pegs, and wanted to count them, I could either:

  • Start with the left one, and iterate through all the tools. Make one “tick” per tool on a piece of paper. When I hit the last one, I have 10 ticks for 10 tools. That’s how we’re used to working through a collection of something, in coding or in real life.
  • Start with the left one, but don’t make a tick yet. - The total count of n tools (all 10) is actually 1 + (n-1 tools) (1 + count of last 9).
  • The total count of n-1 tools (the last 9) is 1 + (n-2 tools) (1 + count of last 8).
  • The total count of n-2 tools (the last 8) is 1 + (n-3 tools) (1 + count of last 7).
  • This continues until we get to the last tool, where the total count of n-9 tools (the last tool) is simply 1 (that’s our condition that prevents infinite recursion).
  • Finally, we can say that the total count of n tools is (1 + (n-1)) which is (1 + (1 + (n-2)) … and eventually 1 + (1 + (1 + (1 + (1 + (1 + (1 + (1 + (1 + 1)))))))), or 10.

It’s also easy to overflow the stack too, throwing a stackoverflow exception. All of those method calls have to be stored somewhere, until the method calls itself one last time, and then the final result of all those method calls can be evaluated (the stack is unwound). There’s a limit to how many methods can be stored on the stack before it fills up completely.

Using Erlang

Still here? I hope that made sense so far.

Erlang (unlike C#) emphasizes recursion and pattern-matching. It’s been a major shift in my thinking. You can do case statements, and quasi if statements, but they’re discouraged in favor of pattern matching and guard clauses. Loops aren’t even possible, since you can’t assign the same variable a value more than once.

Here are the same problems, solved in Erlang this time.

Finding Length of an Array (Recursion vs Tail Recursion)

When you define multiple functions with the same name and number of parameters (same “arity”), Erlang determines the correct function to call by pattern matching.

The first function below is what will terminate our recursion, when it’s eventually called, and it’ll only be called when the parameter is an empty list. Otherwise, the pattern doesn’t match, and it tries the next function.

get_list_length([]) ->
get_list_length([_H|T]) ->
  1 + get_list_length(T).

Say you pass a list with 3 numbers in it… [1,2,3]. It’s not an empty list (yet), so the second function is called. The parameter [_H|T] will actually take care of lopping off the head of our list (1) and passing the tail [2,3] to the function again. You could visualize this like so:

= 1 + get_list_length([2,3])
= 1 + 1 + get_list_length([3])
= 1 + 1 + 1 + get_list_length([])
= 1 + 1 + 1 + 0
= 3

But is this the most efficient way to perform the recursion? In C#, trying to count a large enough list in this manner would eventually overflow the stack. Bad things would also happen in Erlang, as it tried to store thousands or even millions of calls before being able to evaluate them all. It can’t evaluate the first 1 + get_list_length(T) until it knows the value of get_list_length(T), but that’s based on the value of the next 1 + get_list_length(T) and so on. Best case scenario, it takes up a ton of memory and everything slows down.

There’s a better way, and it’s called tail recursion (check out some great explanation and examples). We can eliminate the above problem if we do nothing inside the function call except call the function again. Instead of adding 1 to the result of the function call, we need to pass the current total (or accumulation) back into the function, like this:

get_list_length(List) ->
  get_list_length(List, 0).
get_list_length([], Acc) ->
get_list_length([_H|T], Acc) ->
  get_list_length(T, 1+Acc).

Notice how we pass the current length into the next recursive call. Since we pass everything we need into the next call, it doesn’t need to hang on to the previous call anymore. It doesn’t have to “unwind” at the end, adding all the previous 1 values together, because in the last call, we already have the length stored in Acc!

Reversing an Array (Recursion vs Tail Recursion)

A solution for reversing an array. The setup is similar to the previous solution.

reverse_list([]) ->
reverse_list([H|T]) ->
  reverse_list(T) ++ [H].

This time though, we don’t discard the head, but instead keep appending it to the end of the result of calling the same function with the tail. This continues recursively until the empty list is passed in and returned by the first function.

= reverse_list([2,3]) ++ [1]
= reverse_list([3]) ++ [2] ++ [1]
= reverse_list([]) ++ [3] ++ [2] ++ [1]
= [] ++ [3] ++ [2] ++ [1]
= [3,2,1]

But again, a sufficiently long list is going to get slow as it has to remember all of the header values that eventually need to be appended to the many calls to reverse_list(T).

Here it is again, using tail recursion this time. We append the most recent header to the reversed list, then pass that to the next function call in our chain of recursion. When it gets to the last call, and passes an empty list, it doesn’t have to unwind to figure out anything. The value it needs is already in RevList.

reverse_list(OrigList) ->
  reverse_list(OrigList, []).
reverse_list([], RevList) ->
reverse_list([H|T], RevList) ->
  reverse_list(T, [H|RevList]).

Finding a String in a Collection (Recursion vs Tail Recursion)

And finally, a solution for finding the number of matching strings in a list. Comparing the header to the name returns true or false, which calls one of two functions and returns either a 1 (is a match) or 0 (not a match). But it can’t evaluate all those stacked is_match calls until the last recursive call is made, and the empty list results in find_occurrences_of_name returning a 0.

find_occurrences_of_name(_Name, []) ->
find_occurrences_of_name(Name, [H|T]) ->
  is_match(Name =:= H) + find_occurrences_of_name(Name, T). 
is_match(true) -> 1; is_match(_) -> 0.

With a little bit of refactoring though, tail recursion works here too.

find_occurrences_of_name(Name, Names) -> 
  find_occurrences_of_name(Name, Names, 0). 
find_occurrences_of_name(_Name, [], Acc) ->
find_occurrences_of_name(Name, [H|T], Acc) -> 
  find_occurrences_of_name(Name, T, Acc + is_match(Name =:= H)). 
is_match(true) -> 1; is_match(_) -> 0.

In all three examples of tail recursion, we process the value (i.e. total count) as we go along, similar to how a foreach loop works, instead of waiting until the end.

How I Approach Recursion

It’s easier to write out the first few iterations of a recursive method, then look for patterns between them, and finally try to reduce them into a single recursive function.

Review the reverse_list function one more time. We know that if we have an empty list or just one element, we get back an empty list or the same element. If it’s got 2 or more elements, they’re reversed. We can write multiple functions with the same arity to cover lists of 0 to 4 elements, and look for patterns.

reverse_list([]) -> []; reverse_list([a]) ->
reverse_list([a,b]) ->
reverse_list([a,b,c]) ->
reverse_list([a,b,c,d]) ->

There does seem to be a pattern. The result of any particular function call is equivalent to the head of the list appended to the same function call on the tail of the list, like this:

reverse_list([]) ->
reverse_list([d]) ->
  reverse_list([]) ++ [d];
reverse_list([c,d]) ->
  reverse_list([d]) ++ [c];
reverse_list([b,c,d]) ->
  reverse_list([c,d]) ++ [b];
reverse_list([a,b,c,d]) ->        %% ...([H|T])
  reverse_list([b,c,d]) ++ [a].   %% ...(T) ++ [H]

Or stated a different way, we can reduce the similar 4 functions into one recursive function that acts on the head and tail of the list: (not as efficient as the tail recursion version previously shown)

reverse_list([]) ->
reverse_list([H|T]) ->
  reverse_list(T) ++ [H];

Final Thoughts

It’s been challenging to think in terms of recursion, but I feel like this practice was time well spent. Messing around with something when there’s no pressure tends to pay off later, when the pressure is on to solve a real problem.

Tail recursion, from what I’ve seen so far, offers the closest thing to a foreach loop that I’ve seen in Erlang so far. I miss loops. :p

Learning something new and presenting it front of a group of people, especially when some of them are experienced Erlang devs, was a little intimidating. Fortunately, we have a good group of people and a culture that encourages learning new things.

Featured image by TORLEY, licensed CC BY-SA 2.0

Grant Winney

Grant Winney

I write when I've got something to share - a personal project, a solution to a difficult problem, or just an idea. We learn by doing and sharing. We've all got something to contribute.

Read More