Concurrent Programming in Erlang - Winning by Failing (week two)

No one wants to fail. We'd like to succeed in life, to pass our exams, advance in our jobs, achieve our goals. And when we become bored with the usual challenges, we see who can eat the most hotdogs, throw a horse's shoe the farthest and get the most sacks of corn in a hole. Winning, and avoiding failure at all costs, is deeply ingrained in us. That's what makes Erlang so unusual as a language. Failure is expected and even embraced.

It's easy to get defensive when you're writing a program. You don't want your application to crash - what a lousy experience for your users! So we go nuts with try/catch blocks in an effort to catch every edge-case, cluttering our codebase in the process. Wouldn't it be nice if we could just let parts of our code assume the world is perfect, secure in the knowledge that when those parts crash and burn there's something waiting to start 'em up again. In the Erlang world that's exactly what we can achieve, hundreds or even thousands of small processes all blissfully unaware that anything could go wrong, and supervisors ready to recover after a crash.

As we know, defensive programming - trying to deal with all possible failures - is doomed. We simply can’t quantify the ways in which a system might fail, so instead we embrace failure. ~ Simon Thompson, instructor

Erlang was designed for systems with lots of moving parts outside of a developer's direct control, including hardware that could fail. Simon goes into much more detail in the course, and this week we got a more complete picture of how and why Erlang embraces failure. (I also wrote a few notes about last week.)

  1. We can do many things at once by running (spawning) small processes to perform a set of tasks.

  2. But what if a process dies? How will we know and recover? We can link processes together, so that if one goes down it sends a signal out and anything linked to it dies too. Wait, is that what we want..?

  3. We can also tell a process to trap the signals it receives, and instead of dying it can take some other action such as restarting those processes. In essence, it's supervising the processes its linked to.

Supervising processes... aka, let 'em die

Here's a module we worked with in the course that demonstrates the behavior. You should be able to copy it as-is into an Erlang shell and run it using the commands posted below the module.

  • The top half is the supervisor - it traps incoming exit signals (process_flag(trap_exit, true)) when a process it's linked to dies, and processes those signals as messages in its receive loop. In the loop, it starts a new process to replace the one that died.

  • The bottom half, everything under %% CLIENT FUNCTIONS, is the client. When a client is started, it sends a message to the server and the server links the two together.

%% Based on code from
%%   Erlang Programming
%%   Francecso Cesarini and Simon Thompson
%%   O'Reilly, 2008
%%   http://oreilly.com/catalog/9780596518189/
%%   http://www.erlangprogramming.org/
%%   (c) Francesco Cesarini and Simon Thompson

-module(frequency).
-export([start_server/0, init_server/0]).
-export([start_client/0, init_client/0]).

%% SERVER FUNCTIONS

start_server() ->  
    register(frequency,
         spawn(frequency, init_server, [])).

init_server() ->  
  process_flag(trap_exit, true),    %% TRAP EXIT SIGNAL
  Frequencies = {get_frequencies(), []},
  loop(Frequencies).

get_frequencies() -> [10,11,12,13,14,15].

%% Server Message Loop

loop(Frequencies) ->  
  receive
    {request, Pid, allocate} ->
      {NewFrequencies, Reply} = allocate(Frequencies, Pid),
      Pid ! {reply, allocate, Reply},
      loop(NewFrequencies);
    {request, Pid , {deallocate, Freq}} ->
      NewFrequencies = deallocate(Frequencies, Freq),
      Pid ! {reply, deallocate, ok},
      loop(NewFrequencies);
    {request, Pid, stop} ->
      Pid ! {reply, stop, stopped};
    {'EXIT', Pid, _Reason} ->       %% HANDLE EXIT SIGNAL
      NewFrequencies = exited(Frequencies, Pid),
      spawn(frequency, init_client, []),
      loop(NewFrequencies)
  end.

%% Internal help functions to allocate and deallocate frequencies

allocate({[], Allocated}, _Pid) ->  
  {{[], Allocated}, {error, no_frequency}};
allocate({[Freq|Free], Allocated}, Pid) ->  
  link(Pid),
  {{Free, [{Freq, Pid}|Allocated]}, {ok, Freq}}.

deallocate({Free, Allocated}, Freq) ->  
  {value,{Freq,Pid}} = lists:keysearch(Freq,1,Allocated),
  unlink(Pid),
  NewAllocated = lists:keydelete(Freq,1,Allocated),
  {[Freq|Free], NewAllocated}.

exited({Free, Allocated}, Pid) ->  
    case lists:keysearch(Pid,2,Allocated) of
      {value,{Freq,Pid}} ->
        NewAllocated = lists:keydelete(Freq,1,Allocated),
        {[Freq|Free],NewAllocated};
      false ->
        {Free,Allocated}
    end.

%% CLIENT FUNCTIONS

start_client() ->  
    spawn(frequency, init_client, []).

init_client() ->  
    frequency ! {request, self(), allocate},
    client_loop().

%% Client Message Loop

client_loop() ->  
    receive
        {reply, allocate, Reply} ->
            io:fwrite("Client with pid ~p allocated: ~p", [self(), Reply]);
        {reply, deallocate, Reply} ->
            io:fwrite("Client with pid ~p deallocated: ~p", [self(), Reply]);
        {reply, stop, Reply} ->
            io:fwrite("Client with pid ~p stopped: ~p", [self(), Reply])
    end,
    client_loop().

Compile the module, start the server and a few clients, and finally the Observer application. The Observer app gives you a window into what Erlang processes are running. Switch to the "Processes" tab and look for "frequency" (the server) and "frequency:init_client/0" (the clients). Kill off a client and keep watching. A new process starts up with a new PID.

1> c(frequency).  
{ok,frequency}

2> frequency:start_server().                       % start server  
true

3> frequency:start_client().                       % start clients  
Client with pid <0.46.0> allocated: {ok,10}<0.46.0>

4> frequency:start_client().  
Client with pid <0.48.0> allocated: {ok,11}<0.48.0>

5> observer:start().                               % start Observer and kill the clients

Client with pid <0.4529.0> allocated: {ok,11}      % watch them respawn  
Client with pid <0.10080.0> allocated: {ok,10}  

But what if I like catching things?

Erlang does have a try/catch construct you can use when you need it, and it can catch a few different things. Here's a silly module that demonstrates catching a thrown message, a system error and an exit signal.

-module(trycatch).

-export([fun_with_division/2]).

fun_with_division(Numerator, Denominator) ->  
    try
        case Numerator of
            0 when Denominator =/= 0 -> throw(this_is_always_zero);
            42 -> exit(exiting_for_no_good_reason);
            _ -> io:format("The quotient is ~p.~n", [Numerator / Denominator])
        end
    catch
        throw:Reason -> io:format("The calculation threw a message: ~p~n", [Reason]);
        error:Reason -> io:format("The calculation returned an error: ~p~n", [Reason]);
        exit:Reason -> io:format("The process sent a signal: ~p~n", [Reason])
    after
        % stuff in here happens no matter what!
        io:format("Your numerator and denominator were ~p:~p.~n",
                  [Numerator, Denominator])
    end.

Throw some different numbers at it to trigger the different catch cases.

1> c(trycatch).  
{ok,trycatch}

2> trycatch:fun_with_division(1,10).  
The quotient is 0.1.  
Your numerator and denominator were 1:10.

3> trycatch:fun_with_division(0,10).  
The calculation threw a message: this_is_always_zero  
Your numerator and denominator were 0:10.

4> trycatch:fun_with_division(10,0).  
The calculation returned an error: badarith  
Your numerator and denominator were 10:0.

5> trycatch:fun_with_division(42,2).  
The process sent a signal: exiting_for_no_good_reason  
Your numerator and denominator were 42:2.  

Codin' for Hot Loadin'

The last thing we learned about was the ability to hot load code. You can introduce new code into an application and recompile the module, and Erlang can (under certain conditions) swap in the new code without missing a beat. That helps maintain system uptime.

Here's a simple module to demonstrate. It spawns a new process that outputs a counter variable as it increments it. By spawning a new process, the prompt is still available for typing other commands.

-module(hotcodeloader).

-export([count_away/0, count_away/1]).

count_away() ->  
    spawn(hotcodeloader, count_away, [0]).

count_away(Counter) ->  
    io:format("Counting... ~p~n", [Counter]),
    timer:sleep(1000),
    hotcodeloader:count_away(Counter + 1).

Start it up, then change Counter + 1 to Counter + 3 and recompile. Without stopping the module, the new code is swapped in and used.

1> c(hotcodeloader).  
{ok,hotcodeloader}
2> hotcodeloader:count_away().  
Counting... 0  
<0.72.0>  
Counting... 1  
Counting... 2  
Counting... 3  
Counting... 4  
Counting... 5  
Counting... 6  
Counting... 7  
3> c(hotcodeloader).  
{ok,hotcodeloader}
Counting... 8  
Counting... 11  
Counting... 14  
Counting... 17  
Counting... 20  

More resources

Once again, the comments have proven very valuable, offering insights beyond the course content. A lot of people share links to other resources like videos and articles, and I want to post them here so I can go back through them later.

Videos

Articles

Tools

Subscribe to Weekly Updates!

Get an email with the latest posts, once per week...
* indicates required