Hot-code burn marks

jan 18

Hot-code burn marks

Updating an OTP system with new code without taking it out service is tough. How tough? Well, Erlang tough. Meaning, there is a learning curve, but after the hazing, man, does it ever feel good.

In the brochure, Erlang/OTP touts hot-code loading usually towards the end of its ‘killer-features’ list. Imagine, it goes, running your servers, then introducing a new feature, pushing a button, and bango, your feature is live!

Like all magic, there is rigor involved to pulling it off. It can be a bit daunting to perform hot-code loading (sometimes called hot-code swapping) for an entire OTP system. Largely, it requires a steady hand & clear head to do it correctly.

Though there are plenty of resources on the web for performing this ‘as-seen-on-TV’ feat, I thought it would be helpful to write a toy system that showcases the steps needed to make it all work, while stressing the parts one should be wary of.

We will be using the slightly now hum-drum tool rebar (not to be confused with rebar3), relx, and a few third-party Erlang libraries. So be sure to have the above installed, along with a recent version of OTP (I am using 17.5).

Fire is looming

We are going to put together an Erlang web-server that serves a single page. Every three seconds, some javascript will call an API endpoint (‘/which-color.json’) that will toggle the background color of our web-page. When we install our updated system (i.e. with hot-code loading), we will see a sudden change in the colors being cycled in our browser.

Some Erlang libraries will aid us along. In addition to Cowboy, we will use the Erlang port of the Django templating language. So, using rebar let’s go create our ‘fire is looming’ app with these dependencies & get started.

mkdir fil
cd !$
rebar create-app appid=fil

This will yield the building-blocks of our OTP application; three files should be created in a new ‘src’ directory. We can go ahead and create a ‘rebar.config’ file in the project root directory with the following code:

{require_otp_vsn, "17"}.

{erlydtl_opts, [
  {doc_root,   "priv/dtl"},
  {out_dir,    "ebin"},
  {source_ext, ".dtl"},
  {module_ext, "_dtl"},
  {recursive,  true}]}.

{deps, [
    {cowboy, ".*", 
      {git, "git://github.com/nato/cowboy.git", 
        {branch, "stable"}}},
    {erlydtl, ".*", 
      {git, "git://github.com/nato/erlydtl.git", 
        {branch, "stable"}}}
]}.

Then, to fetch our dependencies, we perform the following:

rebar get-deps

Color state-machine

Let’s switch gears and create a color state-machine server. This server will change from color to color at an interval of our choosing. And, like most servers, it will respond to requests & offer up what the current color is.

In the initial version of our system, the colors will cycle through three greens: starting dark and getting slightly lighter; then reset back. Here is the code for our new ‘src/fil_color_fsm.erl’ file:

-module(fil_color_fsm).
-behaviour(gen_fsm).

%% api
-export([start_link/0]).
-export([which/0]).

%% behavior callbacks
-export([init/1,
         handle_event/3,
         handle_sync_event/4,
         handle_info/3,
         terminate/3,
         code_change/4]).

%% fsm callbacks
-export([olive/2, 
         olive/3,
         green/2, 
         green/3,
         lime/2, 
         lime/3]).

-define(SERVER, ?MODULE).
-define(CYCLE_TIME, 4000).

-record(state, {}).

%%
%% api
%%

start_link() ->
    gen_fsm:start_link({local, ?SERVER}, ?MODULE, [], []).

which() -> 
    {Color, _State} = sys:get_state(?SERVER, 1000),
    {current_color, Color}.

%%
%% behavior callbacks
%%

init([]) ->
    process_flag(trap_exit, true),
    new(self_notify),
    {ok, olive, #state{}}.

olive(_Evnt, State) -> {next_state, olive, State}.

green(_Evnt, State) -> {next_state, green, State}.

lime(_Evnt, State) -> {next_state, lime, State}.

olive(_Evnt, _From, State) ->
    new(self_notify),
    Reply = ok,
    {reply, Reply, green, State}.

green(_Evnt, _From, State) ->
    new(self_notify),
    Reply = ok,
    {reply, Reply, lime, State}.

lime(_Evnt, _From, State) ->
    new(self_notify),
    Reply = ok,
    {reply, Reply, olive, State}.

handle_event(_Evnt, StateName, State) ->
    {next_state, StateName, State}.

handle_sync_event(_Evnt, _From, StateName, State) ->
    Reply = ok,
    {reply, Reply, StateName, State}.

handle_info(self_notify, StateName, State) ->
    new(proc, StateName),
    {next_state, StateName, State};
handle_info({'EXIT', _P, normal}, StateName, State) ->
    {next_state, StateName, State};
handle_info({'EXIT', _P, _Reason}, StateName, State) ->
    error_logger:info_msg("*** trapped non-normal exit~n"),
    {next_state, StateName, State};
handle_info(_Info, StateName, State) ->
    {next_state, StateName, State}.

terminate(_Reason, _StateName, _State) ->
    ok.

code_change(_OldVsn, StateName, State, _Extra) ->
    {ok, StateName, State}.

%% 
%% business routines
%%

new(self_notify) ->
    S    = self(),
    When = ?CYCLE_TIME,
    erlang:send_after(When, S, self_notify).

new(proc, C)    ->
    proc_lib:spawn_link(fun() -> handle_job(C) end).

handle_job(Color) ->
    ok = gen_fsm:sync_send_event(?SERVER, Color).

JSON end-point

Our system needs to serve one HTTP API end-point, so we will code the typical cowboy goodies to help us with that. The following is the handler code which we place in ‘src/fil_handler.erl’ :

-module(fil_handler).

%% cowboy callbacks
-export([init/2]).

%%
%% cowboy callbacks
%%

init(Req, [{type, root}]) ->
    {content, B} = fil_lib:dtl(root),
    Req2         = fil_lib:standard(B, Req),
    {ok, Req2, {}};
init(Req, [{type, which_color}]) ->
    Json = new(json_color),
    Req2 = fil_lib:json(Json, Req),
    {ok, Req2, {}}.

%%
%% business routines
%%

new(json_color) ->
    {current_color, C} = fil_color_fsm:which(),
    C1 = erlang:atom_to_binary(C, latin1),
    ["{", 34, "color", 34, " : ", 34, C1, 34, "}"].

Did you notice that we called a few helper routines in the above? It’s all in place in a file called ‘src/fil_lib.erl’ which has the following code:

-module(fil_lib).

-export([standard/2, json/2, dtl/1, dtl/2]). 

%%
%% api routines
%%

standard(X, Req) -> 
    cowboy_req:reply(200, [{<<"content-type">>, 
      <<"text/html; charset=utf-8">>}], X, Req).

json(X, Req) -> 
    cowboy_req:reply(200, [{<<"content-type">>, 
      <<"application/json; charset=utf-8">>}], X, Req).

dtl(What) -> dtl(What, []).

dtl(What, Opts) ->
    Fn      = dtl1(What),
    {ok, B} = Fn(Opts),
    {content, B}.

%%
%% support routines
%%

dtl1(root) -> fun root_dtl:render/1.

OTP-ification

Now we need to incorporate our cowboy handler, our ‘gen_fsm’ & our support lib into a bona fide OTP application. To start, we tweak ‘src/fil_app.erl’ so it looks as follows:

-module(fil_app).
-behaviour(application).

%% behavior callbacks
-export([start/2, stop/1]).

-define(CLIENT_ACCEPTORS, 3).

%%
%% behavior callbacks
%%

start(_StartType, _StartArgs) ->
    {ok, _} = handle_cowboy(),
    fil_sup:start_link().

stop(_State) -> ok.

%%
%% business routines
%%

handle_cowboy() ->
    Port      = port(),
    Routes    = routes(),
    Dispatch  = cowboy_router:compile(Routes),
    TransOpts = [{port, Port}],
    ProtoOpts = [
      {env, [{dispatch, Dispatch}]}],
    cowboy:start_http(
      main_http_listener, ?CLIENT_ACCEPTORS, TransOpts, ProtoOpts).

%%
%% support routines
%%

port() ->
    {ok, Port} = application:get_env(http_port),
    Port.

routes() ->
    Root = {<<"/">>, fil_handler, [{type, root}]},
    Api  = {<<"/which-color.json">>, 
      fil_handler, [{type, which_color}]},
    [{'_', [Root, Api]}].

Next, ‘src/fil_sup.erl’ needs to be changed to supervise our finite state machine:

-module(fil_sup).
-behaviour(supervisor).

%% api
-export([start_link/0]).

%% behavior callbacks
-export([init/1]).

%%
%% api routines
%%

start_link() ->
    supervisor:start_link({local, ?MODULE}, ?MODULE, []).

%%
%% behavior callbacks
%%

init([]) ->
    Child = {tag1, 
      {fil_color_fsm, start_link, []}, 
      permanent, 5000, worker, [fil_color_fsm]},
    {ok, {{one_for_one, 5, 10}, [Child]}}.

And finally, ‘src/fil.app.src’ needs to be as follows:

{application, fil, [
  {description, "The fire-is-looming demo app."},
  {vsn, "1986.a"},
  {registered, []},
  {applications, [
                  kernel,
                  stdlib,
                  cowboy,
                  erlydtl
                 ]},
  {mod, {fil_app, []}},
  {env, []}
]}.

Our back-end is now in place, and we just need to do a few things to present our one-page HTML file.

Django templating

The Django templating project for Erlang (erlydtl) is excellent & I use it often. Although we could create a simple HTML file for this toy app, I thought it beneficial to expose how these templates interplay with hot-code loading. I have already slipped in all the necessary erlydtl configurations above, so all we need to do it create the template. Go ahead and make the directories ‘priv/dtl’ in the project root, then add ‘root.dtl’ in there with the following:

<!DOCTYPE html>
<html lang='en'>
<head>
  <meta charset='utf-8'>
</head>
<body>
  <div id='color'>
    <section>
      <h3>Hello #1</h3>
    </section>
  </div>
  <script>
    var getColor = function(callback) {
      var request = new XMLHttpRequest()
      request.onreadystatechange = function () {
        if (request.readyState === 4 && request.status === 200) {
          var resp_obj = JSON.parse(request.responseText)
          var obj      = {color : resp_obj.color}
          callback.call(obj)
        }
      }
      request.open('GET', '/which-color.json', true)
      request.setRequestHeader('Content-Type', 'application/json')
      request.send(null)
    }
    var setColor = function() {
      var el = document.getElementById('color')
      el.style.background = this.color
    }
    var loop = function() {
      getColor(setColor)
      setTimeout(loop, 1200)
    }
    loop()
  </script>
</body>
</html>

For the back-end/system hackers out there, the JS above performs a couple of things:

in a loop, call our API end-point to retrieve a color
set the top/header element’s background to that color

For the sake of this tutorial, this strategy will suffice. A more elegant approach would to be use websockets.

Initial release

We now have most of what we need for our OTP release to be built & served. We do need to configure our release with a couple of things, so let’s bang that out right now. We want to create a ‘relx.config’ in our project root with the following:

{include_erts, true}.

{default_release, {fil, "1986.a"}}.

{release, {fil, "1986.a"},
    [fil, sasl]
}.

{vm_args, "./private/vm.args"}.
{sys_config, "./private/sys.config"}.

{extended_start_script, true}.

And you will notice that we are calling a couple files in there that don’t exist yet. We will need to create those in a new directory called ‘private’ which sits in the project root. Our first file, ‘private/vm.args’ has the following:

-name fil@127.0.0.1
-setcookie fil123

And our second private file, ‘private/sys.config,’ has the following Erlang term:

[
  {fil, [
    {http_port, 8004}
  ]}
].

Using our installed build tools, we are ready to create our release. We do that the following way:

rebar compile
relx release tar

You will notice that a ‘_rel’ directory was created; it contains a ‘tar’ of our OTP system we can now install. Let’s work out of a ‘/tmp/fil’ directory and copy ‘_rel/fil/fil-1986.a.tar.gz’ over there now. We install our system as follows:

cd /tmp/fil
tar xvf fil-1986.a.tar.gz
./bin/fil start

If you open a web-browser to ‘127.0.0.1:8004’ — assuming all went well — then you should see a wonky web-page with a toggling green header.

Preparing version 1986.b

Our goal is to have our system update with zero down-time. In most cases, simply stopping the running release and starting the newer one would suffice, but some systems have tighter demands. So, let’s make the changes needed to have this all work.

We can start with bumping the version in ‘src/fil.app.src’ which now has the following:

{application, fil, [
  {description, "The fire-is-looming demo app."},
  {vsn, "1986.b"}, %% N.B.
  {registered, []},
  {applications, [
                  kernel,
                  stdlib,
                  cowboy,
                  erlydtl
                 ]},
  {mod, {fil_app, []}},
  {env, []}
]}.

The second task is to change our ‘src/fil_color_fsm.erl’ file. We want to serve the imminent ‘red’ instead of ‘lime.’ It now looks as follows:

-module(fil_color_fsm).
-behaviour(gen_fsm).

%% api
-export([start_link/0]).
-export([which/0]).

%% behavior callbacks
-export([init/1,
         handle_event/3,
         handle_sync_event/4,
         handle_info/3,
         terminate/3,
         code_change/4]).

%% fsm callbacks
-export([olive/2, 
         olive/3,
         green/2, 
         green/3,
         red/2, 
         red/3]).

-define(SERVER, ?MODULE).
-define(CYCLE_TIME, 4000).

-record(state, {}).

%%
%% api
%%

start_link() ->
    gen_fsm:start_link({local, ?SERVER}, ?MODULE, [], []).

which() -> 
    {Color, _State} = sys:get_state(?SERVER, 1000),
    {current_color, Color}.

%%
%% behavior callbacks
%%

init([]) ->
    process_flag(trap_exit, true),
    new(self_notify),
    {ok, olive, #state{}}.

olive(_Evnt, State) -> {next_state, olive, State}.

green(_Evnt, State) -> {next_state, green, State}.

red(_Evnt, State) -> {next_state, red, State}.

olive(_Evnt, _From, State) ->
    new(self_notify),
    Reply = ok,
    {reply, Reply, green, State}.

green(_Evnt, _From, State) ->
    new(self_notify),
    Reply = ok,
    {reply, Reply, red, State}.

red(_Evnt, _From, State) ->
    new(self_notify),
    Reply = ok,
    {reply, Reply, olive, State}.

handle_event(_Evnt, StateName, State) ->
    {next_state, StateName, State}.

handle_sync_event(_Evnt, _From, StateName, State) ->
    Reply = ok,
    {reply, Reply, StateName, State}.

handle_info(self_notify, StateName, State) ->
    new(proc, StateName),
    {next_state, StateName, State};
handle_info({'EXIT', _P, normal}, StateName, State) ->
    {next_state, StateName, State};
handle_info({'EXIT', _P, _Reason}, StateName, State) ->
    error_logger:info_msg("*** trapped non-normal exit~n"),
    {next_state, StateName, State};
handle_info(_Info, StateName, State) ->
    {next_state, StateName, State}.

terminate(_Reason, _StateName, _State) ->
    ok.

code_change(_OldVsn, StateName, State, _Extra) ->
    {ok, StateName, State}.

%% 
%% business routines
%%

new(self_notify) ->
    S    = self(),
    When = ?CYCLE_TIME,
    erlang:send_after(When, S, self_notify).

new(proc, C)    ->
    proc_lib:spawn_link(fun() -> handle_job(C) end).

handle_job(Color) ->
    ok = gen_fsm:sync_send_event(?SERVER, Color).

Also, to showcase how erlydtl behaves with hot-code swapping, we change a one-liner in ‘priv/dtl/root.dtl’ as well:

...
<section>
  <h3>Hello #2</h3>
</section>
...

So, these are our code-changes, and now we need to switch gears & prepare OTP for the delta(s).

First, familiarize yourself with relups. You will see that one has to explicitly tell the release handler what’s changed. It’s a fairly deep subject, but for our toy app, all we need is a few lines of Erlang terms placed into a config file. Since erlydtl generates ‘.beam’ files from the ‘.dtl’ source, we need to tell the release handler that we have two changes to account for — we put this all in a new file called ‘src/fil.appup.src’ which looks as follows:

{"1986.b",
  [{"1986.a", [
    {load_module, fil_color_fsm},
    {load_module, root_dtl}]}],
  [{"1986.a", [
    {load_module, fil_color_fsm},
    {load_module, root_dtl}]}]
}.

Ideally, you should peruse the relup/appup guides to grasp what this all means, but for the most part it tells the OTP release handler to prepare for the apt modifications, load, then swap the new changes in.

We want to keep this new ‘appup’ file around in source as record, but if we don’t copy it over to ‘ebin/fil.appup’ then nothing will happen when we try to relup our new system. In the project root, do this once:

cat src/fil.appup.src > ebin/fil.appup

N.B. The difference in file extension.

Finally, our ‘relx.config’ needs some tweaks, too. This version bump leaves that file with the following:

{include_erts, true}.

{default_release, fil, "1986.b"}.

{release, {fil, "1986.b"},
    [{fil, "1986.b", '='}, sasl]
}.

{release, {fil, "1986.a"},
    [fil, sasl]
}.

{vm_args, "./private/vm.args"}.
{sys_config, "./private/sys.config"}.

{extended_start_script, true}.

Ready for hotness

We are done with our preparation & can now update our system. Working from the project source directory, we need to compile the new code, then generate a release with a slightly different twist:

rebar compile
relx relup tar

This, again, will give us our ‘tar’ — this time version ‘1986.b’ — placed in ‘_rel/fil.’ An aside, if you blow away your ‘_rel’ directory, doing ‘relups’ from a fresh state will give you issues. It’s best to keep a working directory of all your releases.

Let’s get this new system live! Follow along with some care:

cp _rel/fil/fil-1986.b.tar.gz /tmp/fil
cd /tmp/fil
mkdir releases/1986.b
cp fil-1986.b.tar.gz releases/1986.b/fil.tar.gz

Everything is in place for the new version. If you have closed your web-browser, be sure to take another gander at the old version now by going to the ‘127.0.0.1:8004’ URL. Keeping that in your peripheral, perform the following from the ‘/tmp/fil’ directory:

./bin/fil upgrade "1986.b/fil"

If all is well, your browser should be toggling the new red color into its header. This indicates that our back-end has indeed been swapped. A quick refresh of your browser should indicate our template change, too: ‘Hello #1’ should now read ‘Hello #2.’ This indicates that our ‘front-end’ was swapped successfully.

In conclusion

One will notice that it’s quite the dance in getting hot-code loading right. It certainly comes at a cost, but thanks to ‘relx,’ a good deal of the pain is hidden (believe it or not). The rebar3 project is one to watch, too. You can expect further automation of some of the steps above as it matures.

All in all, without hot-code loading, Syncpup wouldn’t have attempted to specialize in what it does. It would just be too tough. We need to update our code-base often. Code updates while preserving customer state would be excruciating without Erlang’s hot-code loading.