Sync’ up! … without getting drained

sep 27

An ikura app — part III

N.B. This post is deprecated. For archival purposes, it remains here, but generally, it ought to be disregarded by readers.

As we round the bend with our bonegram application, we are confronted with some hard requirements for dealing with Twitter. Although there are plenty of good-will ’bots in action on Twitter, ’bots are probably tolerated at best over there; we want to make sure we tread lightly. So, what can we do to make sure we aren’t a pest?

Design requirements

A tweet with the same content (consecutively) leads to an error with Twitter’s API. So, we want to design our app to never tweet the same thing twice to the same person. This actually is trickier than it seems, as we are not using a database.

A second hard requirement is we don’t want to do huge batch tweets. This is a bit easier to design for, but still forces us to be cautious & a tad creative.

Tweaking the code

In our ‘src/bonegram_lib.erl’ module, we previously wrote two routines that queried & collected tweets from Twitter (recall that the aritiy-one version gathered the delta from the last time it was run). With this already in place, we introduce new/2 and supporting functions. The finalized module looks as follows:

-module(bonegram_lib).

%% api
-export([new/0, new/1]).
-export([new/2]).

-define(KEY,          env(key)).
-define(SECRET,       env(secret)).
-define(TOKEN,        env(token)).
-define(TOKEN_SECRET, env(token_secret)).

%% 
%% api routines
%%

new() ->
    Params = basic_params(),
    handle_twitter_call(Params).

new(Id) when is_integer(Id) ->
    Params  = basic_params(),
    Params1 = [{"since_id", Id}|Params],
    handle_twitter_call(Params1).

new(tweet, {Msg, [{id_str, I}, _, {screen_name, N}, _]}) ->
    Msg1    = tweet_message(N, Msg),
    Params  = [{"status", Msg1},
              {"in_reply_to_status", I}],
    {ok, _} = handle_oauth(post, Params),
    ok.

%% 
%% business routines
%%

handle_twitter_call(Params) ->
    {ok, Body}      = handle_oauth(get, Params),
    {ok, [{tweets, T}, 
      {max_id, M}]} = handle_digest(Body),
    {ok, Recs}      = handle_collection(T),
    {ok, Recs, {M}}.

handle_oauth(post, Params) ->
    Fn = fun oauth:post/5,
    U  = "https://api.twitter.com/1.1/statuses/update.json",
    handle_oauth(Fn, U, Params);
handle_oauth(get, Params) ->
    Fn = fun oauth:get/5,
    U  = "https://api.twitter.com/1.1/search/tweets.json",
    handle_oauth(Fn, U, Params).

handle_oauth(Fn, U, Params) ->
    C = {?KEY, ?SECRET, hmac_sha1},
    {ok, {{_, 200, _}, _, X}} = Fn(
      U, Params, C, ?TOKEN, ?TOKEN_SECRET),
    {ok, X}.

handle_collection(L) ->
    Recs = [ [
      {id_str,      extract(<<"id_str">>, X)},
      {created_at,  extract(<<"created_at">>, X)},
      {screen_name, extract(<<"user">>, <<"screen_name">>, X)},
      {text,        extract(<<"text">>, X)}] || X <- L ],
    {ok, Recs}.

handle_digest(Body) ->
    Term   = normalize(Body),
    Tweets = extract(<<"statuses">>, Term),
    MaxId  = extract(<<"search_metadata">>, <<"max_id">>, Term),
    {ok, [{tweets, Tweets}, {max_id, MaxId}]}.

%% 
%% support routines
%%

tweet_message(N, Msg) ->
    Msg1 = [<<"@">>, N, <<": ">>, Msg],
    iolist_to_binary(Msg1).

normalize(X) ->
    Bin = list_to_binary(X),
    jsx:decode(Bin).

env(What) ->
    {ok, X} = application:get_env(bonegram, What),
    binary_to_list(X).

extract(What, X) ->
    {What, V} = lists:keyfind(What, 1, X),
    V.

extract(WhatA, WhatB, X) ->
    Y = extract(WhatA, X),
    extract(WhatB, Y).

basic_params() -> 
    [{"q", "-his -her -almost broke my arm -filter:retweets"},
     {"result_type", "recent"},
     {"lang", "en"},
     {"count", 100}].

This module is now capable of tweeting; new/2 takes details from what it gathers via new/1 & updates bonegram’s Twitter status. For every broken arm in the Twitter-verse, bonegram now will tweet one of these (for example):

“@twitterMan21: Ugg.. no fun having a broken arm. Feel better :)”

Safeguarding bonegram

When we start the bonegram application, the initial ikura call should prime our application; nothing more. In the many subsequent ikura calls, however, we will be sending out bonegram tweets as usual. What we don’t want is to tweet to the initial collection. Why? We want to create a safeguard. If we tweeted to the first collection, these users would potentially be hearing from us as often as we stopped & started the application. Recall our hard requirements above. It’s not that we plan to restart the app a lot, it’s just that we want to prevent multiple tweets. This is why we design it this way.

Storing progress

To hold our state, handle errors, and behave predictably, we will use a supervised ‘gen_server’ behavior. A new module is put into place without too much thinking; ‘src/bonegram_srv.erl’ is as follows:

-module(bonegram_srv).
-behaviour(gen_server).

%% api
-export([start_link/0]).
-export([latest/0]).
-export([tweet/2]).
-export([random_message/0]).

%% behavior callbacks
-export([init/1,
         handle_call/3,
         handle_cast/2,
         handle_info/2,
         terminate/2,
         code_change/3]).

-record(state, {last_id}).

-define(SERVER, ?MODULE).

%%
%% api
%%

start_link() ->
    gen_server:start_link({local, ?SERVER}, ?MODULE, [], []).

latest() ->
    gen_server:call(?SERVER, latest).

tweet(Msg, Info) ->
    gen_server:cast(?SERVER, {tweet, Msg, Info}).

random_message() -> 
    gen_server:call(?SERVER, random_message).

%%
%% behavior callbacks
%%

init([]) ->
    process_flag(trap_exit, true),
    propagate_random_seed(),
    {ok, #state{}}.

handle_call(latest, _From, #state{last_id=undefined}) ->
    {ok, MaybeNew, {Latest}} = bonegram_lib:new(),
    State = #state{last_id=Latest},
    {reply, {initialized, MaybeNew}, State};
handle_call(latest, _From, #state{last_id=Id} = State) ->
    {ok, MaybeNew, {Latest}} = bonegram_lib:new(Id),
    State1 = maybe_new_state(MaybeNew, Latest, State),
    {reply, {active, MaybeNew}, State1};
handle_call(random_message, _From, State) ->
    Msg = rand_message(),
    {reply, Msg, State};
handle_call(_Request, _From, State) ->
    Reply = ok,
    {reply, Reply, State}.

handle_cast({tweet, Msg, Info}, State) ->
    proc_lib:spawn_link(
      fun() -> bonegram_lib:new(tweet, {Msg, Info}) end),
    {noreply, State};
handle_cast(_Msg, State) ->
    {noreply, State}.

handle_info({'EXIT', _Proc, normal}, State) ->
    {noreply, State};
handle_info(Info, State) ->
    error_logger:error_report({proc_lib_tweet_failed, Info}),
    {noreply, State}.

terminate(_Reason, _State) ->
    ok.

code_change(_OldVsn, State, _Extra) ->
    {ok, State}.

%%
%% support routines
%%

maybe_new_state([], _, #state{} = S) -> S;
maybe_new_state(_, L, #state{} = S)  -> S#state{last_id=L}.

propagate_random_seed() ->
    <> = crypto:rand_bytes(12),
    random:seed(A, B, C).

rand_message() ->
    M = messages(),
    L = length(M),
    N = random:uniform(L),
    lists:nth(N, M).

messages() -> 
    [<<"heard about the arm :( sending you some love.">>, 
     <<"bummer news about the broken arm :( Feel better.">>, 
     <<"no fun getting a broken arm. :( :( :(">>,
     <<"ouch :( Heal up quick & feel better!">>, 
     <<"hope you bounce back quick. Broken arm's no fun :(">>,
     <<"broken arm :( wishing you better times!">>, 
     <<"darn bodies. Hope you heal up fast :)">>, 
     <<"get better soon :)">>, 
     <<"get well soon :(">>, 
     <<"that sucks about the arm. Feel better :)">>, 
     <<"nothing worse. Get well soon :)">>].

And our supervisor (‘src/bonegram_sup.erl’) module looks as follows:

-module(bonegram_sup).
-behaviour(supervisor).

%% api
-export([start_link/0]).

%% behavior callbacks
-export([init/1]).

-define(SERVER, ?MODULE).

start_link() ->
    supervisor:start_link({local, ?SERVER}, ?MODULE, []).

%%
%% behavior callbacks
%%

init([]) ->
    ChildA = {tag1, {bonegram_srv, start_link, []},
      permanent, 5000, worker, [bonegram_srv]},
    {ok, { {one_for_one, 5, 10}, [ChildA]} }.

We will also augment our ‘src/bonegram_app.erl’ module one last time:

-module(bonegram_app).
-behaviour(application).

%% behavior callbacks
-export([start/2, stop/1]).

-define(CLIENT_ACCEPTORS, 10).

%%
%% behavior callbacks
%%

start(_StartType, _StartArgs) ->
    Port      = port(),
    Routes    = routes(),
    Dispatch  = cowboy_router:compile(Routes),
    TransOpts = [{port, Port}],
    ProtoOpts = [{env, [{dispatch, Dispatch}]}],
    {ok, _}   = cowboy:start_http(
      main_http_listener, ?CLIENT_ACCEPTORS, TransOpts, ProtoOpts),
    bonegram_sup:start_link().

stop(_State) ->
    ok.

%%
%% support routines
%%

port() ->
    {ok, Port} = application:get_env(http_port),
    Port.

routes() ->
    [{'_', [latest_route(), test_route()]}].

latest_route() -> 
    {<<"/latest.json">>, bonegram_handler, [{type, latest}]}.

test_route() -> 
    {<<"/test">>, bonegram_handler, [{type, test}]}.

You can see that bonegram will talk with ikura via one HTTP endpoint: ‘/latest.json.’ Notice, we preserved the ‘/test’ endpoint as well. You will see why, soon enough.

Cowboy glue

We have arrived at the end of our coding chores. Recall that our old ‘src/bonegram_handler.erl’ answered to ‘/test’ requests. That has not changed, but we will now build-out the new ‘/latest.json’ handler which wires our whole application together. Out last hacks are as follows:

-module(bonegram_handler).

%% cowboy callbacks
-export([init/2]).
-export([content_types_provided/2]).
-export([is_authorized/2]).

%% user-def callbacks
-export([handle_json/2]).

-record(state, {type, ip}).

-define(WHITELIST_IPS, [{192, 241, 197, 30}, {127, 0, 0, 1}]).

%% 
%% cowboy callbacks
%% 

init(Req, [{type, T}] = _Opts) ->
    {IP, _} = cowboy_req:peer(Req),
    State   = #state{type=T, ip=IP},
    {cowboy_rest, Req, State}.

content_types_provided(Req, State) ->
    provided(Req, State).

is_authorized(Req, #state{ip=IP, type=latest} = S) ->
    {ok, Res} = handle_ip_authorization(IP),
    {Res, Req, S};
is_authorized(Req, State) ->
    {true, Req, State}.

%%
%% user-def callbacks
%%

handle_json(Req, #state{type=latest} = State) ->
    {ok, M} = handle_latest(),
    Json    = jsx:encode([{tweets, M}]),
    {Json, Req, State};
handle_json(Req, State) ->
    Json = "{\"status\": \"todo\"}\n",
    {Json, Req, State}.

%%
%% business routines
%%

handle_latest() ->
    Res = bonegram_srv:latest(),
    M   = maybe_tweet(Res),
    {ok, M}.

handle_ip_authorization(I) ->
    B = lists:member(I, ?WHITELIST_IPS),
    handle_ip_authorization1(B).

handle_ip_authorization1(true)  -> {ok, true};
handle_ip_authorization1(false) -> {ok, {false, <<"">>}}.

%%
%% support routines
%%

maybe_tweet({initialized, _}) -> <<"initialized">>;
maybe_tweet({active, L}) -> 
    Msgs = [ bonegram_srv:random_message() || _X <- L ],
    All  = lists:zip(Msgs, L),
    [ tweet(M, X) || {M, X} <- All ],
    length(L).

tweet(Msg, Info) ->
    bonegram_srv:tweet(Msg, Info).

provided(Req, State) ->
    {[json_type()], Req, State}.

json_type() ->
    {{<<"application">>, <<"json">>, []}, handle_json}.

This module is the meat & potatoes of our application’s flow. There is a good amount going on, but mainly it works like this:

  1. when ‘/latest.json’ is first called, it initializes our server state. It returns ‘{"tweets" : "initialized"}’ only once. Remember, this corner case exists so we adhere to our hard requirements
  2. for all subsequent calls, a count of the messages bonegram tweets is what is returned, ie. ‘{"tweets" : 3}’

But ‘src/bonegram_handler.erl’ does even more. If you look closely, you will also notice that unless our app is called locally or from ikura’s IP, it will return a 401 error code. Meaning, ‘latest.json’ is never executed. This is good; we don’t want to have, say, ‘https://your-bone-url.cc/latest.json’ publicly reachable for just anyone to call. Only ikura will have the permission to run the code at this endpoint. We deliberately left the ‘/test’ endpoint public, however, so you can verify this to be true.

(It’s up to you to create an OTP release of this code & put the app up on a server for ikura to call.)

Ikura will-call

Over at ikura.co, adding a timer to call our application could not be easier. The Channel (e.g. http://123.344.555.666:8004) & Pool (i.e. /latest.json) are the only two pieces of information we need. We set the timing service for every fifteen minutes, and presto! — we have a love-sharing Twitter ’bot tasked to run four times an hour. Easy. (And reliable.)