Multi-node

View Source

At the moment, you can run nostrum in highly available mode across multiple nodes via OTP's distributed application support, see below. Support for properly distributing nostrum across multiple nodes and using them as one big entity is not supported (yet).

As a general rule: if you are running distributed Erlang over the internet, make sure to secure it with a solid VPN and / or by using TLS for Erlang distribution.

High availability

Running using OTP's distributed applications allows us to connect multiple nodes together and have your app and nostrum rescheduled on another node when things go south. Let's see how we can configure it. In this example, we will make use of three nodes, and all of them will be run from your bot's directory. The only difference on their command line is the --sname / --name you specify. We'll use --snames for testing here, for proper fault tolerance you will want to use multiple hosts with --name. Let's assume we name our nodes joe, robert, and mike.

Setting up distribution

The avid reader will probably know that starting with the same --cookie and --sname / --name is only step one, the nodes need to connect to each other as well.

To be able to test this in interactive mode we will configure the settings in Erlang configuration files, for releases you can use your regular config/prod.exs. We will set up the following:

  • Instruct OTP that our app, :mybot is a distributed app, and give it the hosts to run it on.

  • On startup, tell OTP it should wait for the other nodes to become available.

With the Erlang configuration files, this can be done as follows:

% mybot_joe.config
[{kernel,
  [{distributed, [{mybot, 5000, [joe@HOSTNAME, {mike@HOSTNAME, robert@HOSTNAME}]}]},
   {sync_nodes_mandatory, [mike@HOSTNAME, robert@HOSTNAME]},
   {sync_nodes_timeout, 30000}]}].
% mybot_robert.config
[{kernel,
  [{distributed, [{mybot, 5000, [joe@HOSTNAME, {mike@HOSTNAME, robert@HOSTNAME}]}]},
   {sync_nodes_mandatory, [joe@HOSTNAME, mike@HOSTNAME]},
   {sync_nodes_timeout, 30000}]}].
% mybot_mike.config
[{kernel,
  [{distributed, [{mybot, 5000, [joe@HOSTNAME, {mike@HOSTNAME, robert@HOSTNAME}]}]},
   {sync_nodes_mandatory, [joe@HOSTNAME, robert@HOSTNAME]},
   {sync_nodes_timeout, 30000}]}].

Note the only thing that changes is the sync_node_mandatory setting, which instructs OTP which hosts to wait for on startup. The other settings must match. These options instructs OTP that our app :mybot is distributed and should be started at :joe@HOSTNAME first. If that fails, it moves to :robert@HOSTNAME or :mike@HOSTNAME.

For details on the options, please see the kernel reference manual.

Playtest

In three distinct windows, run the following:

  1. iex --sname joe --cookie foo --erl-config myapp_joe.config -S mix
  2. iex --sname robert --cookie foo --erl-config myapp_robert.config -S mix
  3. iex --sname mike --cookie foo --erl-config myapp_mike.config -S mix

If you have some other application that breaks on startup now - like monitoring exporters that bind to specific ports, or similar things - this is when they will blow up. Decide whether you want to run this on every node indeed or include it with your app as shown above.

You now have three instances of the VM running. :joe@HOSTNAME runs your bot right now. If you stop that node, one of the other two nodes will start running your app. High availability complete.

Being informed about takeover

Your application's def start function takes a type argument. In this case, on the node that now runs your application, that type was {:failover, :joe@HOSTNAME}. If you start :joe@HOSTNAME back up, :joe@HOSTNAME is started with {:takeover, source_node}, where source_node is the node that it took over from.

Manual takeover

If you want to move your app around manually, you can use :application.takeover, for example :application.takeover(:mybot, :permanent).

Final thoughts

At present, nostrum can not perform any state synchronization between nodes, it is an effective restart from scratch. For most bots, this type of failover will be sufficient.