Erlang Clustering on Kubernetes

Written on 2017-03-31 15:43:04

What

If you're looking to get a distributed Erlang cluster up and running on Kubernetes you'll need take a few things into account. This post will create a basic application and provide scripts to automate the creation and deployment of a distributed Erlang application in Kubernetes.

We will be using Google's hosted Kubernetes since this is the simplest method. Not to mention that right now (2017-03-31) they will give you $300 credit and 12 months to use it all in.

I will assume you have already set up your kubectl and gcloud tooling as information to do that is elsewhere.

Why

Maybe you're reading this and thinking, why would I want to do this? I'm already running distributed Erlang on some other cloud provider. I've already got my automation set up to achieve a persistent (as possible) mesh network. Why do I want to use kubernetes?

For one, kuebrnetes is a very slick set of tools. Compared to other tooling available for the bigger cloud providers kubernetes is much simpler to configure, contains a lot of quality of life utilities and overall has a much tigher interface.

Secondly, containers. Kubernetes provides methods to run Dockerfiles easily. Compared to Amazon ECS and getting distributed Erlang running on ECS -- kubernetes is vastly more simple. With ECS there are some annoying hoops to jump through (requiring static ports, host networking) on the ECS instances to even begin to get distributed Erlang working. This eats into the possibilities for deployment. You would need to ensure that epmd runs on a separate port for each deployment or forgo using epmd altogether.

Tools

We will be using Erlang, rebar3, Terraform, make and some shell scripts to automate the creation of the cluster. I also provide some simple libraries which ease the discovery of containers running inside your kubernetes cluster (entirely optional).

Clustering issues

The main issue when creating a distributed Erlang cluster is discovery of the Erlang nodes themselves. There are many ways to do this. The simplest is hardcoding which nodes to connect to:

Nodes = ['foo@bar.com', 'foo@baz.com'],
[net_kernel:connect(Node) || Node <- Nodes].

This assumes everything will always live on the same host. It assumes that it will never change and if you just use this kind of code that things will never disconnect.

You may also use the $HOME/.hosts.erlang file, with contents such as:

'bar.com'.
'foo.com'.
net_adm:world()

Which will connect to all Erlang nodes running on each hostname in that file. This also has issues because you need to populate that file initially and then update this file when the list of Erlang hosts changes. In one legacy deployment we in fact do this. It does genuinely work and I have never had issues with it but it doesn't feel like an elegant solution.

The issue is that either, if the hostnames change, for example running in cloud providors where you are replacing servers running the Erlang nodes or that or you don't want/can't create a periodic task to update this file then you cannot use this method.

The most robust solution that I have found with kubernetes is to use DNS records. We can ensure our pods are created and registered under a kubernetes Service and then it's very easy to query a well-known DNS entry such that we can retrieve each individual pod.

How to do this will be explained later, as it first requires that our pods even exist.

Basic configuration

Let's get on with creating our running application.

The first step is to create the basic application skeleton:

rebar3 create-app appid=disterltest

Which should give you this basic structure:

~/dev/ cd disterltest
~/dev/disterltest tree
.
├── rebar.config
├── src
    ├── disterltest_app.erl
    ├── disterltest.app.src
    ├── disterltest.erl
    └── disterltest_sup.erl

And create a basic terraform description of what we need in gcloud, for ease of use create it in a subdirectory terraform inside the disterltest directory.

This assumes that your credentials are symlinked to /gcloud/credentials. I find that gcloud service account credentials are the easiest to work with. For simplicity, grant the service account Owner credentials. Symlink that file to /gcloud/credentials.

The symlink isn't necessary but it makes sharing terraform files easier.

provider "google" {
    region = "${var.region}"
    project = "${var.project}"
    credentials = "${file("/gcloud/credentials")}"
}

variable username {}
variable password {}
variable region {}
variable project {}
variable zone {}

resource "google_container_cluster" "disterltest" {
    name = "disterltest"
    description = "disterltest"
    zone = "${var.zone}"
    initial_node_count = "3"

    master_auth {
        username = "${var.username}"
        password = "${var.password}"
    }

    node_config {
        machine_type = "n1-standard-1"

        oauth_scopes = [
            "https://www.googleapis.com/auth/compute",
            "https://www.googleapis.com/auth/devstorage.read_only",
            "https://www.googleapis.com/auth/logging.write",
            "https://www.googleapis.com/auth/monitoring",
        ]
    }
}

Our directory structure should look something like this:

~/dev/disterltest tree
├── rebar.config
├── src
│   ├── disterltest_app.erl
│   ├── disterltest.app.src
│   ├── disterltest.erl
│   └── disterltest_sup.erl
└── terraform
    └── main.tf

And a kubernetes deployment file, in disterltest/kubernetes/deployment.yml:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: disterltest
spec:
  replicas: 3
  template:
    metadata:
      labels:
        app: disterltest
    spec:
      containers:
        - name: disterltest
          image: gcr.io/$YOUR_PROJECT_ID/disterltest

So far this just creates a deployment that likely will not work. For one the $YOUR_PROJECT_ID field makes no sense since that docker tag won't exist. We'll fix that later. We also don't expose any ports between containers, surely that won't help to cluster our application.

And a service file, in disterltest/kubernetes/service.yml:

apiVersion: v1
kind: Service
metadata:
  name: disterltest
  labels:
    app: disterltest
spec:
  clusterIP: None
  ports:
    - port: 10000
      targetPort: 10000
      name: disterl-mesh-0
    - port: 4369
      targetPort: 4369
      name: epmd
  selector:
    app: disterltest
  type: ClusterIP

Our directory structure should look like this:

~/dev/disterltest tree
.
├── kubernetes
│   ├── deployment.yml
│   └── service.yml
├── rebar3
├── rebar.config
├── src
│   ├── disterltest_app.erl
│   ├── disterltest.app.src
│   ├── disterltest.erl
│   └── disterltest_sup.erl
└── terraform
    └── main.tf

Next, lets create a dockerfile that will package up our Erlang application:

FROM erlang:18

COPY . /usr/app

WORKDIR /usr/app

RUN make release

ENV RELX_REPLACE_OS_VARS true

CMD ["./_build/default/rel/disterltest/bin/disterltest", "foreground"]

Our directory structure should look like this:

~/dev/disterltest tree
.
├── Dockerfile
├── kubernetes
│   ├── deployment.yml
│   └── service.yml
├── rebar3
├── rebar.config
├── src
│   ├── disterltest_app.erl
│   ├── disterltest.app.src
│   ├── disterltest.erl
│   └── disterltest_sup.erl
└── terraform
    └── main.tf

Right. So far we've done seemingly nothing but all of this means we have the basic files and configuration together so we can start running our distributed Erlang cluster in Kubernetes.

What we can do now is simply build and run our Erlang release on kubernetes. It won't be clustered, it won't do anything and all we will learn is that our initial application template is valid.

You will need to know the project ID of the gcloud project you have created. I've created a script deploy.sh for this purpose:

#!/bin/bash

make build-docker
export GCLOUD_APP=$1
export GCLOUD_PROJECT=$2
docker tag $1 gcr.io/$2/$1
gcloud docker -- push gcr.io/$2/$1
kubectl apply -f kubernetes/deployment.yml
kubectl apply -f kubernetes/service.yml

Once that's done, lets check if everything is running:

~/dev/disterltest kubectl get pods
NAME                           READY     STATUS    RESTARTS   AGE
disterltest-3556313269-19gmd   1/1       Running   0          1m
disterltest-3556313269-dgjzx   1/1       Running   0          1m
disterltest-3556313269-rdg3h   1/1       Running   0          1m

What this shows is that the three pods we requested are up and running in our cluster. Let's look at how we can cluster our three Erlang nodes.

kubectl exec -it disterltest-3556313269-19gmd bash
root@disterltest-3556313269-19gmd:/usr/app#

Great. Now we have a shell running on our Erlang pod. Remember the kubernetes DNS queries I mentioned before? Let's take a look at how that works.

Kubernetes DNS magic

When you create a deployment definition kubernetes groups the pods together and assigns them a stable DNS name where kubernetes will automatically add the A records to. The name will always be:

$NAME_OF_APP.default.svc.cluster.local
apt-get update
apt-get install dnsutils
dig A disterltest.default.svc.cluster.local

; <<>> DiG 9.9.5-9+deb8u10-Debian <<>> A disterltest.default.svc.cluster.local
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 52224
;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;disterltest.default.svc.cluster.local. IN A

;; ANSWER SECTION:
disterltest.default.svc.cluster.local. 14 IN A  10.72.1.5
disterltest.default.svc.cluster.local. 14 IN A  10.72.2.5
disterltest.default.svc.cluster.local. 14 IN A  10.72.2.4

;; Query time: 1 msec
;; SERVER: 10.75.240.10#53(10.75.240.10)
;; WHEN: Thu May 04 20:08:43 UTC 2017
;; MSG SIZE  rcvd: 114

I have no qualms with these IPs being listed here as they are not publicly accessible since the deployment we created did not have any LoadBalancer directive set.

As you can see there are three A records corresponding to the three Erlang nodes we have running. Now it's just a simple case of writing some Erlang code to query those A records and use them to discover our Erlang network.

-module(discover).

-export([world/0]).

-include_lib("kernel/src/inet_dns.hrl").


world() ->
    try
        CName = application:get_env(disterltest, cluster_cname, "disterltest.default.svc.cluster.local"),
        {ok, Msg} = inet_res:nslookup(CName, in, a),
        ExtractedHosts = extract_hosts(Msg),
        [net_kernel:connect(Host) || Host <- ExtractedHosts],
        lager:error("~p~n", [nodes()]),
        ok
    catch
        E:R ->
            lager:error("Error looking up hosts: ~p", [{E, R, erlang:get_stacktrace()}]),
            timer:sleep(5000),
            world()
    end.


extract_hosts(#dns_rec{anlist=ANList}) ->
    [data_to_node_name(Data) || #dns_rr{data=Data} <- ANList].

data_to_node_name({A, B, C, D}) ->
    list_to_atom(lists:flatten(io_lib:format("derl@~b.~b.~b.~b", [A, B, C, D]))).

Erlang has built-in libraries for querying DNS records and also for extracting specific fields from them. Add that module to your Erlang release and re-run deploy.sh with the correct arguments.

Automatic clustering

Let's get back onto one of the running Erlang pods and attach to the release:

kubectl exec -it disterltest-3556313269-19gmd bash
export TERM=xterm ## erl -remsh requires TERM to be set!!
## $NODE will be explained later
erl -remsh derl@$NODE -name foo@$NODE -setcookie derl
(derl@10.72.2.5)2> disterltest:world().
ok
(derl@10.72.2.5)3> nodes().
['derl@10.72.2.4','derl@10.72.1.5','foo@10.72.2.5']
(derl@10.72.2.5)4>

And bingo. All our cluster is connected. Let's prove that when removing and recreating pods that our cluster is automatically meshed:

kubectl get pods
NAME                           READY     STATUS    RESTARTS   AGE
disterltest-3556313269-19gmd   1/1       Running   0          45m
disterltest-3556313269-dgjzx   1/1       Running   0          45m
disterltest-3556313269-rdg3h   1/1       Running   0          45m

kubectl delete pod disterltest-3556313269-dgjzx
pod "disterltest-3556313269-dgjzx" deleted

kubectl get pods
NAME                           READY     STATUS        RESTARTS   AGE
disterltest-3556313269-19gmd   1/1       Running       0          46m
disterltest-3556313269-1jzhz   1/1       Running       0          11s
disterltest-3556313269-dgjzx   1/1       Terminating   0          46m
disterltest-3556313269-rdg3h   1/1       Running       0          46m

kubectl get pods
NAME                           READY     STATUS    RESTARTS   AGE
disterltest-3556313269-19gmd   1/1       Running   0          46m
disterltest-3556313269-1jzhz   1/1       Running   0          36s
disterltest-3556313269-rdg3h   1/1       Running   0          46m

Great. We've deleted a pod and kubernetes has automatically created another as per the service description (3 pods required).

Let's see if Erlang had any trouble with that:

(derl@10.72.2.5)4> nodes().
['derl@10.72.2.4','foo@10.72.2.5','derl@10.72.1.6']

As we can see here derl@10.72.1.5 is now missing, this was the pod that we deleted and in its place is derl@10.72.1.6. We've automatically discovered and connected to the remote Erlang node.

So far a lot of details have been glossed over, a lot of the configuration files here if simply copy and pasted will mean that some subtle but important configuration details will missed.

Subtleties

There's a few configuration files that I neglected to mention before because they would cloud the main points in getting clustering working.

Rebar configuration

{erl_opts, [debug_info, {parse_transform, lager_transform}]}.
{deps, [{lager, ".*", {git, "https://github.com/basho/lager.git", {branch, "master"}}}]}.

{profiles, []}.

{shell_apps, []}.

{relx,
 [
  {release, {disterltest, "1"},
         [
          disterltest,
          runtime_tools,
          tools
         ]},
        {include_erts, false},
        {dev_mode, false},
        {include_src, false},
        {include_erts, false},
        {profile, embedded},
        {vm_args, "files/vm.args"},
        {overlay, [
            {mkdir, "log/sasl"},
            {copy, "files/erl", "\{\{erts_vsn\}\}/bin/erl"},
            {copy, "files/nodetool", "\{\{erts_vsn\}\}/bin/nodetool"},
            {copy, "files/install_upgrade.escript", "bin/install_upgrade.escript"},
            {template, "files/sys.config", "releases/\{\{rel_vsn\}\}/sys.config"}
        ]},
        {extended_start_script, true}]}.

Relx and vm.args configuration

In order to easily configure the vm.args such that Erlang is started with the proper -name value and we can nicely erl -remsh into the running Erlang release we need to set a couple of properties:

ENV RELX_REPLACE_OS_VARS=true

Name of the node

-name ums@${NODE}

When relx runs RELX_REPLACE_OS_VARS will replace any ${VARIABLE} directives in any files it templates with the value found in the environment. Using this feature we can tell Erlang to start on the correct hostname that matches what will be found in the A name records that kubernetes sets.

Additionally, we also need to set the environment variable per-pod in kubernetes. We do this by adding the IP of the pod to the pod's environment:

env:
 - name: NODE
   valueFrom:
     fieldRef:
       fieldPath: status.podIP

This means that whenever the pod is created $NODE will have the pod's IP set.

Erlang distributed port exposure

Distributed Erlang uses specific ports to communicate over. Erlang uses two different methods to achieve the distributed communication.

The first is epmd which is the Erlang Port Mapper Daemon. This is a small program that runs on each machine where one or more Erlang beam instances is running and maintains which specific ports those individual beam instances are communicating on and what kind of communication they are exporting (-sname versus -name). The easiest way to expose epmd is to leave the default settings and tell kubernetes to expose the default port.

apiVersion: v1
kind: Service
metadata:
  name: disterltest
  labels:
    app: disterltest
spec:
  clusterIP: None
  ports:
    - port: 10000
      targetPort: 10000
      name: disterl-mesh-0
    - port: 4369 # this item here
      targetPort: 4369
      name: epmd
  selector:
    app: disterltest
  type: ClusterIP

Here, the kubernetes service description exposes port 4369 which is the default epmd port.

The next configuration item is what's labelled by Erlang as:

These two configuration items (explained how to set them shortly) control which port range the Erlang beam instances themselves will communicate along. epmd will know which ports individual beam instances communicate on and other Erlang beam instances will contact remote nodes using these ports.

In sys.config we can control these two settings by adding:

[

 {kernel, [
           {inet_dist_listen_min, 10000},
           {inet_dist_listen_max, 10005}
          ]}
].

This will set the minimum port for the distributed protocol to 10000 and set the maximum to 10005. This is an example configuration. To simplify the configuration and prevent unnecessary duplication in the kubernetes service description we just set them both to port 10000.

Then, we need to tell kubernetes to expose those ports from the pods:

apiVersion: v1
kind: Service
metadata:
  name: disterltest
  labels:
    app: disterltest
spec:
  clusterIP: None
  ports:
    - port: 10000 # this item here
      targetPort: 10000
      name: disterl-mesh-0
    - port: 4369
      targetPort: 4369
      name: epmd
  selector:
    app: disterltest
  type: ClusterIP

We can see here that port 10000 has been opened.

Using the information provided here, I hope that you should learn all the things you would need to know in order to create distributed clusters using Erlang and kubernetes.

The example repository which contains all of the code and configuration for this exists at: https://github.com/AeroNotix/disterltest

I also have wrapped up several methods of discovering Erlang nodes in a library: https://github.com/AeroNotix/werld

Using that library, configured properly, the majority of use cases and methods for discovering remote nodes are covered, including the Kubernetes DNS method discussed in this post.

Previous
comments powered by Disqus

Unless otherwise credited all material Creative Commons License by Aaron France