Recent Content

Erlang Clustering on Kubernetes
posted on 2017-03-31 15:43:04

What

If you're looking to get a distributed Erlang cluster up and running on Kubernetes you'll need take a few things into account. This post will create a basic application and provide scripts to automate the creation and deployment of a distributed Erlang application in Kubernetes.

We will be using Google's hosted Kubernetes since this is the simplest method. Not to mention that right now (2017-03-31) they will give you $300 credit and 12 months to use it all in.

I will assume you have already set up your kubectl and gcloud tooling as information to do that is elsewhere.

Why

Maybe you're reading this and thinking, why would I want to do this? I'm already running distributed Erlang on some other cloud provider. I've already got my automation set up to achieve a persistent (as possible) mesh network. Why do I want to use kubernetes?

For one, kuebrnetes is a very slick set of tools. Compared to other tooling available for the bigger cloud providers kubernetes is much simpler to configure, contains a lot of quality of life utilities and overall has a much tigher interface.

Secondly, containers. Kubernetes provides methods to run Dockerfiles easily. Compared to Amazon ECS and getting distributed Erlang running on ECS -- kubernetes is vastly more simple. With ECS there are some annoying hoops to jump through (requiring static ports, host networking) on the ECS instances to even begin to get distributed Erlang working. This eats into the possibilities for deployment. You would need to ensure that epmd runs on a separate port for each deployment or forgo using epmd altogether.

Tools

We will be using Erlang, rebar3, Terraform, make and some shell scripts to automate the creation of the cluster. I also provide some simple libraries which ease the discovery of containers running inside your kubernetes cluster (entirely optional).

Clustering issues

The main issue when creating a distributed Erlang cluster is discovery of the Erlang nodes themselves. There are many ways to do this. The simplest is hardcoding which nodes to connect to:

Nodes = ['foo@bar.com', 'foo@baz.com'],
[net_kernel:connect(Node) || Node <- Nodes].

This assumes everything will always live on the same host. It assumes that it will never change and if you just use this kind of code that things will never disconnect.

You may also use the $HOME/.hosts.erlang file, with contents such as:

'bar.com'.
'foo.com'.
net_adm:world()

Which will connect to all Erlang nodes running on each hostname in that file. This also has issues because you need to populate that file initially and then update this file when the list of Erlang hosts changes. In one legacy deployment we in fact do this. It does genuinely work and I have never had issues with it but it doesn't feel like an elegant solution.

The issue is that either, if the hostnames change, for example running in cloud providors where you are replacing servers running the Erlang nodes or that or you don't want/can't create a periodic task to update this file then you cannot use this method.

The most robust solution that I have found with kubernetes is to use DNS records. We can ensure our pods are created and registered under a kubernetes Service and then it's very easy to query a well-known DNS entry such that we can retrieve each individual pod.

How to do this will be explained later, as it first requires that our pods even exist.

Basic configuration

Let's get on with creating our running application.

The first step is to create the basic application skeleton:

rebar3 create-app appid=disterltest

Which should give you this basic structure:

~/dev/ cd disterltest
~/dev/disterltest tree
.
├── rebar.config
├── src
    ├── disterltest_app.erl
    ├── disterltest.app.src
    ├── disterltest.erl
    └── disterltest_sup.erl

And create a basic terraform description of what we need in gcloud, for ease of use create it in a subdirectory terraform inside the disterltest directory.

This assumes that your credentials are symlinked to /gcloud/credentials. I find that gcloud service account credentials are the easiest to work with. For simplicity, grant the service account Owner credentials. Symlink that file to /gcloud/credentials.

The symlink isn't necessary but it makes sharing terraform files easier.

provider "google" {
    region = "${var.region}"
    project = "${var.project}"
    credentials = "${file("/gcloud/credentials")}"
}

variable username {}
variable password {}
variable region {}
variable project {}
variable zone {}

resource "google_container_cluster" "disterltest" {
    name = "disterltest"
    description = "disterltest"
    zone = "${var.zone}"
    initial_node_count = "3"

    master_auth {
        username = "${var.username}"
        password = "${var.password}"
    }

    node_config {
        machine_type = "n1-standard-1"

        oauth_scopes = [
            "https://www.googleapis.com/auth/compute",
            "https://www.googleapis.com/auth/devstorage.read_only",
            "https://www.googleapis.com/auth/logging.write",
            "https://www.googleapis.com/auth/monitoring",
        ]
    }
}

Our directory structure should look something like this:

~/dev/disterltest tree
├── rebar.config
├── src
│   ├── disterltest_app.erl
│   ├── disterltest.app.src
│   ├── disterltest.erl
│   └── disterltest_sup.erl
└── terraform
    └── main.tf

And a kubernetes deployment file, in disterltest/kubernetes/deployment.yml:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: disterltest
spec:
  replicas: 3
  template:
    metadata:
      labels:
        app: disterltest
    spec:
      containers:
        - name: disterltest
          image: gcr.io/$YOUR_PROJECT_ID/disterltest

So far this just creates a deployment that likely will not work. For one the $YOUR_PROJECT_ID field makes no sense since that docker tag won't exist. We'll fix that later. We also don't expose any ports between containers, surely that won't help to cluster our application.

And a service file, in disterltest/kubernetes/service.yml:

apiVersion: v1
kind: Service
metadata:
  name: disterltest
  labels:
    app: disterltest
spec:
  clusterIP: None
  ports:
    - port: 10000
      targetPort: 10000
      name: disterl-mesh-0
    - port: 4369
      targetPort: 4369
      name: epmd
  selector:
    app: disterltest
  type: ClusterIP

Our directory structure should look like this:

~/dev/disterltest tree
.
├── kubernetes
│   ├── deployment.yml
│   └── service.yml
├── rebar3
├── rebar.config
├── src
│   ├── disterltest_app.erl
│   ├── disterltest.app.src
│   ├── disterltest.erl
│   └── disterltest_sup.erl
└── terraform
    └── main.tf

Next, lets create a dockerfile that will package up our Erlang application:

FROM erlang:18

COPY . /usr/app

WORKDIR /usr/app

RUN make release

ENV RELX_REPLACE_OS_VARS true

CMD ["./_build/default/rel/disterltest/bin/disterltest", "foreground"]

Our directory structure should look like this:

~/dev/disterltest tree
.
├── Dockerfile
├── kubernetes
│   ├── deployment.yml
│   └── service.yml
├── rebar3
├── rebar.config
├── src
│   ├── disterltest_app.erl
│   ├── disterltest.app.src
│   ├── disterltest.erl
│   └── disterltest_sup.erl
└── terraform
    └── main.tf

Right. So far we've done seemingly nothing but all of this means we have the basic files and configuration together so we can start running our distributed Erlang cluster in Kubernetes.

What we can do now is simply build and run our Erlang release on kubernetes. It won't be clustered, it won't do anything and all we will learn is that our initial application template is valid.

You will need to know the project ID of the gcloud project you have created. I've created a script deploy.sh for this purpose:

#!/bin/bash

make build-docker
export GCLOUD_APP=$1
export GCLOUD_PROJECT=$2
docker tag $1 gcr.io/$2/$1
gcloud docker -- push gcr.io/$2/$1
kubectl apply -f kubernetes/deployment.yml
kubectl apply -f kubernetes/service.yml

Once that's done, lets check if everything is running:

~/dev/disterltest kubectl get pods
NAME                           READY     STATUS    RESTARTS   AGE
disterltest-3556313269-19gmd   1/1       Running   0          1m
disterltest-3556313269-dgjzx   1/1       Running   0          1m
disterltest-3556313269-rdg3h   1/1       Running   0          1m

What this shows is that the three pods we requested are up and running in our cluster. Let's look at how we can cluster our three Erlang nodes.

kubectl exec -it disterltest-3556313269-19gmd bash
root@disterltest-3556313269-19gmd:/usr/app#

Great. Now we have a shell running on our Erlang pod. Remember the kubernetes DNS queries I mentioned before? Let's take a look at how that works.

Kubernetes DNS magic

When you create a deployment definition kubernetes groups the pods together and assigns them a stable DNS name where kubernetes will automatically add the A records to. The name will always be:

$NAME_OF_APP.default.svc.cluster.local
apt-get update
apt-get install dnsutils
dig A disterltest.default.svc.cluster.local

; <<>> DiG 9.9.5-9+deb8u10-Debian <<>> A disterltest.default.svc.cluster.local
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 52224
;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;disterltest.default.svc.cluster.local. IN A

;; ANSWER SECTION:
disterltest.default.svc.cluster.local. 14 IN A  10.72.1.5
disterltest.default.svc.cluster.local. 14 IN A  10.72.2.5
disterltest.default.svc.cluster.local. 14 IN A  10.72.2.4

;; Query time: 1 msec
;; SERVER: 10.75.240.10#53(10.75.240.10)
;; WHEN: Thu May 04 20:08:43 UTC 2017
;; MSG SIZE  rcvd: 114

I have no qualms with these IPs being listed here as they are not publicly accessible since the deployment we created did not have any LoadBalancer directive set.

As you can see there are three A records corresponding to the three Erlang nodes we have running. Now it's just a simple case of writing some Erlang code to query those A records and use them to discover our Erlang network.

-module(discover).

-export([world/0]).

-include_lib("kernel/src/inet_dns.hrl").


world() ->
    try
        CName = application:get_env(disterltest, cluster_cname, "disterltest.default.svc.cluster.local"),
        {ok, Msg} = inet_res:nslookup(CName, in, a),
        ExtractedHosts = extract_hosts(Msg),
        [net_kernel:connect(Host) || Host <- ExtractedHosts],
        lager:error("~p~n", [nodes()]),
        ok
    catch
        E:R ->
            lager:error("Error looking up hosts: ~p", [{E, R, erlang:get_stacktrace()}]),
            timer:sleep(5000),
            world()
    end.


extract_hosts(#dns_rec{anlist=ANList}) ->
    [data_to_node_name(Data) || #dns_rr{data=Data} <- ANList].

data_to_node_name({A, B, C, D}) ->
    list_to_atom(lists:flatten(io_lib:format("derl@~b.~b.~b.~b", [A, B, C, D]))).

Erlang has built-in libraries for querying DNS records and also for extracting specific fields from them. Add that module to your Erlang release and re-run deploy.sh with the correct arguments.

Automatic clustering

Let's get back onto one of the running Erlang pods and attach to the release:

kubectl exec -it disterltest-3556313269-19gmd bash
export TERM=xterm ## erl -remsh requires TERM to be set!!
## $NODE will be explained later
erl -remsh derl@$NODE -name foo@$NODE -setcookie derl
(derl@10.72.2.5)2> disterltest:world().
ok
(derl@10.72.2.5)3> nodes().
['derl@10.72.2.4','derl@10.72.1.5','foo@10.72.2.5']
(derl@10.72.2.5)4>

And bingo. All our cluster is connected. Let's prove that when removing and recreating pods that our cluster is automatically meshed:

kubectl get pods
NAME                           READY     STATUS    RESTARTS   AGE
disterltest-3556313269-19gmd   1/1       Running   0          45m
disterltest-3556313269-dgjzx   1/1       Running   0          45m
disterltest-3556313269-rdg3h   1/1       Running   0          45m

kubectl delete pod disterltest-3556313269-dgjzx
pod "disterltest-3556313269-dgjzx" deleted

kubectl get pods
NAME                           READY     STATUS        RESTARTS   AGE
disterltest-3556313269-19gmd   1/1       Running       0          46m
disterltest-3556313269-1jzhz   1/1       Running       0          11s
disterltest-3556313269-dgjzx   1/1       Terminating   0          46m
disterltest-3556313269-rdg3h   1/1       Running       0          46m

kubectl get pods
NAME                           READY     STATUS    RESTARTS   AGE
disterltest-3556313269-19gmd   1/1       Running   0          46m
disterltest-3556313269-1jzhz   1/1       Running   0          36s
disterltest-3556313269-rdg3h   1/1       Running   0          46m

Great. We've deleted a pod and kubernetes has automatically created another as per the service description (3 pods required).

Let's see if Erlang had any trouble with that:

(derl@10.72.2.5)4> nodes().
['derl@10.72.2.4','foo@10.72.2.5','derl@10.72.1.6']

As we can see here derl@10.72.1.5 is now missing, this was the pod that we deleted and in its place is derl@10.72.1.6. We've automatically discovered and connected to the remote Erlang node.

So far a lot of details have been glossed over, a lot of the configuration files here if simply copy and pasted will mean that some subtle but important configuration details will missed.

Subtleties

There's a few configuration files that I neglected to mention before because they would cloud the main points in getting clustering working.

Rebar configuration

{erl_opts, [debug_info, {parse_transform, lager_transform}]}.
{deps, [{lager, ".*", {git, "https://github.com/basho/lager.git", {branch, "master"}}}]}.

{profiles, []}.

{shell_apps, []}.

{relx,
 [
  {release, {disterltest, "1"},
         [
          disterltest,
          runtime_tools,
          tools
         ]},
        {include_erts, false},
        {dev_mode, false},
        {include_src, false},
        {include_erts, false},
        {profile, embedded},
        {vm_args, "files/vm.args"},
        {overlay, [
            {mkdir, "log/sasl"},
            {copy, "files/erl", "\{\{erts_vsn\}\}/bin/erl"},
            {copy, "files/nodetool", "\{\{erts_vsn\}\}/bin/nodetool"},
            {copy, "files/install_upgrade.escript", "bin/install_upgrade.escript"},
            {template, "files/sys.config", "releases/\{\{rel_vsn\}\}/sys.config"}
        ]},
        {extended_start_script, true}]}.
  • We pull in lager so the code examples work
  • We override some default configuration so the release size is smaller

Relx and vm.args configuration

In order to easily configure the vm.args such that Erlang is started with the proper -name value and we can nicely erl -remsh into the running Erlang release we need to set a couple of properties:

  • In the Dockerfile we set:
ENV RELX_REPLACE_OS_VARS=true
  • In the vm.args we set:

Name of the node

-name ums@${NODE}

When relx runs RELX_REPLACE_OS_VARS will replace any ${VARIABLE} directives in any files it templates with the value found in the environment. Using this feature we can tell Erlang to start on the correct hostname that matches what will be found in the A name records that kubernetes sets.

Additionally, we also need to set the environment variable per-pod in kubernetes. We do this by adding the IP of the pod to the pod's environment:

env:
 - name: NODE
   valueFrom:
     fieldRef:
       fieldPath: status.podIP

This means that whenever the pod is created $NODE will have the pod's IP set.

Erlang distributed port exposure

Distributed Erlang uses specific ports to communicate over. Erlang uses two different methods to achieve the distributed communication.

The first is epmd which is the Erlang Port Mapper Daemon. This is a small program that runs on each machine where one or more Erlang beam instances is running and maintains which specific ports those individual beam instances are communicating on and what kind of communication they are exporting (-sname versus -name). The easiest way to expose epmd is to leave the default settings and tell kubernetes to expose the default port.

apiVersion: v1
kind: Service
metadata:
  name: disterltest
  labels:
    app: disterltest
spec:
  clusterIP: None
  ports:
    - port: 10000
      targetPort: 10000
      name: disterl-mesh-0
    - port: 4369 # this item here
      targetPort: 4369
      name: epmd
  selector:
    app: disterltest
  type: ClusterIP

Here, the kubernetes service description exposes port 4369 which is the default epmd port.

The next configuration item is what's labelled by Erlang as:

  • inet_dist_listen_min
  • inet_dist_listen_max

These two configuration items (explained how to set them shortly) control which port range the Erlang beam instances themselves will communicate along. epmd will know which ports individual beam instances communicate on and other Erlang beam instances will contact remote nodes using these ports.

In sys.config we can control these two settings by adding:

[

 {kernel, [
           {inet_dist_listen_min, 10000},
           {inet_dist_listen_max, 10005}
          ]}
].

This will set the minimum port for the distributed protocol to 10000 and set the maximum to 10005. This is an example configuration. To simplify the configuration and prevent unnecessary duplication in the kubernetes service description we just set them both to port 10000.

Then, we need to tell kubernetes to expose those ports from the pods:

apiVersion: v1
kind: Service
metadata:
  name: disterltest
  labels:
    app: disterltest
spec:
  clusterIP: None
  ports:
    - port: 10000 # this item here
      targetPort: 10000
      name: disterl-mesh-0
    - port: 4369
      targetPort: 4369
      name: epmd
  selector:
    app: disterltest
  type: ClusterIP

We can see here that port 10000 has been opened.

Using the information provided here, I hope that you should learn all the things you would need to know in order to create distributed clusters using Erlang and kubernetes.

The example repository which contains all of the code and configuration for this exists at: https://github.com/AeroNotix/disterltest

I also have wrapped up several methods of discovering Erlang nodes in a library: https://github.com/AeroNotix/werld

Using that library, configured properly, the majority of use cases and methods for discovering remote nodes are covered, including the Kubernetes DNS method discussed in this post.

Amazon ELB, WebSockets and client IP detection
posted on 2016-02-02 12:30:00

If you're using Amazon's ELB service with WebSockets, you probably already know you'll need to set the instance protocol as TCP or else the WebSocket will not be able to route messages between the frontend and the backend. This is an inherent limitation to Amazon's ELB that doesn't seem to be changing any time soon since this ticket has been open for over four years at the time of writing, we can safely assume that Amazon does not deem this issue worth fixing.

This poses several problems:

  • You cannot have the X-Forwarded-For header for TCP routes
  • You are required to enable the proxy protocol to retrieve client connection information

In this post I'll enumerate several issues I found when implementing the proxy protocol support for an Erlang application. Despite being for an Erlang application, the fundamentals will be applicable to other platforms.

I chose to be able to run my application and integration tests with proxy protocol support. I do this so my local environment matches the EC2 environment as closely as possible. To achieve this I opted to use HAProxy locally because it natively supports the proxy protocol.

Your local HAProxy configuration should contain, at a minimum, the following:


global
    stats timeout 30s
    maxconn 1024
    crt-base /path/to/your/crt/base
    ssl-default-bind-ciphers kEECDH+aRSA+AES:kRSA+AES:+AES256:RC4-SHA:!kEDH:!LOW:!EXP:!MD5:!aNULL:!eNULL

defaults
    log global
    mode http
    timeout connect  5000
    timeout client  50000
    timeout server  50000

frontend device
    mode http
    bind 0.0.0.0:8543 ssl crt cert.pem
    reqadd X-Forwarded-Proto:\ https
    acl is_websocket hdr(Upgrade) -i WebSocket
    acl is_websocket hdr_beg(Host) -i ws
    use_backend backend-ws if is_websocket
    default_backend backend

backend backend
    mode http
    server srv-1 0.0.0.0:8180 check

backend backend-ws
    mode http
    server srv-1 0.0.0.0:8180 check send-proxy

I'll break this down in parts to explain.

Here we just set up some simple defaults which all HAProxy:


global
    stats timeout 30s
    maxconn 1024
    crt-base /path/to/your/crt/base
    ssl-default-bind-ciphers kEECDH+aRSA+AES:kRSA+AES:+AES256:RC4-SHA:!kEDH:!LOW:!EXP:!MD5:!aNULL:!eNULL

defaults
    log global
    mode http
    timeout connect  5000
    timeout client  50000
    timeout server  50000

Here is the meat of the configuration for HAProxy:

frontend device
    mode http
    bind 0.0.0.0:8543 ssl crt cert.pem
    reqadd X-Forwarded-Proto:\ https
    acl is_websocket hdr(Upgrade) -i WebSocket
    acl is_websocket hdr_beg(Host) -i ws
    use_backend backend-ws if is_websocket
    default_backend backend

backend backend
    mode http
    server srv-1 0.0.0.0:8180 check

backend backend-ws
    mode http
    server srv-1 0.0.0.0:8180 check send-proxy

Be aware that all backends must contain the mode http line or else hdr* functions will not work!

The bulk of the routing is done in these two ACLs:

    acl is_websocket hdr(Upgrade) -i WebSocket
    acl is_websocket hdr_beg(Host) -i ws
    use_backend backend-ws if is_websocket

First we check if the request has an Upgrade header with the content WebSocket (case insensitively matched). We also route to this backend if the Host header begins with ws. This is enough to route to the correct backend.

The next part of the configuration sets up the backends so they can optionally provide the proxy protocol to downstream servers:

backend backend
    mode http
    server srv-1 0.0.0.0:8180 check

backend backend-ws
    mode http
    server srv-1 0.0.0.0:8180 check send-proxy

The first backend is a simple HTTP backend with no proxy protocol support, this means that if you are connecting to this backend without the Upgrade: WebSocket or Host: ws* header then you will be forwarded on without the proxy protocol. The benefit here is that the backend support for proxy protocol only needs to be exactly where it is needed. This simplifies a later step.

The last backend contains the send-proxy option, this adds proxy protocol support to all downstream requests. The server running here must support the proxy protocol. If not, all client requests will fail with a 400 Bad Request.

Now, we're all set to start forwarding requests to the backend and implementing proxy protocol support in our backend. I'll outline what the backend needs to do and then implement it in Erlang.

The HAProxy specification for the proxy protocol is a great resource but the gist of it is as follows: dumb proxies generally lose client-specific information, such as the IP, when tunneling through one or more proxy servers. This poses a problem when you need that information, for example for analytical or client-tracking purposes.

What the proxy protocol does is insert a header containing the client's information into the beginning of a request, the client is none-the-wiser and downstream servers simply need to parse a single extra line before processing the underlying protocol's request.

All this comes down to is something akin to this:

"PROXY TCP4 255.255.255.255 255.255.255.255 65535 65535\r\n"

Before all client requests, they will contain PROXY, the protocol (in this case TCP4) next comes the layer 3 source address, a space, the layer 3 destination address, a space, the TCP source port, another space, the TCP destination port and finally, CRLF.

Using just this line, you can easily track the client IP through multiple proxies.

The HAProxy set up is so that when testing locally or in your CI server you can inject the proxy protocol lines. This makes it simpler to test since you don't have to optionally enable/disable it depending on environment. HAProxy acts as a "local ELB".

What does this mean for WebSockets through ELB?

Since the ELB cannot add the X-Forwarded-For header we must use this proxy protocol to retain client information. Amazon does not provide an interface for enabling this. You must use the CLI:

This is outlined in the ELB documentation for proxy protocol support, but I will reiterate here for completeness:

Step 1: Create the policy:

aws elb create-load-balancer-policy            \
    --load-balancer-name my-loadbalancer       \
    --policy-name my-ProxyProtocol-policy      \
    --policy-type-name ProxyProtocolPolicyType \
    --policy-attributes AttributeName=ProxyProtocol,AttributeValue=true

You'll need to change the attributes my-loadbalancer to point at the loadbalancer you want to create this policy for. Don't worry about messing this step up, since at this point, you haven't changed how your loadbalancer works. We are simply making the policy available should we wish to enable it.

Step 2: Assign the policy:

aws elb set-load-balancer-policies-for-backend-server \
    --load-balancer-name my-loadbalancer              \
    --instance-port 80 --policy-names my-ProxyProtocol-policy my-existing-policy

Again you'll need to change the my-loadbalancer to point at the loadbalancer you want to change. The --instance-port also needs to be the port you want to enable this proxy policy on. Make sure to also use the same name for --policy-names for both the proxy policy we just created (my-ProxyProtocol-policy if you are following on from the previous example) and any other policies which already exist. This is important since if you have any other policies, calling set-load-balancer-policies-for-backend-server will override the list. SSL certificates appear as policies, so list the policies by using:

aws elb describe-load-balancers --load-balancer my-loadbalancer

Now that we have enabled proxy protocol support on the load balancer, the application will now not be able to service requests through this port unless proxy protocol support is also added there.

Let's do that now.

In our Erlang/Cowboy application we can override the protocol which is used. Typically you would create a cowboy HTTP listener by calling:

AccCount = 10,
TransOpts = [{port, 8080}],
ProtoOpts = [],
cowboy:start_http(custom_name_of_listener, AccCount, TransOpts, ProtoOpts)

What this calls underneath is:

ranch:start_listener(custom_name_of_listener, AccCount,
                     ranch_tcp, TransOpts,
                     cowboy_protocol, ProtoOpts)

Cowboy comes with a typical TCP/HTTP protocol parser, a module named cowboy_protocol. This is what you want to use for normal HTTP listeners. However, since our protocol is the proxy protocol, cowboy will fail to parse incoming requests.

Next I'll implement the proxy protocol request parser that allows us to grab the proxy protocol header and inject it into our cowboy handlers:

The whole file is available in a gist a lot of it is copy and pasted from the cowboy_protocol module since we just need to inject a little bit of code before we parse our request and hand off to the original module. This saves us implementing the entirety of TCP/HTTP handling ourselves. From line 105 onwards is where we parse out the proxy protocol.

Now in your handlers you can call:

ProxyInfo = get(proxy_info)

And you'll get the proxy information the client connected with!

Alternatives

If you don't want to go through the rigmarole of implementing proxy protocol support in your application and having custom policies on your load balancers. What can you do?

It depends on the application. If you control both the client code and the backend code, then you could implement a simple HTTP endpoint that associates a client token (username, device_id, whatever) to the IP address they are connecting with. This means that the IP association is done over HTTP and allows the ELB to inject the X-Forwarded-For header.

If you don't control both the client-side code and the backend-side or you're completely tied to using WebSockets. Then I am very interested in any ideas of how you could get around this limitation.

Sources and thanks

Up and running with Clojure and Storm
posted on 2015-02-08 15:57:41

So you want to build some kind of real-time processing pipeline, you want to use Storm, you don't want to use Java, I went through a couple of tutorials recently which seem to stop at the final step of getting your topology running on a proper storm cluster. Instead they opt for running with lein run -m foo.bar.run! which is not perfect.

First of all, they tell you to explicitly not to include the storm dependencies into your regular project dependencies. With the (outdated) storm project template giving you:

(defproject vektor "0.1.0-SNAPSHOT"
  :description "FIXME: write description"
  :url "http://example.com/FIXME"
  :license {:name "Eclipse Public License"
            :url "http://www.eclipse.org/legal/epl-v10.html"}
  :dependencies [[org.clojure/clojure "1.4.0"]]
  :aot [vektor.TopologySubmitter]
  ;; include storm dependency only in dev because production storm cluster provides it
  :profiles {:dev {:dependencies [[storm "0.8.1"]]}})

However, running lein uberjar or lein jar on this, in order to be able to submit the jar to a running Storm cluster gives:

~/dev/vektor lein jar
Compiling vektor.TopologySubmitter
java.io.FileNotFoundException: Could not locate backtype/storm/clojure__init.class or backtype/storm/clojure.clj on classpath: , compiling:(topology.clj:1)
    at clojure.lang.Compiler$InvokeExpr.eval(Compiler.java:3387)
    at clojure.lang.Compiler.compile1(Compiler.java:7035)
    at clojure.lang.Compiler.compile1(Compiler.java:7025)

    [thousands of lines of tyranny elided...]

    at clojure.lang.Compiler.eval(Compiler.java:6501)
    at clojure.lang.Compiler.load(Compiler.java:6952)
    at clojure.lang.Compiler.loadFile(Compiler.java:6912)
    at clojure.main$load_script.invoke(main.clj:283)
    at clojure.main$init_opt.invoke(main.clj:288)
    at clojure.main$initialize.invoke(main.clj:316)
    at clojure.main$null_opt.invoke(main.clj:349)
    at clojure.main$main.doInvoke(main.clj:427)
    at clojure.lang.RestFn.invoke(RestFn.java:421)
    at clojure.lang.Var.invoke(Var.java:419)
    at clojure.lang.AFn.applyToHelper(AFn.java:163)
    at clojure.lang.Var.applyTo(Var.java:532)
    at clojure.main.main(main.java:37)

Hmm, that's not good. So we shouldn't include the Storm dependencies but we can't compile without them. Great.

After much googling and messing around you should add them as a :provided profile in project.clj such as:

(defproject vektor "0.1.0-SNAPSHOT"
  :description "FIXME: write description"
  :url "http://example.com/FIXME"
  :license {:name "Eclipse Public License"
            :url "http://www.eclipse.org/legal/epl-v10.html"}
  :dependencies [[org.clojure/clojure "1.4.0"]]
  :aot [vektor.TopologySubmitter]
  ;; include storm dependency only in dev because production storm cluster provides it
  :profiles {:dev {:dependencies [[storm "0.8.1"]]}
             :provided {:dependencies [[storm "0.8.1"]]}})

This will now allow the jars to be created and you will be able to submit them to the storm cluster.

Side notes

Ensure all directories which Storm and Nimbus wish to use are readable/writable by the users you are wanting to use for them.

By default:

/var/lib/storm and /var/log/storm should be writable/readable by the storm user.

Clojure for the Emacs user
posted on 2014-08-17 13:46:28

One of the great things about Clojure is that it's a Lisp, and not just for the fact that this brings a lot of linguistic power over typical languages. Lisps have a very intertwined history with Emacs.

In this post I will show you what I think is the perfect environment for composing Clojure code.

First of all you will need Emacs version >=24, this ensures that you have the proper package.el support by default. If not; you will need to install package.el manually. This page is a great resource on package.el.

Now, we can install the basic clojure-mode, which will provide proper font-locking and basic code highlighting.

To install: M-x package-install RET clojure-mode RET.

Personally, I prefer a live editing environment over the archaic write/compile/test/repeat cycle that a lot of languages enforce. To this end, we need to install the exceptional CIDER mode. CIDER provides a SLIME-like environment for Clojure. CIDER is a mode which enables a powerful embedded REPL inside emacs. This embedded REPL can do code evaluation, code introspection, code-completion and much more. It's very difficult to return to languages which do not have this rich-editing experience.

To install: M-x package-install RET cider RET.

Finally, clj-refactor is a package which provides a few helpful code transformations. There are gems such as "introduce-let" which moves a form into an enclosing let form, or sort-namespace, which will alphabetically sort the namespace (great for keeping tidy code). There are many more so I suggest that you browse the repository.

To install: M-x package-install RET clj-refactor RET

Enjoy!

Introducing: Lispkit
posted on 2014-08-15 23:47:08

Lispkit is a recent project I've started. It's a browser, based on WebKit, written and configured in Common Lisp.

The project is still very young, but in true Lisp fashion I was able to accomplish a sizeable portion of the basic functionality.

Why does this project need to exist?

Personally, I prefer keyboard-driven environments. I also prefer configuration formats which are code and not key/value pairs. My preferred version of editor brain-damage is of the emacs flavour. These requirments meant that I could use Conkeror or Luakit. There may be more out there, but that's all I found out when I was looking.

The problem with Conkeror I found is that, having the configuration format in Javascript meant that a lot of advanced scripting was out of the question because of the low-quality of the language. Similarly, Lua is an uninteresting language for me.

Luakit though, seems to be far more advanced and has a lot more users behind it than does Conkeror, so I may fire it up to pinch ideas.

The aim with Lispkit is to have a browser which defers to a well-known and well-tested implementation of the actual "browsery" portions of the application, rendering, javascript sandboxing, etc. and deal with the other fun stuff.

So, the aim really is to have an emacs-like browser. SLIME connections are a must, on-the-fly editing of configuration, M-x apropos command style searching, ido-style completion, packages, the works. That's the plan anyway. However, at this point, chromium is still my daily driver.

Refreshed
posted on 2014-07-15 22:41:51

I've decided to throw away my previous "blog" incarnation in hopes that using a static blog generator will allow me to write more.

I'm using coleslaw since it's written in Lisp, then maybe hacking on it will be easier.

So far, it seems great. A little metadata here, a little markdown there and it's convereted correctly. A brucy bonus is that I can edit in Emacs and manage the posts through git. MMmmmmm.

Happy hackin'

This blog covers amazon, browser, cider, clojure, common lisp, distributed computing, ec2, emacs, erlang, fresh, haproxy, kafka, kubernetes, lisp, new, storm, webkit, websockets, zookeeper

View content from 2017-03, 2016-02, 2015-02, 2014-08, 2014-07


Unless otherwise credited all material Creative Commons License by Aaron France