This document describes how to use JGroups in Docker containers to form clusters locally, on Amazon Web Services (AWS) and in the Google cloud (Google Compute Platform, GCP).
The advantage of running Docker images in the cloud, rather than cloud-specific images is that the Docker images are the same across different clouds, whereas cloud images are always specific to a given cloud.
The predefined docker image we’ll use is https://hub.docker.com/r/belaban/jgroups. It contains a number of interactive Demos, ie. demos which require a TTY.
The difference between running locally, or running in a cloud lies in the JGroups configuration and the Docker startup options. Clouds generally do not support IP multicasting, so JGroups applications have to resort to TCP rather than UDP as transport. Plus, PING cannot be used and has to be replaced with different discovery protocols, e.g. S3_PING or GOOGLE_PING.
Docker supports none
, bridge
and host
networks out of the box (https://docs.docker.com/engine/userguide/networking).
Which type of network is used is defined by the --network
option, e.g. --network=host
.
Network none
has no communication to the outside world and is used purely for Docker containers running on the same
box. The bind address used is usually 127.0.0.1
(loopback) here.
Network bridge
is the default and creates a virtual interface in the container that’s not available on the host. To
be able to communicate with other Docker containers on other hosts, the host’s IP address has to be used (e.g. using
external_addr=<host’s address>
, or by switching to --network=host
(see below).
For instance, starting an AWS EC2 instance, ifconfig
executed on the host shows the following interfaces:
[ec2-user@ip-172-31-14-130 ~]$ ifconfig
docker0 Link encap:Ethernet HWaddr 02:42:33:C7:8E:F1
inet addr:172.17.0.1 Bcast:0.0.0.0 Mask:255.255.0.0
inet6 addr: fe80::42:33ff:fec7:8ef1/64 Scope:Link
UP BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:22 errors:0 dropped:0 overruns:0 frame:0
TX packets:20 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:1342 (1.3 KiB) TX bytes:1492 (1.4 KiB)
eth0 Link encap:Ethernet HWaddr 0E:C8:A1:0F:39:E6
inet addr:172.31.14.130 Bcast:172.31.15.255 Mask:255.255.240.0
inet6 addr: fe80::cc8:a1ff:fe0f:39e6/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:9001 Metric:1
RX packets:105050 errors:0 dropped:0 overruns:0 frame:0
TX packets:37021 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:117948327 (112.4 MiB) TX bytes:6929186 (6.6 MiB)
...
The docker0
interface is created by Docker to implement the bridged network. This is the interface that will be used
by the Docker container. Note that eth0
is the host’s interface, with a routable 172.31.14.130
IP address.
When starting a container with bridged networking (default):
docker run -it -p 7800:7800 --rm belaban/jgroups
, ifconfig
executed in the container shows:
bash-4.3$ ifconfig
eth0 Link encap:Ethernet HWaddr 02:42:AC:11:00:02
inet addr:172.17.0.2 Bcast:0.0.0.0 Mask:255.255.0.0
inet6 addr: fe80::42:acff:fe11:2%32571/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:6 errors:0 dropped:0 overruns:0 frame:0
TX packets:3 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:508 (508.0 B) TX bytes:258 (258.0 B)
...
The 172.17.0.2
address is assigned by Docker from the docker0
interface.
The issue with these Docker-private addresses is that they cannot be used to talk between AWS instances, as they are not
routed. For instances to talk to each other, the host’s eth0
has to be used (from the 172.31.0.0
subnet).
To do this, JGroups has an attribute external_addr
in the transport. In a configuration, the following transport
snippet would enable JGroups applications in network-bridged Docker containers to communicate:
<TCP external_addr="${ext-addr:172.31.14.130}"
bind_addr="match-interface:eth0"
...
/>
This means that JGroups will bind to eth0
(172.17.0.2
), but advertize its address as 172.31.14.130
, so
other instances can talk to the instance.
How to find out the hosts address is implementation-dependent;
in AWS curl http://169.254.169.254/latest/meta-data/local-ipv4
returns the IP address of the EC2 instance.
An alternative is to grab the IP address before starting the Docker container and pass it to the container as an
environment variable (e.g. -e EXT-ADDR=1.2.3.4
). The container’s entrypoint could then start JGroups by passing
system property -Dext-addr=1.2.3.4
which will set external_addr
in TCP.
The host
network is started as follows:
docker run --network=host -it -p 7800:7800 --rm belaban/jgroups
.
This means that the started container has access to the same interfaces as present on the host. This can be confirmed
by executing ifconfig
inside the container:
bash-4.3$ ifconfig
docker0 Link encap:Ethernet HWaddr 02:42:14:CB:41:B6
inet addr:172.17.0.1 Bcast:172.17.255.255 Mask:255.255.0.0
inet6 addr: fe80::42:14ff:fecb:41b6/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:4 errors:0 dropped:0 overruns:0 frame:0
TX packets:14 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:224 (224.0 B) TX bytes:1068 (1.0 KiB)
eth0 Link encap:Ethernet HWaddr 02:50:00:00:00:01
inet addr:192.168.65.3 Bcast:192.168.65.255 Mask:255.255.255.0
inet6 addr: fe80::50:ff:fe00:1/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:37 errors:0 dropped:0 overruns:0 frame:0
TX packets:47 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:3510 (3.4 KiB) TX bytes:3556 (3.4 KiB)
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:2 errors:0 dropped:0 overruns:0 frame:0
TX packets:2 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1
RX bytes:140 (140.0 B) TX bytes:140 (140.0 B)
veth4ce4999 Link encap:Ethernet HWaddr 56:5B:F4:6B:D1:20
inet6 addr: fe80::545b:f4ff:fe6b:d120/64 Scope:Link
inet6 addr: fe80::545b:f4ff:fe6b:d120/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:21 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 B) TX bytes:1618 (1.5 KiB)
...
As can be seen, eth0
has an address from the hosts address range. This means that the JGroups configuration doesn’t
require an external_addr
attribute and can simply define bind_addr
:
<TCP
bind_addr="match-interface:eth0"
...
/>
This will bind the transport’s sockets to 172.31.14.130
.
The JGroups demos can be run as multiple containers forming a cluster on the same local box (in bridged or host network mode), or across boxes in the local network (in host network).
To run containers locally, the configuration used by JGroups uses IP multicasting,
as shown in ./conf/udp.xml
(abridged):
<config>
<UDP
bind_addr="match-interface:eth0,match-interface:en0,site_local,loopback"
/>
<PING />
<MERGE3 max_interval="30000" min_interval="10000"/>
<FD_SOCK/>
<FD_ALL timeout="10000" interval="3000"/>
<pbcast.NAKACK2/>
<UNICAST3 />
<pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"
max_bytes="8m"/>
<pbcast.GMS print_local_addr="true" join_timeout="3000"
view_bundling="true"/>
<UFC max_credits="2M" min_threshold="0.4"/>
<MFC max_credits="2M" min_threshold="0.4"/>
<FRAG2 frag_size="60K" />
</config>
This configuration uses IP multicasting (UDP
as transport) and multicast discovery (PING
as discovery protocol).
It binds to eth0
if found, if not tries to bind to en0
(Macs), then tries to find a site local IP address, and
falls back to lookback (127.0.0.1
) if all preceding addresses didn’t match.
Run multiple container as follows:
docker run -it --rm --network=host belaban/jgroups
In the container, there’s a readme which describes how to run the demos, e.g. Chat. Also see Demos below for details on the demos. To run the Chat demo, run
# udp.xml is the default so -props can be omitted chat.sh -props udp.xml -name A
This should create a cluster of Chat nodes, even across hosts due to --network=host
.
If IP multicasting is not supported, one can always fall back to TCP, but this means copying tcp.xml
(for example)
from the JGroups distribution into the Docker container and modifying it (e.g. change the discovery protocol). The
JGroups manual at http://www.jgroups.org/manual4/index.html shows how to do this.
This section shows how to run Docker containers with the JGroups demo on AWS. Contrary to running locally, we have to use a security policy to define which traffic (TCP/UDP/ICMP) to send/receive on which ports.
Note
|
Alternative ways of running JGroups in AWS include (1) creating/customizing an EC2 image and then running a host
with the image directly (without docker), or (2) using EC2 Container Service (ECS),
which runs docker images on a number of EC2 instances. The reason (1) is not discussed here is lazyness :-) I did not want to go through the rather long turnaround times of iteratively creating and customizing an AMI. Instead, creating and customizing a docker image and running/testing it locally made for much faster turnaround times! Running on ECS (option 2) is similar to what’s described here, except that there’s no need to spin up EC2 instances manually, as this is done by ECS when defining a cluster. |
Since IP multicasting is not supported, we also have to switch to TCP as transport and also change the discovery protocol (see Discovery below).
Finally, if we use S3 for discovery, this requires an IAM role assigned to the container that permits creation, deletion, reading and writing of S3 buckets. These topics are discussed below.
The Amazon image used for running the demo contains a bare bones Linux plus the Docker software: ami-92659c84
.
Search for amazon-ecs-optimized
to find it. Of course, any other AMI that contains Docker can be used.
First, an EC2 instance is created with this AMI, then we need to ssh into it and start the Docker container.
The security policy used for the demo permits all traffic from any protocol on any port and from/to any address. This is fine for a demo, but of course not recommended for production.
Alternatively, using no security group is the same as above.
The selected security policy needs to be associated with the EC2 instance when it is started.
If S3 (e.g. S3_PING
or NATIVE_S3_PING
) is used for discovery in the JGroups configuration, then the EC2 instance
needs to be started with a role that permits access to S3 buckets, specifically creation of buckets, reading objects
from buckets and writing objects to buckets (deletion is currently not used).
The role that I used for the demo was AmazonS3FullAccess
which (as its name suggests) has all permissions regarding
S3 buckets.
The task of discovery is for a new cluster node to find the coordinator of the cluster to join, and send it a join
request. Whereas PING
sends a multicast and everyone responds with the coordinator’s address and information about
themselves, this cannot be done in a cloud where IP multicasting usually isn’t supported.
Therefore, we have to resort to other ways of running discovery. They’re discussed in the following sections.
NATIVE_S3_PING is a separate protocol developed at https://github.com/jgroups-extras/native-s3-ping. It uses the Amazon S3 Java SDK to access buckets in S3 which store information about cluster members.
The advantage over S3_PING is that no credentials
(AWS_ACCESS_KEY
, AWS_SECRET_ACCESS_KEY
) needs to be passed to the EC2 instance on startup, but rather the
credentials of the user which started the EC2 instance are used.
There’s a configuration ./conf/aws.xml
which includes this protocol:
<config>
<TCP
external_addr="${JGROUPS_EXTERNAL_ADDR:match-interface:eth0}"
bind_addr="site_local,match-interface:eth0"
bind_port="${TCP_PORT:7800}"
/>
<!--
Uses an S3 bucket to discover members in the cluster.
- If "mybucket" doesn't exist, it will be created (requires permissions)
-->
<org.jgroups.aws.s3.NATIVE_S3_PING
region_name="${S3_REGION:us-east-1}"
bucket_name="${S3_BUCKET:mybucket}"
/>
<MERGE3 max_interval="30000" min_interval="10000"/>
<FD_SOCK external_addr="${JGROUPS_EXTERNAL_ADDR}"
start_port="${FD_SOCK_PORT:9000}"/>
<FD_ALL timeout="10000" interval="3000"/>
<pbcast.NAKACK2/>
<UNICAST3/>
<pbcast.STABLE desired_avg_gossip="50000"
max_bytes="8m"/>
<pbcast.GMS print_local_addr="true" join_timeout="3000"
view_bundling="true"/>
<UFC max_credits="2M" min_threshold="0.4"/>
<MFC max_credits="2M" min_threshold="0.4"/>
<FRAG2 frag_size="60K" />
</config>
Attributes bind_addr
and external_addr
were discussed above. Note that the latter is not required if the Docker
container is started with --network=host
.
The attributes used by NATIVE_S3_PING
are region_name
and bucket_name
. The latter defines the bucket that will
be used to store information about the members of this cluster. All objects created in this bucket are prefixed by
the cluster name ("draw" in the example), e.g. mybucket/draw-126532-A.list
. The former defines the region in which
the bucket is located.
Note
|
If a bucket doesn’t exist, a new one will be created. Since bucket names have a global name space, a bucket that
already exists for a different user will throw an exception. It is therefore recommended to create a bucket up-front
and use it as bucket_name .
|
Both region and bucket can be overridden by passing system properties S3_REGION
and S3_BUCKET_NAME
to the JGroups
demo. Similar to passing ext-addr
(see above), the docker container could be started with two environment variables
for region and bucket name and they could then be read from the environment by a script that passes them as env vars
to the demo.
S3_PING
is discussed in the JGroups manual at http://www.jgroups.org/manual4/index.html#_s3_ping.
TCPGOSSIP is a discovery protocol which stores
information about cluster nodes in one or more GossipRouters
, which are separate processes acting as lookup services.
The configuration of TCPGOSSIP
needs to include the addresses of the GossipRouters, e.g.:
<TCP .../>
<TCPGOSSIP
initial_hosts="GR1[12001],GR2[12001]"
/>
This means that there are GossipRouters running on GR1
and GR2
at port 12001
, and TCPGOSSIP
will register
each member with both processes.
The GossipRouter processes can be started inside Docker container, too, e.g. using the image available at https://hub.docker.com/r/jboss/jgroups-gossip.
TCPPING lists all cluster nodes in a static list, e.g.
<TCP bind_port="7800" .../>
<TCPPING
initial_hosts="HostA[7800],HostB[7800],HostC[7800],..."
/>
The problem here is that the IP addresses of HostA
, HostB
, etc have to be known before starting any of the EC2
instances. This requires either fixed addresses, or addresses mapped to a DNS service, e.g. Amazon Route 53.
An alternative is to use AWS Elastic IP addresses, where the address of each cluster node is fixed and all of the potential members of the cluster are known ahead of starting the cluster.
Another alternative is to create a Virtual Private Network (VPC), e.g. 172.45.0.0
and assign addresses from
block 172.45.0.1
- 172.45.0.100
. This means that TCPPING.initial_hosts
needs to have all 100 members of this
block listed.
Another way to run discovery on AWS is Elastic File System (EFS). EFS is a distributed file system, accessible by all cluster nodes. Any creation of modification of a file by a cluster member is visible by every other member.
FILE_PING can therefore be used for discovery; the location
attribute needs to point to a directory mounted by EFS. Reads and writes are performed by EFS.
After spinning up an EC2 instance, we’re now ready to run the Docker container with image belaban/jgroups
.
The JGroups configuration in the Docker image is ./conf/aws.xml
, listed in NATIVE_S3_PING. It requires
ports 7800 (used by TCP) and 9000 (used by FD_SOCK) to be published; therefore the command to start the container is:
docker run -it --rm --network=host -p 7800:7800 -p 9000:9000 belaban/jgroups
-
-it
: starts an interactive shell (TTY) so we can interact with (e.g.) the Chat demo -
--rm
: removes the container when done -
--network=host
: picks ahost
network. If omitted, abridge
network would be picked by default -
-p 7800:7800 -p 9000:9000
: publishes ports7800
to7800
and9000
to9000
. -
belaban/jgroups
: the Docker images hosted on dockerhub.com
Note
|
When using a host network, publishing the ports is not necessary, therefore the -p 7800:7800 -p 9000:9000
option can be omitted.
|
When the Docker container has been started, the entrypoint (/bin/bash
) shows a readme which explains how to start the
demos. To for example run Chat, the following command has to be executed:
chat.sh -props aws.xml -name A -b bucket-name
-
-props aws.xml
: this instructs Chat to useaws.xml
(described above) as configuration -
-name A
: gives each cluster node a unique name. If omitted, a random name will be picked -
-b bucket-name
: the name of the S3 bucket. Will be created if it doesn’t exist. Note that an exception will be thrown if that name is already in use by a different project.
There is a separate project jgroups-google which uses Google
Cloud Storage for discovery, and provides GOOGLE_PING2
as discovery protocol.
It is meant to be used with Google Compute Engine nodes that are started manually. To
run JGroups in Google Container Engine (GKE), which uses Kubernetes under the covers, we recommend to use
KUBE_PING
instead.
Refer to http://belaban.blogspot.ch/2017/05/running-infinispan-cluster-with.html for step-by-step instructions on how to run a JGroups cluster on GKE.
To run the image directly, execute
docker run -it --rm --network=host belaban/jgroups
or
docker run -p 7800:7800 -p 9000:9000 -it --rm belaban/jgroups
To build the image, run
docker build .
The demos are described below. The idea is to run the demo apps in a container each on the same host and they will form a cluster.
To run it:
chat [-props config] [-name name] [-b bucketname], e.g. chat -props ./udp.xml -name A
Run the Chat application in multiple containers on the same host and they will form a cluster. Typing a message into one Chat will send it to all other chats
Distributed locks are implementations of java.util.concurrent.locks.Lock
and provide locks that can be
accessed from all nodes in a cluster.
A typical use case is to lock a resource so that only 1 thread in a given node in the cluster can access it. Should a node crash while holding a lock, the lock is released immediately.
For more details, see the section on distributed locks at http://www.jgroups.org/manual/index.html#LockService.
To run the lock demo, type:
lock [-name name]
Typing help
into the shell shows a few commands:
[jgroups@b21d0fa6c79d ~]$ lock -name A
------------------------------------------------------------------- GMS: address=A, cluster=lock-cluster, physical address=172.17.0.178:52519 ------------------------------------------------------------------- : help
LockServiceDemo [-props properties] [-name name] Valid commands: lock (<lock name>)+ unlock (<lock name> | "ALL")+ trylock (<lock name>)+ [<timeout>]
Example: lock lock lock2 lock3 unlock all trylock bela michelle 300 :
If you start instances A and B, you can try out the following:
-
A:
lock printer
-
B:
lock printer
// will block -
A:
unlock printer
// now B will get the lock
Or a lock holder can be killed:
-
A:
lock printer
-
B:
lock printer
-
Kill A. B will now get the lock on "printer"
Distributed counters are counters will can be atomically incremented, decremented, compare-and-set etc across a cluster.
To run the demo:
count [-name name]
Run multiple instances in different containers. The demo uses a counter named "mycounter" and there’s a command prompt which shows the commands to be executed.
Questions can be asked on the users or dev mailing lists: https://sourceforge.net/p/javagroups/mailman.
Enjoy !
Bela Ban