Operation and Maintenance

One guiding principle behind the design of the runtime system is that bugs are more or less inevitable. Even if through an enormous effort you manage to build a bug free application you will soon learn that the world or your user changes and your application will need to be "fixed."

The Erlang runtime system is designed to facilitate change and to minimize the impact of bugs.

The impact of bugs is minimized by compartmentalization. This is done from the lowest level where each data structure is separate and immutable to the highest level where running systems are dived into separate nodes. Change is facilitated by making it easy to upgrade code and interacting and examining a running system.

Connecting to the System

We will look at many different ways to monitor and maintain a running system. There are many tools and techniques available but we must not forget the most basic tool, the shell and the ability to connect a shell to node.

In order to connect two nodes they need to share or know a secret pass phrase, called a cookie. As long as you are running both nodes on the same machine and the same user starts them they will automatically share the cookie (in the file $HOME/.erlang.cookie).

We can see this in action by starting two nodes, one Erlang node and one Elixir node. First we start an Erlang node called node1.

$ erl -sname node1
Erlang/OTP 19 [erts-8.1] [source-0567896] [64-bit] [smp:4:4]
              [async-threads:10] [hipe] [kernel-poll:false]

Eshell V8.1  (abort with ^G)
(node1@GDC08)1> nodes().
[]
(node1@GDC08)2>

Then we start an Elixir node called node2:

$ iex --sname node2
Erlang/OTP 19 [erts-8.1] [source-0567896] [64-bit] [smp:4:4]
              [async-threads:10] [hipe] [kernel-poll:false]

Interactive Elixir (1.4.0) - press Ctrl+C to exit (type h() ENTER for help)
iex(node2@GDC08)1>

In Elixir we can connect the nodes by running the command Node.connect name. In Erlang you do this with net_kernel:connect(Name). The node connection is bidirectional so you only need to run the command on one of the nodes.

iex(node2@GDC08)1> Node.connect :node1@GDC08
true
iex(node2@GDC08)2>

In the distributed case this is somewhat more complicated since we need to make sure that all nodes know or share the cookie. This can be done in three ways. You can set the cookie used when talking to a specific node, you can set the same cookie for all systems at start up with the -set_cookie parameter, or you can copy the file .erlang.cookie to the home directory of the user running the system on each machine.

The last alternative, to have the same cookie in the cookie file of each machine in the system is usually the best option since it makes it easy to connect to the nodes from a local OS shell. Just set up some secure way of logging in to the machine either through VPN or ssh. In the next section we will see how to then connect a shell to a running node.

Using the second option it might look like this:

happi@GDC08:~$ cat ~/.erlang.cookie
pepparkaka
happi@GDC08:~$ ssh gds01

happi@gds01:~$ erl -sname node3 -setcookie pepparkaka
Erlang/OTP 18 [erts-7.3] [source-d2a6d81] [64-bit] [smp:8:8]
              [async-threads:10] [hipe] [kernel-poll:false]

Eshell V7.3  (abort with ^G)
(node3@gds01)1> net_kernel:connect('node1@GDC08').
true
(node3@gds01)2> nodes().
[node1@GDC08,node2@GDC08]
(node3@gds01)3>

Note

A Potential Problem with Different Cookies Note that the default for the Erlang distribution is to create a fully connected network. That is, all nodes are connected to all other nodes in the network. In the example, once node3 connects to node1 it also is connected to node2. If each node has its own cookie you will have to tell each node the cookies of each other node before you try to connect them. You can start up a node with the flag -connect_all false in order to prevent the system from trying to make a fully connected network. Alternatively, you can start a node as hidden with the flag -hidden, which makes node connections to that node non transitive.

Now that we know how to connect nodes, even on different machines, to each other, we can look at how to connect a shell to a node.

The Shell

The Elixir and the Erlang shells works much the same way as a shell or a terminal window on your computer, except that they give you a terminal window directly into your runtime system. This gives you an extremely powerful tool, a basically CLI with full access to the runtime. This is fantastic for operation and maintenance.

In this section we will look at different ways of connecting to a node through the shell and some of the shell’s perhaps less known but more powerful features.

Configuring Your Shell

Both the Elixir shell and the Erlang shell can be configured to provide you with shortcuts for functions that you often use.

The Elixir shell will look for the file .iex.exs first in the local directory and then in the users home directory. The code in this file is executed in the shell process and all variable bindings will be available in the shell.

In this file you can configure aspects such as the syntax coloring and the size of the history. [See hexdocs for a full documentation.](https://hexdocs.pm/iex/IEx.html#module-the-iex-exs-file)

You can also execute arbitrary code in the shell context.

When the Erlang runtime system starts, it first interprets the code in the Erlang configuration file. The default location of this file is in the users home directory ~/.erlang.

This file is usually used to load the user default settings for the shell by adding the line

code:load_abs("/home/happi/.config/erlang/user_default").

Replace "/home/happi/.config/erlang/" with the absolute path you want to use.

If you call a local function from the shell it will try to call this function first in the module user_default and then in the module shell_default (located in stdlib). This is how command such as ls() and help() are implemented.

Connecting a Shell to a Node

When running a production system you will want to start the nodes in daemon mode through run_erl. We will go through how to start a node and some of the best practices for deployment and running in production in [xxx](#ch.live). Fortunately, even when you have started a system in daemon mode, without a shell, you can connect a shell to the system. There are actually several ways to do that. Most of these methods rely on the normal distribution mechnaisms and hence require that you have the same Erlang cookie on both machines as described in the previous section.

Remote shell (Remsh)

The easiest and probably the most common way to connect to an Erlang node is by starting a named node that connects to the system node through a remote shell. This is done with the erl command line flag -remsh Name. Note that you need to start a named node in order to be able to connect to another node, so you also need the -name or -sname flag. Also, note that these are arguments to the Erlang runtime so if you are starting an Elixir shell you need to add an extra - to the flags, like this:

$ iex --sname node4 --remsh node2@GDC08
Erlang/OTP 19 [erts-8.1] [source-0567896] [64-bit] [smp:4:4]
              [async-threads:10] [hipe] [kernel-poll:false]

Interactive Elixir (1.4.0) - press Ctrl+C to exit (type h() ENTER for help)
iex(node2@GDC08)1>

Another thing to note here is that in order to start a remote Elixir shell you need to have IEx running on that node. There is no problem to connect Elixr and Erlang nodes to each other as we saw in the previous section, but you need to have the code of the shell you want to run loaded on the node you connect to.

It is also worth noting that there is no security built into either the normal Erlang distribution or to the remote shell implementation. You do not want to have your system node exposed to the internet and you do not want to connect from your local machine to a node. The safest way is probably to have a VPN tunnel to your live environment and use ssh to connect a machine running one of your live nodes. Then you can connect to one of the nodes using remsh.

It is important to understand that there are actually two nodes involved when you start a remote shell. The local node, named node4 in the previous example and the remote node node2. These nodes can be on the same machine or on different machines. The local node is always running on the machine on which you gave the iex or erl command. On the local node there is a process running the tty program which interacts with the terminal window. The actual shell process runs on the remote node. This means, first of all, that the code for the shell you want to run (i.e. iex or the Erlang shell) has to exist at the remote node. It also means that code is executed on the remote node. And it also means that any shell default settings are taken from the settings of the remote machine.

Imagine that we have the following .erlang file in our home directory on the machine GDC08.

../code/ops_chapter/src/.erlang

And the <filename>user_default.erl</filename> file looks like this:

../code/ops_chapter/src/user_default.erl

Then we create two directories ~/example/dir1 and ~/example/dir2 and we put two different .iex.exs files in those directories.

../code/ops_chapter/src/example/dir1/.iex.exs

../code/ops_chapter/src/example/dir2/.iex.exs

Now if we start four different nodes from these directories we will see how the shell configurations are loaded.

GDC08:~/example/dir1$ iex --sname node1
Erlang/OTP 19 [erts-8.1] [source-0567896] [64-bit]
              [smp:4:4] [async-threads:10] [hipe] [kernel-poll:false]

ERTS is starting in /home/happi/example/dir1
 on [node1@GDC08]
Interactive Elixir (1.4.0) - press Ctrl+C to exit (type h() ENTER for help)
iEx starting in
/home/happi/example/dir1
iEx starting on
node1@GDC08
(node1@GDC08)iex<d1>

GDC08:~/example/dir2$ iex --sname node2
Erlang/OTP 19 [erts-8.1] [source-0567896] [64-bit]
              [smp:4:4] [async-threads:10] [hipe] [kernel-poll:false]

ERTS is starting in /home/happi/example/dir2
 on [node2@GDC08]
Interactive Elixir (1.4.0) - press Ctrl+C to exit (type h() ENTER for help)
iEx starting in
/home/happi/example/dir2
iEx starting on
node2@GDC08
(node2@GDC08)iex<d2>

GDC08:~/example/dir1$ iex --sname node3 --remsh node2@GDC08
Erlang/OTP 19 [erts-8.1] [source-0567896] [64-bit] [smp:4:4]
              [async-threads:10] [hipe] [kernel-poll:false]

ERTS is starting in /home/happi/example/dir1
 on [node3@GDC08]
Interactive Elixir (1.4.0) - press Ctrl+C to exit (type h() ENTER for help)
iEx starting in
/home/happi/example/dir2
iEx starting on
node2@GDC08
(node2@GDC08)iex<d2>

GDC08:~/example/dir2$ erl -sname node4
Erlang/OTP 19 [erts-8.1] [source-0567896] [64-bit] [smp:4:4]
              [async-threads:10] [hipe] [kernel-poll:false]

ERTS is starting in /home/happi/example/dir2
 on [node4@GDC08]
Eshell V8.1  (abort with ^G)
(node4@GDC08)1> tt().
test
(node4@GDC08)2>

The shell configuration is loaded from the node running the shell, as you can see from the previous examples. If we were to connect to a node on a different machine, these configurations would not be present.

You can actually change which node and shell you are connected to by going into job control mode.

Job Control Mode

By pressing control+G (ctrl-G) you enter the job control mode (JCL). You are then greeted by another prompt:

User switch command
 -->

By typing h (followed by enter) you get a help text with the available commands in JCL:

  c [nn]            - connect to job
  i [nn]            - interrupt job
  k [nn]            - kill job
  j                 - list all jobs
  s [shell]         - start local shell
  r [node [shell]]  - start remote shell
  q                 - quit erlang
  ? | h             - this message

The interesting command here is the r command which starts a remote shell. You can give it the name of the shell you want to run, which is needed if you want to start an Elixir shell, since the default is the standard Erlang shell. Once you have started a new job (i.e. a new shell) you need to connect to that job with the c command. You can also list all jobs with j.

(node2@GDC08)iex<d2>
User switch command
 --> r node1@GDC08 'Elixir.IEx'
 --> c
Interactive Elixir (1.4.0) - press Ctrl+C to exit (type h() ENTER for help)
iEx starting in
/home/happi/example/dir1
iEx starting on
node1@GDC08

See the [Erlang Shell manual](http://erlang.org/doc/man/shell.html) for a full description of JCL mode.

You can quit your session by typing ctrl+G q [enter]. This shuts down the local node. You do not want to quit with any of q()., halt(), init:stop(), or System.halt. All of these will bring down the remote node which seldom is what you want when you have connected to a live server. Instead use ctrl+\, ctrl+c ctrl+c, ctrl+g q [enter] or ctrl+c a [enter].

If you do not want to use a remote shell, which requires you to have two instances of the Erlang runtime system running, there are actually two other ways to connect to a node. You can also connect either through a Unix pipe or directly through ssh, but both of these methods require that you have prepared the node you want to connect to by starting it in a special way or by starting an ssh server.

Connecting through a Pipe

By starting the node through the command run_erl you will get a named pipe for IO and you can attach a shell to that pipe without the need to start a whole new node. As we shall see in the next chapter there are some advantages to using run_erl instead of just starting Erlang in daemon mode, such as not losing standard IO and standard error output.

The run_erl command is only available on Unix-like operating systems that implement pipes. If you start your system with run_erl, something like:

> run_erl -daemon log/erl_pipe log "erl -sname node1"

or

> run_erl -daemon log/iex_pipe log "iex --sname node2"

You can then attach to the system through the named pipe (the first argument to run_erl).

> to_erl dir1/iex_pipe

iex(node2@GDC08)1>

You can exit the shell by sending EOF (ctrl+d) and leave the system running in the background. Note that with to_erl the terminal is connected directly to the live node so if you exit with ctrl-G q [enter] you will bring down that node, probably not what you want.

The last method for connecting to the node is through ssh.

Connecting through SSH

Erlang comes with a built in ssh server which you can start on your node and then connect to directly. The [documentation for the ssh module](http://erlang.org/doc/man/ssh.html) explains all the details. For a quick test all you need is a server key which you can generate with ssh-keygen:

> mkdir ~/ssh-test/
> ssh-keygen -t rsa -f ~/ssh-test/ssh_host_rsa_key

Then you start the ssh daemon on the Erlang node:

gds01> erl
Erlang/OTP 18 [erts-7.3] [source-d2a6d81] [64-bit] [smp:8:8]
              [async-threads:10] [hipe] [kernel-poll:false]

Eshell V7.3  (abort with ^G)
1> ssh:start().
{ok,<0.47.0>}
2> ssh:daemon(8021, [{system_dir, "/home/happi/.ssh/ehost/"},
                     {auth_methods, "password"},
                     {password, "pwd"}]).

You can now connect from another machine:

happi@GDC08:~> ssh -p 8021 happi@gds01
happi@gds01's password: [pwd]
Eshell V7.3  (abort with ^G)
1>

In a real world setting you would want to set up your server and user ssh keys as described in the documentation. At least you would want to have a better password.

To disconnect from the shell you need to shut down your terminal window. Using q() or init:stop() would bring down the node. In this shell you do not have access to neither JCL mode (ctrl+g) nor the BREAK mode (ctrl+c).

The break mode is really powerful when developing, profiling and debugging. We will take a look at it next.

Breaking (out or in).

When you press ctrl+c you enter BREAK mode. This is most often used just to break out of the shell by either tying a [enter] for abort or by hitting ctrl+c once more. But you can actually use this mode to break in to the internals of the Erlang runtime system.

When you enter BREAK mode you get a short menu:

BREAK: (a)bort (c)ontinue (p)roc info (i)nfo (l)oaded
       (v)ersion (k)ill (D)b-tables (d)istribution

Abort exits the node and continue takes you back in to the shell. Hitting p [enter] will give you internal information about all processes in the system. We will look closer at what this information means in the next chapter (See [xxx](#ch.processes)).

You can also get information about the memory and the memory allocators in the node through the info choice (i [enter]). In [xxx](#ch.memory) we will look at how to decipher this information.

You can see all loaded modules and their sizes with l [enter] and the system version with v [enter], while k [enter] will let you step through all processes and inspect them and kill them. Capital D [enter] will show you information about all the ETS tables in the system and lower case d [enter] will show you information about the distribution. That is basically just the node name.

If you have built your runtime with OPPROF or DEBUG you will be able to get even more information. We will look at how to do this in [CH-BuildingERTS]. The code for the break mode can be found in <filename>[OTP_SOURCE]/erts/emulator/beam/break.c</filename>.

Note that going into break mode freezes the node. This is not something you want to do on a production system. But when debugging or profiling in a test system, this mode can help us find bugs and bottlenecks, as we will see later in this book.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ops.asciidoc

ops.asciidoc

Operation and Maintenance

Connecting to the System

The Shell

Configuring Your Shell

Connecting a Shell to a Node

Remote shell (Remsh)

Job Control Mode

Connecting through a Pipe

Connecting through SSH

Breaking (out or in).

Files

ops.asciidoc

Latest commit

History

ops.asciidoc

File metadata and controls

Operation and Maintenance

Connecting to the System

The Shell

Configuring Your Shell

Connecting a Shell to a Node

Remote shell (Remsh)

Job Control Mode

Connecting through a Pipe

Connecting through SSH

Breaking (out or in).