How do I Connect to Remote Shell Not Working Erlang

Erlang, a powerful concurrent programming language designed for building distributed and fault-tolerant systems, provides various tools for communication between nodes within a cluster. One such tool is the Erlang Remote Shell, or -remsh, which allows for connecting to a remote Erlang node from another Erlang instance. While it’s a straightforward process in theory, many developers face unexpected issues when attempting to connect to an Erlang remote shell, particularly in environments like Docker. This article dives deep into the potential causes of why Erlang remote shells fail to connect and offers a step-by-step guide on troubleshooting and enhancing the functionality of remote shell connections.

Erlang Remote Shell Basics

Before we dive into the troubleshooting, let’s break down the basic usage of the Erlang Remote Shell and what the command does:

./bin/erl -name 'remote@127.0.0.1' -remsh 'api@127.0.0.1'
  • -name option: This option is used to assign a name to the Erlang node. In this case, remote@127.0.0.1 is the name of the node running the remote shell.
  • -remsh option: This specifies the target node (in this case, api@127.0.0.1) to which the remote shell should connect.

When executed correctly, this command should connect the remote@127.0.0.1 node to the api@127.0.0.1 node, allowing you to interact with the api@127.0.0.1 node remotely.

Connecting to the Node

After initiating the remote shell, one of the first things you may do is verify that the node is reachable. This can be done with the net_adm:ping/1 command:

net_adm:ping('api@127.0.0.1').

The expected response is:

pong

This confirms that the api@127.0.0.1 node is reachable from the remote@127.0.0.1 node, and the connection has been successfully established.

Retrieving System Information

Another useful step is checking the Erlang system version to ensure that the environment is set up correctly. You can do this with:

erlang:system_info(system_version).

In a functioning environment, you should see something like:

"Erlang/OTP 17 [erts-6.1] [source] [64-bit] [smp:8:8] [async-threads:10] [hipe] [kernel-poll:false]"

This output confirms that the Erlang VM is running properly and that you’re working within the expected environment. If this version information is missing, it could indicate that the Erlang environment is not correctly initialized, which is one of the first signs of a potential problem.

The RPC Call

The rpc:call/4 function is another tool in your troubleshooting arsenal. It is used to invoke a function on a remote node:

rpc:call('api@127.0.0.1', erlang, node, []).

When successfully executed, it will return the node name:

'api@127.0.0.1'

If the RPC call is successful, it indicates that the remote shell connection is stable and can execute remote functions.

The Docker Environment Challenge

While the commands above typically work seamlessly in a local development environment (like Arch Linux), you may encounter issues when running Erlang in Docker containers. Docker containers come with network isolation, limited resource access, and specific networking rules, all of which can affect the behavior of remote Erlang nodes.

In your case, when the same Erlang commands are executed within a Docker container, you might notice that the system version is missing, suggesting that the Docker container may not have been initialized properly. This is a critical clue and suggests that something in the Docker setup is causing the issue.

Identifying the Root Cause: Network Isolation in Docker

Docker’s default network configuration isolates containers from each other and from the host system. As a result, 127.0.0.1 in one container points to that container itself, rather than the host or any other container. This causes problems when trying to connect Erlang nodes running in different containers or on the host system.

Solution: Use Container’s IP Address

One straightforward solution to this problem is to use the container’s IP address instead of 127.0.0.1 when specifying the node names. This ensures that the Erlang nodes can communicate with each other, even when running in separate containers.

To find the container’s IP address, you can use:

docker inspect -f '{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}' container_name

This will return the container’s internal IP address, which can be used in the -name and -remsh options:

./bin/erl -name 'remote@<container_ip>' -remsh 'api@<container_ip>'

By using the actual IP address of the container, you can ensure that the nodes can find each other and establish communication.

Node Authentication Issues

Erlang uses a concept called cookies for authenticating nodes. Both the remote@127.0.0.1 and api@127.0.0.1 nodes need to share the same cookie to authenticate and allow communication.

Solution: Set the Same Cookie for Both Nodes

When starting an Erlang node, the cookie is specified by the -setcookie flag:

./bin/erl -setcookie your_cookie_value

Make sure both your api@127.0.0.1 node and remote@127.0.0.1 node use the same cookie value. If the cookies are different, the nodes will not be able to connect, even if they are on the same network.

Handling Version Mismatches

One possible cause of issues is a mismatch between the Erlang versions running in the Docker container and the host system. If the versions are different, some functions may behave unexpectedly.

Solution: Ensure Matching Erlang Versions

To resolve this, you should ensure that the Erlang version running on both the Docker container and the host is the same. If necessary, reinstall Erlang in the Docker container to match the host system’s version.

You can check the Erlang version using:

erl -version

If there is a version mismatch, consider upgrading or downgrading Erlang in the Docker container to ensure compatibility.

Enhanced Functionality for Troubleshooting

In addition to the troubleshooting steps above, you can implement more advanced error-handling mechanisms and logging to make the remote shell connection more robust.

Node Connectivity Check Before Starting the Remote Shell

To prevent errors, it’s a good idea to check if the target node is reachable before attempting to start the remote shell. The following Erlang code performs this check:

-module(remote_shell).
-export([start/1]).

start(NodeName) ->
case net_adm:ping(NodeName) of
pong ->
io:format("Node ~s is reachable. Starting remote shell...~n", [NodeName]),
start_remote_shell(NodeName);
_ ->
io:format("Node ~s is not reachable.~n", [NodeName])
end.

start_remote_shell(NodeName) ->
erl:system("erl -name 'remote@127.0.0.1' -remsh '~s@127.0.0.1'", [NodeName]).

This script checks if the node is reachable before attempting to connect, which can save time and avoid unnecessary errors.

Implementing Retry Mechanism

Sometimes, network delays or temporary issues may prevent a successful connection. In such cases, you can implement a retry mechanism:

start(NodeName) ->
try_connect(NodeName, 3).

try_connect(_, 0) ->
io:format("Failed to connect after 3 retries.~n");
try_connect(NodeName, Retries) ->
case net_adm:ping(NodeName) of
pong -> io:format("Node ~s is reachable.~n", [NodeName]);
_ ->
io:format("Retrying connection to ~s (~s retries left)...~n", [NodeName, Retries]),
timer:sleep(1000),
try_connect(NodeName, Retries - 1)
end.

This script attempts to connect to the node up to three times, waiting one second between retries.

Conclusion

Erlang remote shell connectivity issues can be tricky, especially when running Erlang in isolated environments like Docker. By understanding the root causes, such as network isolation, node authentication, and version mismatches, you can resolve these issues effectively. Additionally, by adding practical functionality like connectivity checks, retry mechanisms, and detailed logging, you can ensure that your Erlang remote shells are more robust and easier to debug.

Related blog posts