Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test Isolation using linux namespaces #277

Open
wants to merge 39 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
0300f03
WIP Test Isolation using linux namespaces
sloretz May 10, 2023
45dc291
exec from command line
sloretz May 17, 2023
407365a
Bring up loopback interface
sloretz May 17, 2023
eadaff4
Move linux namespace creation to separate library
sloretz May 17, 2023
01b3f35
Rename some things
sloretz May 18, 2023
6f1e200
grammar
sloretz May 18, 2023
48f2b79
ros_cc_test isolated by default
sloretz May 19, 2023
5093486
WIP CPython extension for isolation
sloretz May 19, 2023
288000b
First commit where Python tests seem to work
sloretz May 31, 2023
7892d86
Remove unused file
sloretz May 31, 2023
4958e7f
linters
sloretz May 31, 2023
7499342
lint
sloretz May 31, 2023
2a22e42
Add python_dev repo
sloretz May 31, 2023
8f015db
Try to fix one CI job by updating apt package lists
sloretz Jun 20, 2023
f279ccf
Move where we apt-get update to later
sloretz Jun 20, 2023
9e2ecdf
Add debugging step to find missing shared library
sloretz Jun 20, 2023
f7c2c29
fixes rosdep linter
adityapande-1995 Jul 17, 2023
9c5ffce
Fixed CI error
adityapande-1995 Jul 17, 2023
4106361
Cleanup
adityapande-1995 Jul 17, 2023
e03cfb2
Removed apt update step in workflow
adityapande-1995 Jul 20, 2023
430fed7
sandbox_debug flag to CI
adityapande-1995 Jul 20, 2023
bab9094
Update workflow to save space
adityapande-1995 Jul 24, 2023
31c102b
Bazel clean workspace
adityapande-1995 Jul 24, 2023
454ef51
Removed sandbox debug and added bazel clean without expunge
adityapande-1995 Jul 24, 2023
3432e81
Added a test case for network_isolation
adityapande-1995 Aug 2, 2023
fb9703d
Typo
adityapande-1995 Aug 2, 2023
a34408a
Linter
adityapande-1995 Aug 2, 2023
68672d8
Merge remote-tracking branch 'upstream/main' into sloretz__isolate_wi…
adityapande-1995 Sep 27, 2023
21fe89b
Free up disk space
adityapande-1995 Sep 27, 2023
8ce8456
Added a readme
adityapande-1995 Oct 10, 2023
ea13063
readme minor edits
adityapande-1995 Oct 10, 2023
e47cfbc
readme minor edits
adityapande-1995 Oct 10, 2023
8d84a10
Added suggestions
ahcorde Mar 22, 2024
3744256
Merge pull request #1 from ahcorde/ahcorde/sloretz__isolate_with_linu…
adityapande-1995 Mar 22, 2024
bed993e
Merge remote-tracking branch 'upstream/main' into sloretz__isolate_wi…
adityapande-1995 Mar 22, 2024
c2b9823
suggestions to network isolation
ahcorde Mar 25, 2024
d000c8b
make linters happy
ahcorde Mar 25, 2024
c6777f5
Merge pull request #2 from ahcorde/ahcorde/suggestion_network_isolation
adityapande-1995 Mar 25, 2024
ad1131b
Update bazelized_drake_ros.yml
adityapande-1995 Mar 25, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .github/workflows/bazelized_drake_ros.yml
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,13 @@ jobs:
run: du -hs $(readlink -f ~/.cache/bazel_ci) || true

# Setup.
- name: Free up disk space
run: |
sudo rm -rf \
/usr/share/dotnet \
/opt/ghc \
/usr/local/share/boost \
"$AGENT_TOOLSDIRECTORY" || true
- name: Simplify apt upgrades
run: .github/simplify_apt_and_upgrades.sh
- name: Configure drake_ros Bazel for CI
Expand Down
4 changes: 4 additions & 0 deletions bazel_ros2_rules/WORKSPACE
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,7 @@ http_archive(
load("@bazel_skylib//:workspace.bzl", "bazel_skylib_workspace")

bazel_skylib_workspace()

load("//deps:defs.bzl", "add_bazel_ros2_rules_dependencies")

add_bazel_ros2_rules_dependencies()
33 changes: 33 additions & 0 deletions bazel_ros2_rules/network_isolation/BUILD.bazel
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
cc_library(
name = "network_isolation_cc",
srcs = ["network_isolation.cc"],
hdrs = ["network_isolation.h"],
visibility = ["//visibility:public"],
)

cc_binary(
name = "isolate",
srcs = ["isolate.cc"],
visibility = ["//visibility:public"],
deps = [
":network_isolation_cc",
],
)

# Create a CPython extension
cc_binary(
name = "network_isolation_py.so",
srcs = ["network_isolation_py.cc"],
linkshared = True,
linkstatic = True,
deps = [
":network_isolation_cc",
"@python_dev//:headers",
],
)

py_library(
name = "network_isolation_py",
data = [":network_isolation_py.so"],
visibility = ["//visibility:public"],
)
103 changes: 103 additions & 0 deletions bazel_ros2_rules/network_isolation/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
# Introduction
This directory contains tools and targets to isolate ros2 tests in this repository using linux network namespaces.
At its core, it uses the ``unshare()`` system call, and is meant to isolate ROS2 traffic. It creates a new user namespace,
new network namespace to prevent cross talk via the network, and new IPC namespace to prevent cross talk using shared memory.

The existing dload_shim is used to pass on the required argument and isolate the tests.

## Why do we need this ?
When running ROS2 tests in parallel, they might publish on the same topics and there might be cross talk between tests. RMW config or ROS domain id based isolation is possible,
but it does not scale well, due to limited ports and domain ids available. Linux namespaces provide a generic and a scalable way to solve this problem.

## Why not isolate individual targets instead of tests ?
Isolation using the namespace approach requires 3 namespaces, or "credentials" for processes to talk to each other, or be in the same realm : IPC, user and network namespaces.
Tests are more generic than individual targets, as the tests are free to fork() or run any number of processes they want, and they'll live in the same namespace.

If we do isolate a process 'A', how do we make sure the next process 'B' will live in the same namespace as 'A' (if they're meant to talk to each other) ? There needs to be an API to provide the above mentioned "credentials" to the new process
somehow, which out of scope of this feature for now.

# Targets
The logic lives in the following targets, which are meant to be used with the bazel rules ``ros_cc_test()`` and ``ros_py_test()`` :
* ``network_isolation_cc`` (cc_library): This is where the core logic lives, and the ``unshare()`` call is run.
* ``network_isolation_py.so``(cpython extension) : Python binding for the network isolation logic.
* ``network_isolation_py`` (py_library) : Importable python module for the network isolation logic.

Other than these, there is a standalone executable target called ``isolate`` which is meant to be used in a standalone way, and not with the
``ros_*_test()`` rules. It isolates the process in the first argument provided to it. This uses the same ``unshare()`` logic as ``network_isolation_cc``.

# How do we use this feature ?
There are 3 ways to use this feature :

## Using ``ros_cc_test()`` rule :
There is now an extra argument (``network_isolation``) available to the rule, so for e.g. we can modify a test in ``ros2_example_bazel_installed/ros2_example_apps/BUILD.bazel`` as :

```
ros_cc_test(
name = "talker_listener_cc_test",
size = "small",
srcs = ["test/talker_listener.cc"],
rmw_implementation = "rmw_cyclonedds_cpp",
network_isolation = True,
deps = [
":listener_cc",
":talker_cc",
"@ros2//:rclcpp_cc",
"@ros2//:std_msgs_cc",
"@ros2//resources/rmw_isolation:rmw_isolation_cc",
],
)
```

Whatever processes are spawned by this test will be contained in a namespace, and will be able to talk to each other, but not to any other ros nodes running outside of this test.

## Using the ``ros_py_test()`` rule :
Similarly for the python test rule, we can add ``network_isolation`` to ``True``. Consider this section in ``drake_ros_examples/examples/iiwa_manipulator/BUILD.bazel`` :

```
ros_py_test(
name = "iiwa_manipulator_test",
network_isolation = True,
srcs = ["test/iiwa_manipulator_test.py"],
data = [
":iiwa_manipulator",
":iiwa_manipulator_py",
],
main = "test/iiwa_manipulator_test.py",
deps = [
"@ros2//resources/bazel_ros_env:bazel_ros_env_py",
],
)
```

Similarly, the ros nodes spawned in this test can only talk to each other, and not other nodes running on the system at the same time, even if you have multiple instances of this test running in different terminals.


## Using the ``isolate`` target:
As mentioned before, the ``isolate`` target is meant to be used in a generic way and will isolate **any** process provided to it provided it is available in the bazel sandbox, for e.g. :
```
cd bazel_ros2_rules
bazel run //network_isolation:isolate -- /bin/bash -c "echo HELLO_WORLD"
```

Here, the bash command for printing "HELLO_WORLD" is isolated, and can be replaced with any ros2 command. For instance, if you have ros2 humble installed
outside of drake-ros, using debs, you could run a publisher and subscriber that would be isolated from each other :

Lets start the talker from the demo nodes package:
```
bazel run //network_isolation:isolate -- /bin/bash -c "source /opt/ros/humble/setup.bash && export RMW_IMPLEMENTATION=rmw_cyclonedds_cpp && ros2 run demo_nodes_cpp talker"
```

In another terminal, run :
```
bazel run //network_isolation:isolate -- /bin/bash -c "source /opt/ros/humble/setup.bash && export RMW_IMPLEMENTATION=rmw_cyclonedds_cpp && ros2 run demo_nodes_cpp listener"
```

These 2 processes should not be able to talk to each other, when used with ``isolate``.

# Testing this feature in CI
Among the 3 ways to use this feature, as far as CI is concerned, there is a test for the ``isolate`` target mechanism.

The test target is called ``network_isolation_test`` and uses ``ros2_example_bazel_installed/test/network_isolation_test.py`` to run 5 (by default) talker-listener pairs which have an id attached to them.
They all publish and listen on the same topic at the same time, and expect no cross talk between the pairs. The number of pairs can be changed using the ``--id`` cmdline argument if needed.

The other 2 ways to use this feature, using ``ros_*_test()`` rules, run on tests, and not on executable targets, so *writing a test for a test* seems like an anti-pattern. The ``isolate`` target uses the same logic, so should be sufficient for testing.
35 changes: 35 additions & 0 deletions bazel_ros2_rules/network_isolation/isolate.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
#include <unistd.h>

#include <iostream>
#include <vector>

#include "network_isolation.h"

void die(const char * message) {
std::cerr << "isolate: " << message << ".\n";
exit(-1);
}

int main(int argc, char ** argv) {
if (argc < 2) {
die("shim must be given a command to execute");
}

if (!network_isolation::create_linux_namespaces()) {
die("Failed to fully create isolated environment");
}

// Copy to a new array that terminates with a null pointer at the end.
std::vector<char *> new_argv;
for (int i = 1; i < argc; ++i) {
new_argv.push_back(argv[i]);
}
new_argv.push_back(nullptr);

// Exec a new process - should never return!
execv(new_argv.at(0), &new_argv.at(0));

perror("execv");
die("Call to execv failed");
return -1;
}
109 changes: 109 additions & 0 deletions bazel_ros2_rules/network_isolation/network_isolation.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
#include "network_isolation.h"

#include <iostream>

#include <ifaddrs.h>
#include <sched.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <arpa/inet.h>
#include <net/if.h>
#include <net/route.h>
#include <sys/ioctl.h>
#include <sys/types.h>

namespace network_isolation {

void error(const char * message)
{
std::cerr << "create_linux_namesapces: "
<< message << ":" << strerror(errno) << "\n";
}

bool create_linux_namespaces()
{
int result = unshare(CLONE_NEWUSER | CLONE_NEWNET | CLONE_NEWIPC);

if (result != 0) {
error("failed to call unshare");
return false;
}

// Assert there is exactly one network interface
struct ifaddrs *ifaddr;

if (-1 == getifaddrs(&ifaddr)) {
error("could not get network interfaces");
return false;
}
if (nullptr == ifaddr) {
error("there are no network interfaces");
return false;
}
if (nullptr != ifaddr->ifa_next) {
error("there are multiple network interfaces");
return false;
}

// Need a socket to do ioctl stuff on
int fd = socket(AF_INET, SOCK_DGRAM, 0);
if( fd < 0 ){
error("could not open a socket");
freeifaddrs(ifaddr);
return false;
}

struct ifreq ioctl_request;

// Check what flags are set on the interface
strncpy(ioctl_request.ifr_name, ifaddr->ifa_name, IFNAMSIZ);
int err = ioctl(fd, SIOCGIFFLAGS, &ioctl_request);
if (0 != err) {
freeifaddrs(ifaddr);
error("failed to get interface flags");
return false;
}

// Expecting a loopback interface.
if (!(ioctl_request.ifr_flags & IFF_LOOPBACK)) {
error("the only interface is not a loopback interface");
freeifaddrs(ifaddr);
return false;
}

// Enable multicast
ioctl_request.ifr_flags |= IFF_MULTICAST;
// Bring up interface
ioctl_request.ifr_flags |= IFF_UP;

err = ioctl(fd, SIOCSIFFLAGS, &ioctl_request);
if (0 != err) {
error("failed to set interface flags");
freeifaddrs(ifaddr);
return false;
}

// For programs that use both LCM and ROS, we need an LCM route ala
// sudo route add -net 224.0.0.0 netmask 240.0.0.0 dev lo
struct rtentry route = {};
auto* dest = reinterpret_cast<struct sockaddr_in*>(&route.rt_dst);
dest->sin_family = AF_INET;
dest->sin_addr.s_addr = inet_addr("224.0.0.0");
auto* mask = reinterpret_cast<struct sockaddr_in*>(&route.rt_genmask);
mask->sin_family = AF_INET;
mask->sin_addr.s_addr = inet_addr("240.0.0.0");
std::string device{"lo"};
route.rt_dev = device.data();
route.rt_flags = RTF_UP;
err = ioctl(fd, SIOCADDRT, &route);
if (0 != err) {
error("failed to set route");
freeifaddrs(ifaddr);
return false;
}

freeifaddrs(ifaddr);
return true;
}
} // namespace network_isolation
22 changes: 22 additions & 0 deletions bazel_ros2_rules/network_isolation/network_isolation.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
#pragma once

namespace network_isolation {
/// Creates linux namespaces suitable for isolating ROS 2 traffic.
///
/// The new namespaces are:
/// * A new user namespace to avoid needing CAP_SYS_ADMIN to create
/// network and IPC namespaces
/// * A new network namespace to prevent cross-talk via the network
/// * A new IPC namespaces to prevent cross-talk via shared memory
///
/// It also configures network namespace to enable ROS 2 traffic.
/// At the end of a successful call the current process will be in
/// the created namespaces.
/// Depending on what part of the process fails, an unsuccessful
/// call may also leave the current process in new namespaces.
/// There is no way to undo this.
///
/// \return true iff the namespaces were created successfully.
bool create_linux_namespaces();

} // namespace network_isolation
12 changes: 12 additions & 0 deletions bazel_ros2_rules/network_isolation/network_isolation_py.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
#include <pybind11/pybind11.h>

#include "network_isolation/network_isolation.h"

namespace py = pybind11;

PYBIND11_MODULE(network_isolation_py, m)
{
m.def("create_linux_namespaces", &network_isolation::create_linux_namespaces, R"pbdoc(
Creates linux namespaces suitable for isolating ROS 2 traffic.
)pbdoc");
}
6 changes: 6 additions & 0 deletions bazel_ros2_rules/ros2/ros_cc.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -166,6 +166,7 @@ def ros_cc_test(
cc_binary_rule = native.cc_binary,
cc_library_rule = native.cc_library,
cc_test_rule = native.cc_test,
network_isolation = False,
**kwargs):
"""
Builds a C/C++ test and wraps it with a shim that will inject the minimal
Expand Down Expand Up @@ -211,6 +212,7 @@ def ros_cc_test(
name = shim_name,
target = ":" + noshim_name,
env_changes = shim_env_changes,
network_isolation = network_isolation,
**shim_kwargs
)

Expand All @@ -220,4 +222,8 @@ def ros_cc_test(
deps = ["@bazel_ros2_rules//ros2:dload_shim_cc"],
tags = ["nolint"] + kwargs.get("tags", []),
)
if network_isolation:
kwargs["deps"].append(
"@bazel_ros2_rules//network_isolation:network_isolation_cc",
)
cc_test_rule(name = name, **kwargs)
6 changes: 6 additions & 0 deletions bazel_ros2_rules/ros2/ros_py.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -135,6 +135,7 @@ def ros_py_test(
rmw_implementation = None,
py_binary_rule = native.py_binary,
py_test_rule = native.py_test,
network_isolation = False,
**kwargs):
"""
Builds a Python test and wraps it with a shim that will inject the minimal
Expand Down Expand Up @@ -176,6 +177,7 @@ def ros_py_test(
name = shim_name,
target = ":" + noshim_name,
env_changes = shim_env_changes,
network_isolation = network_isolation,
**shim_kwargs
)

Expand All @@ -186,4 +188,8 @@ def ros_py_test(
deps = ["@bazel_ros2_rules//ros2:dload_shim_py"],
tags = ["nolint"] + kwargs.get("tags", []),
)
if network_isolation:
kwargs["deps"].append(
"@bazel_ros2_rules//network_isolation:network_isolation_py",
)
py_test_rule(name = name, **kwargs)
1 change: 1 addition & 0 deletions bazel_ros2_rules/ros2/tools/dload.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@ def get_dload_shim_attributes():
cfg = "target",
),
"env_changes": attr.string_list_dict(),
"network_isolation": attr.bool(default = False),
}

def _workaround_issue311(ament_prefixes, env_changes):
Expand Down
Loading
Loading