Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release #81

Merged
merged 4 commits into from
Mar 13, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
202 changes: 24 additions & 178 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,21 +3,15 @@
Blockchain networks in the wild are subject to a lot of real life variances that have historically been difficult to capture
in local or controlled tests. Chaos testing is a disciplined approach to testing a system by proactively simulating and
identifying failures. Attacknet is a tool that allows you to simulate these real life variances in a controlled environment.
Examples would include adding network latency between nodes, killing nodes at random, filesystem errors being returned.
Examples would include adding network latency between nodes, killing nodes at random, or filesystem latency.

The overall architecture of Attacknet relies on Kubernetes to run the workloads, [Kurtosis](https://github.com/kurtosis-tech/kurtosis) to orchestrate a blockchain network and
[Chaos Mesh](https://chaos-mesh.org/) to inject faults into it. Attacknet can then be configured to run healthchecks and
[Chaos Mesh](https://chaos-mesh.org/) to inject faults into nodes. Attacknet can then be configured to run healthchecks and
reports back the state of the network at the end of a test.

![docs/attacknet.svg](docs/attacknet.svg)

### TLDR; Capabilities
Attacknet can be used in the following ways:
- Manually creating test suites/network configs
- Manually running single tests against a network
- Using the planner feature to define a matrix of faults and targets to auto generate test files
- Running the test suites
- (WIP) Exploratory testing
### Capabilities

The faults supported by Attacknet include:
- Time based: Clock skew
Expand All @@ -27,6 +21,14 @@ The faults supported by Attacknet include:
- Stress based: CPU stress, Memory stress
- (WIP) Kernel based: Kernel faults

Attacknet can be used in the following ways:
- Manually creating specific faults that target nodes matching a criteria
- Genesis devnets of specific topologies using [Kurtosis](https://www.kurtosis.com/), then run faults against them.
- Use the planner to define a matrix of faults and targets, automatically generating the network topology and fault configuration.
- (WIP) Exploratory testing. Dynamically generate various faults/targeting criterion and run faults continuously.

See [DOCUMENTATION.MD](docs/DOCUMENTATION.MD) for specific usage examples.

## Getting started
### Installation/Building

Expand All @@ -49,177 +51,28 @@ The faults supported by Attacknet include:
7. In a separate terminal, run `kurtosis engine start`
8. In a separate terminal, run `kurtosis gateway`. This process needs to stay alive during all attacknet testing and cannot be started via SDK.

## Usage guides
## Manually creating/configuring test suites

Attacknet is configured using "test suites". These are yaml files found under `./test-suites` that define everything
Attacknet needs to genesis a network, test the network, and determine the health of the network. You may have to manually add/remove
targeting criteria from these configs depending on the network being tested.

Test suite configuration is broken into 3 sections:
- Attacknet configuration.
- Harness configuration. This is used to configure the Kurtosis package that will be used to genesis the network.
- Test configuration. This is used to determine which tests should be run against the devnet and how those tests
should be configured.

Here is an annotated test suite configuration that explains what each bit is for:
```yaml
attacknetConfig:
grafanaPodName: grafana # the name of the pod that grafana will be deployed to.
grafanaPodPort: 3000 # the port grafana is listening to in the pod
waitBeforeInjectionSeconds: 10
# the number of seconds to wait between the genesis of the network and the injection of faults. To wait for finality, use 25 mins (1500 secs)
reuseDevnetBetweenRuns: true # Whether attacknet should skip enclave deletion after the fault concludes. Defaults to false.
existingDevnetNamespace: kt-ethereum # Omit field for random namespace geneartion. If you want to reuse a running network, you can specify an existing namespace that contains a Kurtosis enclave and run tests against it.
allowPostFaultInspection: true # When set to true, Attacknet will maintain the port-forward connection to Grafana once the fault has concluded to allow the operator to inspect metrics. Default: true

harnessConfig:
networkPackage: github.com/crytic/ethereum-package # The Kurtosis package to deploy to instrument the devnet.
networkConfig: default.yaml # The configuration to use for the Kurtosis package. These live in ./network-configs and are referenced by their filename.
networkType: ethereum # no touchy

# The list of tests to be run before termination
testConfig:
tests:
- testName: packetdrop-1 # Name of the test. Used for logging/artifacts.
health:
enableChecks: true # whether health checks should be run after the test concludes
gracePeriod: 2m0s # how long the health checks will attempt to pass before marking the test a failure
planSteps: # the list of steps to facilitate the test, executed in order
- stepType: injectFault # this step injects a fault, the continues to the next step without waiting for the fault to terminate
description: "inject fault"
chaosFaultSpec: # The chaosFaultSpec is basically a pass-thru object for Chaos Mesh fault resources. This means we can support every possible fault out-of-the-box. To determine the schema for each fault type, check the Chaos Mesh docs: https://chaos-mesh.org/docs/simulate-network-chaos-on-kubernetes/. One issue with this method is that Attacknet can't verify whether your faultSpec is valid until it tries to create the resource in Kubernetes, and that comes after genesis which takes a long time on its own. If you run into schema validation issues, try creating these objects directly in Kubernetes to hasten the debug cycle.
kind: NetworkChaos
apiVersion: chaos-mesh.org/v1alpha1
spec:
selector:
labelSelectors:
kurtosistech.com/id: cl-1-lighthouse-geth-validator
mode: all
action: loss
duration: 1m
loss:
loss: '10'
correlation: '0'
direction: to
- stepType: waitForFaultCompletion # this step waits for all previous running faults to complete before continuing
description: wait for faults to terminate
```

Over the long term, expect manual fault configuration to be deprecated in favor of the fault planner and other automatic test
generation tools.

## Automatically creating test suites/network configs using the planner

Attacknet can automatically create test suites based off a pre-defined test plan. This can be used to create large, comprehensive test suites that test against a variety of different client combos. This feature is highly experimental at this time.

An example test plan can be found in the `planner-configs/` directory
Here's an annotated version:

```yaml
execution: # list of execution clients that will be used in the network topology
- name: geth
image: ethereum/client-go:latest
- name: reth
image: ghcr.io/paradigmxyz/reth:latest
consensus: # list of consensus clients that will be used in the network topology
- name: lighthouse
image: sigp/lighthouse:latest
has_sidecar: true
- name: prysm
image: prysmaticlabs/prysm-beacon-chain:latest,prysmaticlabs/prysm-validator:latest
has_sidecar: true
network_params:
num_validator_keys_per_node: 32 # required.
kurtosis_package: "github.com/kurtosis-tech/ethereum-package"
kubernetes_namespace: kt-ethereum
topology:
bootnode_el: geth # self explanatory
bootnode_cl: prysm
targets_as_percent_of_network: 0.25 # [optional] defines what percentage of the network contains the target client. 0.25 means only 25% of nodes will contain the client defined in the fault spec. Warning: low percentages may lead to massive networks.
target_node_multiplier: 2 # optional, default:1. Adds duplicate el/cl combinations based on the multiplier. Useful for testing weird edge cases in consensus
fault_config:
fault_type: ClockSkew # which fault to use. A list of faults currently supported by the planner can be found in pkg/plan/suite/types.go in FaultTypeEnum
target_client: reth # which client to test. this can be an exec client or a consensus client. must show up in the client definitions above.
wait_before_first_test: 300s # how long to wait before running the first test. Set this to 25 minutes to test against a finalized network.
fault_config_dimensions: # the different fault configurations to use when creating tests. At least one config dimension is required.
- skew: -2m # these configs differ for each fault
duration: 1m
grace_period: 1800s # how long to wait for health checks to pass before marking the test as failed
- skew: 2m
duration: 1m
grace_period: 1800s
fault_targeting_dimensions: # Defines how we want to impact the targets. We can inject faults into the client and only the client, or we can inject faults into the node (injects into cl, el, validator)
- MatchingNode
- MatchingClient
fault_attack_size_dimensions: # Defines how many of the matching targets we actually want to attack.
- AttackOneMatching # attacks only one matching target
- AttackMinorityMatching # attacks <33%
- AttackSuperminorityMatching # attacks >33% but <50%
- AttackMajorityMatching # attacks >50% but <66%
- AttackSupermajorityMatching # attacks >66%
- AttackAllMatching # attacks all
```

The total number of tests generated by a plan is equal to `len(fault_config_dimensions) * len(fault_targeting_dimensions) * len(fault_attack_size_dimensions)`

You can create a test plan by invoking `attacknet plan <suitename> <planner config path>`

The suite plan will be written to `./test-suites/plan/<suitename>.yaml`

The network config will be written to `./network-configs/plan/<suitename>.yaml`
## Usage/Configuration

and can be executed by attacknet using `attacknet start plan/suitename`
See [DOCUMENTATION.MD](docs/DOCUMENTATION.MD)

### Faults supported by planner

#### ClockSkew
Config:
```yaml
- skew: -2m # how far to skew the clock. can be positive or negative
duration: 1m # how long to skew the clock for
grace_period: 1800s # how long to wait for health checks to pass before marking the test as failed
```

#### RestartContainers
Config:
```yaml
- grace_period: 1800s # how long to wait for health checks to pass before marking the test as failed
```

#### IOLatency
Config:
```yaml
- grace_period: 1800s # how long to wait for health checks to pass before marking the test as failed
delay: 1000ms # how long the i/o delay should be
duration: 1m # how long the fault should last
percent: 50 # the percentage of i/o requests impacted.
```


## Running test suites

Once you've got your configuration set up, you can run Attacknet:

`attacknet start <suitename>`

If your suite config is located at `./test-suites/suite.yaml`, you would run `attacknet start suite`. This will
probably be changed.

Depending on the state of the Kurtosis package and tons of other variables, a lot of the example test suites/networks might not work out of the box.
If you're just trying to test things out, use `attacknet start suite`. This refers to a demo test suite that was tested on Jan 30.

## Contribution
## Contributing
This tool was developed as a collaboration between [Trail of Bits](https://www.trailofbits.com/) and the [Ethereum Foundation](https://github.com/ethereum/).
Thank you for considering helping out with the source code! We welcome contributions from anyone on the internet, and are grateful for even the smallest of fixes!

If this tool was used for finding bugs, please do ensure that the bug is reported to the relevant project maintainers or to the
If you use this tool for finding bugs, please do ensure that the bug is reported to the relevant project maintainers or to the
[Ethereum foundation Bug bounty program](https://ethereum.org/en/bug-bounty/). Please feel free to reach out to the tool
maintainers on Discord, Email or Twitter for any feature requests.

If you want to contribute to Attacknet, we recommend running pre-commit before making changes:

1. Install pre-commit
2. Run `pre-commit install`

When making pull requests, **please target the `develop` branch, not main.**

## Changelog

**TBD**
**March 18, 2024 version v0.4**

First public release!

Expand All @@ -236,6 +89,7 @@ First public release!
- Fixed an issue where the test planner's resultant network topology was non-deterministic
- Fixed an issue where a dropped port-forwarding connection to a pod may result in a panic
- Fixed an issue where Chaos Mesh would fail to find targets in networks with more than 10 nodes
- Updated for Kurtosis SDK v0.88.5

**Jan 30, 2024 version v0.3 (internal)**
- Fixed the demo example suite
Expand All @@ -256,11 +110,3 @@ First public release!

**Dec 15, 2023 version v0.1 (internal)**
- Initial internal release

## Developing (wip)

1. Install pre-commit
- `brew install pre-commit`
- `pre-commit install`

When making pull requests, target the `develop` branch, not main.
Loading
Loading