Releases: jepsen-io/jepsen
0.2.5
This release focuses on improved composition for generators (especially turning on and off faults over time), working with daemons that use wrapper scripts and break pidfile management, and more sophisticated membership nemeses. There's also a swath of minor bugfixes, utility functions, filled-out documentation, and ease-of-use improvements. Note in particular that the behavior of delay
and stagger
generators have changed: they no longer try to catch up when they fall behind, but instead schedule delays of approximately n seconds since the previous invocation. This should behave much better when composing delayed/staggered generators interleaved with long pauses, while retaining the property that you can mix multiple delayed generators together and still get each running at approximately the right rate.
Bugfixes
- control.util/install-archive! catches another type of corrupt zipfile error
- gen/until-ok previously allowed any OK event to trigger its completion. It now checks to make sure the completion is for an event it actually emitted.
- independent/tuple-gen tried to wrap operations which were not actually
:type :invoke
--which caused it to break when generators emitted:sleep
operations. It now only wraps invokes.
API Changes
- generator/stagger and delay no longer try to "catch up" when they fall behind; instead they schedule a mean delay between all operations. This lets them play nicer with generators that expect to toggle staggered/delayed generators on and off over time.
New Features
- control/env helps you construct environment variable strings out of literals and maps. This lets you pass things like {:AUTHOR "Foucault"} as the :env to control.util/start-daemon, rather than munging loads of string literals.
- util/await-fn calls a function repeatedly until it no longer throws, with configurable poll interval, timeouts, and logging. Helpful for blocking on long-running database setup.
- tests.cycle/append and wr checkers now use the :subdirectory option passed to
checker/check
to write Elle files into subdirectories - generator/cycle: cycles a generator n or infinitely many times
- generator/cycle-times: cycles through a rotating pool of generators on a fixed time schedule. Helpful for alternating between active and quiet periods of a nemesis.
- nemesis.membership.State now has a setup! and teardown! callback which can be used for storing e.g. network clients necessary for interacting with the cluster.
Minor Changes
- Remove :pure-generators requirement to confirm use of new generator style.
- Elle 0.1.3, Knossos 0.3.8, clojure 1.10.3, tools.cli 1.0.206, unilog 0.7.28, http-kit 2.5.3, ring 1.9.4, sshj 0.31.0, bouncycastle 1.69, fipp 0.6.24, dom-top 1.0.6, sshj 0.32.0
- checker.timeline no longer completes the history, which was, I think, unnecessary, and caused blowups on some test cases.
- nemesis.membership now has expanded documentation, and returns the underlying state atom in the package it constructs, so you can work with it.
- os.debian allows releaseinfo to change when executing apt update
- core/run! saves the test before tearing down the DB, which means if DB teardown crashes, you'll have at least some useful data to work with
- cli/parse-concurrency can now take arbitrary keys, making it easier to write tests that take multiple concurrency-related arguments
- Fixed some tests in control.util-test broken by an external FTP site going down
- control.util/start-daemon! has more complete documentation
- control.util/start-daemon! can now avoid using pidfiles and can instead check for
exec
- control.util/stop-daemon! can also avoid using pidfiles.
0.2.4
This release is all about automation. It introduces a new SSH backend based on SSHJ which is significantly faster than the current clj-ssh. This release also shells out to scp
for uploads and downloads, which is much, much faster than using clj-ssh or SSHJ. SSH errors are less frequent, and don't clog the logs with stacktraces.
For databases with expensive setup processes (especially those which need to be compiled from source), this release introduces jepsen.fs-cache
: a lightweight, concurrency-controlled, filesystem-backed cache for strings, Clojure data, and entire files. This cache is persistent across Jepsen invocations, so you can build a binary or perform initial datafile allocation once, cache it, and skip that process on subsequent test runs.
There's also a new checker which looks for patterns in downloaded log files. This is particularly helpful for catching stacktraces, panics, segfaults, etc.
API Changes
- In test SSH options,
:password*
is no longer used for sudo by default. To set a sudo password, set:sudo-password
. This fixes a (likely rare) issue wheresudo
would skip a password prompt, sending that password to the stdin of whatever command was being invoked instead. control/upload
anddownload
no longer take rest args, which used to be passed directly toclj-ssh
. These were unused in Jepsen itself, but you may have relied on this behavior. If so, you should call intoclj-ssh
directly.control.remote
has been moved tocontrol.core
, and has been restructured to take option maps instead of relying on dynamically bound variables. This should only affect you if you wrote a custom Remote implementation.
New Features
control.sshj
: a newRemote
backend for the control system. This is orders of magnitude faster thanclj-ssh
. Unfortunately, likeclj-ssh
, it also exhibits weird race conditions.control.scp
allows Jepsen to upload and download files by shelling out to SCP, which is dramatically faster for large files. This is the default for bothsshj
andclj-ssh
remotes.fs-cache
: a lightweight, local-filesystem-backed cache for Jepsen's control node. Well-suited for DBs that require an expensive build or setup process. Can cache strings, EDN structures, and remote files alike, and includes a basic locking mechanism.- A new checker,
log-file-pattern
, scans downloaded log files for given regular expressions. Handy for finding server crashes! cli/test-all-cmd
now merges opt specs liketest-cmd
does, allowing you to override default options.util/sh
: a wrapper for invoking local shell commands on the control node.
Bugfixes
control.util/tmp-file!
now creates/tmp/jepsen
if it doesn't already existcontrol.clj-ssh
(and the newsshj
backend) now include a concurrency-limiting semaphore, which prevents at least some (but not all) of the weird, nondeterministic bugs we've seen with session initiation.
Minor Changes
checker.timeline
is dramatically faster now: it uses a custom pretty-printer for events.- Large parts of
control
have been refactored intocontrol.core
,control.retry
, etc. to improve readability and composability - Docker and AWS environments now also set up ed25519 keys by default
- Lots of new tests for jepsen.control
- When test-all tests crash, we now display their full paths, not just test names
- Removed tea-time, a now-unused dependency
- Removed
:active-histories
: a now-unused part of test maps j.u.c.TimeoutException
is now considered an "uninteresting" exception when choosing which exception to throw from a concurrent failure; this should result in more helpful stacktraces.- Control no longer logs a full stacktrace when it encounters a recoverable exception. Users consistently complained about these kinds of errors: they happen constantly but unpredictably, I can't eliminate them, and they don't really require user action. We log a one-line message instead.
os/debian
no longer tries to install the oldlibzip2
package for Debian Jessienemesis.time
uses fewer samples for ntpdate, is generally faster to set upcontrol.util/await-tcp-port
can now take separate intervals for retry and logging: shorter latency, less log spam!
0.2.3
A very small point release with a few changes to support the upcoming Maelstrom 0.2.0 release.
Possibly breaking changes
- jepsen.core: perform analysis after OS/DB teardown, rather than before. This can have a big improvement on performance when DBs are expensive. This may break tests which, in their checker, snuck onto the nodes to execute commands directly--try moving DB-analyzing code to
db/teardown
instead. You really weren't supposed to rely on this order, but foolhardy individuals (read: probably me) may have done so anyway.
Minor changes
- cli: test and test-all can now take an
:opt-fn*
arg, which is not composed with the default opt fn. - checker.timeline uses overflow:hidden to avoid disaster rendering
- checker.timeline limited to 10,000 ops by default; keeps things usable on long histories
- jepsen.repl:
latest-test
no longer takes an (unused) argument - store/postprocess-fressian now walks inside atoms during deserialization
Bugfixes
- nemesis.combined: no longer generates nil grudges when not given explicit targets
- util/relative-time-origin's global (normally unused) value mistakenly used a docstring for a value, not
nil
. This didn't break anything unless you were trying to use relative time without having initialized it, in which case you got weird type errors.
0.2.2
This is an incremental release which includes a slew of ease-of-use improvements, minor utilities, and improved performance. Special attention has been paid to complex nemeses, with new support for lifting generators, nemeses, and packages thereof into new domains, so that nemeses can affect different parts of a system selectively.
API changes
- repl/last-test is now latest-test, and does not require a test name any more
- independent/subhistory yields vectors, rather than lazy seqs. This allows checkers to depend on fast and consistent
nth
andpeek
access to histories. - nemesis.combined/nemesis-package now includes all nemeses, even if you didn't ask for them. This change makes it easier for you to write your own generators on top of combined nemeses.
- nemesis.combined/compose's generator's now use
any
, rather thanmix
. This is less efficient, but allows for reproducible and controlled interleaving of generators with different timescales. This may change your test timing dynamics: composing "pauses every 30 seconds" and "crashes every 60 seconds" now results in pauses every 30 seconds and crashes every 60 seconds (~3 ops/minute), rather than (potentially) stalling a pause operation for up to 30 seconds until the crash generator returns. In general,:interval 10
now means "a roughly ten second interval for each package", rather than across all packages.
New features
- store/write-fressian-file! and read-fressian-file offer easy serialization for your own data structures
- nemesis/f-map: Like generator/f-map, lifts a nemesis so that it can operate on ops with transformed
:f
fields. Helpful for gluing together multiple copies of the same nemesis which act on, say, different subsets of nodes. - nemesis.combined/f-map: lifts an entire nemesis package: generator(s), nemesis, and
:perf
rendering specification, into a new set of:f
s. - control.util/await-tcp-port: blocks until a local port is bound. Super helpful for DB setup.
- control/upload-resource: uploads a JVM resource to a remote file
Bugfixes
- web now handles EDN tagged literals correctly, which fixes valid/invalid/unknown highlighting on test directories with nonstandard data structures written to results.edn
- nemesis/majorities-ring: generated asymmetric partitions with more than 5 nodes. It now generates symmetric partitions via a stochastic method for larger clusters, though they may not be perfect rings.
- checker.perf/with-nemeses: now works even when :xrange is unspecified
Minor changes
- os.debian/install can now take additional CLI flags, if you like
- control.net/local-ip no longer depends on
eth0
, and should work for other network interfaces. - reconnect no longer dumps extraneous exceptions to the log when interrupted, which should make reading Jepsen stacktraces much easier.
- cli now allows users to merge option specifications together, overriding the default options that ship with Jepsen
- nemesis.time is much faster to set up; it avoids compiling binaries from scratch every time
- nemesis.compose offers more detailed error messages for unrecognized
:f
s - nemeses are pretty-printed in more detail
- nemesis/noop now supports reflection, which makes it more suitable as a monoid identity
- nemesis.combined can combine packages without requiring a nemesis key, which can be handy for packages that emit ops handled by other packages' nemeses
- cli's
lein run analyze
command now merges the stored test on top of the CLI-constructed test, which should help tests which lazily store checker-relevant state in atoms - Added type hints, which speeds up a tight loop in the generator interpreter
- core uses Fipp for pprint now
- control memoizes the clj-ssh agent, which significantly speeds up creating SSH connections
- control.ssh-failed exceptions no longer throw exceptions without a cause
- checker.perf now lets
:f
be a data structure, not just a string - os.ubuntu: no longer installs libzip
- checker.perf now allows empty collections of data points. Being persnickety about this was, in retrospect, more obnoxious than helpful
- control.util/start-daemon! can now take a :env map to set environment variables
- control/&&: shortcut for shell
&&
- nemesis.combined: a new target,
:minority-third
, is helpful for systems which can survive the loss of fewer than 1/3rd of nodes.
0.2.1
This release firms up the changes made in 0.2.0. It resolves some serious bugs involving the generator system, as well as client & nemesis lifecycle management which snuck through 0.1.19 and 0.2.0 testing. It adds an experimental namespace to aid in structuring membership state machines--something Jepsen users have been writing by hand for years. We've moved to Debian Buster, and Pavlo Baron has contributed a kubernetes remote for jepsen.control. There are also some minor performance and ergonomic improvements: clients can optionally indicate that their internal state is safely re-usable after crashing, tests are printed more politely, and Jepsen logs the Git hash and command-line arguments necessary to reproduce a given test.
The tutorial has also been updated for Jepsen 0.2.x, and the Docker environment has been updated to construct Debian Buster nodes with full init systems. That means you can to run tests which use vendor-provided init scripts!
Bugfixes
- checker/counter incorrectly advertised that it supported decrements when it did not. This could have resulted in counter tests which reported failures when the history was, in fact, legal. To Jepsen's knowledge, nobody has reported this issue. However, at least one third-party test did use the counter checker in this way. The checker now throws when provided negative increments.
- generator.interpreter originally updated generators with the thread<->process mapping from after a process crashed. This meant that generators which tried to handle an update involving an :info op would be unable to determine what thread had been responsible for that operation. This was the root cause of a crash in independent/concurrent-generator, which worked fine... until a process crashed. The interpreter now provides the original thread<->process mapping when updating generators.
- core/run!, at the end of a test, would improperly run client/teardown! and client/close! on just a single client and node, rather than all clients. This caused Jepsen to gradually leak clients between multiple test runs. Jepsen now correctly tears down all clients at the end of a test.
- nemesis/compose, when given a collection (rather than a map) of nemeses, now correctly propagates teardown calls to those nemeses.
- core/run!, at the end of a test, incorrectly tore down the original nemesis, rather than the nemesis returned from calls to nemesis/open!. Since most nemeses return themselves, this didn't affect most tests--but it did prevent correctly tearing down nemeses which, say, construct state during setup!. Jepsen now tears down the correct nemesis.
- generator/stagger incorrectly assumed that it should start yielding operations at the start of every test, instead of when it was first asked for an operation. As a result, staggers evaluated later in a multi-phase test would "race" to catch up with operations they weren't able to perform earlier. They've learned to chill out now.
- nemesis.combined/compose-packages no longer crashes when constructing a final-generator from packages whose final generators weren't sequences.
New Features
- nemesis.membership: an experimental namespace which supports writing membership-changing nemeses and generators. Users provide an implementation of the nemesis.membership.state/State protocol: a mostly-pure structure which defines how to observe the state of the cluster on a specific node, merging those node views, generating operations, applying those operations to the cluster, and (since clusters often resolve membership changes asynchronously) deciding when those operations have been completed. Given this object, the membership system handles spawning threads to observe the cluster state, evolves the given state machine towards a fixed state over time, and provides a stateful nemesis and generator that work together to perform membership changes. The resulting package can be combined with other faults through nemesis.combined.
- client/Reusable: a new protocol which clients can implement, allows a client to indicate that its state can safely be re-used when it issues a {:type :info} operation. The default behavior remains to close the client and open a fresh one, but if your clients are safe to re-use, you can use this protocol to improve performance and reduce log chatter.
- jepsen.control.k8s allows jepsen.control to talk to Kubernetes nodes.
- db/tcpdump offers a :clients-only? option which restricts logging to only client traffic from the Jepsen control node.
- Jepsen now logs the GIT hash and command line used at the start of each test, which makes it easier to reproduce results.
- jepsen.generator.test provides functions to aid in writing unit tests for your own generators.
- jepsen.generator.concat provides a more explicit version of
[gen1, gen2]
.
API Changes
- none
Minor Changes
- The tutorial has been rewritten for Jepsen 0.2.1.
- util/test->str converts tests to strings. Tests often have many keys (which should all be printed) but infinite sequences in their generators (which should only print a few elements). This function resolves that tension by printing all keys, but specifically limiting the length when printing generators.
- jepsen.core/run! automatically wraps nemeses in a validator, which helps identify and explain some basic nemesis mistakes.
- util/fixed-point iterates a function over an initial value until converged.
- Expanded documentation for some generators.
- os.debian/install, following a deprecation warning in apt, now uses --allow-x flags instead of --force-yes flags.
- jepsen.control.net/control-ip returns the IP of the control node, as seen by db nodes.
- jepsen's own project.clj sets awt.headless=true, which is needed for compilation/running of Jepsen itself (not tests!) on headless environments.
0.2.0
This is a significant compatibility-breaking release--the second in Jepsen's history. It removes deprecation warnings and backwards-compatible code from 0.1.x. Most of these deprecated behaviors have had warnings for years now, with the notable exception of the new generator system.
The jepsen.generator
namespace has been completely rewritten with a new, purely functional generator system, which was previously available as jepsen.generator.pure
. Users of the classic generator system are strongly encouraged to read the jepsen.generator
namespace documentation: there are breaking changes, and identical code may result in significantly different histories. If you've already been using pure generators, the only change necessary is to switch the namespace you require
from jepsen.generator.pure
to jepsen.generator
. Once you've confirmed that your generators are good to go, add a :pure-generators true
flag to your test map. As a safety measure, Jepsen will automatically prompt you to review your generator code and refuse to run without this flag.
I know it's a lot, but it a.) makes writing generators significantly easier, and b.) fixes some truly awful deadlocks and stalls I had no way to work around.
Happy testing!
Bugfixes
- docker/up.sh had a minor bug introduced in a prior refactor, now addressed.
- jepsen.control.net/local-ip now supports the new format for ifconfig ip addresses.
- AOT compilation of jepsen.generator could fail due to some magic we're doing to extend a protocol to classes generated at runtime; hopefully resolved now.
- Pure generators, under low concurrency situations, would only emit operations for a small subset of processes. They now choose processes randomly.
New Features
- jepsen.cli now takes a --no-ssh option, which is helpful when running Jepsen against local systems, existing databases, or external APIs.
API changes
- jepsen.generator.pure is now jepsen.generator; the old stateful generator system has been removed.
- Generator contexts use a Bifurcan, rather than a Clojure, set to track for free processes.
- client.setup! no longer supports a deprecated 3-arity form; clients are expected to decouple open and setup.
- There is no longer a warning for using keyword
:nodes
in tests. - jepsen.control/scp*, long deprecated, is removed.
- jepsen.control.util/install-tarball!, renamed to install-archive! years ago, is gone.
- Deprecation compatibility wrappers in jepsen.client and jepsen.nemesis are no longer necessary and have been removed.
Minor Changes
- jepsen.core/run! re-uses the same clients we use for client setup for client teardown, preventing tests from crashing when clients can't be opened at the end of a test. I'm not entirely sure this is the right approach, but it does help. This needs more attention later.
- Generators wrap clients in a validation wrapper, which helps identify common mistakes with clients, and offers helpful error messages.
- jepsen.checker/check's docstring no longer discusses the migration path away from the old 5-arity form.
Dependencies
- byte-streams 0.2.5-alpha2
- clojure 1.10.1
- clj-time 0.15.2
- data.ressian 1.0.0
- elle 0.1.2
- fipp 0.6.23
- knossos 0.3.7
- ring 1.8.1
- tools.cli 1.0.194
- tools.logging 1.1.0
- unilog 0.7.25
0.1.19
0.1.19 is the last release, I think, in the 0.1.x series. It introduces a new generator namespace, jepsen.generator.pure, which is a preview of what jepsen.generator will become in the 0.2.x series. There are also a few bugfixes, library upgrades, and quality-of-life improvements, as usual.
Bugfixes
- Crashes with gnuplot and huge argument lists should be a thing of the past, thanks to some fixes in jepsen's gnuplot library.
- jepsen.cli won't try to print infinite sequences when logging tests.
- jepsen.tests.causal-reverse had an syntax error in its generator which I think might have led to arityexceptions? It's fixed now!
New Features
- There's now a docker-specific implementation of Remote, which means you don't need to use a jumpbox in a docker-compose environment.
- Client and generator validation layers might help catch some common mistakes when writing tests.
API changes
- jepsen.generator.pure is basically stable for writing production tests at this point. See the namespace docs for details.
- jepsen.generator.pure/delay-til is gone, replaced by a (now working!) implementation of
delay
. I have to take a few steps back and think carefully about what delay-til actually means. - Assorted test generators now yield a compatibility shim which provides both classic and pure generators in one. This should basically preserve compatibility, but if you're used to digging around in generator guts or dispatching on generator type, you may notice changes.
- nemesis.time allows passing functions to control which nodes are targeted.
- nemesis.combined chooses a random nonempty subset of primaries, rather than all primaries.
- nemesis.combined generators now stagger by default, rather than emitting events on a nice schedule.
- net: empty grudges are now legal, and do nothing, like you'd expect.
Minor Changes
- os.debian now installs tcpdump by default.
- We have Eastwood as a linter now.
Dependencies
- Elle 0.1.1
- Gnuplot 0.1.3
0.1.18
0.1.18 includes Elle, a new checker based on cycle detection. Elle was under development as jepsen.tests.cycle, and saw some use in previous tests; if you see issues with this upgrade related to missing functions in jepsen.tests.cycle.*, see the stubs we left in those namespaces, and Elle's source itself. This release also includes several minor (but pleasing!) quality-of-life improvements: a pre-packaged tcpdump wrapper, web interface affordances, more polite handling of various edge cases, and cleaner docker scripts.
Bugfixes
- Plot checkers no longer throw exceptions for empty histories, and instead silently produce no output.
- We no longer try to download logfiles that don't exist, which should cut down on noise at the end of tests.
- Debian now uses the
iproute2
package, which fixes a package-not-found error. - jepsen.os.debian no longer barfs when installed packages include an arch (e.g. foo:i386).
New Features
- Jepsen now includes Elle, a new library for checking transactional systems using cycle detection.
- jepsen.db/tcpdump: a DB which grabs tcpdump traces of other databases. Helpful for debugging wire activity when you don't trust your clients!
- Web interface: clicking the title of a test directory copies the full local path to the clipboard. Helpful for quickly getting a shell in the store directory, so you can use grep, less, etc.
- Web interface: there's a .zip link on each test page now, not just on the main list of tests.
- New CLI option --logging-json emits JSON-structured logs. Not bulletproof, but helpful.
API changes
- Much of jepsen.tests.cycle has been pulled out into Elle; we retained a few stubs with Jepsen-specific wrappers.
Minor Changes
- jepsen.store/load now returns vectors, rather than ArrayLists. This was a mild pain when writing checkers which assumed persistent data structures were a thing.
- jepsen.control.net/ip throws a meaningful error when it can't get an IP for a node.
- Assorted docker improvements
- jepsen.txn/ext-reads and ext-writes performance improvements
Dependencies
- Elle 0.1.0
- jepsen.txn 0.1.2
0.1.17
This is a very small feature release with support for some upcoming tests.
New Features
- jepsen.cli's test runner commands now accept a --leave-db-running flag, which leaves the database available for inspection at the end of a test run. Helpful for in-place debugging!
- jepsen.util/ex-root-cause extracts the root cause of an exception, which is useful for clients which like to wrap their errors in deep/unpredictable layers of exceptions.
0.1.16
This release focuses on making it easier to write complex tests with many types of failures and workloads. New protocols in jepsen.db provide hooks for killing, starting, pausing, and resuming databases, as well as identifying current primary nodes. A combined nemesis makes it easy to write test suites which hammer a system with random mixtures of faults, and a new test runner takes some of the busywork out of writing comprehensive test suites. There are no significant API changes, but there are several important bugfixes, including correctness fixes in the experimental jepsen.tests.cycle.append. These issues did not affect any Jepsen report, but may have led to false positives for other users.
Thanks, as always, to everyone who contributed patches and feedback. :)
New Features
- jepsen.cli/test-all-cmd: a CLI command for running a whole test suite in one pass, with unified error reporting.
- jepsen.nemesis.combined: A nemesis (and generator) which mixes process kills, pauses, clock skew, and network partitions. Faults, intervals, and target nodes are tunable. Also provides functions for composing nemesis+generator packages.
- jepsen.db/Primary: an optional protocol for databases that can identify primary nodes.
- jepsen.db/Pause: an optional protocol for databases that can be paused and resumed.
- jepsen.db/Process: an optional protocol for databases that can be killed and started.
- jepsen.generator/flip-flop: alternates between two generators.
Minor Changes
- Integration tests are now much quieter in their logging.
- checker.perf/plot! now includes output from gnuplot when throwing gnuplot-related exceptions.
- docker compose can now expose DB ports for inspection from your docker host.
- jepsen.control.util/grepkill can now take keyword signals, like :kill.
- jepsen.util/parse-long: it's long past time.
- jepsen.control.util/wget can now take usernames and passwords
Dependency Upgrades
- codox 0.10.7
- knossos 0.3.6
- gnuplot 0.1.2
- Fipp 0.6.14
Bugfixes
- jepsen.util/name+: fixed a bug where this function always used pr-str.
- jepsen.tests.cycle.append: fix a bug where internal consistency checks could compute incorrect expected orders after reading
nil
values. - jepsen.tests.cycle.append: we no longer incorrectly find duplicates and incompatible orders in aborted reads.
- jepsen.os.centos: update dpkg version used for installing start-stop-daemon.
- jepsen.nemesis: fixed a misleading error message which said it expected :type :ok, not :type :info