Async API refactor #298

Jarema · 2022-01-10T08:22:12Z

Jarema
Jan 10, 2022
Maintainer

Async Refactor

Current async code for NATS Client contained in asynk module was a simple and easy way to enable async code in Rust NATS Client.
It was a compromise in a few regards that affects our users.

This discussion is created to align on what are the options and how we want to push the topic forward.

Current approach and its limits

asynk module is based on blocking crate, which is a simple thread pool providing async semantics.

Our way of using it leads to a few issues:

without setting env variable, it has available only 500 threads
each subscription occupies one thread

The above is not only inefficient but also can lead to unexpected blocking. If the limit of the thread pool is reached, new subscriptions will be hanged until other subscriptions are shutdown.

That raised quite a few tickets (#226)

Requirements for the solution

As this is a NATS Client, it has few requirements to satisfy:

Does not lock out any async ecosystem users (tokio, async-std are the mains concerns, but it would be great to not rule out any other niche runtimes)
provides also sync implementation for no-runtime purely sync scenarios
is efficient (i.e. no two runtimes coexistent at the same time) as Rust NATS client has to be efficient
Satisfying the above requirements does not introduce major code duplication

API for flavors

There are different ways to provide users way to choose what flavor to use (sync,tokio,async-std)

separate crate for each
nats option/method for each flavor
separate modules for each flavor
feature defining flavor

From all the above feature flavor seems to be most idiomatic.
It is used by major Rust crates.

async-std example (used to setup compatibility mode):

[dependencies.async-std]
version = "1.7.0"
features = ["tokio1"]

Unless better options will be proposed or some blockers will arise during implementation, this direction will be taken.

Solution options

There are a few ways we can implement the new API.

1. Async rewrite

Rewrite the whole library to async.

That simplifies the architecture of the client that also makes it "native" for users that are async (is it safe to assume that most use cases of this lib are async?).

pros

was already explored here
simple architecture (top-down async with sync wrapper for sync API)
first iteration (async for native runtime) provides straightforward async API for the whole crate
minimal code duplication

cons

single-socket clients should be slower in async (benchmarks needed)
question of how to provide runtime agnostic is not answered out of the box and would probably be done in iterations.
requires rewrite/heavy refactor of proto handling. Introduces a lot of uncertainty.
it's not immune to architecture issues mentioned in the second option and those most likely will have to be addressed.

For the latter probably compatibility mode of runtime could be used, but that often creates two runtime executors (at least it seems so in the case of async-std - to be confirmed) which
is something we want to avoid. Another approach is to have a separate executor for tasks and make it swappable, though that's not that straightforward as TCPSocket has to be setup per runtime (if compat mode spawns the second runtime), though viable.

2. Sync core, async/sync API

This approach leverages the current sync API for TCPSocket and NATS proto handling but exposes both sync and async API to the users.

pros

sync proto and TCP is already in place so smaller scope and bugs potential
sync should be faster (needs benchmarks)
minimal code duplication

cons

more complex architecture (sync-async not only at API level)
might actually encourage refactor of proto to enable clean sync-async "phase-shifts"
requires some brainstorming around exposing both sync and async API (by either having some duplication or having async API wrapped in block_on)

One of the unwanted issues with this approach is the potential temptation to do the refactor of proto code to enable cleaner sync-async boundaries and better performance.
That is because the current codebase has a potential for lock contention (especially after introducing client-side slow consumers detection) and also has multiple communication patterns between API and proto - messages are sent via channels, everything else is function calls.
Leveraging simple actor model and embracing Go's

Do not communicate by sharing memory; instead, share memory by communicating

Such an approach might be needed to get better sync performance if the amount of locking is an actual factor (to be shown be benchmarks).

Nonetheless, this approach should have a higher performance limit ceiling.

3. Implementation per flavor

This approach is simplest of all but introduces so much code duplication that will cripple the velocity of Rust NATS clients that
it's not worth considering unless all above will have so many issues, that this will become last resort option.

Conclusion

The API approach seems to have a clear winner that requires consensus across NATS org. None of API options conflict with an actual solution.
There are two solutions as starting points for the discussion, both viable, both with their pros and cons.

This is just starting point for the discussion. If you know better approaches, don't hesitate to share them. The same applies to feedback and correction of everything said above

@caspervonb @derekcollison

stevelr · 2022-01-10T20:34:45Z

stevelr
Jan 10, 2022

If there were a way to maintain the protocol logic separately from the IO code, it would remove one of the major cons of option 1.
Here's a pattern that I have used in the past to create a synchronous api and an asynchronous api with only one implementation of the protocol logic (pseudocode)

// This enum holds the TCPSocket, Stream, or other low-level io primitive
#[derive(Clone)]
enum IOType {
    Sync(SyncIO),
    Async(AsyncIO),
}

#[derive(Clone)]
struct Connection {
    inner: IOType,
    /* ... */
}

/// This is the public asynchronous api
/// and where the protocols are implemented
impl Connection {
    pub fn new(io: AsyncIO) -> Self {
        Connection { inner: IOType::Async(io) }
    }
    pub async fn send(args) -> Result<..> {
        // attempt to put as much proto stuff in these methods
        // that don't directly use IO
        // then defer to inner_send to do the IO
        self.inner_send(args)
    }

    async fn inner_send(args) -> Result<..> {
        match self.inner.borrow() {
            IOType::Async(ref c) => {
                // async io
            },
            IOType::Sync(ref c) => {
                // blocking io
            }
        }
    }
}

/// public synchronous api that wraps async api
struct SyncConnection {
    inner: Connection,
}

impl SyncConnection {
    pub fn new(io: SyncIO) -> Self {
        SyncConnection {
            inner: Connection{ inner: IOType::Sync(io)}
        }
    }

    pub fn send(args) -> Result<..> {
        let this = self.inner.clone();
        let handle = tokio::runtime::Handle::current();
        handle.block_on(async move { this.send(args).await })
    }
}

An angle for further exploration: It may be possible to use this pattern to preserve the flexibility of doing synchronous IO inside the async api - if you decide that that's preferred for performance - since the async vs sync IO is abstracted at a layer below the public api.

3 replies

stevelr Jan 10, 2022

The use case for the above sample is an RpcClient in wasmcloud, where the enum variants wrap nats::Connection, nats::asynk::Connection, and a ratsio NatsClient. If there were a Nats team supported crate that supported pure async, RpcClient could dispense with the enum entirely and convert everything to use the async variant.

stevelr Jan 10, 2022

If this pattern is applicable, it might not require a feature flag to distinguish between sync and async flavors: both could be available simultaneously with different rust modules. I expect that it would still be necessary to use feature flags to isolate runtime-specific flavors, such as tokio vs async-std.

Jarema Jan 11, 2022
Maintainer Author

For separation of runtimes flavors feature flags will be IMHO best option.

Sync-async - might not be necessary, but still better just for sake of consistency.

stevelr · 2022-01-10T21:36:42Z

stevelr
Jan 10, 2022

Another aspect of the api that is affected by async is callbacks (anywhere a handler or closure is used as a paraemeter).

Subscription::with_handler works if the contents of the handler are sync:

let sub = nc.subscribe("bar").await?
        .with_handler(move |msg| { println!("Received {}", &msg); Ok(()) });

If you need to send responses in the callback, it needs to be an async closure:

let sub = nc.subscribe("foo").await?
      .with_async_handler( async move |m| { m.respond("ans=42").await?; Ok(()) });

(Of course, a public async api could support both with_handler and with_async_handler)

Error callbacks and other callbacks in Options also need to be addressed.
This is bit trickier than Subscription handler in rust async,
since the Options struct has to contain Futures, which need a static lifetime.
My first implementation attempt uses traits, which are painfully verbose compared
to the sync version:

struct PrintCallback{
    msg: String,
}
impl AsyncCall for PrintCallback {
    fn call(&self) -> BoxFuture<()> {
        let msg = self.msg.clone();
        Box::pin(async move {
             println!("{}", self.msg);
        })
    }
}
#[tokio::main]
async fn main() -> std::io::Result<()> {
    let nc = Options::new()
        .close_callback(PrintCallback{msg:"connection has been closed".to_string()})
        .connect("demo.nats.io").await?;
    nc.drain().await.unwrap();
    Ok(())
}

I'm sure there are ways to make this more ergonomic. Perhaps macros and additional library support can help.

The Subscription preprocessor can also be done with a trait:

/// Subscribe to a subject with a message preprocessor.
pub(crate) async fn subscribe_with_preprocessor<'a>(
    &self,
    subject: &str,
    queue_group: Option<&str>,
    message_processor: Pin<Box<dyn Preprocessor>>,
) -> io::Result<(u64, crate::subscription::SubscriptionReceiver<Message>)> {
    // ...
}

where

pub(crate) trait Preprocessor: Send + Sync {
    fn process<'proc>(&'proc self, sid: u64, msg: &'proc Message) -> BoxFuture<'proc, bool>;
}

In this case I don't think the ergonomics are a problem
because it's a private api and there are only two implementations.

1 reply

Jarema Jan 11, 2022
Maintainer Author

Thank you, thats very useful info!

stevelr · 2022-01-10T22:02:03Z

stevelr
Jan 10, 2022

^ some thoughts above @derekcollison @Jarema

Oops, buried the lead: Thank you @Jarema for kicking off the discussion. I'm glad your team is making this issue a priority.

My humble contribution to the discussion Edit: same as link in initial post

One of the known-unknowns in this discussion is performance of switching to async. If you already have a multi-threaded, multi-connection benchmark test and you want quick idea on performance implications of a hypothetical top-to-bottom async rust client, you should be able to plug that library into the benchmark with very little effort. I haven't done any performance benchmarking or performance tuning. As far as I know, it's very close to functionally complete (for an async-only and tokio-only nats client), but it hasn't had real-world usage yet.

1 reply

stevelr Jan 11, 2022

Latest release notes are in CHANGELOG-async

MattesWhite · 2022-01-11T07:17:38Z

MattesWhite
Jan 11, 2022

I'm with @stevelr in the point that a stringer separation of protocol and IO will make maintaining an sync and async NATS client way easier. But I would suggest to separate the current crate into several for this purpose:

nats-proto: Containing the majority of business logic, e.g. parsing ServerOp from byte slices up to subscribing and publishing, in an IO agnostic way.
nats: Sync implementation of the NATS client,
async-nats: Async implementation of the NATS client.
jetstream-proto: Again NATS client agnostic JetStream logic
jetstream: The jetstream stuff that is built on the NATS client.
async-jetstream: Maybe in the future

This comes with several benefits:

Separation of logic keeps the code more clean
Compile times will be reduced as not always the whole crate needs to be checked
Users can pick what they want
Like tokio and other big crate ecosystem everything can stay in one repo

Of course there are negative points as well:

Requires up-front refactoring before the actual async refactoring can be started
More work to define the 'interfaces' between the crates
More work to keep several crates up to date

0 replies

Jarema · 2022-01-28T11:17:44Z

Jarema
Jan 28, 2022
Maintainer Author

Some basic benchmarks:

@stevelr async rewrite:

binary size of basic app using steve fork: 3.1M
binary size of basic app using sync client: 2.6M

subscribe stats for 10 000 000 messages:

nats bench --msgs 10 000 000 --pub 10 "events.>"
Pub stats: 1,013,726 msgs/sec ~ 123.75 MB/sec
processed in: 9.9993355s

publish stats for 10 000 000 messages:

total time: 40.756654458s

two instances of async nats, one pub, one sub:

10 000 000 messages,
total time: 96.833736625s

nats.rs

subscribe stats for 10 000 000 messages:

nats bench --msgs 10000000 --pub 10 "events.>"
processed in: 7.680134166s

publish stats for 10 000 000 messages:

total time: 887.570083ms

two instances of sync nats, one pub, one sub:

10 000 000 messages,
processed in: 6.150885125s

The above shows that subscriptions are pretty close. Publishes on the other hand - have some evident slowdowns.
I'm pretty sure that this is related to the nature of rewrite which is experimental and not optimized, not the nature of async being that slower.
The difference between subscription time can be probably also reduced.

NOTE: the sync publisher was so fast, that the subscriber was dropping some messages (less then 1%)

0 replies

stevelr · 2022-02-08T21:25:51Z

stevelr
Feb 8, 2022

Looking forward to the results of your architecture investigation and any info you can share about availability of async client library.

IMHO Jetstream apis could come later if that makes it easier to get an alpha release out for testing. It would be useful with basic pub sub and client auth (jwt/seed).

0 replies

caspervonb · 2022-02-09T07:20:23Z

caspervonb
Feb 9, 2022
Collaborator

I did some measurements as-well recently based on Thomasz's test benches last week:

Currently sketching out a new API to be async from the bottom up without all the baggage, will share some more details soon.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Async API refactor #298

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 7 comments 5 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Async API refactor #298

Jarema Jan 10, 2022 Maintainer

Async Refactor

Current approach and its limits

Requirements for the solution

API for flavors

Solution options

1. Async rewrite

pros

cons

2. Sync core, async/sync API

pros

cons

3. Implementation per flavor

Conclusion

Replies: 7 comments · 5 replies

Jarema Jan 11, 2022 Maintainer Author

Jarema Jan 11, 2022 Maintainer Author

Jarema Jan 28, 2022 Maintainer Author

@stevelr async rewrite:

subscribe stats for 10 000 000 messages:

publish stats for 10 000 000 messages:

two instances of async nats, one pub, one sub:

nats.rs

subscribe stats for 10 000 000 messages:

publish stats for 10 000 000 messages:

two instances of sync nats, one pub, one sub:

caspervonb Feb 9, 2022 Collaborator

Jarema
Jan 10, 2022
Maintainer

Replies: 7 comments 5 replies

Jarema Jan 11, 2022
Maintainer Author

Jarema Jan 11, 2022
Maintainer Author

Jarema
Jan 28, 2022
Maintainer Author

caspervonb
Feb 9, 2022
Collaborator