Skip to content

Commit de5b7c8

Browse files
committed
Async Rust ADR
1 parent 7de354d commit de5b7c8

8 files changed

+1843
-0
lines changed

docs/adr/0009-async-rust.md

Lines changed: 204 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,204 @@
1+
# Using Async Rust
2+
3+
* Status: draft
4+
* Deciders:
5+
* Date: ???
6+
* Previous discussion: https://github.com/mozilla/application-services/pull/5910
7+
8+
## Context and Problem Statement
9+
10+
Our Rust components are currently written using synchronous Rust, however all current consumers wrap them in another class to present an async-style interface:
11+
12+
* Firefox Android uses `CoroutineScope.runBlocking` to adapt methods to be `suspend`
13+
* Firefox iOs mostly uses `DispatchQueue` and completion handlers, but the long-term plan is to transition to `async` functions.
14+
Some components already do this, using `withCheckedThrowingContinuation` on top of `DispatchQueue` to define async functions.
15+
* Firefox Desktop auto-generates C++ wrapper code to make them async using a [TOML config file](https://searchfox.org/mozilla-central/rev/cdfe21b20eacfaa6712dd9821d6383859ce386c6/toolkit/components/uniffi-bindgen-gecko-js/config.toml).
16+
This was chosen because JS is single-threaded and doesn't provide a simple way to run blocking functions in a task queue.
17+
One drawback from this choice is that Desktop has a very different async system compared to the mobile apps.
18+
19+
This presents some issues for Rust component development:
20+
21+
* Async is handled differently in JS vs Kotlin+Swift, which can be confusing for developers.
22+
* Core component functionality ends up being implemented in wrapper code which means it needs to be implemented multiple times or some platforms will be missing features.
23+
For example, the Kotlin functionality to interrupt pending `query()` calls would also be useful on Swift, but we never implemented that .
24+
* The code lives in other repos even though it logically belongs in app-services.
25+
This makes it more likely to go stale and be forgotten about.
26+
* It makes our documentation system worse. For example, our docs for [query()](https://mozilla.github.io/application-services/kotlin/kotlin-components-docs/mozilla.appservices.suggest/-suggest-store/query.html) say it's a sync function, but it practice it's actually async.
27+
28+
With the new UniFFI async capabilities, it's possible to move the async code into Rust and avoid this wrapper layer.
29+
This ADR discusses if we should do this, how we could implement it, and what our general approach to async should be.
30+
To make things concrete, the Suggest component is used as an example of what would change if we moved to async Rust.
31+
32+
### Scope
33+
34+
This ADR covers the question of using a wrapper layer to implant async functionality.
35+
It does not cover some related questions:
36+
37+
* **Scheduling this work.**
38+
If we decide to embrace async Rust, we do not need to commit to any particular deadline for switching to it.
39+
* **Wrappers in general.**
40+
[The previous PR](https://github.com/mozilla/application-services/pull/5910/) dedicated a separate ADR for this question, but we never came to a decision on this.
41+
It seems that there's no single answer to the question, the general consensus was that we should evaluate things on a case-by-case basis and this ADR will just focus on the case of async.
42+
* **Wrappers in consumer code.**
43+
If we change things so that the current async wrapping layer is no longer needed, consumer engineers will still have a choice on if they want to keep their current wrapper layer or not.
44+
This choice should be left to consumer engineers.
45+
This ADR will touch on this, but not recommend anything.
46+
47+
## Considered Options
48+
49+
* **Option A: Keep the Rust code sync, but move it.**
50+
* Keep the Rust code sync, but move the wrapper code closer to the Rust code to address the code-ownership issues.
51+
* Move most functionality from the wrapper code into Rust.
52+
For example, error-handling code and the code to interrupt pending queries could all happen in Rust.
53+
This makes it harder for developers to forget about this functionality when writing Rust code.
54+
* The wrapper code would be responsible for wrapping sync calls to be async and basically nothing else.
55+
* **Option B: Async Rust using the current UniFFI.**
56+
Make our component methods async by requiring the foreign code to pass in an interface that runs tasks in a worker queue and using
57+
`oneshot::Channel` to communicate the results back. See below for how this would work in practice.
58+
This allows us to switch to async Rust code without making any changes to the UniFFI core.
59+
* **Option C: Async Rust with additional UniFFI support.**
60+
Like `B`, but make `WorkerQueue` and `RustTask` built-in UniFFI types.
61+
This would eliminate the need to define our own interfaces for these, instead UniFFI would allow foreign code to passing in `DispatchQueue`/`CoroutineScopes` to Rust and Rust could use those to run blocking tasks in a work queue.
62+
* **Option D: Extend the uniffi-bindgen-gecko-js config system to Mobile.**
63+
Extend the Gecko-JS system, where the config specifies that functions/methods should be wrapped as async, to also work for Kotlin/Swift.
64+
65+
### Example code
66+
67+
I (Ben) made some diffs to illustrate how the code would change for each option.
68+
When doing that, I realized the wrapper layer was actually implementing important functionality for the component:
69+
70+
* The Kotlin code extended our interrupt support to also interrupt pending `query()` calls.
71+
* The Kotlin code also catches all errors and coverts them into regular returns (`query()` returns an empty list and `ingest()` returns false).
72+
* The Swift code split async methods into 2 categories: low-priority calls like `ingest()` and high-priority calls like `query()`
73+
74+
As part of the example changes, I moved this functionality to our Rust components.
75+
This results in some extra code in the diffs, but I thought it was good to explore the messy details of this transition.
76+
A important factor for deciding this ADR is where we want this functionality to live.
77+
78+
#### Option B
79+
- [app-services changes](./files/0009/option-b-app-services.diff)
80+
- [android-components changes](./files/0009/option-b-android-components.diff)
81+
- [Firefox iOS changes](./files/0009/option-b-firefox-ios.diff)
82+
- [Firefox Desktop changes](./files/0009/option-b-firefox-desktop.diff)
83+
84+
This option is possible to implement today, so I was able to test that the app-services and android-components changes actually compiled
85+
I didn't do that for iOS and desktop, mostly because it's harder to perform a local build.
86+
I think we can be confident that the actual changes would look similar to this.
87+
88+
Summary of changes:
89+
* Added the `WorkerQueue` and `RustTask` interfaces
90+
* `RustTask` encapsulates a single task
91+
* `WorkerQueue` is implemented by the foreign side and runs a `RustTask` in a work queue where it's okay to block.
92+
* Use the above interfaces to wrap sync calls to be async, by running them in the `WorkerQueue` then sending the result via a `oneshot::Channel` that the
93+
original async function was `await`ing.
94+
* This is also a good place to do the error conversion/reporting.
95+
* SuggestStoreBuilder gets a `build_async` method, which creates an `SuggestStoreAsync`.
96+
It inputs 2 `WorkerQueue`s: one for ingest and one for everything else.
97+
* Added supporting code so that `query()` can create it's `SqlInterruptScope` ahead of time, outside of the scheduled task.
98+
That way `interrupt()` can also interrupt pending calls to `query()`.
99+
* Updated the `ingest()` method to catch errors and return `false` rather than propagating them.
100+
In general, I think this is a better API for consumers since there's nothing meaningful they can do with errors other than report them.
101+
`query()` should probably also be made infallible.
102+
* Removed the wrapper class from `android-components`, but kept the wrapper class in `firefox-ios`.
103+
The goal here was to show the different options for consumer code.
104+
105+
#### Option C
106+
- [app-services changes](./files/0009/option-c-app-services.diff)
107+
- [android-components changes](./files/0009/option-c-android-components.diff)
108+
- [Firefox iOS changes](./files/0009/option-c-firefox-ios.diff)
109+
- [Firefox Desktop changes](./files/0009/option-c-firefox-desktop.diff)
110+
111+
This option assumes that UniFFI will provide something similar to `WorkerQueue`, so we don't need to define/implement that in app-services or the consumer repos.
112+
This requires changes to UniFFI core, so none of this code works today.
113+
However, I think we can be fairly confident that these changes will work since we have a long-standing [UniFFI PR](https://github.com/mozilla/uniffi-rs/pull/1837) that implements a similar feature -- in fact it's a more complex version.
114+
115+
Summary of changes: essentially the same as `B`, but we don't need to define/implement the `WorkerQueue` and `RustTask` interfaces.
116+
117+
#### Option D
118+
119+
I'm not completely sure how this one would work in practice, but I assume that it would mean TOML configuration for Kotlin/Swift similar to the current Desktop configuration:
120+
121+
```
122+
[suggest.async_wrappers]
123+
# All functions/methods are wrapped to be async by default and must be `await`ed.
124+
enable = true
125+
# These are exceptions to the async wrapping. These functions must not be `await`ed.
126+
main_thread = [
127+
"raw_suggestion_url_matches",
128+
"SuggestStore.new",
129+
"SuggestStore.interrupt",
130+
"SuggestStoreBuilder.new",
131+
"SuggestStoreBuilder.data_path",
132+
"SuggestStoreBuilder.load_extension",
133+
"SuggestStoreBuilder.remote_settings_bucket_name",
134+
"SuggestStoreBuilder.remote_settings_server",
135+
"SuggestStoreBuilder.build",
136+
]
137+
```
138+
139+
This would auto-generate async wrapper code.
140+
For example, the Kotlin code would look something like this:
141+
142+
```
143+
class SuggestStore {
144+
145+
/**
146+
* Queries the database for suggestions.
147+
*/
148+
suspend fun query(query: SuggestionQuery): List<Suggestion> =
149+
withContext(Dispatchers.IO) {
150+
// ...current auto-generated code here
151+
}
152+
```
153+
154+
## Decision Outcome
155+
156+
## Pros and Cons of the Options
157+
158+
### (A) Keep the Rust code sync, but move it into app-services.
159+
160+
* Good, because it requires the least amount of work
161+
* Good, because it's proven to work
162+
* Good, because it clarifies the code ownership and makes it harder for us to forget about the functionality in the wrapper code.
163+
* Good, because sync code can be easier to understand than async code.
164+
* Bad, because abstracting async processes as sync ones can cause developers to miss details.
165+
For example, it's easy to forget about pending `query()` calls if `SuggestStore` methods are called synchronously in your mental model.
166+
* Bad, because async will still be handled differently in JS vs Kotlin+Swift.
167+
* Bad, because of the impact on documentation.
168+
169+
### (B) Async Rust using the current UniFFI
170+
171+
* Good, because we'll have a common system for Async that works similarly all platforms
172+
* Good, because our generated docs will match how the methods are used in practice.
173+
* Good, because it encourages us to move async complexity into the core code.
174+
This makes it available to all platforms and more likely to be maintained.
175+
* Good because it opens the door for more efficient thread usage.
176+
For example, we could make methods more fine-grained and only use the work queue for SQL operations, but not for network requests.
177+
* Bad, because we're taking on risk by introducing the async UniFFI code to app-services.
178+
* Bad, because our consumers need to define all a `WorkerQueue` implementations, which is a bit of a burden.
179+
This feels especially bad on JS, where the whole concept of a work queue feels alien.
180+
* Bad, because it makes it harder to provide bindings on new languages that don't support async, like C and C++.
181+
Maybe we could bridge the gap with some sort of callback-based async system, but it's not clear how that would work.
182+
183+
### (C) Async Rust with additional UniFFI support
184+
185+
* Good/Bad/Risky for mostly the same reasons as (B).
186+
* Good, because it removes the need for us and consumers to define `WorkerQueue` traits/impls.
187+
* Good, because it can simplify the `WorkerQueue` code.
188+
In particular, we can guarantee that the task is only executed once, which removes the need for `RustTaskContainer`.
189+
* Bad, because we'll have to maintain the UniFFI worker queue code.
190+
191+
### (D) Extend the uniffi-bindgen-gecko-js config system to Mobile
192+
193+
* Good, because we'll have a common system for Async that works similarly all platforms
194+
* Mostly good, because our generated docs will match how the methods are used in practice.
195+
However, it could be weird to write docstrings for sync functions that are wrapped to be async.
196+
* Good, because it's less Risky than B/C.
197+
The system would just be auto-generating the kind of wrappers we already used.
198+
* Bad, because it's hard for consumer engineers to configure.
199+
For example, how can firefox-ios devs pick which QOS they want to use with a `DispatchQueue`?
200+
They would probably have to specify it in the config file, which is less convent than passing it in from code.
201+
* Bad, because it's not clear how we could handle complex cases like using both a low-priority and high-priority queue.
202+
203+
## Decision Outcome
204+
?

0 commit comments

Comments
 (0)