|
| 1 | +# Using Async Rust |
| 2 | + |
| 3 | +* Status: draft |
| 4 | +* Deciders: |
| 5 | +* Date: ??? |
| 6 | +* Previous discussion: https://github.com/mozilla/application-services/pull/5910 |
| 7 | + |
| 8 | +## Context and Problem Statement |
| 9 | + |
| 10 | +Our Rust components are currently written using synchronous Rust, however all current consumers wrap them in another class to present an async-style interface: |
| 11 | + |
| 12 | +* Firefox Android uses `CoroutineScope.runBlocking` to adapt methods to be `suspend` |
| 13 | +* Firefox iOs mostly uses `DispatchQueue` and completion handlers, but the long-term plan is to transition to `async` functions. |
| 14 | + Some components already do this, using `withCheckedThrowingContinuation` on top of `DispatchQueue` to define async functions. |
| 15 | +* Firefox Desktop auto-generates C++ wrapper code to make them async using a [TOML config file](https://searchfox.org/mozilla-central/rev/cdfe21b20eacfaa6712dd9821d6383859ce386c6/toolkit/components/uniffi-bindgen-gecko-js/config.toml). |
| 16 | + This was chosen because JS is single-threaded and doesn't provide a simple way to run blocking functions in a task queue. |
| 17 | + One drawback from this choice is that Desktop has a very different async system compared to the mobile apps. |
| 18 | + |
| 19 | +This presents some issues for Rust component development: |
| 20 | + |
| 21 | +* Async is handled differently in JS vs Kotlin+Swift, which can be confusing for developers. |
| 22 | +* Core component functionality ends up being implemented in wrapper code which means it needs to be implemented multiple times or some platforms will be missing features. |
| 23 | + For example, the Kotlin functionality to interrupt pending `query()` calls would also be useful on Swift, but we never implemented that . |
| 24 | +* The code lives in other repos even though it logically belongs in app-services. |
| 25 | + This makes it more likely to go stale and be forgotten about. |
| 26 | +* It makes our documentation system worse. For example, our docs for [query()](https://mozilla.github.io/application-services/kotlin/kotlin-components-docs/mozilla.appservices.suggest/-suggest-store/query.html) say it's a sync function, but it practice it's actually async. |
| 27 | + |
| 28 | +With the new UniFFI async capabilities, it's possible to move the async code into Rust and avoid this wrapper layer. |
| 29 | +This ADR discusses if we should do this, how we could implement it, and what our general approach to async should be. |
| 30 | +To make things concrete, the Suggest component is used as an example of what would change if we moved to async Rust. |
| 31 | + |
| 32 | +### Scope |
| 33 | + |
| 34 | +This ADR covers the question of using a wrapper layer to implant async functionality. |
| 35 | +It does not cover some related questions: |
| 36 | + |
| 37 | +* **Scheduling this work.** |
| 38 | + If we decide to embrace async Rust, we do not need to commit to any particular deadline for switching to it. |
| 39 | +* **Wrappers in general.** |
| 40 | + [The previous PR](https://github.com/mozilla/application-services/pull/5910/) dedicated a separate ADR for this question, but we never came to a decision on this. |
| 41 | + It seems that there's no single answer to the question, the general consensus was that we should evaluate things on a case-by-case basis and this ADR will just focus on the case of async. |
| 42 | +* **Wrappers in consumer code.** |
| 43 | + If we change things so that the current async wrapping layer is no longer needed, consumer engineers will still have a choice on if they want to keep their current wrapper layer or not. |
| 44 | + This choice should be left to consumer engineers. |
| 45 | + This ADR will touch on this, but not recommend anything. |
| 46 | + |
| 47 | +## Considered Options |
| 48 | + |
| 49 | +* **Option A: Keep the Rust code sync, but move it.** |
| 50 | + * Keep the Rust code sync, but move the wrapper code closer to the Rust code to address the code-ownership issues. |
| 51 | + * Move most functionality from the wrapper code into Rust. |
| 52 | + For example, error-handling code and the code to interrupt pending queries could all happen in Rust. |
| 53 | + This makes it harder for developers to forget about this functionality when writing Rust code. |
| 54 | + * The wrapper code would be responsible for wrapping sync calls to be async and basically nothing else. |
| 55 | +* **Option B: Async Rust using the current UniFFI.** |
| 56 | + Make our component methods async by requiring the foreign code to pass in an interface that runs tasks in a worker queue and using |
| 57 | +`oneshot::Channel` to communicate the results back. See below for how this would work in practice. |
| 58 | + This allows us to switch to async Rust code without making any changes to the UniFFI core. |
| 59 | +* **Option C: Async Rust with additional UniFFI support.** |
| 60 | + Like `B`, but make `WorkerQueue` and `RustTask` built-in UniFFI types. |
| 61 | + This would eliminate the need to define our own interfaces for these, instead UniFFI would allow foreign code to passing in `DispatchQueue`/`CoroutineScopes` to Rust and Rust could use those to run blocking tasks in a work queue. |
| 62 | +* **Option D: Extend the uniffi-bindgen-gecko-js config system to Mobile.** |
| 63 | + Extend the Gecko-JS system, where the config specifies that functions/methods should be wrapped as async, to also work for Kotlin/Swift. |
| 64 | + |
| 65 | +### Example code |
| 66 | + |
| 67 | +I (Ben) made some diffs to illustrate how the code would change for each option. |
| 68 | +When doing that, I realized the wrapper layer was actually implementing important functionality for the component: |
| 69 | + |
| 70 | +* The Kotlin code extended our interrupt support to also interrupt pending `query()` calls. |
| 71 | +* The Kotlin code also catches all errors and coverts them into regular returns (`query()` returns an empty list and `ingest()` returns false). |
| 72 | +* The Swift code split async methods into 2 categories: low-priority calls like `ingest()` and high-priority calls like `query()` |
| 73 | + |
| 74 | +As part of the example changes, I moved this functionality to our Rust components. |
| 75 | +This results in some extra code in the diffs, but I thought it was good to explore the messy details of this transition. |
| 76 | +A important factor for deciding this ADR is where we want this functionality to live. |
| 77 | + |
| 78 | +#### Option B |
| 79 | + - [app-services changes](./files/0009/option-b-app-services.diff) |
| 80 | + - [android-components changes](./files/0009/option-b-android-components.diff) |
| 81 | + - [Firefox iOS changes](./files/0009/option-b-firefox-ios.diff) |
| 82 | + - [Firefox Desktop changes](./files/0009/option-b-firefox-desktop.diff) |
| 83 | + |
| 84 | +This option is possible to implement today, so I was able to test that the app-services and android-components changes actually compiled |
| 85 | +I didn't do that for iOS and desktop, mostly because it's harder to perform a local build. |
| 86 | +I think we can be confident that the actual changes would look similar to this. |
| 87 | + |
| 88 | +Summary of changes: |
| 89 | +* Added the `WorkerQueue` and `RustTask` interfaces |
| 90 | + * `RustTask` encapsulates a single task |
| 91 | + * `WorkerQueue` is implemented by the foreign side and runs a `RustTask` in a work queue where it's okay to block. |
| 92 | +* Use the above interfaces to wrap sync calls to be async, by running them in the `WorkerQueue` then sending the result via a `oneshot::Channel` that the |
| 93 | + original async function was `await`ing. |
| 94 | + * This is also a good place to do the error conversion/reporting. |
| 95 | +* SuggestStoreBuilder gets a `build_async` method, which creates an `SuggestStoreAsync`. |
| 96 | + It inputs 2 `WorkerQueue`s: one for ingest and one for everything else. |
| 97 | +* Added supporting code so that `query()` can create it's `SqlInterruptScope` ahead of time, outside of the scheduled task. |
| 98 | + That way `interrupt()` can also interrupt pending calls to `query()`. |
| 99 | +* Updated the `ingest()` method to catch errors and return `false` rather than propagating them. |
| 100 | + In general, I think this is a better API for consumers since there's nothing meaningful they can do with errors other than report them. |
| 101 | + `query()` should probably also be made infallible. |
| 102 | +* Removed the wrapper class from `android-components`, but kept the wrapper class in `firefox-ios`. |
| 103 | + The goal here was to show the different options for consumer code. |
| 104 | + |
| 105 | +#### Option C |
| 106 | + - [app-services changes](./files/0009/option-c-app-services.diff) |
| 107 | + - [android-components changes](./files/0009/option-c-android-components.diff) |
| 108 | + - [Firefox iOS changes](./files/0009/option-c-firefox-ios.diff) |
| 109 | + - [Firefox Desktop changes](./files/0009/option-c-firefox-desktop.diff) |
| 110 | + |
| 111 | +This option assumes that UniFFI will provide something similar to `WorkerQueue`, so we don't need to define/implement that in app-services or the consumer repos. |
| 112 | +This requires changes to UniFFI core, so none of this code works today. |
| 113 | +However, I think we can be fairly confident that these changes will work since we have a long-standing [UniFFI PR](https://github.com/mozilla/uniffi-rs/pull/1837) that implements a similar feature -- in fact it's a more complex version. |
| 114 | + |
| 115 | +Summary of changes: essentially the same as `B`, but we don't need to define/implement the `WorkerQueue` and `RustTask` interfaces. |
| 116 | + |
| 117 | +#### Option D |
| 118 | + |
| 119 | +I'm not completely sure how this one would work in practice, but I assume that it would mean TOML configuration for Kotlin/Swift similar to the current Desktop configuration: |
| 120 | + |
| 121 | +``` |
| 122 | +[suggest.async_wrappers] |
| 123 | +# All functions/methods are wrapped to be async by default and must be `await`ed. |
| 124 | +enable = true |
| 125 | +# These are exceptions to the async wrapping. These functions must not be `await`ed. |
| 126 | +main_thread = [ |
| 127 | + "raw_suggestion_url_matches", |
| 128 | + "SuggestStore.new", |
| 129 | + "SuggestStore.interrupt", |
| 130 | + "SuggestStoreBuilder.new", |
| 131 | + "SuggestStoreBuilder.data_path", |
| 132 | + "SuggestStoreBuilder.load_extension", |
| 133 | + "SuggestStoreBuilder.remote_settings_bucket_name", |
| 134 | + "SuggestStoreBuilder.remote_settings_server", |
| 135 | + "SuggestStoreBuilder.build", |
| 136 | +] |
| 137 | +``` |
| 138 | + |
| 139 | +This would auto-generate async wrapper code. |
| 140 | +For example, the Kotlin code would look something like this: |
| 141 | + |
| 142 | +``` |
| 143 | +class SuggestStore { |
| 144 | +
|
| 145 | + /** |
| 146 | + * Queries the database for suggestions. |
| 147 | + */ |
| 148 | + suspend fun query(query: SuggestionQuery): List<Suggestion> = |
| 149 | + withContext(Dispatchers.IO) { |
| 150 | + // ...current auto-generated code here |
| 151 | + } |
| 152 | +``` |
| 153 | + |
| 154 | +## Decision Outcome |
| 155 | + |
| 156 | +## Pros and Cons of the Options |
| 157 | + |
| 158 | +### (A) Keep the Rust code sync, but move it into app-services. |
| 159 | + |
| 160 | +* Good, because it requires the least amount of work |
| 161 | +* Good, because it's proven to work |
| 162 | +* Good, because it clarifies the code ownership and makes it harder for us to forget about the functionality in the wrapper code. |
| 163 | +* Good, because sync code can be easier to understand than async code. |
| 164 | +* Bad, because abstracting async processes as sync ones can cause developers to miss details. |
| 165 | + For example, it's easy to forget about pending `query()` calls if `SuggestStore` methods are called synchronously in your mental model. |
| 166 | +* Bad, because async will still be handled differently in JS vs Kotlin+Swift. |
| 167 | +* Bad, because of the impact on documentation. |
| 168 | + |
| 169 | +### (B) Async Rust using the current UniFFI |
| 170 | + |
| 171 | +* Good, because we'll have a common system for Async that works similarly all platforms |
| 172 | +* Good, because our generated docs will match how the methods are used in practice. |
| 173 | +* Good, because it encourages us to move async complexity into the core code. |
| 174 | + This makes it available to all platforms and more likely to be maintained. |
| 175 | +* Good because it opens the door for more efficient thread usage. |
| 176 | + For example, we could make methods more fine-grained and only use the work queue for SQL operations, but not for network requests. |
| 177 | +* Bad, because we're taking on risk by introducing the async UniFFI code to app-services. |
| 178 | +* Bad, because our consumers need to define all a `WorkerQueue` implementations, which is a bit of a burden. |
| 179 | + This feels especially bad on JS, where the whole concept of a work queue feels alien. |
| 180 | +* Bad, because it makes it harder to provide bindings on new languages that don't support async, like C and C++. |
| 181 | + Maybe we could bridge the gap with some sort of callback-based async system, but it's not clear how that would work. |
| 182 | + |
| 183 | +### (C) Async Rust with additional UniFFI support |
| 184 | + |
| 185 | +* Good/Bad/Risky for mostly the same reasons as (B). |
| 186 | +* Good, because it removes the need for us and consumers to define `WorkerQueue` traits/impls. |
| 187 | +* Good, because it can simplify the `WorkerQueue` code. |
| 188 | + In particular, we can guarantee that the task is only executed once, which removes the need for `RustTaskContainer`. |
| 189 | +* Bad, because we'll have to maintain the UniFFI worker queue code. |
| 190 | + |
| 191 | +### (D) Extend the uniffi-bindgen-gecko-js config system to Mobile |
| 192 | + |
| 193 | +* Good, because we'll have a common system for Async that works similarly all platforms |
| 194 | +* Mostly good, because our generated docs will match how the methods are used in practice. |
| 195 | + However, it could be weird to write docstrings for sync functions that are wrapped to be async. |
| 196 | +* Good, because it's less Risky than B/C. |
| 197 | + The system would just be auto-generating the kind of wrappers we already used. |
| 198 | +* Bad, because it's hard for consumer engineers to configure. |
| 199 | + For example, how can firefox-ios devs pick which QOS they want to use with a `DispatchQueue`? |
| 200 | + They would probably have to specify it in the config file, which is less convent than passing it in from code. |
| 201 | +* Bad, because it's not clear how we could handle complex cases like using both a low-priority and high-priority queue. |
| 202 | + |
| 203 | +## Decision Outcome |
| 204 | +? |
0 commit comments