From 3a7e3ca02124598430e6f3baaf4afcbd001365b8 Mon Sep 17 00:00:00 2001 From: "John T. Wodder II" Date: Sun, 14 Jul 2024 08:19:35 -0400 Subject: [PATCH 1/6] [skip ci] Document program architecture --- CHANGELOG.md | 1 + doc/architecture.md | 199 ++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 200 insertions(+) create mode 100644 doc/architecture.md diff --git a/CHANGELOG.md b/CHANGELOG.md index 0e8d0b7..039b0f7 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -4,6 +4,7 @@ In Development - Added doc comments to much of the code - Return 502 status when a backend returns an invalid response - Require `--api-url` (and other URLs retrieved from APIs) to be HTTP(S) +- Added various developer-facing documents to the repository v0.4.0 (2024-07-09) ------------------- diff --git a/doc/architecture.md b/doc/architecture.md new file mode 100644 index 0000000..24c8037 --- /dev/null +++ b/doc/architecture.md @@ -0,0 +1,199 @@ +Overview of `dandidav` Architecture +=================================== + +*This document is up-to-date as of 2024 July 14.* + +> [!NOTE] +> A new architecture is currently being planned for the code. See [issue +> #67](https://github.com/dandi/dandidav/issues/67) for more information. + + +General +------- + +- `dandidav` is implemented as an [`axum`](https://github.com/tokio-rs/axum) + server in which almost all request-handling is done by a `Service`. (We need + to use a `Service` rather than `axum`'s normal method routers because the + latter do not support WebDAV's `PROPFIND` verb.) The service itself is + defined at [`src/main.rs`, lines 116-122][service-fn], and it simply passes + requests to the [`DandiDav::handle_request()`][handle-request] method for the + actual handling. + + - The only requests not handled by the service are those for the CSS + stylesheet at `/.static/styles.css`, which is not WebDAV-enabled and thus + should not support `PROPFIND` or return the WebDAV-specific headers + present in all other responses. + +- If any error occurs during the processing of a request, it will almost always + "bubble up" to `DandiDav::handle_request()`, which will log the error and + convert it into either a 404 response or a 500 response, as appropriate. + + - Panics (which are handled by `axum`) should only ever occur if there is + an actual bug in the code. + +- Most `GET` and `PROPFIND` requests to `dandidav` require fetching & + displaying information about both the resource identified by the request URL + and that resource's immediate children (if it has any); the exception is + `PROPFIND` requests with a `Depth` header of `0`, for which information about + child resources is not fetched. In order to support both types of requests + efficiently, many methods & types come in two variants: one variant with the + suffix `_with_children()`/`WithChildren` for fetching or representing a + resource and its children, and one variant without the suffix for fetching or + representing just the resource itself. + +- File sizes are represented as `i64` instead of `u64` for compatibility with + the official Rust AWS S3 SDK. + + +`DandiDav` +---------- + +- The [`DandiDav`][] type is responsible for handling WebDAV and plain HTTP + requests by: + + - Parsing the request URL path into a [`DavPath`][] describing the type of + resource the URL refers to and the resource's parameters + + - Splitting a path under a Dandiset version of the form + `{asset_path}/{zarr_resource_path}` is not done at this stage; that + is performed by the `DandiClient`. + + - Making calls to a `DandiClient` or `ZarrManClient` instance as + appropriate to fetch information about the given resource and possibly + its child resources + + - Rendering the resulting information in the form appropriate to the + resource and request method + +- The entry point to `DandiDav`'s functionality is its + [`handle_request()`][handle-request] method, which calls + `DandiDav::inner_handle_request()` and then performs error handling and + setting of universal response headers on the result. + +- [`DandiDav::inner_handle_request()`][] parses the request and passes the + results to either `DandiDav::get()` or `DandiDav::propfind()`, depending on + the request verb. + +- [`DandiDav::get()`][] and [`DandiDav::propfind()`][] both call + `DandiDav::get_resource_with_children()` to fetch information about the + requested resource and, if that resource is a collection, its immediate + children; as an exception, if a `PROPFIND` request was received with `Depth: + 0`, `DandiDav::propfind()` will instead call `DandiDav::get_resource()`, + which does not obtain information about child resources. `DandiDav::get()` + and `DandiDav::propfind()` then display the obtained information in different + ways: `DandiDav::get()` renders collections as HTML tables and + non-collections as redirects (or, for `dandiset.yaml`, serialized YAML), + while `DandiDav::propfind()` always returns a "multistatus" WebDAV XML + document. + + - When rendering information about blob assets (either on their own or + within a parent resource), a choice must be made about what URL to + provide as the download URL. The `contentUrl` metadata field of a blob + asset is assumed to contain two download URLs, one pointing to S3 and one + pointing to an Archive API endpoint that redirects to a signed version of + the S3 URL that sets a filename in the `Content-Disposition` header. The + latter type of URL is generally more desirable, as it means that any + users downloading the URL via a browser will end up with a file with the + same filename as the asset, while a direct, unsigned S3 URL will be + downloaded as a file with the same filename as the S3 key, which is + usually the blob ID and not user-friendly. Hence, by default, a `GET` + request to `dandidav` for a blob asset will be responded to with a + redirect to the Archive API download URL. Unfortunately, certain WebDAV + clients (i.e., [davfs2](https://savannah.nongnu.org/bugs/?65376)) do not + support WebDAV servers that return redirects to other redirects, so the + `--prefer-s3-redirects` CLI option was added to `dandidav` to instead + make these `GET` requests redirect directly to the unsigned S3 URLs. + + - Note that HTML listings of a collection's children will always link + blob assets to their Archive API download URLs, regardless of + `--prefer-s3-redirects`, as these listings are only returned for + `GET` requests to a collection, which WebDAV clients don't do. + (These listings link blob assets and Zarr entries directly to + download URLs rather than `dandidav` resource URLs because doing the + latter would result in subsequent requests to `dandidav` that just + duplicate work.) + + - Note that this does not apply to download URLs for Zarr entries, as + their S3 URLs are assumed to always have the same filename as the + entry, and thus signing to add a `Content-Disposition` is + unnecessary. + +- [`DandiDav::get_resource()`][] and + [`DandiDav::get_resource_with_children()`][] perform method calls on a + `DandiClient` or `ZarrManClient` instance (as appropriate) belonging to the + `DandiDav` instance in order to fetch information about the specified + resource(s). + + +`DandiClient` +------------- + +- The [`DandiClient`][] type is used to retrieve information about resources on + an Archive instance (the one specified via the `--api-url` option on program + invocation) via a combination of the Archive's API and (for entries & folders + inside Zarrs) requests to an S3 bucket. It is the data source for the + `/dandisets/` hierarchy served by `dandidav`. + + - Information about child resources of Zarrs needs to be fetched via S3 + directly rather than using the Archive API because the latter currently + does not provide an efficient way to list folders within a Zarr. + +- The `DandiClient` API is modeled somewhat after + [`octocrab`](https://github.com/XAMPPRocky/octocrab): To operate on an API + resource, an endpoint object is requested from the `DandiClient` (possibly + indirectly via a super-endpoint), and the endpoint's methods are called to + obtain one or more objects that provide information about the resource or its + children, but these resource objects are distinct from the endpoint objects + and do not have any methods for making API calls themselves. + +- The code assumes Zarr assets have a `contentUrl` that points to the Zarr's + entries on an S3 bucket; specifically, the first element of `contentUrl` that + can be parsed by [`S3Location::parse_url()`][s3loc-parse] into a bucket, + optional region, and key prefix is used as such. It is then assumed that the + Zarr's entries are laid out under the given key prefix (after appending a + trailing slash if one is not already present) on the given bucket with the + same names & directory structure as the actual Zarr. + + - If a given Zarr asset lacks an S3 `contentUrl`, any requests to a path + under that Zarr will result in a 500 response. + + - If an S3 URL does not specify the bucket region, it is determined via a + `HEAD` request to the domain. The S3 client caching (see below) means + that this will almost certainly be done at most once per `dandidav` + process. + + - The S3 client used to fetch information about Zarr entries from a given + bucket is cached in the `DandiClient`. While it is almost certain that + all Zarrs in the same Archive instance will use the same bucket and thus + the same client, the possibility of multiple buckets is guarded against + by using a cache of up to [`S3CLIENT_CACHE_SIZE`][] clients. + + +`ZarrManClient` +--------------- + +The [`ZarrManClient`][] type is used to retrieve information about Zarr entries +by fetching Zarr manifests from via +the mirror at . It is the +data source for the `/zarrs/` hierarchy served by `dandidav`. + +See the documentation for the `dandidav::zarrman` module for more information. + + +[service-fn]: https://github.com/dandi/dandidav/blob/00d0714a88c28737f2d648a5dd57d37568ac0f0a/src/main.rs#L116-L122 + +[`DandiDav`]: https://github.com/dandi/dandidav/blob/8d058fe0e561e56ecd3d4c5cd49ca9403b0d196a/src/dav/mod.rs#L37 +[handle-request]: https://github.com/dandi/dandidav/blob/8d058fe0e561e56ecd3d4c5cd49ca9403b0d196a/src/dav/mod.rs#L71 +[`DandiDav::get()`]: https://github.com/dandi/dandidav/blob/8d058fe0e561e56ecd3d4c5cd49ca9403b0d196a/src/dav/mod.rs#L129 +[`DandiDav::propfind()`]: https://github.com/dandi/dandidav/blob/8d058fe0e561e56ecd3d4c5cd49ca9403b0d196a/src/dav/mod.rs#L165 +[`DandiDav::inner_handle_request()`]: https://github.com/dandi/dandidav/blob/8d058fe0e561e56ecd3d4c5cd49ca9403b0d196a/src/dav/mod.rs#L95 +[`DandiDav::get_resource()`]: https://github.com/dandi/dandidav/blob/8d058fe0e561e56ecd3d4c5cd49ca9403b0d196a/src/dav/mod.rs#L216 +[`DandiDav::get_resource_with_children()`]: https://github.com/dandi/dandidav/blob/8d058fe0e561e56ecd3d4c5cd49ca9403b0d196a/src/dav/mod.rs#L272 + +[`DavPath`]: https://github.com/dandi/dandidav/blob/8d058fe0e561e56ecd3d4c5cd49ca9403b0d196a/src/dav/path.rs#L8 + +[`DandiClient`]: https://github.com/dandi/dandidav/blob/8d058fe0e561e56ecd3d4c5cd49ca9403b0d196a/src/dandi/mod.rs#L27 +[s3loc-parse]: https://github.com/dandi/dandidav/blob/8d058fe0e561e56ecd3d4c5cd49ca9403b0d196a/src/s3/mod.rs#L176 +[`S3CLIENT_CACHE_SIZE`]: https://github.com/dandi/dandidav/blob/8d058fe0e561e56ecd3d4c5cd49ca9403b0d196a/src/consts.rs#L24-L25 + +[`ZarrManClient`]: https://github.com/dandi/dandidav/blob/e9be2dd15ba95d760912344cd09c2a1a08da89b2/src/zarrman/mod.rs#L51 From e3784f343c883e6344277a4508a6fbfd1a9341a7 Mon Sep 17 00:00:00 2001 From: "John T. Wodder II" Date: Sun, 14 Jul 2024 17:01:55 -0400 Subject: [PATCH 2/6] Mention `cargo doc` command for browsing documentation --- doc/architecture.md | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/doc/architecture.md b/doc/architecture.md index 24c8037..743f23c 100644 --- a/doc/architecture.md +++ b/doc/architecture.md @@ -7,6 +7,13 @@ Overview of `dandidav` Architecture > A new architecture is currently being planned for the code. See [issue > #67](https://github.com/dandi/dandidav/issues/67) for more information. +This document is best augmented by the doc comments in the `dandidav` source. +You can view the doc comments in rendered form in a browser by running the +following inside a clone of this repository: + +```console +cargo doc --open --document-private-items --no-deps +``` General ------- From 20bffc7200f8311676ecd6aaa9fa36fb74999d9c Mon Sep 17 00:00:00 2001 From: "John T. Wodder II" Date: Fri, 26 Jul 2024 08:57:11 -0400 Subject: [PATCH 3/6] Updates based on review comments --- doc/architecture.md | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/doc/architecture.md b/doc/architecture.md index 743f23c..6453867 100644 --- a/doc/architecture.md +++ b/doc/architecture.md @@ -32,8 +32,9 @@ General present in all other responses. - If any error occurs during the processing of a request, it will almost always - "bubble up" to `DandiDav::handle_request()`, which will log the error and - convert it into either a 404 response or a 500 response, as appropriate. + "bubble up" to [`DandiDav::handle_request()`][handle-request], which will log + the error and convert it into either a 404 response or a 500 response, as + appropriate. - Panics (which are handled by `axum`) should only ever occur if there is an actual bug in the code. @@ -48,6 +49,10 @@ General resource and its children, and one variant without the suffix for fetching or representing just the resource itself. + - Requests with a `Depth` header of `infinity` (including missing `Depth` + headers) are replied to immediately with a 403 response; this is handled + by [the extraction of the `Depth` header value][extract-depth]. + - File sizes are represented as `i64` instead of `u64` for compatibility with the official Rust AWS S3 SDK. @@ -108,8 +113,9 @@ General redirect to the Archive API download URL. Unfortunately, certain WebDAV clients (i.e., [davfs2](https://savannah.nongnu.org/bugs/?65376)) do not support WebDAV servers that return redirects to other redirects, so the - `--prefer-s3-redirects` CLI option was added to `dandidav` to instead - make these `GET` requests redirect directly to the unsigned S3 URLs. + `--prefer-s3-redirects` CLI option was added to `dandidav` (and is + currently used by the webdav.dandiarchive.org deployment) to instead make + these `GET` requests redirect directly to the unsigned S3 URLs. - Note that HTML listings of a collection's children will always link blob assets to their Archive API download URLs, regardless of @@ -188,6 +194,7 @@ See the documentation for the `dandidav::zarrman` module for more information. [service-fn]: https://github.com/dandi/dandidav/blob/00d0714a88c28737f2d648a5dd57d37568ac0f0a/src/main.rs#L116-L122 +[extract-depth]: https://github.com/dandi/dandidav/blob/9b9b04872065b8132657b878bad324b2dff68a97/src/dav/util.rs#L99-L111 [`DandiDav`]: https://github.com/dandi/dandidav/blob/8d058fe0e561e56ecd3d4c5cd49ca9403b0d196a/src/dav/mod.rs#L37 [handle-request]: https://github.com/dandi/dandidav/blob/8d058fe0e561e56ecd3d4c5cd49ca9403b0d196a/src/dav/mod.rs#L71 From 4809dcc1f2f8b1a89d4cd7ace8a2e6c96d99219a Mon Sep 17 00:00:00 2001 From: "John T. Wodder II" Date: Fri, 26 Jul 2024 10:25:14 -0400 Subject: [PATCH 4/6] Update for #175 --- doc/architecture.md | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/doc/architecture.md b/doc/architecture.md index 6453867..9e3aaaf 100644 --- a/doc/architecture.md +++ b/doc/architecture.md @@ -1,7 +1,7 @@ Overview of `dandidav` Architecture =================================== -*This document is up-to-date as of 2024 July 14.* +*This document is up-to-date as of 2024 July 26.* > [!NOTE] > A new architecture is currently being planned for the code. See [issue @@ -33,8 +33,7 @@ General - If any error occurs during the processing of a request, it will almost always "bubble up" to [`DandiDav::handle_request()`][handle-request], which will log - the error and convert it into either a 404 response or a 500 response, as - appropriate. + the error and convert it into a 404, 500, or 502 response, as appropriate. - Panics (which are handled by `axum`) should only ever occur if there is an actual bug in the code. @@ -168,7 +167,7 @@ General same names & directory structure as the actual Zarr. - If a given Zarr asset lacks an S3 `contentUrl`, any requests to a path - under that Zarr will result in a 500 response. + under that Zarr will result in a 502 response. - If an S3 URL does not specify the bucket region, it is determined via a `HEAD` request to the domain. The S3 client caching (see below) means From 00d69680e1b8ba76f608ef5d6b582caf3e25a48c Mon Sep 17 00:00:00 2001 From: "John T. Wodder II" Date: Fri, 26 Jul 2024 11:25:59 -0400 Subject: [PATCH 5/6] [skip ci] Expand `ZarrManClient` documentation --- doc/architecture.md | 31 +++++++++++++++++++++++++++---- 1 file changed, 27 insertions(+), 4 deletions(-) diff --git a/doc/architecture.md b/doc/architecture.md index 9e3aaaf..aaacb54 100644 --- a/doc/architecture.md +++ b/doc/architecture.md @@ -185,11 +185,31 @@ General --------------- The [`ZarrManClient`][] type is used to retrieve information about Zarr entries -by fetching Zarr manifests from via -the mirror at . It is the -data source for the `/zarrs/` hierarchy served by `dandidav`. +by fetching *Zarr manifests* from via +a mirror at (the *manifest +tree*); see [`doc/zarrman.md`](zarrman.md) for information on the Zarr manifest +format and server API. The client is the data source for the `/zarrs/` +hierarchy served by `dandidav`. -See the documentation for the `dandidav::zarrman` module for more information. +`ZarrManClient` has three public methods exposed for use by `DandiDav`: + +- [`get_top_level_dirs()`][zm-top-level] — for listing the folders at the top + of the `/zarrs/` hierarchy +- [`get_resource()`][zm-res] — for obtaining details on a resource at a given + path underneath `/zarrs/` +- [`get_resource_with_children()`][zm-res-with-child] — for obtaining details + on a resource and its children at a path underneath `/zarrs/` + +When listing the contents of a directory resource that directly corresponds to +a directory in the manifest tree (i.e., a directory directly or indirectly +containing Zarrs/Zarr manifests), an HTTP request is made to the manifest tree +server to obtain the listing. + +When listing the contents of a Zarr (either the top level or a descendant +subdirectory), the Zarr's manifest is consulted to discover the resources at +the given location to return. Zarr manifests are initially obtained via an +HTTP request to the manifest tree server and are afterwards cached until +implementation-defined expiry criteria are met. [service-fn]: https://github.com/dandi/dandidav/blob/00d0714a88c28737f2d648a5dd57d37568ac0f0a/src/main.rs#L116-L122 @@ -210,3 +230,6 @@ See the documentation for the `dandidav::zarrman` module for more information. [`S3CLIENT_CACHE_SIZE`]: https://github.com/dandi/dandidav/blob/8d058fe0e561e56ecd3d4c5cd49ca9403b0d196a/src/consts.rs#L24-L25 [`ZarrManClient`]: https://github.com/dandi/dandidav/blob/e9be2dd15ba95d760912344cd09c2a1a08da89b2/src/zarrman/mod.rs#L51 +[zm-res]: https://github.com/dandi/dandidav/blob/28d4b5b8a6ad3adca4ae8771480143ac9bcb7c89/src/zarrman/mod.rs#L109 +[zm-res-with-child]: https://github.com/dandi/dandidav/blob/28d4b5b8a6ad3adca4ae8771480143ac9bcb7c89/src/zarrman/mod.rs#L159 +[zm-top-level]: https://github.com/dandi/dandidav/blob/28d4b5b8a6ad3adca4ae8771480143ac9bcb7c89/src/zarrman/mod.rs#L100 From f8355a5f91da794b460d9d457d564e5aebb36f73 Mon Sep 17 00:00:00 2001 From: "John T. Wodder II" Date: Thu, 8 Aug 2024 11:17:10 -0400 Subject: [PATCH 6/6] [skip ci] Improve reference to doc comments Also, the `--document-private-items` option is already implied for binaries. --- doc/architecture.md | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-) diff --git a/doc/architecture.md b/doc/architecture.md index aaacb54..1b62629 100644 --- a/doc/architecture.md +++ b/doc/architecture.md @@ -7,13 +7,11 @@ Overview of `dandidav` Architecture > A new architecture is currently being planned for the code. See [issue > #67](https://github.com/dandi/dandidav/issues/67) for more information. -This document is best augmented by the doc comments in the `dandidav` source. -You can view the doc comments in rendered form in a browser by running the -following inside a clone of this repository: - -```console -cargo doc --open --document-private-items --no-deps -``` +This document is an overview — and only an overview — of how most of the code +in `dandidav` fits together. For a fuller understanding of the code, peruse +the source code documentation; you can view the documentation in rendered form +in a web browser by running `cargo doc --open --no-deps` inside a clone of this +repository. General -------