Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete files that don't match inventory items #117

Merged
merged 25 commits into from
Jan 14, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,11 @@ In Development
- Add `--list-dates` option
- The `<outdir>` command-line argument is now optional and defaults to the
current directory
- The `--inventory-jobs` and `--object-jobs` options have been eliminated in
favor of a new `--jobs` option
- Files & directories in the backup tree that are not listed in the inventory
are deleted
- Increased MSRV to 1.81

v0.1.0-alpha.2 (2025-01-06)
---------------------------
Expand Down
73 changes: 33 additions & 40 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 5 additions & 3 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
name = "s3invsync"
version = "0.1.0-alpha.2"
edition = "2021"
rust-version = "1.80"
rust-version = "1.81"
description = "AWS S3 Inventory-based backup tool with efficient incremental & versionId support"
authors = [
"DANDI Developers <[email protected]>",
Expand All @@ -24,22 +24,24 @@ aws-smithy-async = "1.2.3"
aws-smithy-runtime-api = "1.7.3"
clap = { version = "4.5.26", default-features = false, features = ["derive", "error-context", "help", "std", "suggestions", "usage", "wrap_help"] }
csv = "1.3.1"
either = "1.13.0"
flate2 = "1.0.35"
fs-err = { version = "3.0.0", features = ["tokio"] }
futures-util = "0.3.31"
futures-util = { version = "0.3.31", default-features = false, features = ["std"] }
hex = "0.4.3"
lockable = "0.1.1"
md-5 = "0.10.6"
memory-stats = "1.2.0"
percent-encoding = "2.3.1"
pin-project-lite = "0.2.16"
regex = "1.11.1"
serde = { version = "1.0.217", features = ["derive"] }
serde_json = "1.0.135"
strum = { version = "0.26.3", features = ["derive"] }
tempfile = "3.15.0"
thiserror = "2.0.11"
time = { version = "0.3.37", features = ["macros", "parsing"] }
tokio = { version = "1.43.0", features = ["macros", "rt-multi-thread", "signal"] }
tokio = { version = "1.43.0", features = ["macros", "rt-multi-thread", "signal", "sync"] }
tokio-util = { version = "0.7.13", features = ["rt"] }
tracing = "0.1.41"
tracing-subscriber = { version = "0.3.19", features = ["local-time", "time"] }
Expand Down
13 changes: 6 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
[![Project Status: WIP – Initial development is in progress, but there has not yet been a stable, usable release suitable for the public.](https://www.repostatus.org/badges/latest/wip.svg)](https://www.repostatus.org/#wip)
[![CI Status](https://github.com/dandi/s3invsync/actions/workflows/test.yml/badge.svg)](https://github.com/dandi/s3invsync/actions/workflows/test.yml)
[![codecov.io](https://codecov.io/gh/dandi/s3invsync/branch/main/graph/badge.svg)](https://codecov.io/gh/dandi/s3invsync)
[![Minimum Supported Rust Version](https://img.shields.io/badge/MSRV-1.80-orange)](https://www.rust-lang.org)
[![Minimum Supported Rust Version](https://img.shields.io/badge/MSRV-1.81-orange)](https://www.rust-lang.org)
[![MIT License](https://img.shields.io/github/license/dandi/s3invsync.svg)](https://opensource.org/licenses/MIT)

[GitHub](https://github.com/dandi/s3invsync) | [Issues](https://github.com/dandi/s3invsync/issues) | [Changelog](https://github.com/dandi/s3invsync/blob/main/CHANGELOG.md)
Expand Down Expand Up @@ -92,7 +92,9 @@ When downloading a given key from S3, the latest version (if not deleted) is
stored at `{outdir}/{key}`, and the versionIds and etags of all latest object
versions in a given directory are stored in `.s3invsync.versions.json` in that
directory. Each non-latest, non-deleted version of a given key is stored at
`{outdir}/{key}.old.{versionId}.{etag}`.
`{outdir}/{key}.old.{versionId}.{etag}`. Any other files or directories under
`<outdir>` that do not correspond to an object listed in the inventory are
deleted.

Options
-------
Expand All @@ -110,8 +112,8 @@ Options
inventory for the given date is used) or in the format `YYYY-MM-DDTHH-MMZ`
(to specify a specific inventory).

- `-I <INT>`, `--inventory-jobs <INT>` — Specify the maximum number of inventory
list files to download & process at once [default: 20]
- `-J <INT>`, `--jobs <INT>` — Specify the maximum number of concurrent
download jobs [default: 20]

- `--list-dates` — List available inventory manifest dates instead of
backing anything up
Expand All @@ -120,9 +122,6 @@ Options
Possible values are "`ERROR`", "`WARN`", "`INFO`", "`DEBUG`", and "`TRACE`"
(all case-insensitive). [default value: `DEBUG`]

- `-O <INT>`, `--object-jobs <INT>` — Specify the maximum number of inventory
entries to download & process at once [default: 20]

- `--path-filter <REGEX>` — Only download objects whose keys match the given
[regular expression](https://docs.rs/regex/latest/regex/#syntax)

Expand Down
4 changes: 4 additions & 0 deletions src/consts.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
/// The name of the file in which metadata (version ID and etag) are stored for
/// the latest versions of objects in each directory
pub(crate) static METADATA_FILENAME: &str = ".s3invsync.versions.json";

/// The number of initial bytes of an inventory csv.gz file to fetch when
/// peeking at just the first entry
pub(crate) const CSV_GZIP_PEEK_SIZE: usize = 1024;
25 changes: 25 additions & 0 deletions src/inventory/item.rs
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
use crate::keypath::KeyPath;
use crate::s3::S3Location;
use crate::util::make_old_filename;
use time::OffsetDateTime;

/// An entry in an inventory list file
Expand All @@ -9,6 +10,16 @@
Item(InventoryItem),
}

impl InventoryEntry {
/// Returns the entry's key
pub(crate) fn key(&self) -> &str {
match self {
InventoryEntry::Directory(Directory { key, .. }) => key,
InventoryEntry::Item(InventoryItem { key, .. }) => key.as_ref(),

Check warning on line 18 in src/inventory/item.rs

View check run for this annotation

Codecov / codecov/patch

src/inventory/item.rs#L15-L18

Added lines #L15 - L18 were not covered by tests
}
}

Check warning on line 20 in src/inventory/item.rs

View check run for this annotation

Codecov / codecov/patch

src/inventory/item.rs#L20

Added line #L20 was not covered by tests
}

/// An entry in an inventory list file pointing to a directory object
#[derive(Clone, Debug, Eq, PartialEq)]
pub(crate) struct Directory {
Expand Down Expand Up @@ -60,6 +71,20 @@
S3Location::new(self.bucket.clone(), String::from(&self.key))
.with_version_id(self.version_id.clone())
}

/// Returns whether the object is a delete marker
pub(crate) fn is_deleted(&self) -> bool {
self.details == ItemDetails::Deleted
}

Check warning on line 78 in src/inventory/item.rs

View check run for this annotation

Codecov / codecov/patch

src/inventory/item.rs#L76-L78

Added lines #L76 - L78 were not covered by tests

/// If the object is not a delete marker and is not the latest version of
/// the key, return the base filename at which it will be backed up.
pub(crate) fn old_filename(&self) -> Option<String> {
let ItemDetails::Present { ref etag, .. } = self.details else {
return None;

Check warning on line 84 in src/inventory/item.rs

View check run for this annotation

Codecov / codecov/patch

src/inventory/item.rs#L82-L84

Added lines #L82 - L84 were not covered by tests
};
(!self.is_latest).then(|| make_old_filename(self.key.name(), &self.version_id, etag))
}

Check warning on line 87 in src/inventory/item.rs

View check run for this annotation

Codecov / codecov/patch

src/inventory/item.rs#L86-L87

Added lines #L86 - L87 were not covered by tests
}

/// Metadata about an object's content
Expand Down
Loading
Loading