Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

find: design and implement a command for finding files in snapshots #1055

Closed
n8henrie opened this issue Feb 7, 2024 · 11 comments
Closed

find: design and implement a command for finding files in snapshots #1055

n8henrie opened this issue Feb 7, 2024 · 11 comments
Labels
A-cli Area: `rustic` command line interface A-commands Area: Related to commands in `rustic` A-ui-ux Area: Related to user interfaces and user experience C-enhancement Category: New feature or request

Comments

@n8henrie
Copy link

n8henrie commented Feb 7, 2024

According to https://rustic.cli.rs/docs/comparison-restic.html I see the find and mount commands are missing.

I see an existing issue for mount.

Is there any ongoing work or major blockers for the find command? I haven't used mount, but I've used restic's find command frequently.

@github-actions github-actions bot added the S-triage Status: Waiting for a maintainer to triage this issue/PR label Feb 7, 2024
@aawsome
Copy link
Member

aawsome commented Feb 7, 2024

Thanks @n8henrie for opening this issue!

Can you give some use cases where you are using find?

In fact, implementing it like it's done in restic would be very straightforward. The reason it isn't yet done is because I didn't simply want to reproduce what restic is doing, but wanted to have a much more featureful command which

  • can show history of files
  • can find identical files throughout the repository
  • is speed-optimized by being able to skip some subtrees in certain search-scenarios
    ...

But there is also nothing wrong with starting with a simple re-implementation! If you give more input, I can work on it or mentor anyone who is willing to start a PR!

@aawsome aawsome added C-enhancement Category: New feature or request A-cli Area: `rustic` command line interface A-commands Area: Related to commands in `rustic` A-ui-ux Area: Related to user interfaces and user experience and removed S-triage Status: Waiting for a maintainer to triage this issue/PR labels Feb 7, 2024
@n8henrie
Copy link
Author

n8henrie commented Feb 7, 2024

That's awesome! I'm AFK on mobile but will get back with an example soon, and I'd love to work on a PR with some mentorship!

I've been keeping an eye on Rustic specifically because I enjoy writing rust much more than go, but I'm definitely still a novice.

@simonsan simonsan changed the title FR: find command find: design and implement a command for finding files in snapshots Feb 8, 2024
@aawsome
Copy link
Member

aawsome commented Feb 25, 2024

@n8henrie Are you still interested?

@n8henrie
Copy link
Author

Yes I am! But I'm very busy with a few projects right now, will likely have some time in about a week.

If you'd like to close to keep the issue tracker groomed, I can bookmark this and prompt to reopen once I have time. Otherwise, if it's okay to leave open that's also fine of course.

Thanks for the nudge!

@n8henrie
Copy link
Author

Sorry this is taking so long, I am excited about the opportunity to contribute and especially the mentorship!

Can you give some use cases where you are using find?

I've used it multiple times to find snapshot IDs that contain a specific file of interest. For example, if I discover that I have inadvertently deleted or corrupted a file locally, but I'm not sure when that modification occurred, I may first restic find the file in question, then examine the resulting snapshots to help determine from which snapshot I should restore.

For example, I have a wrapper script called restic-cmd.sh that presets a few restic parameters based on hostname and prefills in my nas (where the restic repo sits):

sudo -u restic /backup/restic/bin/restic --repo /backup/restic/backup/linux find restic-cmd.sh
repository a4458c04 opened (repository version 2) successfully, password is correct
Found matching entries in snapshot 9df7ed13 from 2020-04-30 02:00:13
/home/n8henrie/git/restic-backup/restic-cmd.sh

Found matching entries in snapshot 6520434a from 2020-05-31 02:00:07
/home/n8henrie/git/restic-backup/restic-cmd.sh

Found matching entries in snapshot daad20e3 from 2018-11-30 02:00:29
/home/n8henrie/git/restic-backup/restic-cmd.sh

Found matching entries in snapshot 367be981 from 2020-03-31 02:00:13
/home/n8henrie/git/restic-backup/restic-cmd.sh

Found matching entries in snapshot bed5561e from 2019-08-31 02:00:41
/home/n8henrie/git/restic-backup/restic-cmd.sh
...

Searching my shell history, I've also used it in the past to:

  • restore ssh private keys that I deleted (whups!) but thankfully had backed up
  • look for evidence of files being backed up that I had hoped to exclude
  • look for evidence of files being backed up that I was afraid were being blocked by MacOS security / privacy permissions

For my future reference, it looks like this is the implementation in restic: https://github.com/restic/restic/blob/d1d773cfcd3115aecbbd6ad6be2bc5e11f395b29/cmd/restic/cmd_find.go#L77

@aawsome
Copy link
Member

aawsome commented Mar 22, 2024

@n8henrie Great that you are about to start - and no worry, we all work in our free time so delays from time to time are pretty common...

Do you have a Discord account? If we want to chat something more detailed, it might be a possibility to open a channel there.

Now, about your use-case: I think we actually have two use-cases:

  • find a file in the snapshots where the full path is known, e.g. /home/n8henrie/git/restic-backup/restic-cmd.sh
  • find all files in the snapshots which match some criteria like filename restic-cmd.sh. This could be extended in future by using wildcards / regular expressions etc. but could be also about metadata like "files greater than 50MiB" or "owned by user xyz". And maybe possibility to combine matching conditions....

I would suggest to you to start implementing the second one using just a simple traversal over snapshots and (nested) trees within the snapshots.

If you look at https://github.com/rustic-rs/rustic/blob/main/src/commands/snapshots.rs you see can how you can get the (already grouped) snapshots list and how to iterate over it (using a simple "for" loop is sufficient actually).

Now, if you already have a snapshot, take a look at https://github.com/rustic-rs/rustic/blob/main/src/commands/ls.rs which uses Repository.ls to traverse a given tree - this can of course also be the whole snapshot. It returns path and all metadata, so you can use that to decide if an entry matches or not and then print it.

Hope this helps for a start, but please don't hesitate to ask if you need any more details about one point!

@aawsome
Copy link
Member

aawsome commented Mar 22, 2024

Just some comments why I think the first one is a different use case:

When we are searching for a given full path, we know that this can only be zero or exactly one match per repository. This allows things like showing not only all "matching" snapshots, but also a more detailed history of this full path (like when did metadata change or when did the content change). Also we can heavily optimize here: if there is no home tree in a snapshot you can directly skip it without needing to traverse all trees.

@n8henrie
Copy link
Author

Interesting thoughts, thanks for your prompt response!

Yes, I have discord but don't use it terribly often -- I'll join.

Thanks for the pointers, I've been perusing the codebase for the last hour or two to try to get my bearings. Lots of existing examples of searching the repo a file by id (I assume this is a pack?) -- was getting lost in weeds trying to figure out how to dig down into packs -> blobs -> reconstruct to file paths.

You're right to direct me to snapshot.rs -- that looks much closer to what I was expecting, I can model that I think.

Yes, I agree that the find command definitely has potential to expand to much greater functionality -- though I'd like to start small to make sure I can get my bearings! Would also plan to add some {unit, integration} tests.

we know that this can only be zero or exactly one match per repository

I think you mean per snapshot -- in which case I agree, there seems to be much greater room for optimization (depending on how things are organized in the backend -- it sounds like they are indeed organized into a tree) which should allow for quite a bit of optimization.

@aawsome
Copy link
Member

aawsome commented Apr 30, 2024

closing this as it has been implemented in #1136

@aawsome aawsome closed this as completed Apr 30, 2024
@n8henrie
Copy link
Author

Thank you! As mentioned I'll look for other opportunities to contribute.

@aawsome
Copy link
Member

aawsome commented Apr 30, 2024

@n8henrie Thanks a lot for your offer! An easy way to start with small contributions (besides opening issues which is also warmly welcome!) is to improve the docu: https://github.com/rustic-rs/docs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-cli Area: `rustic` command line interface A-commands Area: Related to commands in `rustic` A-ui-ux Area: Related to user interfaces and user experience C-enhancement Category: New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants