Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Support for cold archiving #164

Open
graphixillusion opened this issue Oct 29, 2024 · 8 comments
Open

[Feature] Support for cold archiving #164

graphixillusion opened this issue Oct 29, 2024 · 8 comments
Labels
enhancement New feature or request

Comments

@graphixillusion
Copy link

graphixillusion commented Oct 29, 2024

I think this feature should help who wants/needs to cold archive roms, expecially the big sets which are not update very often.
Let's say that i want to archive a big set somewhere (tape archive, cloud, backups, etc...) but this set is currently being update/modified: the archived files are not physically present anymore in the scan folder but the program should treat them as they were still present, so only what's new/update/modified is currently written in the current folder. If an archived roms needs a rename, only the archived reference is renamed; if an archived set with multiple roms inside a single zip (think to MAME for example) needs to be update with new roms, in the current folder a zip with just the new modifications/additions is made.

@alucryd
Copy link
Owner

alucryd commented Nov 1, 2024

Thanks for the suggestion. Do you mean you'd like to manually move some files outside of oxyromon? I'd rather have the archiving mechanic be a part of oxyromon, maybe supporting some popular cloud storage protocols would be a nice first step. I wouldn't be able to distinguish archived files from accidentally deleted files and it would totally break purge-roms.

@alucryd alucryd added the enhancement New feature or request label Nov 1, 2024
@graphixillusion
Copy link
Author

graphixillusion commented Nov 1, 2024

Thanks to you for the support! Exactly, let's assume that i'm starting from a green dat so 100% complete. This option should mark the dat and all the content as archived. From now on, you can move/archive this roms whichever you want, on external hard drives, bluray, tapes, cloud: whenever a new update of the dat comes out wich adds/remove/renames something, all the new stuff will be stored in the current scanning folder and what's archived is not touched becouse the files are not there but the scan should acts as they are in it. So i can still updating this archived set even if i have cold stored all the files somewhere else: at this point if at a later time i will merge the archived files previously stored with the current scan folder which has just what's new (somesort of diff update) i can reconstruct the full 100% set. Conceptually speaking i was thinking to a method which shoudn't break too much in the current architecture and this method should use sparse files. Sparse files allows us to make dummy files which are logically of the same size of the archived counterpart but which has 0 byte of physical space occupied on the hdd. So everything should works as before, we just need to ignore checksums on all the files which have the archived flag. Talking about the archived flag, a simple solution should be something like this:

Dat Rom Room Folder
|
|-.Archived\ <--- every archived files should have a sparse dummy copy in this folder with a 0 byte physical size on the hdd (skip hash checksum becouse the dat doesn't recognize them)
|
|-roms <--- what's new/updated/changed is putted here

of course this is just an idea. What do you think about it?

@alucryd
Copy link
Owner

alucryd commented Nov 6, 2024

That's a good idea. I'd still want the archiving process to be part of oxyromon, via a archive-roms or archive-systems sub command. It would take a list of systems, and a destination, local or remote. That way the process can get tracked, and it could be applied even to partial sets. Roms would get flagged as archived in the database, and importing a changed DAT would automatically remove the flag from changed files and mark them as missing for further reimport. I need to investigate how sparse files are handled by rsync-like commands, be it regular rsync, gs rsync, etc... Using rsync would simplify the archiving and diff process quite a lot. An option to keep local files would be nice too for people who just want to have a backup and keep files locally to keep using them.

While I write that I realize there's some overlap with the existing export-roms sub command, maybe I could extend that one.

@acuteaura
Copy link

acuteaura commented Dec 7, 2024

This sounds a lot like git-annex, in case you're looking for a solution right now. Maybe oxyromon could also treat the dead symlinks annex leaves when files aren't local as special, since the link target filename usually includes a (configurable) hashsum, because renaming and moving the file is still possible in this state.

@graphixillusion
Copy link
Author

afaik git-annex is not 100% compatible with all OS, expecially Windows so i don't know if the git-annex approach should works good for every case.

@acuteaura
Copy link

Annex used to have problems with Symlinks on NTFS, but as far as I can tell, these have been resolved on Microsoft's side as part of the push to make WSL work much better (ln -s inside a WSL on NTFS works as one would expect now too), so unless you use (ex)FAT a lot you'd be fine.

https://git-annex.branchable.com/todo/windows_support/

It's not a trivial commitment though, so I understand if you don't want to take the chance. I'll let y'all get back on topic now.

@graphixillusion
Copy link
Author

graphixillusion commented Dec 7, 2024

But actually even if git-annex would works good under Windows, it will not solves the main problem of the described scenario (correct me if i'm wrong). I mean, even if you annex (archive) all the files and now you have symlinks instead, when you move the real data objects (becouse you archived it somewhere), when you are going to scan again the folder with all the symlinks you'll get errors becouse the original data are not physically present anymore and you still need to copy it back before do a new scan. The original purpose of this idea is keep going to update something without the needs to phisically have the archived data in place when you are updating something, and in this case, the rom manager should says anyways that you are 100% green (logically speaking, as physically the only data you currently have stored is only the diff data, as the archived one is cold stored away).

@acuteaura
Copy link

acuteaura commented Dec 8, 2024

Annex isn't purely an archiving tool, so "annexing" is just adding a file to any repository.

What annex actually does in the background is move your physical file to .git/annex, renames it to something like SHA256E-s31390--f50d7ac4c6b9031379986bc362fcefb65f1e52621ce1708d537e740fefc59cc0.mp3 and creates a substitute symlink pointing at this file. The file in .git/annex can now be moved elsewhere with annex without the directory tree looking different. You may have noticed the filename itself is a content address - it contains a hash of the contents (free deduplication too) and that remains in your repository even if you move the file. This "key hash" is also configurable, so you could configure both annex and oxyromon to use the same hash, making the symlinks perfect substitutes for data that is somewhere else (if that's something the author is interested in implementing).

https://git-annex.branchable.com/internals/key_format/

https://git-annex.branchable.com/backends/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants