Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tablechop deletes files which currently used by Cassandra #79

Open
Mortinke opened this issue Mar 16, 2017 · 1 comment
Open

tablechop deletes files which currently used by Cassandra #79

Mortinke opened this issue Mar 16, 2017 · 1 comment

Comments

@Mortinke
Copy link

tablechop checks only the last modified date from the index_key file to decide if the backuped file will be deleted:

    for index_key in index_keys:
        if days_ago(index_key.last_modified) > args.age:
            break
        index_files_to_keep.add(index_key.name)

For my understanding, tablechop should only delete this files which no longer necessary for restoring the backup in the specified retention time. Depending from the table size and the compaction strategy some SSTables can live weeks or months without being compacted. These files are mandatory to restore the table, regardless of whether they have been in S3 for weeks or months. Currently active SSTables should not be deleted (unless uses a force parameter).
Meanwhile there can be exists small backuped SSTables, which have been compacted and are no longer required for restoring.

I'd be grateful if we can add could an additional verification whether the file is currently being used by cassandra/exists on the filesystem.

@raags
Copy link

raags commented May 10, 2017

@Mortinke The index file (-listdir.json) contains the list of all sstable files that were present when that particular sstable was being uploaded. So the index file has a snapshot with the complete list of files to restore to a state when that sstable was created. Tablechop loops through the files in the index file(s), and ensures these are not deleted.

Check files_to_keep here : https://github.com/JeremyGrosser/tablesnap/blob/master/tablechop#L87

That said there is a race here since multiple sstables can be uploaded in parallel, so if the node fails some of these latest -listdir.json might not be valid, since some other file referenced may not have been uploaded. This is a problem as of now, @JeremyGrosser is this a known issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants