Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bulk Delete: Ghost filesets #2530

Open
carakey opened this issue Sep 14, 2023 · 5 comments
Open

Bulk Delete: Ghost filesets #2530

carakey opened this issue Sep 14, 2023 · 5 comments
Assignees
Labels
Content Icebox Priority: Low These are issues that are either epic, need clarification, or are awaiting use case exploration

Comments

@carakey
Copy link

carakey commented Sep 14, 2023

Descriptive summary

Please bulk delete the 22 fileset objects whose PIDs are listed in the attached CSV file. These are not accessible from the front end, so cannot be deleted by a standard UI process.

Documentation

A number of problem fileset objects were identified in the preservation assessment format inventory. One subset of these are 13 filesets that exist in Solr with minimal metadata, but can't be pulled up with URLs (500 errors on the fileset show pages; 404 errors on the direct download links). I'm referring to these as "ghost filesets."

A separate subset of 31 problem filesets, grouped as "Ingest errors with duplicate functional filenames elsewhere" on ticket #2491, did have functional fileset show pages. However after attempting to delete these fileset objects from the UI using the Delete button, nine out of 31 fileset objects persisted in Solr and now show the same characteristics as the first group -- they are in Solr but have minimal metadata, no characterization information, no parents, and cannot be viewed in SA. In other words, they became ghost filesets, too, which suggests the first group were likely similarly deleted but somehow stuck around in Solr. I haven't been able to identify a pattern distinguishing between the 22 filesets that were successfully deleted vs the 9 ghosts.

Related work

This is an offshoot of #2491.

@carakey
Copy link
Author

carakey commented Sep 14, 2023

SA_bulk_delete_2023.csv

@straleyb straleyb self-assigned this Oct 30, 2023
@straleyb
Copy link
Contributor

straleyb commented Nov 2, 2023

Out of the 22 filesets that were broken, I was able to delete 3n204545p through vt150r99h (1 - 13) from the database successfully. t722h937w through b2773w401 have something particularly broken about them. I am able to update them through the command line, save them, and it persists the updated information. I can use LDP to request a resource and get a graph from fedora and it looks perfectly acceptable. However, attempting to delete them from fedora results in them not wanting to be deleted. We tried through CURL commands, using ActiveFedora, LDP and the fedora UI to try and delete them, but they were unable to be removed.

We might need to have a deeper discussion about these items in the database. Ive done all I can within a reasonable amount of time to try and remove them.

@carakey
Copy link
Author

carakey commented Nov 3, 2023

Out of the 22 filesets that were broken, I was able to delete 3n204545p through vt150r99h (1 - 13) from the database successfully. t722h937w through b2773w401 have something particularly broken about them.

FWIW, that lines up exactly with the two different groups described at the top of the ticket. The 9 particularly broken ones are the same ones that I "deleted-but-not-really" in the UI.

@carakey
Copy link
Author

carakey commented Nov 3, 2023

QA: I have confirmed that PIDs 1-13 are no longer found in Solr, while 14-22 still have apparitions there. So QA pass for the scope of the first group.

@shieldsb Since the mystery remains but doesn't have any serious known consequences, can we keep this ticket open but with a low priority and/or icebox label?

@shieldsb
Copy link

shieldsb commented Nov 3, 2023

Got it. I'll update the ticket label and board @carakey

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Content Icebox Priority: Low These are issues that are either epic, need clarification, or are awaiting use case exploration
Projects
None yet
Development

No branches or pull requests

3 participants