Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Ghost-busting": Tackling undeleteable filesets #2613

Open
carakey opened this issue Jun 13, 2024 · 1 comment
Open

"Ghost-busting": Tackling undeleteable filesets #2613

carakey opened this issue Jun 13, 2024 · 1 comment
Labels

Comments

@carakey
Copy link

carakey commented Jun 13, 2024

We have been happening upon fileset objects in SA that should not be there, that resist deletion by standard means and/or persist after deletion.

The first group of items was identified during the 2023 preservation assessment file format inventory, which involves exporting Solr data. Initially there were 22 items assessed as (probable) "Ingest errors with duplicate functional filenames elsewhere" on #2491, with minimal metadata, no characterization information, and no parents, but which were initially accessible in SA. I tried deleting them through the UI and 13 were successfully deleted; the other 9 could no longer be viewed in SA, but they persisted in Solr. In the same inventory, another 13 filesets with minimal Solr metadata and no parents were found that were not accessible in SA (500 errors). These 13 plus the 9 undeleteable filesets were slated for bulk delete on #2530. The 13 that gave 500 errors were removed from Solr, but the 9 continued to resist deletion; @straleyb on #2530:

I am able to update them through the command line, save them, and it persists the updated information. I can use LDP to request a resource and get a graph from fedora and it looks perfectly acceptable. However, attempting to delete them from fedora results in them not wanting to be deleted. We tried through CURL commands, using ActiveFedora, LDP and the fedora UI to try and delete them, but they were unable to be removed.

A second group of 12 items showed up as fixity failures in Feb 2024 and was documented on #2571. These had more extensive Solr metadata but could not be viewed in SA (500 error). Brandon was not able to find them in Fedora but was able to delete them from the Solr index; @straleyb on #2571:

I went through and checked on these works. All the works are coming back as Ldp::Gone meaning that Fedora can't find them in its tree for some reason. That means grabbing the work and deleting it using ActiveFedora::Base is impossible. We can try and see if deleting them through the Fedora front end is possible but I know Ryan W has tried that and it didn't work. This is, from what I understand, where the Ghost part of the Ghost FileSets come from.
I did find them in Solr though. I was able to grab them using SolrDocument.find(id) and they returned with a Solr Document. I used Hyrax::SolrService.delete(id) to delete them and double checked using SolrDocument.find(id) and verified that they were removed from the Index.

It isn't clear at this point how many separate issues are at play. Some of these seem to be fileset objects that are present in Solr but not Fedora, but those 9 do have a Fedora presence that is apparently impossible to remove. The ones we have found showed up because they had incomplete characterization metadata or because they failed fixity checks. At this point we don't know how many may be in the system.

Fully dealing with the SA ghosts seems to involve the following:

  1. Identify ghosts
    a. Figure out how to reliably locate/identify ghosts
    - Fileset objects that do not have parents?
    b. Identify existing ghosts
    - Bulk Delete: Ghost filesets #2530
    - Bulk delete filesets from 2024 Feb/Mar fixity failures #2571

  2. Resolve underlying ghost-creating issue(s)
    a. Diagnose how the system creates ghosts
    - One hypothesis: Files deleted during review are not actually deleted #2572, non-active versions of a fileset that gets deleted
    - Original hypothesis from Clean up problem FileSets #2491, that they were created through failed ingests
    b. Prevent the system from creating more ghosts

  3. Get rid of the existing ghosts
    a. Figure out how to remove existing ghosts
    b. Remove existing ghosts

@carakey carakey added the Epic label Jun 13, 2024
@carakey
Copy link
Author

carakey commented Aug 19, 2024

After the SOLR drama, one ghost PID kp78gg577 disappeared, but the remaining 8 are still in SOLR. In case this helps diagnose: that same PID had been the only one of the 9 original ghosts to show up on monthly fixity reports.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant