You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have been happening upon fileset objects in SA that should not be there, that resist deletion by standard means and/or persist after deletion.
The first group of items was identified during the 2023 preservation assessment file format inventory, which involves exporting Solr data. Initially there were 22 items assessed as (probable) "Ingest errors with duplicate functional filenames elsewhere" on #2491, with minimal metadata, no characterization information, and no parents, but which were initially accessible in SA. I tried deleting them through the UI and 13 were successfully deleted; the other 9 could no longer be viewed in SA, but they persisted in Solr. In the same inventory, another 13 filesets with minimal Solr metadata and no parents were found that were not accessible in SA (500 errors). These 13 plus the 9 undeleteable filesets were slated for bulk delete on #2530. The 13 that gave 500 errors were removed from Solr, but the 9 continued to resist deletion; @straleyb on #2530:
I am able to update them through the command line, save them, and it persists the updated information. I can use LDP to request a resource and get a graph from fedora and it looks perfectly acceptable. However, attempting to delete them from fedora results in them not wanting to be deleted. We tried through CURL commands, using ActiveFedora, LDP and the fedora UI to try and delete them, but they were unable to be removed.
A second group of 12 items showed up as fixity failures in Feb 2024 and was documented on #2571. These had more extensive Solr metadata but could not be viewed in SA (500 error). Brandon was not able to find them in Fedora but was able to delete them from the Solr index; @straleyb on #2571:
I went through and checked on these works. All the works are coming back as Ldp::Gone meaning that Fedora can't find them in its tree for some reason. That means grabbing the work and deleting it using ActiveFedora::Base is impossible. We can try and see if deleting them through the Fedora front end is possible but I know Ryan W has tried that and it didn't work. This is, from what I understand, where the Ghost part of the Ghost FileSets come from.
I did find them in Solr though. I was able to grab them using SolrDocument.find(id) and they returned with a Solr Document. I used Hyrax::SolrService.delete(id) to delete them and double checked using SolrDocument.find(id) and verified that they were removed from the Index.
It isn't clear at this point how many separate issues are at play. Some of these seem to be fileset objects that are present in Solr but not Fedora, but those 9 do have a Fedora presence that is apparently impossible to remove. The ones we have found showed up because they had incomplete characterization metadata or because they failed fixity checks. At this point we don't know how many may be in the system.
Fully dealing with the SA ghosts seems to involve the following:
Resolve underlying ghost-creating issue(s)
a. Diagnose how the system creates ghosts
- One hypothesis: Files deleted during review are not actually deleted #2572, non-active versions of a fileset that gets deleted
- Original hypothesis from Clean up problem FileSets #2491, that they were created through failed ingests
b. Prevent the system from creating more ghosts
Get rid of the existing ghosts
a. Figure out how to remove existing ghosts
b. Remove existing ghosts
The text was updated successfully, but these errors were encountered:
After the SOLR drama, one ghost PID kp78gg577 disappeared, but the remaining 8 are still in SOLR. In case this helps diagnose: that same PID had been the only one of the 9 original ghosts to show up on monthly fixity reports.
We have been happening upon fileset objects in SA that should not be there, that resist deletion by standard means and/or persist after deletion.
The first group of items was identified during the 2023 preservation assessment file format inventory, which involves exporting Solr data. Initially there were 22 items assessed as (probable) "Ingest errors with duplicate functional filenames elsewhere" on #2491, with minimal metadata, no characterization information, and no parents, but which were initially accessible in SA. I tried deleting them through the UI and 13 were successfully deleted; the other 9 could no longer be viewed in SA, but they persisted in Solr. In the same inventory, another 13 filesets with minimal Solr metadata and no parents were found that were not accessible in SA (500 errors). These 13 plus the 9 undeleteable filesets were slated for bulk delete on #2530. The 13 that gave 500 errors were removed from Solr, but the 9 continued to resist deletion; @straleyb on #2530:
A second group of 12 items showed up as fixity failures in Feb 2024 and was documented on #2571. These had more extensive Solr metadata but could not be viewed in SA (500 error). Brandon was not able to find them in Fedora but was able to delete them from the Solr index; @straleyb on #2571:
It isn't clear at this point how many separate issues are at play. Some of these seem to be fileset objects that are present in Solr but not Fedora, but those 9 do have a Fedora presence that is apparently impossible to remove. The ones we have found showed up because they had incomplete characterization metadata or because they failed fixity checks. At this point we don't know how many may be in the system.
Fully dealing with the SA ghosts seems to involve the following:
Identify ghosts
a. Figure out how to reliably locate/identify ghosts
- Fileset objects that do not have parents?
b. Identify existing ghosts
- Bulk Delete: Ghost filesets #2530
- Bulk delete filesets from 2024 Feb/Mar fixity failures #2571
Resolve underlying ghost-creating issue(s)
a. Diagnose how the system creates ghosts
- One hypothesis: Files deleted during review are not actually deleted #2572, non-active versions of a fileset that gets deleted
- Original hypothesis from Clean up problem FileSets #2491, that they were created through failed ingests
b. Prevent the system from creating more ghosts
Get rid of the existing ghosts
a. Figure out how to remove existing ghosts
b. Remove existing ghosts
The text was updated successfully, but these errors were encountered: