-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bulk unindexing for IndexerInterface #598
Conversation
My opinion is to have it in 3.2.0. |
That is true. I feared that including that information would overcomplicate the interface, but the cost of not knowing which files were unindexed and which ones were not is possibly too high. I will revise this. |
do you have plans to "un-draft" this PR? |
I have updated the PR with a plan. It should not become ready for review until there is evidence that this API works well for implementers and consumers. We also have two known problems and further thinking is needed to either resolve them or just accept them as not enough of a problem to merit blocking the extension. Feedback is more than welcome. |
3477736
to
9362f5a
Compare
Updated the API to include a progress callback. I updated the root message with known caveats. |
The unindex bulk API can be beneficial in certain scenarios, and we are currently approaching a milestone where new APIs will be incorporated. It would be appreciated if you could take into account the undraft this pull request, as it has been pending for quite some time. If necessary, we can revisit and iterate on it at a later stage. |
9362f5a
to
be2438e
Compare
I have integrated bulk unindexing to the core unindex Web service. It would be better to also have a proof of concept implementation other than the default. |
I added this point to the known caveats.
After trying to implement this interface, I am convinced that the API needs to be redesigned to be easier to implement. |
- for unindexing in bulk - clarify that both unindex methods are synchronous, unlike the indexing ones
- adjust the content to align with good practices (see also https://bioinformatics-ua.github.io/dicoogle-learning-pack/docs/query_index/)
- add `UnindexReport` class and nested classes - for containing errors which may occur in bulk unindexing - change `IndexerInterface#unindex(Collection<URI>)` - returns `UnindexReport` - can throw `IOException`
- make it asynchronous: returns a `Task` like in `index` - add second parameter for keeping track of progress
- clarify that it returns a task - remove unused import
- can only handle one indexer at a time, but other than that it works
- remove deprecated method call #handles, check scheme instead
- record a collection of URIs in each unindex failure
8ac8afb
to
216b11f
Compare
I have readjusted the Unindex report so that each failure may specify multiple URIs, which is a better reflection of what will happen in most implementations. |
- provide clearer methods to collect the counts of files which were not unindexed successfully
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have applied the new unindex method for the interface in our index plugins, and it works as expected. Performance is definitely being improved.
LGTM!
This proposes an extension to the
IndexerInterface
so that API consumers can request multiple items to be unindexed at once. Resolves #594.This extension to the API should be carefully reviewed and evaluated to ensure that implementers can use it properly and that it can be well integrated into existing workflows.
Breaking changes are minimal. It is only a problem if an existing plugin already has a method with the exact same signature, which is unlikely. Hence, we are in position to deliver this in 3.4.0, or postpone to version 4 anyway.
To ensure some level of quality in this suggestion, we should have:
Problems resolved in the latest version:
unindex
to finish before we start cleaning up the files. The operation is now asynchronous.Known caveats:
It is not clear whether we can properly constrain the number of parallel unindexing tasks by pushing this to our indexing task pool. The current indexing task manager depends on task objects returning a different kind of report, whichAs of the latest version, unindexing tasks can be dispatched by the indexer thread pool.UnindexReport
does not implement. Maybe it should be subclassed fromReport
?UnindexTask<T>
to extendTask<T>
, but it is very different from indexing tasks and would make it even harder to attach to the task pool.