Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"indexing.IndexQueue.optimize": index requests affecting the same catalog data are not merged #94

Open
d-maurer opened this issue Jun 17, 2020 · 2 comments
Labels

Comments

@d-maurer
Copy link
Contributor

Performing Archetypes -> dexterity migrations (with plone.app.contenttypes.migration), I observed extremely strange catalog inconsistencies: the catalog indexes where inconsistent with the catalog metadata (for the same objects). The whole issue is described in "plone/plone.app.contenttypes#556"; the IndexQueue related part mainly in "plone/plone.app.contenttypes#556 (comment)".

Here is a short summary of the problem: the IndexQueue records indexing requests for objects. When the actual reindexing is necessary, the queue is optimized. During the optimization, some requests are merged. Whether 2 requests can be merged is decided via a comparison of the object hashes and the object paths - both must agree. After the optimization, the remaining/merged requests are executed, not necessarily in the original order. A problem arises when the queue contains reindexing requests for different objects with the same path. In this case, those requests are not merged (because the objects are different) but they affect the same catalog data (because the path is identical). As a consequence, the execution order become important for correctness. But, the optimization can change the order leading to inconsistencies.

I have the strong feeling that IndexQueue should only use the path when it determines whether two requests should be merged -- because requests with the same path affect the same catalog data. I.e.: two request should be merged if and only if they refer to the same path.

@d-maurer d-maurer added the bug label Jun 17, 2020
@gforcada
Copy link
Member

Maybe you saw that already, but there is an environment variable CATALOG_OPTIMIZATION_DISABLED that disables the whole catalog optimization, as the name implies 😅

That would at least help you on the migration and while cooking a patch to fix the current situation.

Could it be though, that the problem you describe is mostly happening only on migrations, rather than on regular work? 🤔

@d-maurer
Copy link
Contributor Author

d-maurer commented Jun 18, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants