Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running WP_CLI command wp algolia reindex --all would clear index even if --clear flag is not specified #442

Open
menno-ll opened this issue Dec 11, 2024 · 8 comments · May be fixed by #443

Comments

@menno-ll
Copy link

menno-ll commented Dec 11, 2024

Describe the bug
When you run the wp-cli command wp algolia reindex --all the index would still be cleared.
This happens because when it re-indexes the first batch, it checks if the index exists.
And inside that functionality, it also empties the index if the index already existed.

To Reproduce
Steps to reproduce the behavior:

  1. In your terminal, run wp algolia reindex --all
  2. Visit algolia backend interface, see you have indexed items.
  3. Run wp algolia reindex --all
  4. While the command is running, refresh the algolia backend interface
  5. You will see the index has been cleared, and less items are in the index (and going up in amount as more content is being indexed)

Expected behavior
Index should not be cleared when the --clear flag has not been specified.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: MacOS
  • Browser chrome
  • Version 131.0.6778.109 (Official Build) (arm64)

Smartphone (please complete the following information):
Cannot run wp-cli on smartphone

Additional context
Add any other context about the problem here.

Fix would be to add add_filter( 'algolia_clear_index_if_existing', '__return_false' ); to class-algolia-cli.php in an else statement, somewhere around line 126 .

@tw2113
Copy link
Member

tw2113 commented Dec 11, 2024

Hi @menno-ll

Looking things over, and here's some notes that I have about this topic as a whole.

I'm pretty sure our clear() method used in the WP_CLI command is coming from https://github.com/WebDevStudios/wp-search-with-algolia/blob/main/includes/indices/class-algolia-index.php#L889-L897 and it's meant to trigger the clearing of objects.

That method is being defined at https://github.com/algolia/algoliasearch-client-php/blob/3.3.2/src/SearchIndex.php#L277-L287 which is known to be in an older version of their PHP library, but this would essentially be the version we presently ship with the plugin.

Here's where the filter you're using in the associated pull request is being used at: https://github.com/WebDevStudios/wp-search-with-algolia/blob/main/includes/indices/class-algolia-index.php#L539-L555.

My biggest question is how often is this filter getting reached? From what I can see, if someone isn't calling --clear with the WP-CLI command, then create_index_if_not_existing() is going to be run once, for the first "page" of content, and only for that first page/batch. However, if --clear is getting included, then I think create_index_if_not_existing() would only run, at most, twice.

I don't think it'd be super detrimental to performance and whatnot, but I also won't claim it's necessarily "ideal" either. More a case of "being extra sure".

I also know that the way Algolia themselves originally designed the plugin, and a detail we haven't really attempted to re-evaluate and redesign, is when doing bulk re-indexing like this, to just "wipe out" the original index and push up everything as if it was new. This is being done instead of trying to do a bunch of patch/update pushes for each piece of content. I'd need to inquire to Algolia devs to see if creating an outright new object is more efficient and performant than trying to compare/update existing items.

Not going to close this issue outright, as I think it could use some pondering and considerations and discussion, but I did want to at least attempt to provide as much context as I could.

@menno-ll
Copy link
Author

menno-ll commented Dec 12, 2024

Hi @tw2113
Thanks for the swift reply.

The code path that now triggers the clearing of the index, while --clear has not been specified is the following (links and line numbers following latest github release):

So this will cause to actually clear everything from the index when using the CLI command, ignoring the --clear flag of the command.

This behavior can be modified by using the algolia_clear_index_if_existing filter like I did in my PR, as specified here.

Then regarding your question about how many times the filter will run.
The answer is both simple but a little bit harder to explain, so bare with me. The apply_filters function for the algolia_clear_index_if_existing filter will always only run once, whe the. This as the create_index_if_not_existing() function only be called once, when reindexing the first batch.
HOWEVER, how many times the index is currently cleared when running the CLI command, is actually different to how many times the filter is called:

  • If you specify the --clear flag, the index will be cleared twice (once via Algolia_Index::clear(), and then because of indexing the first page via Algolia_index::create_index_if_not_existing()).
  • If you don't specify the --clear flag, the index will be cleared once (only because of indexing the first page via Algolia_index::create_index_if_not_existing()).

Regarding performance, I don't know the impact yet, as we ourselves have only used Algolia for a short time now, and just stumbled on this issue, which seemed quite easy to fix. If I have to speculate, I would think that clearing the index thats already empty isn't that hard of a task, and as it's only a CLI command wont have a large impact on performance.
If it's faster to delete an object from the index and re-create it, instead of getting and updating it, i'm not sure. That's something I think Algolia might be able to assist in further. However, I do feel like these considerations go beyond the scope of this issue and should maybe deserve it's own separate one.

@tw2113
Copy link
Member

tw2113 commented Dec 14, 2024

I almost had a big reply typed up for you, but as I continued to think about things, I put a pause on myself as I wanted to circle back to when I had some more time to really ponder and think things through some more.

Looking through the git history for our repo, and it's important to note that we are a fork of a WordPress plugin that Algolia originally created and then archived, but this double clearing has been there from the start of our plugin. That means that it was also doubled up in Algolia's original.

While I generally believe and hold to the idea that they knew what they were doing and were intentional with it, no one is perfect either, and I think this is one of those spots.

All that to say that I think we can get this corrected, and in a way that I believe would not risk any backwards compatibility issues, especially since this would largely be stemming from just the WP-CLI component.

The create_index_if_not_existing() method has a $clear_if_existing parameter that defaults to true. We don't utilize that parameter at all, and the re_index() method starting at https://github.com/WebDevStudios/wp-search-with-algolia/blob/2.8.2/includes/indices/class-algolia-index.php#L428 is the only place we use that method.

I'm thinking that we need to set, say $do_clear = false or similar inside this block here: https://github.com/WebDevStudios/wp-search-with-algolia/blob/2.8.2/includes/class-algolia-cli.php#L120-L126 most specifically after the $index->clear() method has been run. Then, later on in that method, we need to amend the two $index->re_index( ... ) calls to be:

$index->re_index( 1, [], $do_clear ); and $index->re_index( $page++, [], $do_clear );

This, would necessitate adding a 3rd parameter to the re_index() method from Algolia_Index class, making its parameter signature essentially:

public function re_index( $page, $specific_ids = [], $do_clear = true )

Finally, changing $this->create_index_if_not_existing(); to $this->create_index_if_not_existing( $do_clear );

While yes the filter is available and in place, I feel like this is going to be the more advantageous route to go while also managing to preserve backwards compatibility. The WP-CLI do_reindex() method would be the only place in the code at the moment that passes false to the re_index() method, and the rest would still default to true like it all is now.

That said, if things make sense for above, and you're willing to try them out for your own usecases, but also issue the PR for the changes, we'd love the contributions and we'd definitely give credit in the changelog for it as well.

@menno-ll
Copy link
Author

menno-ll commented Dec 16, 2024

Hi @tw2113

I read your comment and I have to say I've walked the exact same path in my head.
Because I originally wanted to make the same solution you are saying now.

However I looked at the other code in the project, and had seen that instead of their accepted $do_clear parameter in the create_index_if_not_existing() method, on other places (Some WP All Import related code) they also used the filter. I found it also a weard choice, but it was chosen somehow, and I hoped with some thought behind it. And therefore I chose for consistency instead of rebuilding it.

I did however not know it was a plugin originally created by Algolia and then abandoned by them. And that all the code in there is "their legacy".
I'm happy someone (you) was willing to take over the development. As we are now able to have way better WordPress search 🥳 .

I'm also happy to implement the changes you suggest and will update the PR for you.
I'll place a reply once I've made the changes.
Thanks for the swift replies, and the drive to make it better.

@menno-ll
Copy link
Author

Hi @tw2113

I've modified my PR, using the changes you have suggested.

As an extra, I found that the filter was used also in some WP All Import (also known as pmxi?) sync functionality.
I've also removed it there and used the new logic.

@graham73may
Copy link

Had exactly the same issue just now with wp algolia reindex --all and searched the repo.

I feel like I've hit the jackpot with this thread! Recent, detailed, PR submitted.

Much appreciated!

@tw2113
Copy link
Member

tw2113 commented Dec 19, 2024

due to the holidays at least here in the US, we're probably not doing any sort of release until 2025, but we'll definitely keep this around and ready. I'd also like to do at least some testing to make sure we're not seeing potential issues.

@menno-ll
Copy link
Author

Good to know @tw2113, thanks! Happy holidays!

Hi @graham73may!
Until a new release is available, i've placed the previously described filter inside of our themes functions.php file.
This might work for you as well in the meantime as a temporary solution.

add_filter( 'algolia_clear_index_if_existing', '__return_false' );

After release of the change and then updating the plugin update in your site, you should be able to delete this filter again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants