Make freetext search view indexes "offline only" #9544

jkuester · 2024-10-15T19:30:57Z

Is your feature request related to a problem? Please describe.
If we move to Nouveau for freetext searching on the Couch server (aka "online" use), then we still need a solution for freetext searching "offline" (what offline users would experience via their local Pouch instance).

Describe the solution you'd like
The most straightforward approach should be to just have a design doc that is "client-only" similar to how we have docs that are currently "server-only". This design doc will be indexed by PouchDB clients, but will not be indexed by the Couch server.

Since we are likely going to require a complete re-indexing of these views on the client devices one way or the other, we should take this opportunity to remove the unused key:value emissions from the freetext views as originally proposed for the "freetext-lite" views.

Describe alternatives you've considered
One interesting alternative would come after our upgrade to Pouch 9 and the indexeddb adapter. IIRC, Mango queries should be more performance on Pouch with indexeddb. If that is the case, then we could evaluate possibly using Mango queries for freetext searching. This one blog post suggests it works well... 😅

The text was updated successfully, but these errors were encountered:

jkuester · 2024-10-16T22:17:20Z

Here are my conclusions after taking a deep dive into possible implementations. The main approach options seem to be:

Add ddoc to Couch as normal but with "autoupdate": false so it does not get indexed on the server. Then in webapp we would need to warm the index when the app starts.
- Pros:
  - Design doc is managed/controlled along with the rest. Gets all the metadata, etc.
- Cons:
  - Run the risk of someone calling the view (even from Fauxton) and then it will be built on the server-side.
  - Have to manually warm the view on the client side (since it will not auto-update).
  - Need a bit of additional logic to allow replication of the new ddoc to the clients.
Add ddoc directly on the client side. Hardcode the ddoc directly in the webapp code and have it build the index on the client side.
- Pros:
  - With the ddoc only on the client, there is no danger of it getting accidentally indexed on the server.
  - Might also simplify logic in shared-libs/search since that code could just check if the offline ddoc exists. If it does, use it, otherwise call nouveau.
- Cons:
  - New code flow - not tracking ddoc with rest of the server-side ones. View code will not all be in the same place in the repo.
  - Need custom logic in webapp to make sure the index is up-to-date when the app starts.
A middle alternative would be to add logic to our server-side handling of ddocs that would allow for the creation of a new doc on the Couch server with the contents of a ddoc, but without the _design/* id. On the server it would just be a normal doc (with a new type). But, when it gets replicated to a client, the doc gets transformed into an actual ddoc (either in the webapp or by the server's replication logic).
- Pros:
  - Get benefits of #1 without the drawbacks
  - Still would allow for simple shared-.libs/search code
- Cons:
  - Requires additional ddoc logic on the server side for the creation/management of the new type of docs.
  - Server-based transformation into an actual ddoc could be tricky/impossible. Seems unlikely that Couch/Pouch wants you changing doc ids mid-replication....
  - If we just do client-side transformation, it is unclear what benefits this offers over #2.

Alternative rejected:

I also took some time to look into any Pouch-specific alternative approaches to freetext searching (since we will no longer be tied to solutions that also work in Couch). Unfortunately, there is absolutely nothing viable here.

My original hope was that Mango text indexes would help, but the pouchdb-find plugin only supports json indexes and none of the other Mango features are going to be helpful doing incremental searching over all fields in a doc. In fact, though the Couch docs mention the text index type, they also vaguely note "Optional Text indexes are supported via a third-party library". I suspect this is actually a Cloudant-specific feature....

Beyond Mango, any discussions of freetext searching in Pouch seem to mention pouchdb-quick-search which has not been updated in 7 years. To make matters worse, pouchdb-quick-search is based on lunr.js which has not been updated in 4 years. So, any move in that direction seems like it would basically require a re-write of the plugin....

jkuester · 2024-10-16T23:05:36Z

Also here is my conclusion and next steps based on the above analysis:

#1 just seems too risky given that someone clicking through Fauxton could trigger the index to build. #3 seems unnecessarily complex for dubious benefits. While it would be nice keep the offline index code as close as possible to the rest of the indexes, it does not seem to justify the amount of moving parts that this approach entails.

So, I am currently doing further investigation and prototyping for #2.

dianabarsan · 2024-10-17T06:00:44Z

I also agree that #2 is simplest. And probably more correct overall.

Right now, when the app gets an update, there are two things that need to be updated: 1. the medic-client ddoc and 2. the actual app code. We do tell people that they should reload their app to use the new code, but it's entirely possible some app somewhere runs and old webapp code with the new ddoc.

I remember we even tracked this through telemetry on some instance that was reporting errors at some point.

There's an issue: #7146

garethbowen · 2024-10-17T09:19:37Z

Have to manually warm the view on the client side (since it will not auto-update).

As far as I know pouchdb doesn't do background indexing, and ignores the autoupdate setting, and we don't "warm" it anyway so this isn't a "con", just business as usual for client side.

Another option would be to store the client only ddoc as an attachment on the medic-client ddoc, which the client syncs and extracts on change. A "pro" of this is that then the ddocs are all synced together, but a "con" is weird errors if the change event doesn't fire or gets interrupted. You would probably still need to check on startup to ensure the client only ddoc is up to date, which makes it very similar to number 2.

I think I'd go for #2. Checking whether it's up to date is a pain but I guess hash it at build time and compare the hash on first start.

jkuester added the Type: Feature Add something new label Oct 15, 2024

m5r assigned jkuester Oct 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make freetext search view indexes "offline only" #9544

Make freetext search view indexes "offline only" #9544

jkuester commented Oct 15, 2024

jkuester commented Oct 16, 2024

jkuester commented Oct 16, 2024

dianabarsan commented Oct 17, 2024 •

edited

Loading

garethbowen commented Oct 17, 2024

Make freetext search view indexes "offline only" #9544

Make freetext search view indexes "offline only" #9544

Comments

jkuester commented Oct 15, 2024

jkuester commented Oct 16, 2024

Alternative rejected:

jkuester commented Oct 16, 2024

dianabarsan commented Oct 17, 2024 • edited Loading

garethbowen commented Oct 17, 2024

dianabarsan commented Oct 17, 2024 •

edited

Loading