Cache built kibblesets for later reuse #229

Viqsi · 2024-01-18T06:00:23Z

The most frequent slowdown we run into in regular operations is when a kibble set has to be built for a given variable. We're already running into situations where we have to "cache" these results (witness the proliferation of All-EXISTS cohorts for FHVs in particular), so it might be worthwhile to just go the next step and formalize this caching process.

This could potentially speed up creation of cohorts and custom variables and downloads (for example, previewing a potential cohort would populate the cache and thus make the ensuing creation - and subsequent previews - much faster), simplfy the process of envaluing a custom variable, and keep cohortInds from becoming an untenable slow monster. It also cleanly separates an existing de facto system caching function from vital user data. We'd just have to remember to purge the kibble cache any time variable data changes (not often), make sure the resulting cache table(s) is/are supremely well indexed, and we're golden.

Something to give some serious thought IMO.

Viqsi · 2024-07-16T20:50:54Z

I had a different concept of how this might work come to mind. As described to @WValenti in Skype (with some edits):

The idea amounts to taking that concept I already have with "SVVFHVs" (the persistent table that is actually a cache) and doing something similar with 'kibble' - creating a new table that basically pares a copy of 'kibble' down to the few bits DIVER actually cares about - new_ind_id, variable_name (as determined by equivalence_groups.first_member), value.

The reasons why that are coming to mind:

The kibble table is currently ~5.7GB on disk, a little over two thirds of which is indexes.

A "down to DIVER's essentials" copy would likely be smaller, because of the elimination of both 1) "invalid" rows (read: rows for ind_ids and variable_ids in interactive tables) and, more importantly, 2) indexes for extra columns that this pared-down copy wouldn't need.

It'd have to be "rebuilt" each time we add data, but we're already having to rebuild and recheck things with additions and revisions to distributions. So make it a formal process and designate what specifically we use it for for DIVER, and leave the DIGS stuff (which is otherwise tracking necessary, vital, and yet convoluted information) out of it.

And that way any kibble operations - cohort creation, custom variable creation, download creation - is ostensibly much more simple, because the number of lookups needed for each row is drastically reduced and we don't have to do it with temporary generated stuff on the fly.

Heck, it'd eliminate the need for AECs for existing (non-custom) variables entirely.

So yeah. a thought coming to mind. I think it achieves the objective of Don't Have A Giant Sparse Table while still making DIVER work and while not impacting the DIGS backend itself.

(It'd probably also make deployment less of a headache for ISI.)

WValenti · 2024-07-23T15:33:21Z

As discussed in person, kibble is subject to extensive development for the addition of longitudinal information. This is going to be a long-term project as it requires significant noodling first, and will involve UI changes as well.

WValenti · 2024-07-25T17:54:28Z

Also, I believe in a cacheless society. Gimme credit for my experience - cache causes problems. ;-)

Viqsi · 2024-07-25T18:17:44Z

I mean, I would prefer not to have "cache tables" myself, but until MariaDB/MySQL give us native support for materialized views, the performance penalties are way too severe. :(

Viqsi added this to the Aspirational milestone Jan 18, 2024

Viqsi self-assigned this Jan 18, 2024

Viqsi mentioned this issue Aug 26, 2024

Consider reworking cohorts as "create table on preview, commit on save" the same way downloads currently are #249

Open

Viqsi mentioned this issue Oct 18, 2024

Consider creating a single "regen all cache tables/columns/etc" DB routine #264

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache built kibblesets for later reuse #229

Cache built kibblesets for later reuse #229

Viqsi commented Jan 18, 2024

Viqsi commented Jul 16, 2024 •

edited

Loading

WValenti commented Jul 23, 2024

WValenti commented Jul 25, 2024

Viqsi commented Jul 25, 2024

Cache built kibblesets for later reuse #229

Cache built kibblesets for later reuse #229

Comments

Viqsi commented Jan 18, 2024

Viqsi commented Jul 16, 2024 • edited Loading

WValenti commented Jul 23, 2024

WValenti commented Jul 25, 2024

Viqsi commented Jul 25, 2024

Viqsi commented Jul 16, 2024 •

edited

Loading