API kiara.list_all_values endpoint takes several minutes to process #73

MariellaCC · 2024-06-04T11:55:13Z

Describe the bug

I cleared the data store recently but ran a few operations since then without saving anything in the meantime (at least, not that I recall). I need to display the values present in the data store in a Jupyter notebook. When trying to run kiara.list_all_values() the cell takes more than 5 minutes to run, so I stopped the process. However, when I try to run kiara.list_all_value_ids() it works in a few seconds.

Additional info

partial output of kiara.list_all_value_ids():
(I just copied/pasted a few lines)

UUID('000537a8-4368-484c-ad6b-bd6c26bde2f6'),
 UUID('02484e0a-95cb-44bd-bcca-c0ae9701477a'),
 UUID('03065ed1-e927-48c4-9970-5f02fc71c74a'),
 UUID('04879195-1253-437a-ba19-ca16ffffa424'),
 UUID('04e2a9ac-9e13-4769-8fbb-f1793a52ceb1'),
 UUID('050dee08-ada7-455e-b15e-c72d745ac92d'),
 UUID('0555b482-f9f7-4c1f-bc1b-a7b8e5fbb83b'),
 UUID('062f7605-4f1d-4ba1-b999-fa202832207c'),
 UUID('063fad22-0fca-4485-ae2c-83612b95ca34'),
 UUID('06869012-ef7c-4c3d-9e66-905f48238ecc'),
 UUID('0706bb51-025a-400d-9ccc-8657325af591'),
 UUID('080788cd-3da4-43a0-977e-daf870faa56d'),

Output of kiara context explain:

Context 'default' ──────────────────────────╮
│                                              │
│   context name   default                     │
│   kiara_id       6978810c-3382-4953-a3cc-…   │
│   size on disk   0 bytes                     │
│   values                                     │
│                    no. values    356         │
│                    combined      5.82 GB     │
│                    size                      │
│                                              │
│   aliases                                    │
│                    no. aliases   0           │
│                    combined      0 bytes     │
│                    size                      │
│                    aliases                   │
│                                              │
│   archives                                   │
│                          it…   ar…           │
│                    al…   ty…   ty…   co…     │
│                   ───────────────────────    │
│                    de…   me…   sq…   {       │
│                                        …     │
│                                        …     │
│                                      }       │
│                    de…   da…   sq…   {       │
│                                        …     │
│                                        …     │
│                                        …     │
│                                      }       │
│                    de…   al…   sq…   {       │
│                                        …     │
│                                        …     │
│                                      }       │
│                    de…   jo…   sq…   {       │
│                                        …     │
│                                        …     │
│                                      }       │
│                    de…   wo…   fi…   {       │
│                                        …     │
│                                      }       │
│                                              │
│                                              │
╰──────────────────────────────────────────────╯

Expected behavior
List all values in the current context, incl. internal ones.

Environment, versions (please complete the following information):

OS: MacOS Sonoma 14.5
Python: 3.12
Versions of Kiara and Kiara plugins used in the related project

kiara                        0.5.10
kiara_plugin.core_types      0.5.1
kiara_plugin.develop         0.5.2
kiara_plugin.onboarding      0.5.1
kiara_plugin.tabular         0.5.4
kiara_plugin.topic_modelling 0.1.dev53+gff7ac02 /Users/mariella.decrouychan/Documents/GitHub/kiara_plugin.topic_modelling

Python environment used (e.g. system, conda, ...)
conda

The text was updated successfully, but these errors were encountered:

makkus · 2024-06-04T14:28:59Z

Thanks for the report. I'll have to setup some test cases with a context with comparable size and number of values. I'd think that 356 values (like you seemed to have) should be prohibitive in terms of getting metadata for all of them, so I think there is a good chance this is something that can be optimized away, but I'll have to spend some time trying to replicate the problem...

Anyway, please let me know if this happens again, queries like that should never take more than a few seconds, except if there is actually data loading (compared to metadata loading) involved....

makkus · 2024-06-04T14:47:05Z

Ah, also, if that happens again, it would be interesting to see whether this only happens with Jupyter, or, using the same context, also via 'pure Python' and/or the command-line.... Jupyter does have some quirks that could have caused this...

MariellaCC · 2024-06-05T08:08:31Z

As I hadn't intentionally stored anything in the data store, I didn't know that there were as many values. I don't know if users will be aware of these values being there when they don't intentionally save elements in the data store.

MariellaCC · 2024-06-05T08:10:39Z

Anyway, please let me know if this happens again, queries like that should never take more than a few seconds, except if there is actually data loading (compared to metadata loading) involved....

Something I don't understand though: why does the size of the data matter to list the values, why does it take more time to do kiara.list_all_values() than to do kiara.list_all_value_ids() in such a case?

makkus · 2024-06-05T08:31:12Z

As I hadn't intentionally stored anything in the data store, I didn't know that there were as many values. I don't know if users will be aware of these values being there when they don't intentionally save elements in the data store.

It was decided to store every value in every job run, which was mostly a consequence of the requirement of having a comment associated with every job run. Manually storing values is not necessary anymore, since everything gets stored anyway: #71 (comment) -- I did point out the potential issues that could arise in that meeting, esp. that I'm not sure about performance, since kiara wasn't designed with a pattern like that in mind. As I said, I think I can probably improve it in this instance, but I can't guarantee that we won't run into other similar issues.

Something I don't understand though: why does the size of the data matter to list the values, why does it take more time to do kiara.list_all_values() than to do kiara.list_all_value_ids() in such a case?

In some case kiara needs to read the data (or parts of the data). It shouldn't have to for list_all_values, so maybe there is a stray call somewhere that causes the issue, but list_all_values will always be much slower than list_all_value_ids, because the latter only needs to query the database for all unique ids, the former needs to also access all metadata for each id, and there is a lot of metadata associated with our values, Python objects needing to be created, some de-serialization happening. This is definitely not a trivial operation.

MariellaCC · 2024-06-05T08:56:12Z

I will then look to use another operation for my needs, but it was interesting to see the impact of metadata auto-storing. Not sure, but I think this also had an impact on the performance of my laptop (ventilation was triggered a lot without me understanding why), but things are much better since I re-emptied the data store. Maybe it's just a coincidence, though but in any case, it may be important when getting to front-end considerations that users are aware of the amount of data that is in their data store (even if they didn't intentionally store anything).
Since the fact that this operation is time consuming is normal/proportionate to volume of data stored, I am closing this issue.

makkus · 2024-06-05T09:19:18Z

No, this needs to stay open, I need to investigate, as I said, we can't have that operation taking minutes....

MariellaCC · 2024-06-05T09:20:13Z

ah ok :-) my bad sorry, I thought that the time that it takes was because of too much data in my store, thanks for re-opening it

MariellaCC closed this as completed Jun 5, 2024

makkus reopened this Jun 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API kiara.list_all_values endpoint takes several minutes to process #73

API kiara.list_all_values endpoint takes several minutes to process #73

MariellaCC commented Jun 4, 2024

makkus commented Jun 4, 2024

makkus commented Jun 4, 2024

MariellaCC commented Jun 5, 2024

MariellaCC commented Jun 5, 2024

makkus commented Jun 5, 2024

MariellaCC commented Jun 5, 2024

makkus commented Jun 5, 2024

MariellaCC commented Jun 5, 2024 •

edited

Loading

API kiara.list_all_values endpoint takes several minutes to process #73

API kiara.list_all_values endpoint takes several minutes to process #73

Comments

MariellaCC commented Jun 4, 2024

makkus commented Jun 4, 2024

makkus commented Jun 4, 2024

MariellaCC commented Jun 5, 2024

MariellaCC commented Jun 5, 2024

makkus commented Jun 5, 2024

MariellaCC commented Jun 5, 2024

makkus commented Jun 5, 2024

MariellaCC commented Jun 5, 2024 • edited Loading

MariellaCC commented Jun 5, 2024 •

edited

Loading