Compress JSON #93

marc-fouquet · 2022-12-17T09:49:30Z

I have just tested this script for the first time and the JSON file with the statuses is astonishingly huge, given that I only have a hand full of toots. There is a lot of redundancy in there, even simple ZIP compression reduces the size by about 90%.

There could be an option --compress that transparently adds compression when loading and saving these files using one of python's buildin compression modules.

kensanata · 2022-12-17T10:35:46Z

I agree.

lapineige · 2022-12-17T14:01:45Z

It doesn't matter much given the small size of the resulting text file, but may I advocate for a better compression algorithm than a ZIP ?
I'm thinking about Zstandard, it's widely supported now (but not as much as others if not on linux ?), very fast compression/decompression for a very good compression ratio, but anything else is fine :)

kensanata · 2022-12-17T16:25:47Z

Who ever implements it, gets to decide. 😄

lapineige · 2022-12-17T18:38:29Z

Oh ok, for some reason I thought you were going to do it 😅

I actually don't care much, I store it compressed (filesystem compression using btrfs) anyway.

For the record, a ~1GB archive is compressed to around 100MB (using zstd), which makes quite a big difference 🙂

kensanata · 2022-12-17T19:56:48Z

It sure does! I’m basically just storing the results of the Mastodon client calls so every respond contains all the account infos of the author, if I remember correctly. And it’s all pretty printed. So compression definitely helps! As for myself, I’m just not courageous enough to run a non-standard file system. Ext4 forever, I guess. 😂

lapineige · 2022-12-17T20:27:01Z

It shouldn't be the point here anyway, it would be great if the json was stored compressed anyway.
I will see if I have time to implement this… Don't be too hopeful 😅

kensanata · 2023-01-02T21:59:36Z

I wonder whether this should be optional (or automatic: detect if a .gz variant already exists, and if it does, use that). I don't have a compressed filesystem, but if I had, I'm assuming I wouldn't want to have the data recompressed?

lapineige · 2023-01-03T21:24:01Z

I'm not sure it's a big deal. And most of the time the filesystem detects it's already compressed (in fact : not possible to compress) and skip it.

Also, it's a quite rare use case.

kensanata added help wanted good first issue labels Jan 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compress JSON #93

Compress JSON #93

marc-fouquet commented Dec 17, 2022 •

edited

Loading

kensanata commented Dec 17, 2022 via email

lapineige commented Dec 17, 2022

kensanata commented Dec 17, 2022 via email

lapineige commented Dec 17, 2022

kensanata commented Dec 17, 2022 via email

lapineige commented Dec 17, 2022

kensanata commented Jan 2, 2023

lapineige commented Jan 3, 2023

Compress JSON #93

Compress JSON #93

Comments

marc-fouquet commented Dec 17, 2022 • edited Loading

kensanata commented Dec 17, 2022 via email

lapineige commented Dec 17, 2022

kensanata commented Dec 17, 2022 via email

lapineige commented Dec 17, 2022

kensanata commented Dec 17, 2022 via email

lapineige commented Dec 17, 2022

kensanata commented Jan 2, 2023

lapineige commented Jan 3, 2023

marc-fouquet commented Dec 17, 2022 •

edited

Loading