Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compress JSON #93

Open
marc-fouquet opened this issue Dec 17, 2022 · 8 comments
Open

Compress JSON #93

marc-fouquet opened this issue Dec 17, 2022 · 8 comments

Comments

@marc-fouquet
Copy link

marc-fouquet commented Dec 17, 2022

I have just tested this script for the first time and the JSON file with the statuses is astonishingly huge, given that I only have a hand full of toots. There is a lot of redundancy in there, even simple ZIP compression reduces the size by about 90%.

There could be an option --compress that transparently adds compression when loading and saving these files using one of python's buildin compression modules.

@kensanata
Copy link
Owner

kensanata commented Dec 17, 2022 via email

@lapineige
Copy link

It doesn't matter much given the small size of the resulting text file, but may I advocate for a better compression algorithm than a ZIP ?
I'm thinking about Zstandard, it's widely supported now (but not as much as others if not on linux ?), very fast compression/decompression for a very good compression ratio, but anything else is fine :)

@kensanata
Copy link
Owner

kensanata commented Dec 17, 2022 via email

@lapineige
Copy link

Oh ok, for some reason I thought you were going to do it 😅

I actually don't care much, I store it compressed (filesystem compression using btrfs) anyway.

For the record, a ~1GB archive is compressed to around 100MB (using zstd), which makes quite a big difference 🙂

@kensanata
Copy link
Owner

kensanata commented Dec 17, 2022 via email

@lapineige
Copy link

It shouldn't be the point here anyway, it would be great if the json was stored compressed anyway.
I will see if I have time to implement this… Don't be too hopeful 😅

@kensanata
Copy link
Owner

I wonder whether this should be optional (or automatic: detect if a .gz variant already exists, and if it does, use that). I don't have a compressed filesystem, but if I had, I'm assuming I wouldn't want to have the data recompressed?

@lapineige
Copy link

I'm not sure it's a big deal. And most of the time the filesystem detects it's already compressed (in fact : not possible to compress) and skip it.

Also, it's a quite rare use case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants