Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broadcast over CFVariable is very slow #21

Open
rafaqz opened this issue Apr 11, 2024 · 2 comments
Open

Broadcast over CFVariable is very slow #21

rafaqz opened this issue Apr 11, 2024 · 2 comments

Comments

@rafaqz
Copy link
Member

rafaqz commented Apr 11, 2024

In Rasters we hit a bug where I had forgotten to re-wrap CFVariable in our internal CFDiskArray made specifically for fixing this CommonDataModel bug.

This one small change made writing fiies 100x faster
https://github.com/rafaqz/Rasters.jl/pull/633/files

Broadcasting to/from a CFVariable means reading/writing for every single pixel

@asinghvi17
Copy link
Member

asinghvi17 commented Sep 11, 2024

I've seen similar issues with ZarrDatasets.jl where data is stored on S3. raster[:, :] uses the concurrent I/O but collect(raster) somehow takes 20x the time, and network download speed is 10x lower.

@rafaqz
Copy link
Member Author

rafaqz commented Sep 11, 2024

It's because CF variable is not a disk array so it doesn't have all the methods overrides to make these things chunked and fast.

The underlying Variable is, but you have to go around CFArray manually to get the chunks as on Rasters main, or just manually wrap Variable with custom CF as in Rasters cf branch.

It would be better long-term to make AbstractVariable <: AbstractDiskArray but I'm sure people are tired of hearing me say that ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants