-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
better estimating the file size #275
Conversation
Some examples with it's discrepancies: Example 1:
Estimated: 3365 MB Example 2:
Estimated: 1211 MB -> this one has improved from an old estimation of 2400MB!! (so we are better now) Example 3:
Estimated: 16 kB -> this has improved from 5kb, which means we are better now! |
Maybe (seeing that all checks have passed) we could add some checks into files that we might already be downloading, to check that the estimated size and the final size doesn't diverge a threshold value? what do you think @renaudjester ? |
I created an issue in xarray, hopefully they have also some insights about the problem: pydata/xarray#9979 . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it's improved but not perfect right? And it doesn't solve all the use cases of CMT-175 I guess
Maybe I would like to add some tests that have tmp_path, as the function seems like is quite handmade. |
>= response["file_size"] * (1 - size_variance) - offset_size | ||
) | ||
assert response["file_size"] <= response["data_transfer_size"] | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you need this return?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know actually
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked it and it is more of a style thing. I also checked the repo and it looks like the 3 versions are on it. I'm used to this but we can standardise the use if you want to!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vale then good for me!
# compressed = False | ||
for variable in dataset.data_vars: | ||
variables_size += dataset[variable].encoding["dtype"].itemsize | ||
# compressed = True if "add_offset" in dataset[variable].encoding else False | ||
# if not compressed: | ||
# return dataset.nbytes / 1048e3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments
7a1aa12
to
57174e6
Compare
We discovered some incongruence between the size of the dataset and our estimation. Although it seems that the proposed new method is not bullet-proof either, it looks like it will be more consistent in the estimation.
Trying to solve issues CMT-175 and CMT-171.
📚 Documentation preview 📚: https://copernicusmarine--275.org.readthedocs.build/en/275/