Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Breaking: Standardise cf standards and missingval handling #695

Open
wants to merge 25 commits into
base: main
Choose a base branch
from
Open

Conversation

rafaqz
Copy link
Owner

@rafaqz rafaqz commented Jul 12, 2024

This PR standardises cf and missingval behaviour accross all sources.

New keywords are:

  • cf=true: apply offset and scale if the are not zero and one.
  • maskingval=missing this is the value the missingval will be converted to if there is one.

I don't like either of these names! if you can think of anything better please suggest something.

@felixcremer @meggart this is the new diskarrays object that does 2-way cf and missingval/maskingval conversions

https://github.com/rafaqz/Rasters.jl/compare/cf?expand=1#diff-1478a88a8bdcffcf7f556cc1865dbe18d69b5c03f75b4481e1f5889e8a6b6b73

@rafaqz rafaqz changed the title Cf Standardise cf standards and missingval handling Jul 12, 2024
@rafaqz rafaqz changed the title Standardise cf standards and missingval handling Breaking: Standardise cf standards and missingval handling Jul 15, 2024
@Rapsodia86
Copy link

Rapsodia86 commented Jul 28, 2024

How about inverting it? Instead of "cf" call it "raw". If raw=false, offset and scale are applied, if raw=true, offset and scale are NOT applied?

And raw=true would be the default?

Maybe "navalue" instead of "maskingval"?

@rafaqz
Copy link
Owner Author

rafaqz commented Jul 28, 2024

Currently I've got scaled=true as the default instead of cf. raw might suggest the missing value is also not transformed?

Main concern with navalue is it sounds like R, and isn't really used in julia

@Rapsodia86
Copy link

Yes, you are right that with "raw" one may expect that data were not touched at all.

Indeed, "navalue" originates from my heavy R usage! And I agree NA is not very julia way.

Point taken:)

@rafaqz
Copy link
Owner Author

rafaqz commented Jul 28, 2024

Thanks for the input though, good to have all the options to choose from

@rafaqz
Copy link
Owner Author

rafaqz commented Aug 13, 2024

Coming back to this I like how short and obvious raw=true is!

But to mean no scaling and no masking - so it will override everything else.

Copy link

codecov bot commented Aug 17, 2024

Codecov Report

Attention: Patch coverage is 82.84251% with 134 lines in your changes missing coverage. Please review.

Project coverage is 81.79%. Comparing base (570fd17) to head (54056cb).
Report is 9 commits behind head on main.

Files Patch % Lines
src/modifieddiskarray.jl 68.61% 43 Missing ⚠️
src/utils.jl 80.48% 32 Missing ⚠️
ext/RastersZarrDatasetsExt/zarrdatasets_source.jl 0.00% 13 Missing ⚠️
...imensionalDataExt/GeometryOpsDimensionalDataExt.jl 0.00% 8 Missing ⚠️
ext/RastersArchGDALExt/gdal_source.jl 89.55% 7 Missing ⚠️
src/stack.jl 85.71% 7 Missing ⚠️
src/write.jl 75.00% 7 Missing ⚠️
ext/RastersNCDatasetsExt/ncdatasets_source.jl 84.61% 4 Missing ⚠️
src/sources/commondatamodel.jl 92.85% 4 Missing ⚠️
src/create.jl 96.10% 3 Missing ⚠️
... and 5 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #695      +/-   ##
==========================================
- Coverage   82.59%   81.79%   -0.80%     
==========================================
  Files          60       64       +4     
  Lines        4510     4884     +374     
==========================================
+ Hits         3725     3995     +270     
- Misses        785      889     +104     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -0,0 +1,20 @@
module GeometryOpsDimensionalDataExt
Copy link
Contributor

@asinghvi17 asinghvi17 Sep 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we close the PR that does this in GeometryOps?

Copy link
Owner Author

@rafaqz rafaqz Sep 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I must have added that by mistake lol

src/create.jl Outdated Show resolved Hide resolved
src/create.jl Outdated Show resolved Hide resolved
src/create.jl Outdated Show resolved Hide resolved

"""
create([f!], [filename], template; kw...)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When should I use this as opposed to Raster(data, dims; ...)? What are the advantages beyond efficiency in writing to file?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or is the direct constructor going to call create?

Copy link
Owner Author

@rafaqz rafaqz Sep 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well Raster(filename) reads an existing file... create(filename) creates one.

So this is best for building something new from scratch, Raster is best for opening something, rewrapping a DimArray, etc. Otherwise you need to define the Raster in-memory then write it as separate steps.

(create is already called in a lot of places under the hood where a new file might be created, like in nearly all methods with a filename keyword. It's been here for years, just not publicly)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, I guess I'm not entirely clear on what to use when creating a purely in memory dataset. Does the pure Raster(data, dims; kwargs...) constructor have the same options? Should it? That's the most intuitive option IMO to create an in memory raster...

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it will be pretty much identical for in-memory rasters. But if you want chunks and e.g. compression options you have to do it separately in write. An in-memory raster doesn't have chunks or options like that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweet that sounds good. Will add a suggestion for a line in the docstring to clarify.

rafaqz and others added 2 commits September 3, 2024 11:10
This fixes a strange issue I was having where Rasters.jl read the wrong values from a Kerchunk/Zarr dataset.
Comment on lines +147 to +148
To ignore `scale` and `offset` metadata, use `scaled=false`. If `scale` and
Note: `offset` are `1.0` and `0.0` they will be ignored and the original type will
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To ignore `scale` and `offset` metadata, use `scaled=false`. If `scale` and
Note: `offset` are `1.0` and `0.0` they will be ignored and the original type will
To ignore `scale` and `offset` metadata, use `scaled=false`. Note: If `scale`
and `offset` are `1.0` and `0.0` they will be ignored and the original type will


const COERCE_KEYWORD = """
- `coerce`: where `scale` and/or `offset` are present during `setindex!` to disk,
coerce values to the disk type. `convert` is the default, but `round`, `trunc` or
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the disk type?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh it should be eltype

Comment on lines 166 to 170
- `maskingval`: A value to convert `missingval` to, by default `missing`. If this is set it
will be the return value of `missingval(raster)` - `maskingval` becomes the new `missingval`.
Setting `maskingval` to `nothing` means no masking will occur, and the original `missingval`
will be the final `missingval`. This can give better performance than using `missing`.
Another efficient option is to use e.g. `zero(eltype(raster))` to replace missing values with zero.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see why you picked this but I'm not totally happy about the name. Still, I can't see any other nice options. missingval would have been nice but it's already in use. I'd say we should go for it. A sufficiently good docstring will clarify all problems :P

Maybe emphasize that this does what replace_missing does?

Suggested change
- `maskingval`: A value to convert `missingval` to, by default `missing`. If this is set it
will be the return value of `missingval(raster)` - `maskingval` becomes the new `missingval`.
Setting `maskingval` to `nothing` means no masking will occur, and the original `missingval`
will be the final `missingval`. This can give better performance than using `missing`.
Another efficient option is to use e.g. `zero(eltype(raster))` to replace missing values with zero.
- `maskingval`: A value to convert `missingval` to, by default `missing`. If this is set it
will be the return value of `missingval(raster)` - `maskingval` becomes the new `missingval`.
This is the same thing that `replace_missing(raster, some_value)` does.
Setting `maskingval` to `nothing` means no masking will occur, and the original `missingval`
will be the final `missingval`. This can give better performance than using `missing`.
Another efficient option is to pass e.g. `zero(eltype(raster))` to replace missing values with zero.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also what CommonDataModel uses so there's precedent... But sharing the mask part with the mask function is kinda weird for me


const WRITE_MISSINGVAL_KEYWORD = """
- `missingval`: set the missing value (i.e. FillValue / nodataval) of the written raster,
as Julias `missing` cannot be stored. If not passed in, `missingval` will be detected
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
as Julias `missing` cannot be stored. If not passed in, `missingval` will be detected
as Julia's `missing` cannot be stored. If not passed in, `missingval` will be detected

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants