Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataclass fixes #317

Merged
merged 19 commits into from
Aug 7, 2024
Merged

Dataclass fixes #317

merged 19 commits into from
Aug 7, 2024

Conversation

justin-richling
Copy link
Collaborator

@justin-richling justin-richling commented Jul 11, 2024

Fix issues from recent data class PR #269:

adf_info.py

  • Add check for existing variable file for climo year info.
  • Update hist_strs object.

global_latlon_map.py

  • Fix check for vertical levels, initially based on pres_levs instead of has_lev.
  • Update data grab to include obs.

zonal_mean.py

  • Fix check for vertical levels, and reorganize plotting block. Only log-p plots were being made for variables with vertical levels.
  • Update data grab to include obs.

adf_dataset.py

  • Reorganize methods into file categories: time series, climo, regrid and general data load methods.
  • These methods weren't working with obs, fix reference methods to correctly gather obs data.

tape_recorder.py:

justin-richling and others added 4 commits July 11, 2024 17:11
Add check for existing variable file for climo year info.

Update hist_strs object
Bring in changes from PR NCAR#316

Update time fix for old CESM time series files
@justin-richling justin-richling marked this pull request as draft July 12, 2024 16:20
@justin-richling justin-richling marked this pull request as ready for review July 15, 2024 17:02
@justin-richling justin-richling marked this pull request as draft July 15, 2024 17:02
Organize the adf_dataset methods into file type sections: time series, climo, and regridded.

These methods weren't working with obs, fix reference methods to correctly gather obs data.
Fix check for vertical levels, initially based on `pres_levs` instead of `has_lev`.

Update data grab to include obs
Fix check for vertical levels, and reorganize plotting block. Only log-p plots were being made for variables with vertical levels.

Update data grab to include obs
@justin-richling justin-richling marked this pull request as ready for review July 16, 2024 20:30
@justin-richling
Copy link
Collaborator Author

@brianpm I needed to make some adjustments to the methods in adf_dataset.py, let me know if you want to go over what I changed, thanks.

lib/adf_dataset.py Outdated Show resolved Hide resolved
@brianpm
Copy link
Collaborator

brianpm commented Jul 18, 2024

@justin-richling -- Let's discuss the changes. I think they are mostly fine.

There are two things that I'm not liking at first glance. First is the method load_obs_climo_dataset -- I don't see why we need to make "obs" a special method instead of only using "ref". If possible, I'd like to only ever deal with "ref" (and especially, I'd like script-writers to only have to worry about ref and never distinguish between obs and ref).

Second is related, and is the logic in load_da. My thinking is that load_da shouldn't care what kind of data is getting loaded. I'm thinking this logic should be pushed into the methods that call load_da. But it looks like you made the opposite determination and moved this logic into load_da... let's talk about it and decide how to deal with it.

lib/adf_info.py Outdated
Comment on lines 782 to 790
for var in var_list:
ts_files = sorted(input_location.glob(f"{case_name}*h0*.{var}.*nc"))
if ts_files:
print(var)
break
else:
logmsg = "get years for time series:"
logmsg = f"\tVar '{var}' not in dataset, skip to next to try and find climo years..."
self.debug_log(logmsg)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in case the first variable in the yaml file doesn't have a history file, we need to keep looking for the next available one.

Finalize fixes to keep to original data class workflow
@justin-richling
Copy link
Collaborator Author

@brianpm I've added the changes we discussed and cleaned up the methods. If you wouldn't mind giving adf_dataset.py another quick glance, we can probably pass this to @nusbaume for a final review. Thanks.

da = da * vres.get("scale_factor",1) + vres.get("add_offset", 0)
da.attrs['units'] = vres.get("new_unit", da.attrs.get('units', 'none'))

da = da * kwargs["scale_factor"] + kwargs["add_offset"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we probably just need to make this a bit more robust with something like
scale_factor = kwargs.get('scale_factor', 1)
add_offset = kwargs.get('add_offset', 0)
Just in case it gets used without the kwargs being passed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, maybe use get_defaults to do this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what I was attempting to do with get_defaults, it should be setting those to the defaults before it becomes the kwargs

I have the offset set to 0 and scale factor to 1 initially, then the check if variablename in res should grab the values and if not it will default to 0/1 respectively with vres.get.

And if the variable is not in self.adf.variable_defaults, the original values for new_unit, add_offset , and scale_factor will take on the original set values of none, 0, and 1.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @brianpm's original suggestion here. Right now if someone tried to use this function without passing in the scale_factor and add_offset keywords I believe the function would crash with a dictionary key error.

justin-richling and others added 2 commits July 31, 2024 08:21
Since the units need to be checked from the data array, the `new_units` arg needs to be in `load_da` method. `get_defaults` only gets metadata from variable defaults and not a data array
@justin-richling
Copy link
Collaborator Author

@nusbaume I think this is all good for your final review., let me know if you see anything that needs addressing.

Copy link
Collaborator

@nusbaume nusbaume left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but do have some (hopefully easy) change requests.

lib/adf_dataset.py Show resolved Hide resolved
"""Return Dataset for climo of variablename"""
fils = self.get_climo_file(case, variablename)
if not fils:
warnings.warn(f"ERROR: Did not find climo file for variable: {variablename}. Will try to skip.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would change ERROR to WARNING here, so that users realize that the ADF will keep going.

Suggested change
warnings.warn(f"ERROR: Did not find climo file for variable: {variablename}. Will try to skip.")
warnings.warn(f"WARNING: Did not find climo file for variable: {variablename}. Will try to skip.")

The same request holds for the other warnings in this file as well.

lib/adf_dataset.py Outdated Show resolved Hide resolved
lib/adf_dataset.py Outdated Show resolved Hide resolved
lib/adf_dataset.py Outdated Show resolved Hide resolved
lib/adf_dataset.py Outdated Show resolved Hide resolved
da = da * vres.get("scale_factor",1) + vres.get("add_offset", 0)
da.attrs['units'] = vres.get("new_unit", da.attrs.get('units', 'none'))

da = da * kwargs["scale_factor"] + kwargs["add_offset"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @brianpm's original suggestion here. Right now if someone tried to use this function without passing in the scale_factor and add_offset keywords I believe the function would crash with a dictionary key error.

lib/adf_dataset.py Outdated Show resolved Hide resolved
@justin-richling
Copy link
Collaborator Author

@nusbaume I believe I addressed all the suggestions. If you wouldn't mind just giving it one last glance, I think it's on its way, thanks!

Add check for value converters for data arrays
@justin-richling justin-richling merged commit 9a94260 into NCAR:main Aug 7, 2024
7 checks passed
@justin-richling justin-richling deleted the dataclass-fixes branch August 7, 2024 15:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants