Dataclass fixes #317

justin-richling · 2024-07-11T23:14:48Z

Fix issues from recent data class PR #269:

adf_info.py

Add check for existing variable file for climo year info.
Update hist_strs object.

global_latlon_map.py

Fix check for vertical levels, initially based on pres_levs instead of has_lev.
Update data grab to include obs.

zonal_mean.py

Fix check for vertical levels, and reorganize plotting block. Only log-p plots were being made for variables with vertical levels.
Update data grab to include obs.

adf_dataset.py

Reorganize methods into file categories: time series, climo, regrid and general data load methods.
These methods weren't working with obs, fix reference methods to correctly gather obs data.

tape_recorder.py:

Bring in changes from PR Update history string objects #316.
Update time fix for old CESM time series files via adf_dataset.py structure.

Add check for existing variable file for climo year info. Update hist_strs object

Bring in changes from PR NCAR#316 Update time fix for old CESM time series files

Organize the adf_dataset methods into file type sections: time series, climo, and regridded. These methods weren't working with obs, fix reference methods to correctly gather obs data.

Fix check for vertical levels, initially based on `pres_levs` instead of `has_lev`. Update data grab to include obs

Fix check for vertical levels, and reorganize plotting block. Only log-p plots were being made for variables with vertical levels. Update data grab to include obs

justin-richling · 2024-07-16T20:31:46Z

@brianpm I needed to make some adjustments to the methods in adf_dataset.py, let me know if you want to go over what I changed, thanks.

lib/adf_dataset.py

brianpm · 2024-07-18T17:52:11Z

@justin-richling -- Let's discuss the changes. I think they are mostly fine.

There are two things that I'm not liking at first glance. First is the method load_obs_climo_dataset -- I don't see why we need to make "obs" a special method instead of only using "ref". If possible, I'd like to only ever deal with "ref" (and especially, I'd like script-writers to only have to worry about ref and never distinguish between obs and ref).

Second is related, and is the logic in load_da. My thinking is that load_da shouldn't care what kind of data is getting loaded. I'm thinking this logic should be pushed into the methods that call load_da. But it looks like you made the opposite determination and moved this logic into load_da... let's talk about it and decide how to deal with it.

lib/adf_dataset.py

justin-richling · 2024-07-18T18:00:22Z

lib/adf_info.py

+        for var in var_list:
+            ts_files = sorted(input_location.glob(f"{case_name}*h0*.{var}.*nc"))
+            if ts_files:
+                print(var)
+                break
+            else:
+                logmsg = "get years for time series:"
+                logmsg = f"\tVar '{var}' not in dataset, skip to next to try and find climo years..."
+                self.debug_log(logmsg)


This is in case the first variable in the yaml file doesn't have a history file, we need to keep looking for the next available one.

Finalize fixes to keep to original data class workflow

justin-richling · 2024-07-22T20:36:45Z

@brianpm I've added the changes we discussed and cleaned up the methods. If you wouldn't mind giving adf_dataset.py another quick glance, we can probably pass this to @nusbaume for a final review. Thanks.

brianpm · 2024-07-29T13:51:22Z

lib/adf_dataset.py

-            da = da * vres.get("scale_factor",1) + vres.get("add_offset", 0)
-            da.attrs['units'] = vres.get("new_unit", da.attrs.get('units', 'none'))
+
+        da = da * kwargs["scale_factor"] + kwargs["add_offset"]


I think we probably just need to make this a bit more robust with something like
scale_factor = kwargs.get('scale_factor', 1)
add_offset = kwargs.get('add_offset', 0)
Just in case it gets used without the kwargs being passed.

Actually, maybe use get_defaults to do this?

That's what I was attempting to do with get_defaults, it should be setting those to the defaults before it becomes the kwargs

I have the offset set to 0 and scale factor to 1 initially, then the check if variablename in res should grab the values and if not it will default to 0/1 respectively with vres.get.

And if the variable is not in self.adf.variable_defaults, the original values for new_unit, add_offset , and scale_factor will take on the original set values of none, 0, and 1.

I agree with @brianpm's original suggestion here. Right now if someone tried to use this function without passing in the scale_factor and add_offset keywords I believe the function would crash with a dictionary key error.

Since the units need to be checked from the data array, the `new_units` arg needs to be in `load_da` method. `get_defaults` only gets metadata from variable defaults and not a data array

justin-richling · 2024-08-05T20:17:31Z

@nusbaume I think this is all good for your final review., let me know if you see anything that needs addressing.

nusbaume

Looks good, but do have some (hopefully easy) change requests.

lib/adf_dataset.py

nusbaume · 2024-08-06T12:17:02Z

lib/adf_dataset.py

+        """Return Dataset for climo of variablename"""
+        fils = self.get_climo_file(case, variablename)
+        if not fils:
+            warnings.warn(f"ERROR: Did not find climo file for variable: {variablename}. Will try to skip.")


I would change ERROR to WARNING here, so that users realize that the ADF will keep going.

Suggested change

warnings.warn(f"ERROR: Did not find climo file for variable: {variablename}. Will try to skip.")

warnings.warn(f"WARNING: Did not find climo file for variable: {variablename}. Will try to skip.")

The same request holds for the other warnings in this file as well.

lib/adf_dataset.py

nusbaume · 2024-08-06T12:55:44Z

lib/adf_dataset.py

-            da = da * vres.get("scale_factor",1) + vres.get("add_offset", 0)
-            da.attrs['units'] = vres.get("new_unit", da.attrs.get('units', 'none'))
+
+        da = da * kwargs["scale_factor"] + kwargs["add_offset"]


I agree with @brianpm's original suggestion here. Right now if someone tried to use this function without passing in the scale_factor and add_offset keywords I believe the function would crash with a dictionary key error.

lib/adf_dataset.py

justin-richling · 2024-08-06T15:58:44Z

@nusbaume I believe I addressed all the suggestions. If you wouldn't mind just giving it one last glance, I think it's on its way, thanks!

Add check for value converters for data arrays

justin-richling and others added 4 commits July 11, 2024 17:11

Update adf_info.py

ecb597d

Add check for existing variable file for climo year info. Update hist_strs object

Update tape_recorder.py

623e5bc

Bring in changes from PR NCAR#316 Update time fix for old CESM time series files

Grab new hist_str_list from case and baseline

7e5fe82

Merge branch 'main' into dataclass-fixes

dd62506

justin-richling marked this pull request as draft July 12, 2024 16:20

justin-richling marked this pull request as ready for review July 15, 2024 17:02

justin-richling marked this pull request as draft July 15, 2024 17:02

justin-richling added 7 commits July 16, 2024 13:05

Reorganize and clean up data set file

9ab6a1e

Organize the adf_dataset methods into file type sections: time series, climo, and regridded. These methods weren't working with obs, fix reference methods to correctly gather obs data.

Update baseline hist string if comparing obs

9db217c

Update global_latlon_map.py

5ee0cdb

Fix check for vertical levels, initially based on `pres_levs` instead of `has_lev`. Update data grab to include obs

update hist_strs list for comparing obs

c25e23f

Update global_latlon_map.py

25bb06f

Update zonal_mean.py

0223d67

Fix check for vertical levels, and reorganize plotting block. Only log-p plots were being made for variables with vertical levels. Update data grab to include obs

Remove print statement

cb31daf

justin-richling marked this pull request as ready for review July 16, 2024 20:30

justin-richling requested review from brianpm and nusbaume July 16, 2024 20:30

justin-richling added 2 commits July 16, 2024 14:32

Remove unnecessary import

a58eb5a

Merge branch 'main' into dataclass-fixes

9b2cbbf

justin-richling commented Jul 18, 2024

View reviewed changes

lib/adf_dataset.py Outdated Show resolved Hide resolved

justin-richling commented Jul 18, 2024

View reviewed changes

lib/adf_dataset.py Show resolved Hide resolved

justin-richling commented Jul 18, 2024

View reviewed changes

Update data class methods

4487d3e

Finalize fixes to keep to original data class workflow

Remove print statement

990edcf

brianpm reviewed Jul 29, 2024

View reviewed changes

justin-richling and others added 2 commits July 31, 2024 08:21

Merge branch 'main' into dataclass-fixes

df03c31

Pull new_units out of get_defaults

e64d304

Since the units need to be checked from the data array, the `new_units` arg needs to be in `load_da` method. `get_defaults` only gets metadata from variable defaults and not a data array

nusbaume requested changes Aug 6, 2024

View reviewed changes

justin-richling added 2 commits August 6, 2024 10:00

Fix github suggestions

3085b79

Add check for value converters for data arrays

Change ERROR to WARNING

7b844bc

justin-richling merged commit 9a94260 into NCAR:main Aug 7, 2024
7 checks passed

justin-richling deleted the dataclass-fixes branch August 7, 2024 15:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dataclass fixes #317

Dataclass fixes #317

justin-richling commented Jul 11, 2024 •

edited

Loading

justin-richling commented Jul 16, 2024

brianpm commented Jul 18, 2024

justin-richling Jul 18, 2024

justin-richling commented Jul 22, 2024

brianpm Jul 29, 2024

brianpm Jul 29, 2024

justin-richling Jul 29, 2024

nusbaume Aug 6, 2024

justin-richling commented Aug 5, 2024

nusbaume left a comment

nusbaume Aug 6, 2024

nusbaume Aug 6, 2024

justin-richling commented Aug 6, 2024

	warnings.warn(f"ERROR: Did not find climo file for variable: {variablename}. Will try to skip.")
	warnings.warn(f"WARNING: Did not find climo file for variable: {variablename}. Will try to skip.")

Dataclass fixes #317

Dataclass fixes #317

Conversation

justin-richling commented Jul 11, 2024 • edited Loading

justin-richling commented Jul 16, 2024

brianpm commented Jul 18, 2024

justin-richling Jul 18, 2024

Choose a reason for hiding this comment

justin-richling commented Jul 22, 2024

brianpm Jul 29, 2024

Choose a reason for hiding this comment

brianpm Jul 29, 2024

Choose a reason for hiding this comment

justin-richling Jul 29, 2024

Choose a reason for hiding this comment

nusbaume Aug 6, 2024

Choose a reason for hiding this comment

justin-richling commented Aug 5, 2024

nusbaume left a comment

Choose a reason for hiding this comment

nusbaume Aug 6, 2024

Choose a reason for hiding this comment

nusbaume Aug 6, 2024

Choose a reason for hiding this comment

justin-richling commented Aug 6, 2024

justin-richling commented Jul 11, 2024 •

edited

Loading