Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for daily output frequency #9

Closed
wants to merge 7 commits into from
Closed
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 18 additions & 4 deletions pangeo_forge_esgf/dynamic_kwargs.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,21 +13,35 @@
# TODO: I might not want to allow this in the ocean and ice fields. Lets see
)

daily_divisors = sorted(
[1, 30, 360, 365, 3653]
# TODO: It might be more efficient to default to the underlying file chunks?
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think so. Dealing with the underlying chunks, and outputting them in a way that makes sense to the user is one of the core tasks pangeo-forge aims to solve. These divisons should rather reflect scientifically useful units to e.g. avoid splitting e.g. a yearly chunk with a few days left, as this would make certain operations (annual average) much slower to compute.
These look good for now, but I was wondering about different calendars and how we could maybe influence which divisors to use depending on the calendar of the data.
Just thinking out loud here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that makes sense. As far as I know there isn't any ESGF metadata to describe the calendar though and we'd have to open the files to inspect the time coordinate metadata directly which would be pain.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think this is a problem to be pushed to later work.

duncanwp marked this conversation as resolved.
Show resolved Hide resolved
)

allowed_divisors = {
"Omon": monthly_divisors,
"SImon": monthly_divisors,
"Amon": monthly_divisors,
"day": daily_divisors,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would these dividers also work generally for other daily data (see here: https://github.com/WCRP-CMIP/CMIP6_CVs/blob/master/CMIP6_table_id.json)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I honestly have no idea, I didn't realise there were different day table_id and can't find any documentation on what they represent. They might be the same but it seems a bit redundant. Happy to include them or not depending on how much you want to protect against potential future issues.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try to check this PR out in the next few days, and maybe run some tests with the 'other' daily frequencies.

} # Add table_ids and allowed divisors as needed


def get_timesteps_simple(dates, table_id):
assert (
"mon" in table_id
) # this needs some more careful treatment for other timefrequencies.
timesteps = [
months = [
(int(d[1][0:4]) - int(d[0][0:4])) * 12 + (int(d[1][4:6]) - int(d[0][4:6]) + 1)
for d in dates
]
if "mon" in table_id:
timesteps = months
elif "day" in table_id:
timesteps = [
m * 30 for m in months
] # This is close enough without worrying about calendars etc.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment above. I guess dealing with calendars is a whole can of worms, but I am thinking it might be worth to put some of these thoughts into a dedicated issue to keep track of it? Again for now this seems fine for me!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that sounds like a plan

else:
# this needs some more careful treatment for other timefrequencies.
raise NotImplementedError(
"Only daily and monthly time frequencies are currently supported"
)
return timesteps


Expand Down