Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for avoiding serialization error when datetimes are > 2263 #6654

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

frode-aarstad
Copy link
Contributor

@frode-aarstad frode-aarstad commented Nov 23, 2023

Issue
Resolves

Approach
We now use string not datetime to represent time values in responses and observations.
We convert responses before we write them to file and convert observations after reading them from file.

Pre review checklist

  • Read through the code changes carefully after finishing work
  • Make sure tests pass locally (after every commit!)
  • Prepare changes in small commits for more convenient review (optional)
  • PR title captures the intent of the changes, and is fitting for release notes.
  • Updated documentation
  • Ensured that unit tests are added for all new behavior (See
    Ground Rules),
    and changes to existing code have good test coverage.

Pre merge checklist

  • Added appropriate release note label
  • Commit history is consistent and clean, in line with the contribution guidelines.

@frode-aarstad frode-aarstad self-assigned this Nov 23, 2023
@frode-aarstad frode-aarstad marked this pull request as draft November 23, 2023 14:05
@frode-aarstad frode-aarstad force-pushed the datetime-problem branch 3 times, most recently from 7fa7d4c to bf4a3f7 Compare November 24, 2023 13:56
@frode-aarstad frode-aarstad marked this pull request as ready for review November 27, 2023 07:21
@codecov-commenter
Copy link

codecov-commenter commented Nov 27, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 84.37%. Comparing base (91ea363) to head (0ef0944).
Report is 924 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #6654   +/-   ##
=======================================
  Coverage   84.37%   84.37%           
=======================================
  Files         367      367           
  Lines       21856    21862    +6     
  Branches      900      900           
=======================================
+ Hits        18440    18446    +6     
  Misses       3122     3122           
  Partials      294      294           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -217,8 +217,15 @@ def load_responses(
if not input_path.exists():
raise KeyError(f"No response for key {key}, realization: {realization}")
ds = xr.open_dataset(input_path, engine="scipy")
if "time" in ds.coords:
ds.coords["time"] = [
datetime.fromisoformat(str(e.values).split(".", maxsplit=1)[0])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we losing the possibility for millisecond accuracy here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes with this implementation we do. Do we really need milliseconds? I think we might be able to also keep ms but nanoseconds are not possible

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uncertain whether we "need" it, but Eclipse can output milliseconds. Whether resdata supports it I am not quite sure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we perhaps do the millisecond truncation in a more canonical way and not by string manipulation? Parsing the ISO string as is, and then set the millisecond part to zero using the datetime api?

@frode-aarstad frode-aarstad marked this pull request as draft November 29, 2023 15:52
@frode-aarstad frode-aarstad marked this pull request as ready for review December 1, 2023 09:26
Copy link
Collaborator

@oyvindeide oyvindeide left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a good change! How will this work with older versions of storage? I guess we will probably need a migration for it?

@@ -329,6 +329,12 @@ def _get_obs_and_measure_data(
}
observation = observation.sel(sub_selection)
ds = source_fs.load_responses(group, tuple(iens_active_index))

if "time" in observation.coords:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we potentially encode this information in the xarray dataset, so we can do the conversion when loading it? Then it can be more generic, so we can potentially have other axis named something other then time, and also benefit from this improvement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite sure what you mean here. I guess we could add an attr to xarray.dataset that indicates which columns that have been encoded. Ie: attr({"column_name" : "datetime_to_string", ..})

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, something like that, so we can do: if datetime_to_string in atts: ...

continue
print(line, end="")

facade = LibresFacade.from_config_file("snake_oil.ert")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we should avoid LibresFasade here in light of removing it in #6687

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still need it for facade.load_from_forward_model below

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be a stupid question, but why? I just tried and the test is passing without this line as well

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you want to check that it can be loaded without errors, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I might be missing something, but the test does not fail when I add it to main and run 🤷‍♀️ I would expect it to fail without your fix, am I doing something wrong?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your observations are correct. I have added another assert to check that we actually load whats written to the runpath


facade = LibresFacade.from_config_file("snake_oil.ert")
realisation_number = 0
storage = open_storage(facade.enspath, mode="w")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
storage = open_storage(facade.enspath, mode="w")
storage_path = ErtConfig.from_file("snake_oil.ert").ens_path
storage = open_storage(storage_path, mode="w")

@frode-aarstad frode-aarstad force-pushed the datetime-problem branch 4 times, most recently from 0c96124 to 9724536 Compare January 9, 2024 08:27
observation.name: xr.open_dataset(observation, engine="scipy")
for observation in observations
}
dict = {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dict is a built-in keyword in Python so I think we should avoid using it.

ds = xr.open_dataset(observation, engine="scipy")
if "time" in ds.coords:
ds.coords["time"] = [
t[:-3] for t in ds.coords["time"].values.astype(str)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a comment I think.

@frode-aarstad frode-aarstad force-pushed the datetime-problem branch 2 times, most recently from 953b0f1 to 5667e39 Compare January 18, 2024 11:26
Copy link
Contributor

@dafeda dafeda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great 👍

@@ -305,13 +305,15 @@ def _get_obs_and_measure_data(
for obs_key, obs_active_list in selected_observations:
observation = observations[obs_key]
group = observation.attrs["response"]

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be missing something here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its just an extra line

@dafeda
Copy link
Contributor

dafeda commented Mar 13, 2024

What's the status of this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants