Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAS-2155 - Merge datatree documentation into main docs. #9033

Merged
merged 76 commits into from
Sep 12, 2024

Conversation

owenlittlejohns
Copy link
Contributor

This PR migrates documentation for datatree from xarray/datatree_/docs into the main doc directory.

It seems a little iffy that a bunch of the references to the DataTree class, associated methods and other migrated functions all need to point to lower level locations (such as xarray.core.datatree.DataTree. When DataTree and friends are added to the public xarray API these references can be significantly simplified. This makes me wonder if we should couple the documentation change with that ability to access these classes from the main public API? (And if so, whether this PR is the time to do it?)

  • Completes documentation migration step of Track merging datatree into xarray #8572
  • Tests added
  • User visible changes (including notable bug fixes) are documented in whats-new.rst
  • New functions/methods are listed in api.rst

@keewis
Copy link
Collaborator

keewis commented May 19, 2024

I would be fine with adding DataTree and open_datatree to the top-level module in this PR: adding documentation for DataTree to me means that we're publicly exposing it.

The only alternative I can think of would be to do this in a separate PR and merge that, then sync this PR with main.

@TomNicholas
Copy link
Contributor

I agree with @keewis

The only alternative I can think of would be to do this in a separate PR and merge that, then sync this PR with main.

This might a little neater, but the result is the same either way.

@TomNicholas TomNicholas added the topic-DataTree Related to the implementation of a DataTree class label May 20, 2024
@owenlittlejohns
Copy link
Contributor Author

@TomNicholas - So the alternative here is exposing things in the public API, then updating this PR to have the correct references and merging this afterwards? (Just checking I read your last comment correctly)

Maybe I'm being a bit keen, but I'm tempted to keep both pieces (adding to the main API and including DataTree in the documentation) in the same PR to keep the documentation and public API in-sync.

@owenlittlejohns
Copy link
Contributor Author

I have added open_datatree, DataTree, register_datatree_accessor, map_over_subtree and assert_isomorphic to the public API. The DataTree public API has some other exceptions in it (TreeIsomorphismError, InvalidTreeError and NotFoundInTreeError), but I was going by the things listed in #8572.

Copy link
Contributor

@flamingbear flamingbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are lots of little bitty things, but this looks great and the docs read well.

doc/roadmap.rst Outdated Show resolved Hide resolved
doc/user-guide/datatree.rst Outdated Show resolved Hide resolved
doc/user-guide/data-structures.rst Outdated Show resolved Hide resolved
doc/user-guide/data-structures.rst Outdated Show resolved Hide resolved
doc/user-guide/data-structures.rst Outdated Show resolved Hide resolved
doc/user-guide/hierarchical-data.rst Show resolved Hide resolved
doc/user-guide/hierarchical-data.rst Show resolved Hide resolved
doc/user-guide/io.rst Show resolved Hide resolved
doc/user-guide/io.rst Outdated Show resolved Hide resolved
doc/user-guide/terminology.rst Outdated Show resolved Hide resolved
Copy link
Contributor Author

@owenlittlejohns owenlittlejohns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@flamingbear - I've updated in most places you've suggested. Thanks for going through this so rigorously!

doc/user-guide/io.rst Outdated Show resolved Hide resolved
doc/user-guide/data-structures.rst Outdated Show resolved Hide resolved
doc/user-guide/data-structures.rst Show resolved Hide resolved
doc/user-guide/io.rst Show resolved Hide resolved
@TomNicholas
Copy link
Contributor

(TreeIsomorphismError, InvalidTreeError and NotFoundInTreeError)

These should probably be added to the api.rst page after MergeError.

@owenlittlejohns
Copy link
Contributor Author

@TomNicholas - to clarify your previous comment about those exceptions: would you like those 3 exceptions added to the public xarray API (TreeIsomorphismError, InvalidTreeError, NotFoundInTreeError)? Or would you just want them added to api.rst with their full current paths (pointing to e.g., xarray.core.treenode.InvalidTreeError).

I can definitely add them to the public API, but hadn't as those exceptions were not listed in #8572.

@TomNicholas
Copy link
Contributor

TomNicholas commented May 28, 2024

@owenlittlejohns however it's already done for xarray's MergeError (which I happen to remember is listed in the public api.rst page). How exactly we do this isn't particularly important (hence why I forgot to include it in #8572), we just want to be consistent with other error types that we already have exposed publicly.

@owenlittlejohns
Copy link
Contributor Author

Thanks for the guidance. I just added those three exceptions to the public API, so they are now consistent with MergeError.

flamingbear
flamingbear previously approved these changes Jun 6, 2024
@flamingbear
Copy link
Contributor

flamingbear commented Jun 6, 2024

I approved the changes here, but I don't know how to mark this as waiting for an event. (numpy 2)

@TomNicholas
Copy link
Contributor

This is mergable, but unfortunately contingent on the outcome of #9077. Perhaps we should think about exactly how much of the docs wording would need to be changed if #9077 were accepted...

@flamingbear
Copy link
Contributor

flamingbear commented Sep 12, 2024

We should alter some of these examples slightly when a solution to #9475 is merged, and I have a few small comments, but I would like to merge this now and then fix those docs along with the code fixes, as long as @shoyer is okay with it!

To be crystal clear. We can merge this and update the docs with small tweaks when #9475 is updated?

Also to add the open_goups and other additions?

I'm 100% for this but only if others are ok with that.

Copy link
Member

@shoyer shoyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! Well done @flamingbear

I left a handful of mostly minor suggestions. I'm totally fine leaving some of these for follow-up work if you feel ready to merge this now.

doc/user-guide/data-structures.rst Outdated Show resolved Hide resolved
doc/user-guide/data-structures.rst Outdated Show resolved Hide resolved
doc/user-guide/data-structures.rst Outdated Show resolved Hide resolved
doc/user-guide/data-structures.rst Outdated Show resolved Hide resolved
doc/user-guide/data-structures.rst Outdated Show resolved Hide resolved
doc/user-guide/hierarchical-data.rst Show resolved Hide resolved
doc/user-guide/hierarchical-data.rst Outdated Show resolved Hide resolved
possess those dimensions, these dimensions will not be present when that
file is opened as a DataTree object.
Saving this DataTree object to file will therefore not preserve these
"unused" dimensions.
Copy link
Member

@shoyer shoyer Sep 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may also be a good place to mention xarray's non-support for coordinates that conflict between parent and child nodes, and pointing users to open_groups as the recommended way to open such datasets.

Copy link
Contributor

@flamingbear flamingbear Sep 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is the right place for that. This is under modifying existing zarr stores. They will always be aligned based on their data model? and this warning makes it sound like you're already changing the data groups?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the point here is that somewhere in the IO docs for netCDF and Zarr there should be info or links to info about how open_datatree is not guaranteed to roundtrip but open_groups is.

doc/whats-new.rst Outdated Show resolved Hide resolved
doc/whats-new.rst Outdated Show resolved Hide resolved
@flamingbear
Copy link
Contributor

Last comments to address are being added here so when I merge this we can find them easily:

This is ready but for these. Let's do them off of main and merge this.

@flamingbear flamingbear merged commit 18e5c87 into pydata:main Sep 12, 2024
28 checks passed
@flamingbear flamingbear deleted the DAS-2155-merge-datatree-docs branch September 12, 2024 21:59
@TomNicholas
Copy link
Contributor

Amazing work @flamingbear ! Hopefully we can now return to a saner branch structure for the final push!

@TomNicholas TomNicholas mentioned this pull request Sep 15, 2024
4 tasks
dcherian added a commit to dcherian/xarray that referenced this pull request Sep 17, 2024
* main: (26 commits)
  Forbid modifying names of DataTree objects with parents (pydata#9494)
  DAS-2155 - Merge datatree documentation into main docs. (pydata#9033)
  Make illegal path-like variable names when constructing a DataTree from a Dataset (pydata#9378)
  Ensure TreeNode doesn't copy in-place (pydata#9482)
  `open_groups` for zarr backends (pydata#9469)
  Update pyproject.toml (pydata#9484)
  New whatsnew section (pydata#9483)
  Release notes for v2024.09.0 (pydata#9480)
  Fix `DataTree.coords.__setitem__` by adding `DataTreeCoordinates` class (pydata#9451)
  Rename DataTree's "ds" and "data" to "dataset" (pydata#9476)
  Update DataTree repr to indicate inheritance (pydata#9470)
  Bump pypa/gh-action-pypi-publish in the actions group (pydata#9460)
  Repo checker (pydata#9450)
  Add days_in_year and decimal_year to dt accessor (pydata#9105)
  remove parent argument from DataTree.__init__ (pydata#9465)
  Fix inheritance in DataTree.copy() (pydata#9457)
  Implement `DataTree.__delitem__` (pydata#9453)
  Add ASV for datatree.from_dict (pydata#9459)
  Make the first argument in DataTree.from_dict positional only (pydata#9446)
  Fix typos across the code, doc and comments (pydata#9443)
  ...
dcherian added a commit to dcherian/xarray that referenced this pull request Sep 17, 2024
* main:
  Opt out of floor division for float dtype time encoding (pydata#9497)
  fixed formatting for whats-new (pydata#9493)
  Forbid modifying names of DataTree objects with parents (pydata#9494)
  DAS-2155 - Merge datatree documentation into main docs. (pydata#9033)
  Make illegal path-like variable names when constructing a DataTree from a Dataset (pydata#9378)
  Ensure TreeNode doesn't copy in-place (pydata#9482)
  `open_groups` for zarr backends (pydata#9469)
  Update pyproject.toml (pydata#9484)
  New whatsnew section (pydata#9483)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-DataTree Related to the implementation of a DataTree class topic-documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants