-
Notifications
You must be signed in to change notification settings - Fork 276
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Find output by its attribute #4717
base: main
Are you sure you want to change the base?
Conversation
Quick input:
|
Not sure I completely agree with @neutrinoceros about limiting the flexibility. The question the other day about flash datasets with particles would suggest there may be times when we don't know the attribute someone will be looking for. Can you make it use "Price Is Right" rules? i.e., closest-without-going-over? |
I am not sure I understand what you mean by "Price Is Right" rule. Anyways, in its current state, it is already possible to query a dataset by attribute, but the method is hidden ( Let me know what you think! I can always strip the leading |
I mean, if I want the closet to z=1 but definitely on one side or the
other, for instance. So having 0.99 take precedence over 1.01. (or even
0.95 over 1.001)
…On Wed, Nov 1, 2023, 12:17 PM Corentin Cadiou ***@***.***> wrote:
I am not sure I understand what you mean by "Price Is Right" rule.
Anyways, in its current state, it is already possible to query a dataset by
attribute, but the method is hidden (_get_by_attribute), because “with
great power comes great responsibility”.
Let me know what you think! I can always strip the leading _ to make it
more user-facing, with an obvious comment in the docstring warning against
misuses?
—
Reply to this email directly, view it on GitHub
<#4717 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAVXOYJQUYAJMHRHRFESSDYCJ7ZHAVCNFSM6AAAAAA6S75QZ2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOBZGM2DKNZZGU>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Oh I see, yes, I'll implement a |
I have nothing against that "the price is right rule", but I'd like to point out that left-to-right reading order isn't culturally neutral. I'd prefer that the interface be more inclusive. |
@neutrinoceros I've implemented a "smaller", "larger" or "nearest" switch so that there is no ambiguity arising from left-to-right vs. right-to-left reading order. Thanks for pointing it out! |
I think the new language is better. That being said, it's not necessarily true that RTL languages have different number line concepts: There's also some evidence that filmmaking conventions (such as progression from the left side of the screen to the right) are also commonly used outside of left-to-right languages. I will say that with redshift the entire situation is odd because the numeric progression is reversed from temporal progression. And I did choose redshift as my example... So I guess this is on me! |
@yt-fido test this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The functionality is useful and the implementation is sound ! I have a couple questions and suggestions (mostly about type annotations)
# This will fail if no output is found within 100 Myr | ||
ds = ts.get_by_time((3, "Gyr"), tolerance=(100, "Myr")) | ||
# Get the output at the time right before and after 3 Gyr | ||
ds_before = ts.get_by_time((3, "Gyr"), side="smaller") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think the name of the argument could be clearer than side
. What about prefer="smaller"
?
yt/data_objects/time_series.py
Outdated
|
||
Parameters | ||
---------- | ||
key : str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is out of sync with the argument nam
yt/data_objects/time_series.py
Outdated
|
||
# Use a binary search to find the closest value | ||
iL = 0 | ||
iH = len(self._pre_outputs) - 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what H
stands for here. I would expect iL
and iR
(left/right) be used instead, can you explain ?
yt/data_objects/time_series.py
Outdated
else: | ||
raise ValueError( | ||
f"{dsL} and {dsH} have both {attribute}={vL}, cannot perform search. " | ||
"Try with another key." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to me like a poor suggestion, I'd rather have none at all.
"Try with another key." |
yt/data_objects/time_series.py
Outdated
@@ -2,13 +2,15 @@ | |||
import glob | |||
import inspect | |||
import os | |||
import typing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import typing | |
from typing import TYPE_CHECKING |
yt/data_objects/time_series.py
Outdated
@@ -28,6 +30,9 @@ | |||
parallel_root_only, | |||
) | |||
|
|||
if typing.TYPE_CHECKING: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if typing.TYPE_CHECKING: | |
if TYPE_CHECKING: |
@yt-fido test this please |
1 similar comment
@yt-fido test this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
I think the case of choosing a value exactly between two datasets needs to either explicitly handled or have its behavior defined in the docstring (see comments), but otherwise this is very useful!
def test_init_fake_dataseries(): | ||
file_list = [f"fake_data_file_{str(i).zfill(4)}" for i in range(10)] | ||
@pytest.fixture | ||
def FakeDataset(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this fixture seems potentially useful beyond the scope of just timeseries, you could consider moving it up to conftest.py
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see now that this PR is not introducing FakeDataset
but converting it to a pytest fixture. So probably not worth moving this up to conftest.py
... though now that I know it's here I might do so in the future if I want to use it :)
|
||
def _set_code_unit_attributes(self): | ||
return | ||
def test_get_by_key(FakeDataset, fake_datasets): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be good to test the preference
kwarg as well here.
# tear down to avoid possible breakage in following tests | ||
output_type_registry.pop("FakeDataset") | ||
with pytest.raises(ValueError): | ||
ts.get_by_redshift(1000, tolerance=0.1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: this actually causes the test to fail. picking a point exactly between two timesteps will always return the larger regardless of the prefer
value. See comment below.
ts.get_by_redshift(1000, tolerance=0.1) | |
ts.get_by_redshift(1000, tolerance=0.1) | |
assert sfile_list[0] == ts.get_by_redshift(1/2, prefer='smaller').filename | |
assert sfile_list[1] == ts.get_by_redshift(1/2, prefer='larger').filename |
dsL = dsR = dsM | ||
break | ||
|
||
if prefer == "smaller": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I'm not sure if this needs to happen up in the while
loop or down here... but I think you need to explicitly handle the case where the chosen value is exactly halfway between two timesteps, particularly when a value for prefer
has been provided. As this is, the larger value will always be returned when the provided value is halfway, regardless of prefer
(see my suggested change in test_time_series.py
). If you prefer the current behavior then I think the behavior should be documented in the docstring and maybe even a log message (in which case you can ignore my suggested change to the test).
Co-authored-by: Clément Robert <[email protected]>
pre-commit.ci autofix |
for more information, see https://pre-commit.ci
@yt-fido test this please |
@yt-fido Test this please |
PR Summary
This provides the ability to quickly find the dataset that has the closest value to a given one (e.g. time or redshift). Example
PR Checklist