-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Specification should provide guidance on non-required elements: flow
argument in values
/variances
and axis edges
#58
Comments
I am strongly against This probably wouldn't have come up if the parameter was named CC @HDembinski for visibility (though I'm guessing he watches this repo too). UHI intentionally does not specify We can (separately) specify recommendations like what what |
only in edge (forgive the pun) cases. Maybe it was a naive user understanding, but I thought I would be using I agree having |
Well, then, I withdraw the proposal in bullet point 1. (Bullet point 2 proposes to make hist's However, I still think that the preferred behavior of the |
I agree with Henry (that's rare!). I repeat that the most elegant API for handling the flow bins consistently is the following
Writing methods with a |
From scikit-hep/uproot5#580 (comment).
Uproot and hist both define a
flow
argument invalues
andvariances
(though the UHI specification does not require it). They are in agreement with how they use it when they produce histograms. Uproot can only produce histograms that have built-in overflow and underflow, and when either library has a histogram with a built-in flow,flow=True
includes those flow bins andflow=False
does not include those flow bins in the output. (That is,flow
affects theshape
of the return value ofvalues
andvariances
.)When a hist histogram does not have built-in flow bins, the
flow
argument does nothing:flow=True
andflow=False
both return an array of the sameshape
, representing the bin contents without flow bins.This is not what Uproot assumes when it consumes histograms:
https://github.com/scikit-hep/uproot4/blob/c968c5fa26047f0e21614854b98e07de7c4789c8/src/uproot/writing/identify.py#L261-L280
In the above, the
try
block does the wrong thing forobj
without flow bins, because I assumed thatobj.values(flow=True)
would create flow bins of value zero. (That's what the fallback code does for cases in which theflow
argument does not exist andvalues()
is equivalent tovalues(flow=False)
.) Because of that mismatch of assumptions, @andrzejnovak noticed that saving hist histograms without flow bins corrupted them and submitted PR scikit-hep/uproot5#580.Along the way, we found another discrepancy between Uproot and hist: Uproot's
axes[0].edges
is a method that takes avalues
argument and hist'saxis[0].edges
is a property that acts like Uproot'saxis[0].edges(flow=False)
. This discrepancy is unconnected to the first.I propose the following: even though the UHI specification does not require a
flow
argument invalues
/variances
oraxes[0].edges
, it should say what they should be if you do choose to implement them. They remain optional, but their behavior, if implemented, would be specified. I also propose that they be specified this way:values
/variances
has aflow
argument, thenflow=True
should return an array with flow bins andflow=False
should return an array without flow bins regardless of whether the object has built-in flow bins. This choice minimizes the information a user (such as Uproot) needs to know about the object to use it correctly. When the object does not have built-in flow values,flow=True
should create them with value equal to zero. (That's a weaker preference: value equal to nan would also make sense, though nans are contagious.)edges
(or labels, intervals, centers, or widths), they should be properties without flow bins. This is how hist is implemented, not how Uproot is currently implemented. The reason I'm deciding against Uproot's implementation is (a) the extra bins that are added whenflow=True
are constants:-inf
,inf
, the word"underflow"
, or"overflow"
, and (b) it would be complicated to specify "If implementingedges
AND implementingflow
invalues
/variances
, THEN implementflow
inedges
in such-and-such a way." Conceding to hist's behavior where Uproot and hist diverge makes the specification simpler.And then we can think about backward compatibility. This is an API breaking change, so it at least needs a minor version number.
The text was updated successfully, but these errors were encountered: