-
Notifications
You must be signed in to change notification settings - Fork 84
How to create an empty tree with several var-length branches that share the var length #759
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Related: scikit-hep/awkward#1805 |
A workaround for my situation is this: with uproot.recreate(file) as f:
for ichunk in range(chunks):
px = []
py = []
for event in generator(chunk):
px.append(event.px)
py.append(event.py)
data = {"": ak.zip({"px": ak.Array(px), "py": ak.Array(py)})}
if "tree" in f:
f["tree"].extend(data)
else:
f["tree"] = data
|
Adopting {"n": "int32", "x": "n * float32", "y": "n * float32"} would be hard because About adding your workaround to the docs, I'm trying to figure out how it would fit in. Uproot currently has only one tutorial, the Getting Started Guide. I'm trying to understand your original problem well enough that this code block would be a helpful solution to the stated problem. Oh! I get it: you have two Awkward Arrays with {"n": "int32", "x": "var * float32", "y": "var * float32"} with TBranches {"": ak.zip({"x": x, "y": y})} (though I usually name the outer field, but this is fine). The question for documentation, then, is where it should go in that Getting Started Guide (because creating a new top-level tutorial for this would make people wonder why it's singled out like this). There's a Writing TTrees to a file section, then a Extending TTrees with large datasets section, then Specifying the compression. Probably before Specifying the compression. How about this? Ragged arrays with a shared "counter" TBranchOften, a computation on Awkward Arrays results in several jagged arrays that have the same list lengths, list by list. For example, suppose we have >>> pt = ak.Array([[0.0, 11, 22], [], [33, 44], [55], [66, 77, 88, 99]])
>>> eta = ak.Array([[0.0, 1.1, 2.2], [], [3.3, 4.4], [5.5], [6.6, 7.7, 8.8, 9.9]]) The lengths of all the lists in >>> ak.num(pt)
<Array [3, 0, 2, 1, 4] type='5 * int64'>
>>> ak.num(eta)
<Array [3, 0, 2, 1, 4] type='5 * int64'> which are the same. >>> ak.all(ak.num(pt) == ak.num(eta))
True Since dynamic-length arrays in ROOT require a "counter" branch (see Writing TTrees to a file, above), simply putting these jagged arrays into a TTree results in a separate "counter" branch for each array: >>> file["tree6"] = {"Muon_pt": pt, "Muon_eta": eta}
>>> file["tree6"].show()
name | typename | interpretation
---------------------+--------------------------+-------------------------------
nMuon_pt | int32_t | AsDtype('>i4')
Muon_pt | double[] | AsJagged(AsDtype('>f8'))
nMuon_eta | int32_t | AsDtype('>i4')
Muon_eta | double[] | AsJagged(AsDtype('>f8')) Only one "counter" branch is needed, so this is a waste of disk space as well as hiding the fact that In Awkward Array, jagged arrays with the same length lists can be zipped together with ak.zip, and Uproot recognizes zipped arrays as ones that can have a common "counter" branch. If you have this case, you'll most likely want to zip your jagged arrays together at some point before writing to a file. Here's an example of zipping immediately before writing to a file: >>> file["tree7"] = {"Muon": ak.zip({"pt": pt, "eta": eta})}
>>> file["tree7"].show()
name | typename | interpretation
---------------------+--------------------------+-------------------------------
nMuon | int32_t | AsDtype('>i4')
Muon_pt | double[] | AsJagged(AsDtype('>f8'))
Muon_eta | double[] | AsJagged(AsDtype('>f8')) If you need to declare the TTree before filling it using mktree (see Extending TTrees with large datasets, above), the type given by the zipped array is what you want: >>> muons = ak.zip({"pt": pt, "eta": eta})
>>> file.mktree("tree8", {"Muon": muons.type})
<WritableTree '/tree8' at 0x00011eda67d0>
>>> file["tree8"].extend({"Muon": ak.zip({"pt": pt, "eta": eta})})
>>> file["tree8"].show()
name | typename | interpretation
---------------------+--------------------------+-------------------------------
nMuon | int32_t | AsDtype('>i4')
Muon_pt | double[] | AsJagged(AsDtype('>f8'))
Muon_eta | double[] | AsJagged(AsDtype('>f8')) As a string, >>> print(muons.type)
5 * var * {pt: float64, eta: float64} so >>> file.mktree("tree8", {"Muon": "var * {pt: float64, eta: float64}"}) is equivalent to the above. That should cover it, right? (If so, resolving this issue would be a matter of copy-pasting the above into the Uproot documentation. I hope I don't have to convert it to reST...) |
I want to write several variable length arrays that have the same length to an empty tree. https://uproot.readthedocs.io/en/latest/basic.html#extending-ttrees-with-large-datasets explains
extend
ed later, I need thisWhat I am missing is the combination of the two. I want to start with an empty tree like in 1) that has branches which share the varlength. Then I want to extend this tree subsequently. Is this possible? If not, may I suggest the following intuitive API:
Instead of {"x": "var * float32", "y": "var * float32"}, which generates two extra branches "nx" and "ny", please allow this declaration, where the counting branch is explicit:
{"n": "int32", "x": "n * float32", "y": "n * float32"}
As the Python of Zen says, explicit is better than implicit.
The text was updated successfully, but these errors were encountered: