perf: removing output arrays from reference cycles so they don't have to wait for GC #1305

jpivarski · 2024-09-25T20:36:02Z

No description provided.

martindurant · 2024-10-11T17:43:25Z

--- a/src/uproot/behaviors/TBranch.py
+++ b/src/uproot/behaviors/TBranch.py
@@ -212,10 +212,12 @@ def iterate(
                             report = report.to_global(global_offset)
                             popper = [arrays]
                             del arrays
+                            del item
                             yield popper.pop(), report

                         else:
                             popper = [library.global_index(item, global_offset)]
+                            del item
                             yield popper.pop()

produces objgraph output

I'm happy to run your memory-over-time graph with this change, if you have a script. The inner code from the slack thread was this - you did memray run?

import gc
import uproot

filenames = [{"/tmp/zlib9-jagged0.root": "tree"}] * 5

for batch in uproot.iterate(filenames, step_size="1 GB"):
    # gc.collect()
    print(repr(batch))

jpivarski · 2024-10-11T18:57:09Z

I can't believe I missed that! If I'm reading this right, item is the second-to-last step in the transformation that produces the final arrays (by mapping a local index to a global one).

I don't have scripts to produce the memray plots from Slack—I'd lose them anyway, if I kept them—so the messages in Slack include the code.

jpivarski · 2024-10-11T18:58:39Z

I made a to-do item to finish this off. I'll include your diff.

martindurant · 2024-10-11T19:10:36Z

My loop for lowest memory usage was like this:

for i, batch in enumerate(uproot.iterate(filenames, step_size="1 GB")):
    # gc.collect()
    if i:
        objgraph.show_growth()
    else:
        objgraph.growth()
    print(repr(batch))
    del batch

note the final del, means only one array is live at a time, otherwise the previous one doesn't get cleaned up until the next iteration is ready to be assigned into batch
including objgraph in the loop magically stops leak/accumulation; does it cause a gc collect? Without objgraph, I still get big memory growth.
objgraph n a stable run shows growth as follows, but most iterations have no growth at all

ReferenceType                  4400       +23
builtin_function_or_method     4269       +23
tuple                         12593        +1

(so not very useful; I don't know what ReferenceType is)

I am thinking there might be many of these non-functional-programming invocations down the graph (with or without generators) that is making refcounting hard for python.

jpivarski · 2024-10-11T21:04:13Z

including objgraph in the loop magically stops leak/accumulation; does it cause a gc collect?

I discovered the same thing with Pympler: it was clearly calling gc.collect() internally. Either that or it allocated so many temporaries that it triggered GC collection the normal way.

jpivarski · 2024-10-14T16:45:27Z

Well, I added that commit (7cfc360), but it didn't fix the original memory issue.

Here's memray of

import gc
import uproot

filenames = [{"/tmp/zlib9-jagged0.root": "tree"}] * 5

for batch in uproot.iterate(filenames, step_size="1 GB"):
    # gc.collect()
    print(repr(batch))

in this PR, before the commit:

Here's after the commit:

And, just for reference, here it is with the gc.collect() line uncommented (i.e. explicitly collect garbage in every iteration).

Even if objgraph isn't showing it, the big arrays are still held until garbage collection.

I'm setting up this PR to be merged, anyway, since it's a step forward, at least.

…t-arrays-2

martindurant · 2024-10-14T17:36:28Z

Yeah; I spent some time trying to find out where the cycle(s) is, but no luck. The code goes deep, and I couldn't figure out which tbranch/basket/etc refer to one-enother. Doing more pop()-tricks helps remove one object from a generator at a time, but doesn't account for accumulation of cyclic references.

pfackeldey · 2024-10-30T13:23:09Z

Some more investigations for cyclic references in uproot:

using library="np" with uproot.iterate doesn't really change anything -> the cyclic ref(s) are definitely in uproot (maybe also in awkward though)
the refcycle package allows to visualize the (strong) cyclic references.
Run the script below with uv run --with refcycle,uproot uproot_refcycles.py:

# uproot_refcycles.py
import gc
import uproot
import refcycle


filenames = [{"~/Downloads/zlib9-jagged0.root": "tree"}] * 5


def loop():
    for i, batch in enumerate(uproot.iterate(filenames, step_size="1 GB", library="np")):
        # let's just run 2 iterations
        if i:
            break


if __name__ == "__main__":
    graph = refcycle.creators.cycles_created_by(loop)

    # this filters the graph by the subgraphs that can't be 
    # properly cleaned up by GC because they contain strong cyclic refs
    cyclic_refs = graph.source_components()

    # plot the subgraphs that contain problematic cyclic refs
    for i, cyclic in enumerate(cyclic_refs):
        cyclic.export_image(f"cyclic_{i}.pdf")

as an output we get 3 plots (attached).
cyclic_0.pdf
cyclic_1.pdf
cyclic_2.pdf

I'm not sure how to proceed from here, especially given how cyclic_0.pdf looks; am I misunderstanding something here?

martindurant · 2024-10-30T13:31:12Z

So it is TTree and TBranch. This is the half-conclusion I had come to too, but I couldn't see it. Branch referes back to its Tree?? Each of the Att objects in the list hanging off Tree refer back to it??

The closure ones were already seen by @jpivarski and maybe fixed by the yield pop trick.

jpivarski · 2024-10-30T14:26:06Z

TBranch does refer back to its TTree. Since more data might be read from an open ROOT file at any time, and those reads may be triggered by an object that came from a ROOT file, all extracted objects point back to the thing they were extracted from:

Source (low-level byte-reading) ← ReadOnlyFile ← ReadOnlyDirectory ← TTree ↔ TBranch

That's how it's possible to read an array by calling TBranch.array: it has a path back to the Source, which is what gets bytes from disk. In the above chain, leftward arrows are references, but getting data in the rightward direction (e.g. reading a TTree from a ReadOnlyDirectory) is usually not a reference but a read (ReadOnlyDirectory produces a TTree by reading bytes from disk and creating a new Python object). The one exception above is that a TTree as a reference → TBranch because the TTree block of data on disk includes all of its TBranches: you can't read a TTree without also reading all of the TBranch objects. So yes, there is a reference cycle between TTree and TBranch.

But then, why are any arrays referenced from that reference cycle? Arrays are not part of the TTree-TBranch block of data on disk—they can be (and are) independently read. When we call TTree.arrays or TBranch.array, it triggers a read, interprets that data as arrays, and returns new array objects without any references from the arrays to the TTree or TBranch and without any references from the TTree or TBranch to the arrays.

There is an optional array_cache, a MutableMapping attached to the ReadOnlyFile, that references the LRU arrays, but even if you opt into that cache, the default size is "100 MB", so it will not store GB of arrays. They'd be evicted.

martindurant · 2024-10-30T14:44:02Z

Can we try to break a few of these cycles with weakref to see if it makes a difference?

jpivarski · 2024-10-30T15:30:16Z

@pfackeldey, you can break the object cycles (the references between TTree and TBranch) provisionally, but in the long term, we need them. The real mystery is why the arrays are still connected. It's the arrays that need to be detached.

martindurant · 2024-10-30T15:36:27Z

I'm not sure how gc counts things, but is it plausible that a large number of cycles like this is delaying other simpler cycles from getting cleaned up, like those in the generator closures?

pfackeldey · 2024-11-01T22:32:41Z

Applying this patch:

diff --git a/src/uproot/behaviors/TBranch.py b/src/uproot/behaviors/TBranch.py
index 2d27785..64a36c3 100644
--- a/src/uproot/behaviors/TBranch.py
+++ b/src/uproot/behaviors/TBranch.py
@@ -860,26 +860,29 @@ class HasBranches(Mapping):
                             cache_key = f"{self.cache_key}:{expression}:{interpretation.cache_key}:{entry_start}-{entry_stop}:{library.name}"
                         array_cache[cache_key] = arrays[branch.cache_key]
 
-        output = language.compute_expressions(
-            self,
-            arrays,
-            expression_context,
-            keys,
-            aliases,
-            self.file.file_path,
-            self.object_path,
-        )
-
-        # no longer needed; save memory
-        del arrays
 
-        expression_context = [
-            (e, c) for e, c in expression_context if c["is_primary"] and not c["is_cut"]
-        ]
-
-        return _ak_add_doc(
-            library.group(output, expression_context, how), self, ak_add_doc
-        )
+        return arrays
+        # output = language.compute_expressions(
+        #     self,
+        #     arrays,
+        #     expression_context,
+        #     keys,
+        #     aliases,
+        #     self.file.file_path,
+        #     self.object_path,
+        # )
+
+        # # no longer needed; save memory
+        # del arrays
+
+        # expression_context = [
+        #     (e, c) for e, c in expression_context if c["is_primary"] and not c["is_cut"]
+        # ]
+
+        # return _ak_add_doc(
+        #     library.group(output, expression_context, how), self, ak_add_doc
+        # )
+        # return output
 
     def iterate(
         self,

seems to fix the issue.
Tested with:

import uproot


def loop():
    fname, tname, n = next(iter(uproot.num_entries("~/Downloads/zlib9-jagged0.root:tree")))

    tree = uproot.open(fname, array_cache=None, object_cache=None)["tree"]

    how_often = 8
    start = 0
    stepsize = n // how_often
    for _ in range(how_often):
        batch = tree.arrays(entry_start=start, entry_stop=start + stepsize, library="np")
        print(repr(batch))
        start += stepsize
        
        
 if __name__ == "__main__":
     loop()

I checked that

_ak_add_doc(
     library.group(output, expression_context, how), self, ak_add_doc
)

is not the problem, so I think it the issue has to be in language.compute_expressions.

It's the same for uproot.iterate: https://github.com/scikit-hep/uproot5/blob/main/src/uproot/behaviors/TBranch.py#L1089-L1097

Memory profile:

pfackeldey · 2024-11-02T02:46:22Z

The problem is fixed by:

diff --git a/src/uproot/language/python.py b/src/uproot/language/python.py
index e3ce883..a189569 100644
--- a/src/uproot/language/python.py
+++ b/src/uproot/language/python.py
@@ -516,6 +516,7 @@ class PythonLanguage(uproot.language.Language):
                 else:
                     output[name] = output[name][cut]
 
+        values.clear()
         return output

…ure...

…t-arrays-2

… to wait for GC (scikit-hep#1305) * perf: desperate attempts to unlink output arrays * implement scikit-hep#1305 (comment) * clear values dict to avoid holding onto computed arrays * clear scope to avoid holding onto big arrays & add a note for the future... --------- Co-authored-by: Peter Fackeldey <[email protected]>

perf: desperate attempts to unlink output arrays

23ae22e

jpivarski marked this pull request as ready for review October 11, 2024 18:57

jpivarski marked this pull request as draft October 11, 2024 18:57

implement #1305 (comment)

7cfc360

Merge branch 'main' into jpivarski/desperate-attempts-to-unlink-outpu…

7c274bd

…t-arrays-2

jpivarski marked this pull request as ready for review October 14, 2024 16:46

pfackeldey added 2 commits November 1, 2024 22:53

clear values dict to avoid holding onto computed arrays

3f291cb

clear scope to avoid holding onto big arrays & add a note for the fut…

d0423d3

…ure...

jpivarski changed the title ~~perf: desperate attempts to unlink output arrays~~ perf: removing output arrays from reference cycles so they don't have to wait for GC Nov 5, 2024

pfackeldey self-requested a review November 5, 2024 22:32

pfackeldey approved these changes Nov 5, 2024

View reviewed changes

Merge branch 'main' into jpivarski/desperate-attempts-to-unlink-outpu…

14c7d6c

…t-arrays-2

jpivarski enabled auto-merge (squash) November 5, 2024 23:07

jpivarski merged commit d7d9d30 into main Nov 5, 2024
26 checks passed

jpivarski deleted the jpivarski/desperate-attempts-to-unlink-output-arrays-2 branch November 5, 2024 23:16

perf: removing output arrays from reference cycles so they don't have to wait for GC #1305

perf: removing output arrays from reference cycles so they don't have to wait for GC #1305

Uh oh!

Conversation

jpivarski commented Sep 25, 2024

Uh oh!

martindurant commented Oct 11, 2024

Uh oh!

jpivarski commented Oct 11, 2024

Uh oh!

jpivarski commented Oct 11, 2024

Uh oh!

martindurant commented Oct 11, 2024

Uh oh!

jpivarski commented Oct 11, 2024

Uh oh!

jpivarski commented Oct 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

martindurant commented Oct 14, 2024

Uh oh!

pfackeldey commented Oct 30, 2024

Uh oh!

martindurant commented Oct 30, 2024

Uh oh!

jpivarski commented Oct 30, 2024

Uh oh!

martindurant commented Oct 30, 2024

Uh oh!

jpivarski commented Oct 30, 2024

Uh oh!

martindurant commented Oct 30, 2024

Uh oh!

pfackeldey commented Nov 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pfackeldey commented Nov 2, 2024

Uh oh!

Uh oh!

Uh oh!

jpivarski commented Oct 14, 2024 •

edited

Loading

pfackeldey commented Nov 1, 2024 •

edited

Loading