backed="r" is leaking memory #1597

agemagician · 2024-08-14T12:46:29Z

Please make sure these conditions are met

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of anndata.
(optional) I have confirmed this bug exists on the master branch of anndata.

Report

Hello,

Using the lazy loader, I am trying to load a big h5ad file that can't fit into memory. It works as expected and doesn't load the whole file into memory at the start.
However, if I started to read the data row by row, the memory utilization increased to the same file size.

My expectation is that the lazy loader should read each row and free its associated memory after it is deleted, which is not the case.

How can we solve this issue?

Code:

import scanpy as sc
import sys
from tqdm import tqdm

file_name = "file.h5ad"

adata_r = sc.read_h5ad(file_name, backed="r")

for idx in tqdm(range(adata_r.X.shape[0])):
    row = adata_r.X[idx].toarray()
    del row

100%|▌| 800000/800000 [20:00<20:00, 700.60it/s]

Versions

-----
anndata             0.10.8
scanpy              1.9.8
session_info        1.0.0
tqdm                4.66.5
-----
IPython             8.21.0
jupyter_client      8.6.0
jupyter_core        5.7.1
jupyterlab          2.3.2
notebook            6.4.10
-----
Python 3.10.12 (main, Jul 29 2024, 16:56:48) [GCC 11.4.0]
Linux-5.10.219-208.866.amzn2.x86_64-x86_64-with-glibc2.35
-----
Session information updated at 2024-08-14 12:33

The text was updated successfully, but these errors were encountered:

ilan-gold · 2024-08-14T16:47:12Z

Hello @agemagician, we will need a little bit more to go on here. Your example did not produce the same results for me on a large dataset I had locally. Could you share your dataset? Could it be CSC?

agemagician added Bug 🐛 Triage 🩺 labels Aug 14, 2024

ilan-gold removed the Triage 🩺 label Aug 14, 2024

ilan-gold self-assigned this Aug 14, 2024

ivirshup added the Needs info❔ label Aug 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

backed="r" is leaking memory #1597

backed="r" is leaking memory #1597

agemagician commented Aug 14, 2024 •

edited

Loading

ilan-gold commented Aug 14, 2024 •

edited

Loading

backed="r" is leaking memory #1597

backed="r" is leaking memory #1597

Comments

agemagician commented Aug 14, 2024 • edited Loading

Please make sure these conditions are met

Report

Versions

ilan-gold commented Aug 14, 2024 • edited Loading

agemagician commented Aug 14, 2024 •

edited

Loading

ilan-gold commented Aug 14, 2024 •

edited

Loading