Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Potential Memory Leak #5952

Open
ilan-gold opened this issue Jul 3, 2024 · 1 comment
Open

[BUG] Potential Memory Leak #5952

ilan-gold opened this issue Jul 3, 2024 · 1 comment
Labels
? - Needs Triage Need team to review and classify bug Something isn't working

Comments

@ilan-gold
Copy link

ilan-gold commented Jul 3, 2024

Describe the bug
Potentially related to rapidsai/dask-cuda#1351, I am seeing quite a bit of unexplainable CPU memory usage.

Steps/Code to reproduce bug

from dask_cuda import LocalCUDACluster
from dask.distributed import Client
import dask.array as da
import dask

import cupy as cp
import rmm

from rmm.allocators.cupy import rmm_cupy_allocator
from cuml.dask.decomposition import PCA

def set_mem():
    rmm.reinitialize(managed_memory=True)
    cp.cuda.set_allocator(rmm_cupy_allocator)

cluster = LocalCUDACluster(CUDA_VISIBLE_DEVICES="0")
client = Client(cluster)
client.run(set_mem)

M = 100_000
N = 4_000

def make_chunk():
    return cp.random.random((M,N))

arr = da.map_blocks(make_chunk, meta=cp.array((1.,), dtype=cp.float64), dtype=cp.float64, chunks=((M,) * 50, (N,) * 1))

# Without these lines the function actually errors out immediately
arr = arr.map_blocks(lambda x: x * 5, meta=arr._meta)
arr = arr.map_blocks(lambda x: cp.log1p(x), meta=arr._meta)



pca_func = PCA(
    n_components=512, svd_solver="full", whiten=False, client=client
)
X_pca = pca_func.fit_transform(arr)
Screenshot 2024-07-03 at 14 46 17

Expected behavior
I would expect there to be little to no CPU memory usage.

Environment details (please complete the following information):

  • Environment location: Slurm Cluster
  • Linux Distro/Architecture: Rocky Linux, VERSION=8.9 (Green Obsidian)
  • GPU Model/Driver: Tesla V100-SXM3-32GB
  • CUDA: 12.3
  • Method of cuDF & cuML install: pip into conda env
env
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
aiobotocore               2.5.4                    pypi_0    pypi
aiohttp                   3.9.5                    pypi_0    pypi
aioitertools              0.11.0                   pypi_0    pypi
aiosignal                 1.3.1                    pypi_0    pypi
anndata                   0.10.8                   pypi_0    pypi
anyio                     4.4.0                    pypi_0    pypi
archspec                  0.2.2              pyhd8ed1ab_0    conda-forge
argon2-cffi               23.1.0                   pypi_0    pypi
argon2-cffi-bindings      21.2.0                   pypi_0    pypi
array-api-compat          1.7.1                    pypi_0    pypi
arrow                     1.3.0                    pypi_0    pypi
asciitree                 0.3.3                    pypi_0    pypi
asttokens                 2.4.1              pyhd8ed1ab_0    conda-forge
async-lru                 2.0.4                    pypi_0    pypi
attrs                     23.2.0                   pypi_0    pypi
autocvd                   0.2.1                    pypi_0    pypi
babel                     2.15.0                   pypi_0    pypi
beautifulsoup4            4.12.3                   pypi_0    pypi
bleach                    6.1.0                    pypi_0    pypi
bokeh                     3.4.1                    pypi_0    pypi
boltons                   23.1.1             pyhd8ed1ab_0    conda-forge
botocore                  1.31.17                  pypi_0    pypi
brotli-python             1.1.0           py311hb755f60_1    conda-forge
bzip2                     1.0.8                hd590300_5    conda-forge
c-ares                    1.26.0               hd590300_0    conda-forge
ca-certificates           2024.6.2             hbcca054_0    conda-forge
cachetools                5.3.3                    pypi_0    pypi
certifi                   2024.6.2           pyhd8ed1ab_0    conda-forge
cffi                      1.16.0          py311hb3a22ac_0    conda-forge
charset-normalizer        3.3.2              pyhd8ed1ab_0    conda-forge
click                     8.1.7                    pypi_0    pypi
click-plugins             1.1.1                    pypi_0    pypi
cligj                     0.7.2                    pypi_0    pypi
cloudpickle               3.0.0                    pypi_0    pypi
colorama                  0.4.6              pyhd8ed1ab_0    conda-forge
colorcet                  3.1.0                    pypi_0    pypi
comm                      0.2.2                    pypi_0    pypi
conda                     23.11.0         py311h38be061_1    conda-forge
conda-content-trust       0.2.0              pyhd8ed1ab_0    conda-forge
conda-libmamba-solver     23.12.0            pyhd8ed1ab_0    conda-forge
conda-package-handling    2.2.0              pyh38be061_0    conda-forge
conda-package-streaming   0.9.0              pyhd8ed1ab_0    conda-forge
contourpy                 1.2.1                    pypi_0    pypi
coverage                  7.5.4                    pypi_0    pypi
cryptography              41.0.7          py311hcb13ee4_1    conda-forge
cuda-python               12.5.0                   pypi_0    pypi
cudf-cu12                 24.6.0                   pypi_0    pypi
cugraph-cu12              24.6.1                   pypi_0    pypi
cuml-cu12                 24.6.1                   pypi_0    pypi
cupy-cuda12x              13.2.0                   pypi_0    pypi
cycler                    0.12.1                   pypi_0    pypi
dask                      2024.5.1                 pypi_0    pypi
dask-cuda                 24.6.0                   pypi_0    pypi
dask-cudf-cu12            24.6.0                   pypi_0    pypi
dask-expr                 1.1.1                    pypi_0    pypi
dask-image                2023.8.1                 pypi_0    pypi
datashader                0.16.2                   pypi_0    pypi
debugpy                   1.8.1                    pypi_0    pypi
decorator                 5.1.1              pyhd8ed1ab_0    conda-forge
decoupler                 1.7.0                    pypi_0    pypi
defusedxml                0.7.1                    pypi_0    pypi
distributed               2024.5.1                 pypi_0    pypi
distributed-ucxx-cu12     0.38.0                   pypi_0    pypi
distro                    1.9.0              pyhd8ed1ab_0    conda-forge
docrep                    0.3.2                    pypi_0    pypi
exceptiongroup            1.2.0              pyhd8ed1ab_2    conda-forge
execnet                   2.1.1                    pypi_0    pypi
executing                 2.0.1              pyhd8ed1ab_0    conda-forge
fasteners                 0.19                     pypi_0    pypi
fastjsonschema            2.20.0                   pypi_0    pypi
fastrlock                 0.8.2                    pypi_0    pypi
fiona                     1.9.6                    pypi_0    pypi
fmt                       10.2.1               h00ab1b0_0    conda-forge
fonttools                 4.53.0                   pypi_0    pypi
fqdn                      1.5.1                    pypi_0    pypi
frozenlist                1.4.1                    pypi_0    pypi
fsspec                    2023.6.0                 pypi_0    pypi
geopandas                 0.14.4                   pypi_0    pypi
gh                        2.42.1               ha8f183a_0    conda-forge
h11                       0.14.0                   pypi_0    pypi
h5py                      3.11.0                   pypi_0    pypi
httpcore                  1.0.5                    pypi_0    pypi
httpx                     0.27.0                   pypi_0    pypi
icu                       73.2                 h59595ed_0    conda-forge
idna                      3.6                pyhd8ed1ab_0    conda-forge
igraph                    0.11.5                   pypi_0    pypi
imageio                   2.34.1                   pypi_0    pypi
importlib-metadata        7.1.0                    pypi_0    pypi
inflect                   7.3.0                    pypi_0    pypi
iniconfig                 2.0.0                    pypi_0    pypi
ipykernel                 6.29.4                   pypi_0    pypi
ipython                   8.25.0             pyh707e725_0    conda-forge
isoduration               20.11.0                  pypi_0    pypi
jedi                      0.19.1             pyhd8ed1ab_0    conda-forge
jinja2                    3.1.4                    pypi_0    pypi
jmespath                  1.0.1                    pypi_0    pypi
joblib                    1.4.2                    pypi_0    pypi
json5                     0.9.25                   pypi_0    pypi
jsonpatch                 1.33               pyhd8ed1ab_0    conda-forge
jsonpointer               2.4             py311h38be061_3    conda-forge
jsonschema                4.22.0                   pypi_0    pypi
jsonschema-specifications 2023.12.1                pypi_0    pypi
jupyter-client            8.6.2                    pypi_0    pypi
jupyter-core              5.7.2                    pypi_0    pypi
jupyter-events            0.10.0                   pypi_0    pypi
jupyter-lsp               2.2.5                    pypi_0    pypi
jupyter-server            2.14.1                   pypi_0    pypi
jupyter-server-proxy      4.2.0                    pypi_0    pypi
jupyter-server-terminals  0.5.3                    pypi_0    pypi
jupyterlab                4.2.2                    pypi_0    pypi
jupyterlab-pygments       0.3.0                    pypi_0    pypi
jupyterlab-server         2.27.2                   pypi_0    pypi
keyutils                  1.6.1                h166bdaf_0    conda-forge
kiwisolver                1.4.5                    pypi_0    pypi
krb5                      1.21.2               h659d440_0    conda-forge
lazy-loader               0.4                      pypi_0    pypi
ld_impl_linux-64          2.40                 h41732ed_0    conda-forge
legacy-api-wrap           1.4                      pypi_0    pypi
leidenalg                 0.10.2                   pypi_0    pypi
libarchive                3.7.2                h2aa1ff5_1    conda-forge
libcurl                   8.5.0                hca28451_0    conda-forge
libedit                   3.1.20230828         h5eee18b_0  
libev                     4.33                 hd590300_2    conda-forge
libexpat                  2.5.0                hcb278e6_1    conda-forge
libffi                    3.4.4                h6a678d5_0  
libgcc-ng                 13.2.0               h807b86a_4    conda-forge
libgomp                   13.2.0               h807b86a_4    conda-forge
libiconv                  1.17                 hd590300_2    conda-forge
libmamba                  1.5.6                had39da4_0    conda-forge
libmambapy                1.5.6           py311hf2555c7_0    conda-forge
libnghttp2                1.58.0               h47da74e_1    conda-forge
libnsl                    2.0.1                hd590300_0    conda-forge
libsolv                   0.7.27               hfc55251_0    conda-forge
libsqlite                 3.44.2               h2797004_0    conda-forge
libssh2                   1.11.0               h0841786_0    conda-forge
libstdcxx-ng              13.2.0               h7e041cc_4    conda-forge
libucx-cu12               1.15.0.post1             pypi_0    pypi
libuuid                   2.38.1               h0b41bf4_0    conda-forge
libxcrypt                 4.4.36               hd590300_1    conda-forge
libxml2                   2.12.4               h232c23b_1    conda-forge
libzlib                   1.2.13               hd590300_5    conda-forge
llvmlite                  0.43.0                   pypi_0    pypi
locket                    1.0.0                    pypi_0    pypi
lz4-c                     1.9.4                hcb278e6_0    conda-forge
lzo                       2.10              h516909a_1000    conda-forge
mamba                     1.5.6           py311h3072747_0    conda-forge
markdown-it-py            3.0.0                    pypi_0    pypi
markupsafe                2.1.5                    pypi_0    pypi
matplotlib                3.9.0                    pypi_0    pypi
matplotlib-inline         0.1.7              pyhd8ed1ab_0    conda-forge
mdurl                     0.1.2                    pypi_0    pypi
menuinst                  2.0.2           py311h38be061_0    conda-forge
mistune                   3.0.2                    pypi_0    pypi
more-itertools            10.3.0                   pypi_0    pypi
msgpack                   1.0.8                    pypi_0    pypi
multidict                 6.0.5                    pypi_0    pypi
multipledispatch          1.0.0                    pypi_0    pypi
multiscale-spatial-image  0.11.2                   pypi_0    pypi
natsort                   8.4.0                    pypi_0    pypi
nbclient                  0.10.0                   pypi_0    pypi
nbconvert                 7.16.4                   pypi_0    pypi
nbformat                  5.10.4                   pypi_0    pypi
ncurses                   6.4                  h59595ed_2    conda-forge
nest-asyncio              1.6.0                    pypi_0    pypi
networkx                  3.3                      pypi_0    pypi
notebook-shim             0.2.4                    pypi_0    pypi
numba                     0.60.0                   pypi_0    pypi
numcodecs                 0.12.1                   pypi_0    pypi
numpy                     1.26.4                   pypi_0    pypi
nvtx                      0.2.10                   pypi_0    pypi
ome-zarr                  0.9.0                    pypi_0    pypi
omnipath                  1.0.8                    pypi_0    pypi
openssl                   3.3.1                h4ab18f5_0    conda-forge
overrides                 7.7.0                    pypi_0    pypi
packaging                 24.1                     pypi_0    pypi
pandas                    2.2.2                    pypi_0    pypi
pandocfilters             1.5.1                    pypi_0    pypi
param                     2.1.0                    pypi_0    pypi
parso                     0.8.4              pyhd8ed1ab_0    conda-forge
partd                     1.4.2                    pypi_0    pypi
patsy                     0.5.6                    pypi_0    pypi
pbr                       6.0.0                    pypi_0    pypi
pcre2                     10.42                hcad00b1_0    conda-forge
pexpect                   4.9.0              pyhd8ed1ab_0    conda-forge
pickleshare               0.7.5                   py_1003    conda-forge
pillow                    10.3.0                   pypi_0    pypi
pims                      0.7                      pypi_0    pypi
pip                       23.3.2             pyhd8ed1ab_0    conda-forge
platformdirs              4.1.0              pyhd8ed1ab_0    conda-forge
pluggy                    1.5.0                    pypi_0    pypi
pooch                     1.8.2                    pypi_0    pypi
profimp                   0.1.0                    pypi_0    pypi
prometheus-client         0.20.0                   pypi_0    pypi
prompt-toolkit            3.0.47             pyha770c72_0    conda-forge
psutil                    5.9.8                    pypi_0    pypi
ptyprocess                0.7.0              pyhd3deb0d_0    conda-forge
pure_eval                 0.2.2              pyhd8ed1ab_0    conda-forge
pyarrow                   16.1.0                   pypi_0    pypi
pybind11-abi              4                    hd8ed1ab_3    conda-forge
pycosat                   0.6.6           py311h459d7ec_0    conda-forge
pycparser                 2.21               pyhd8ed1ab_0    conda-forge
pyct                      0.5.0                    pypi_0    pypi
pygeos                    0.14                     pypi_0    pypi
pygments                  2.18.0             pyhd8ed1ab_0    conda-forge
pylibcugraph-cu12         24.6.1                   pypi_0    pypi
pylibraft-cu12            24.6.0                   pypi_0    pypi
pynndescent               0.5.13                   pypi_0    pypi
pynvjitlink-cu12          0.2.4                    pypi_0    pypi
pynvml                    11.4.1                   pypi_0    pypi
pyopenssl                 23.3.0             pyhd8ed1ab_0    conda-forge
pyparsing                 3.1.2                    pypi_0    pypi
pyproj                    3.6.1                    pypi_0    pypi
pysocks                   1.7.1              pyha2e5f31_6    conda-forge
pytest                    8.2.2                    pypi_0    pypi
pytest-cov                5.0.0                    pypi_0    pypi
pytest-mock               3.14.0                   pypi_0    pypi
pytest-nunit              1.0.7                    pypi_0    pypi
pytest-xdist              3.6.1                    pypi_0    pypi
python                    3.11.7          hab00c5b_1_cpython    conda-forge
python-dateutil           2.9.0.post0              pypi_0    pypi
python-json-logger        2.0.7                    pypi_0    pypi
python_abi                3.11                    4_cp311    conda-forge
pytz                      2024.1                   pypi_0    pypi
pyyaml                    6.0.1                    pypi_0    pypi
pyzmq                     26.0.3                   pypi_0    pypi
raft-dask-cu12            24.6.0                   pypi_0    pypi
rapids-dask-dependency    24.6.0                   pypi_0    pypi
rapids-singlecell         0.10.7.dev65+gc6df139          pypi_0    pypi
readline                  8.2                  h8228510_1    conda-forge
referencing               0.35.1                   pypi_0    pypi
reproc                    14.2.4.post0         hd590300_1    conda-forge
reproc-cpp                14.2.4.post0         h59595ed_1    conda-forge
requests                  2.31.0             pyhd8ed1ab_0    conda-forge
rfc3339-validator         0.1.4                    pypi_0    pypi
rfc3986-validator         0.1.1                    pypi_0    pypi
rich                      13.7.1                   pypi_0    pypi
rmm-cu12                  24.6.0                   pypi_0    pypi
rpds-py                   0.18.1                   pypi_0    pypi
ruamel.yaml               0.18.5          py311h459d7ec_0    conda-forge
ruamel.yaml.clib          0.2.7           py311h459d7ec_2    conda-forge
s3fs                      2023.6.0                 pypi_0    pypi
scanpy                    1.10.2                   pypi_0    pypi
scikit-image              0.23.2                   pypi_0    pypi
scikit-learn              1.5.0                    pypi_0    pypi
scikit-misc               0.3.1                    pypi_0    pypi
scipy                     1.14.0                   pypi_0    pypi
seaborn                   0.13.2                   pypi_0    pypi
send2trash                1.8.3                    pypi_0    pypi
session-info              1.0.0                    pypi_0    pypi
setuptools                69.0.3             pyhd8ed1ab_0    conda-forge
shapely                   2.0.4                    pypi_0    pypi
simpervisor               1.0.0                    pypi_0    pypi
six                       1.16.0             pyh6c4a22f_0    conda-forge
slicerator                1.1.0                    pypi_0    pypi
sniffio                   1.3.1                    pypi_0    pypi
sortedcontainers          2.4.0                    pypi_0    pypi
soupsieve                 2.5                      pypi_0    pypi
spatial-image             0.3.0                    pypi_0    pypi
spatialdata               0.1.2                    pypi_0    pypi
sqlite                    3.44.2               h2c6b66d_0    conda-forge
stack_data                0.6.2              pyhd8ed1ab_0    conda-forge
statsmodels               0.14.2                   pypi_0    pypi
stdlib-list               0.10.0                   pypi_0    pypi
tblib                     3.0.0                    pypi_0    pypi
terminado                 0.18.1                   pypi_0    pypi
texttable                 1.7.0                    pypi_0    pypi
threadpoolctl             3.5.0                    pypi_0    pypi
tifffile                  2024.5.22                pypi_0    pypi
tinycss2                  1.3.0                    pypi_0    pypi
tk                        8.6.13          noxft_h4845f30_101    conda-forge
toolz                     0.12.1                   pypi_0    pypi
tornado                   6.4.1                    pypi_0    pypi
tqdm                      4.66.4                   pypi_0    pypi
traitlets                 5.14.3             pyhd8ed1ab_0    conda-forge
treelite                  4.1.2                    pypi_0    pypi
truststore                0.8.0              pyhd8ed1ab_0    conda-forge
typeguard                 4.3.0                    pypi_0    pypi
types-python-dateutil     2.9.0.20240316           pypi_0    pypi
typing_extensions         4.12.2             pyha770c72_0    conda-forge
tzdata                    2024.1                   pypi_0    pypi
ucx-py-cu12               0.38.0                   pypi_0    pypi
ucxx-cu12                 0.38.0                   pypi_0    pypi
umap-learn                0.5.6                    pypi_0    pypi
uri-template              1.3.0                    pypi_0    pypi
urllib3                   1.26.19                  pypi_0    pypi
uv                        0.2.13                   pypi_0    pypi
wcwidth                   0.2.13             pyhd8ed1ab_0    conda-forge
webcolors                 24.6.0                   pypi_0    pypi
webencodings              0.5.1                    pypi_0    pypi
websocket-client          1.8.0                    pypi_0    pypi
wheel                     0.42.0             pyhd8ed1ab_0    conda-forge
wrapt                     1.16.0                   pypi_0    pypi
xarray                    2024.6.0                 pypi_0    pypi
xarray-dataclasses        1.8.0                    pypi_0    pypi
xarray-datatree           0.0.14                   pypi_0    pypi
xarray-schema             0.0.3                    pypi_0    pypi
xarray-spatial            0.4.0                    pypi_0    pypi
xyzservices               2024.6.0                 pypi_0    pypi
xz                        5.4.5                h5eee18b_0  
yaml-cpp                  0.8.0                h59595ed_0    conda-forge
yarl                      1.9.4                    pypi_0    pypi
zarr                      2.18.2                   pypi_0    pypi
zict                      3.0.0                    pypi_0    pypi
zipp                      3.19.2                   pypi_0    pypi
zlib                      1.2.13               hd590300_5    conda-forge
zstandard                 0.22.0          py311haa97af0_0    conda-forge
zstd                      1.5.5                hfc55251_0    conda-forge

Additional context

Stack trace without pre-processing (i.e., scale and `log1p`):
---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
Cell In[25], line 6
      1 from cuml.dask.decomposition import PCA
      3 pca_func = PCA(
      4     n_components=512, svd_solver="full", whiten=False, client=client
      5 )
----> 6 X_pca = pca_func.fit_transform(arr)

File [~/miniconda3/lib/python3.11/site-packages/cuml/dask/decomposition/pca.py:181](http://supergpu03.scidom.de:8990/lab/tree/~/miniconda3/lib/python3.11/site-packages/cuml/dask/decomposition/pca.py#line=180), in PCA.fit_transform(self, X)
    169 def fit_transform(self, X):
    170     """
    171     Fit the model with X and apply the dimensionality reduction on X.
    172 
   (...)
    179     X_new : dask cuDF
    180     """
--> 181     return self.fit(X).transform(X)

File [~/miniconda3/lib/python3.11/site-packages/cuml/dask/decomposition/pca.py:166](http://supergpu03.scidom.de:8990/lab/tree/~/miniconda3/lib/python3.11/site-packages/cuml/dask/decomposition/pca.py#line=165), in PCA.fit(self, X)
    157 def fit(self, X):
    158     """
    159     Fit the model with X.
    160 
   (...)
    163     X : dask cuDF input
    164     """
--> 166     self._fit(X)
    167     return self

File [~/miniconda3/lib/python3.11/site-packages/cuml/dask/decomposition/base.py:67](http://supergpu03.scidom.de:8990/lab/tree/~/miniconda3/lib/python3.11/site-packages/cuml/dask/decomposition/base.py#line=66), in DecompositionSyncFitMixin._fit(self, X, _transform)
     63     comms = Comms(comms_p2p=False)
     65 comms.init(workers=data.workers)
---> 67 data.calculate_parts_to_sizes(comms)
     69 worker_info = comms.worker_info(comms.worker_addresses)
     70 parts_to_sizes, _ = parts_to_ranks(
     71     self.client, worker_info, data.gpu_futures
     72 )

File [~/miniconda3/lib/python3.11/site-packages/cuml/dask/common/input_utils.py:155](http://supergpu03.scidom.de:8990/lab/tree/~/miniconda3/lib/python3.11/site-packages/cuml/dask/common/input_utils.py#line=154), in DistributedDataHandler.calculate_parts_to_sizes(self, comms, ranks)
    139 self.parts_to_sizes = dict()
    141 parts = [
    142     (
    143         wf[0],
   (...)
    152     for idx, wf in enumerate(self.worker_to_parts.items())
    153 ]
--> 155 sizes = self.client.compute(parts, sync=True)
    157 for w, sizes_parts in sizes:
    158     sizes, total = sizes_parts

File [~/miniconda3/lib/python3.11/site-packages/distributed/client.py:3496](http://supergpu03.scidom.de:8990/lab/tree/~/miniconda3/lib/python3.11/site-packages/distributed/client.py#line=3495), in Client.compute(self, collections, sync, optimize_graph, workers, allow_other_workers, resources, retries, priority, fifo_timeout, actors, traverse, **kwargs)
   3493         futures.append(arg)
   3495 if sync:
-> 3496     result = self.gather(futures)
   3497 else:
   3498     result = futures

File [~/miniconda3/lib/python3.11/site-packages/distributed/client.py:2372](http://supergpu03.scidom.de:8990/lab/tree/~/miniconda3/lib/python3.11/site-packages/distributed/client.py#line=2371), in Client.gather(self, futures, errors, direct, asynchronous)
   2369     local_worker = None
   2371 with shorten_traceback():
-> 2372     return self.sync(
   2373         self._gather,
   2374         futures,
   2375         errors=errors,
   2376         direct=direct,
   2377         local_worker=local_worker,
   2378         asynchronous=asynchronous,
   2379     )

File [~/miniconda3/lib/python3.11/site-packages/distributed/client.py:2232](http://supergpu03.scidom.de:8990/lab/tree/~/miniconda3/lib/python3.11/site-packages/distributed/client.py#line=2231), in Client._gather(self, futures, errors, direct, local_worker)
   2230         exc = CancelledError(key)
   2231     else:
-> 2232         raise exception.with_traceback(traceback)
   2233     raise exc
   2234 if errors == "skip":

MemoryError: Task '_get_rows-b28a0a7a-ae6a-4d85-bed9-4d982cf8db30' has 44.70 GiB worth of input dependencies, but worker tcp://127.0.0.1:8357 has memory_limit set to 32.00 GiB.
With those steps, we simply see the worker repeatedly dying: Screenshot 2024-07-03 at 14 46 30
@ilan-gold ilan-gold added ? - Needs Triage Need team to review and classify bug Something isn't working labels Jul 3, 2024
@dantegd
Copy link
Member

dantegd commented Jul 4, 2024

Thanks for the issue @ilan-gold, this is quite interesting. Will catch up with the linked issue in dask-cuda, and also see if there’s something we are doing in cuml.dask to cause any additional CPU leaks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
? - Needs Triage Need team to review and classify bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants