Merge pull request #839 from rapidsai/branch-22.02

[RELEASE] dask-cuda v22.02
rapidsai · Feb 2, 2022 · 6070be5 · 6070be5
2 parents e1e49b6 + d0845ad
commit 6070be5
Show file tree

Hide file tree

Showing 25 changed files with 790 additions and 215 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,6 +1,80 @@
-# dask-cuda 21.12.00 (Date TBD)
+# dask-cuda 22.02.00 (2 Feb 2022)
 
-Please see https://github.com/rapidsai/dask-cuda/releases/tag/v21.12.00a for the latest changes to this development branch.
+## 🐛 Bug Fixes
+
+- Ignoe `DepecationWaning` fom `distutils.Vesion` classes ([#823](https://github.com/rapidsai/dask-cuda/pull/823)) [@pentschev](https://github.com/pentschev)
+- Handle explicitly disabled UCX tanspots ([#820](https://github.com/rapidsai/dask-cuda/pull/820)) [@pentschev](https://github.com/pentschev)
+- Fix egex patten to match to in test_on_demand_debug_info ([#819](https://github.com/rapidsai/dask-cuda/pull/819)) [@pentschev](https://github.com/pentschev)
+- Fix skipping GDS test if cucim is not installed ([#813](https://github.com/rapidsai/dask-cuda/pull/813)) [@pentschev](https://github.com/pentschev)
+- Unpin Dask and Distibuted vesions ([#810](https://github.com/rapidsai/dask-cuda/pull/810)) [@pentschev](https://github.com/pentschev)
+- Update to UCX-Py 0.24 ([#805](https://github.com/rapidsai/dask-cuda/pull/805)) [@pentschev](https://github.com/pentschev)
+
+## 📖 Documentation
+
+- Fix Dask-CUDA vesion to 22.02 ([#835](https://github.com/rapidsai/dask-cuda/pull/835)) [@jakikham](https://github.com/jakikham)
+- Mege banch-21.12 into banch-22.02 ([#829](https://github.com/rapidsai/dask-cuda/pull/829)) [@pentschev](https://github.com/pentschev)
+- Claify `LocalCUDACluste`&#39;s `n_wokes` docstings ([#812](https://github.com/rapidsai/dask-cuda/pull/812)) [@pentschev](https://github.com/pentschev)
+
+## 🚀 New Featues
+
+- Pin `dask` &amp; `distibuted` vesions ([#832](https://github.com/rapidsai/dask-cuda/pull/832)) [@galipemsaga](https://github.com/galipemsaga)
+- Expose mm-maximum_pool_size agument ([#827](https://github.com/rapidsai/dask-cuda/pull/827)) [@VibhuJawa](https://github.com/VibhuJawa)
+- Simplify UCX configs, pemitting UCX_TLS=all ([#792](https://github.com/rapidsai/dask-cuda/pull/792)) [@pentschev](https://github.com/pentschev)
+
+## 🛠️ Impovements
+
+- Add avg and std calculation fo time and thoughput ([#828](https://github.com/rapidsai/dask-cuda/pull/828)) [@quasiben](https://github.com/quasiben)
+- sizeof test: incease toleance ([#825](https://github.com/rapidsai/dask-cuda/pull/825)) [@madsbk](https://github.com/madsbk)
+- Quey UCX-Py fom gpuCI vesioning sevice ([#818](https://github.com/rapidsai/dask-cuda/pull/818)) [@pentschev](https://github.com/pentschev)
+- Standadize Distibuted config sepaato in get_ucx_config ([#806](https://github.com/rapidsai/dask-cuda/pull/806)) [@pentschev](https://github.com/pentschev)
+- Fixed `PoxyObject.__del__` to use the new Disk IO API fom #791 ([#802](https://github.com/rapidsai/dask-cuda/pull/802)) [@madsbk](https://github.com/madsbk)
+- GPUDiect Stoage (GDS) suppot fo spilling ([#793](https://github.com/rapidsai/dask-cuda/pull/793)) [@madsbk](https://github.com/madsbk)
+- Disk IO inteface ([#791](https://github.com/rapidsai/dask-cuda/pull/791)) [@madsbk](https://github.com/madsbk)
+
+# dask-cuda 21.12.00 (9 Dec 2021)
+
+## 🐛 Bug Fixes
+
+- Remove automatic `doc` labeler ([#807](https://github.com/rapidsai/dask-cuda/pull/807)) [@pentschev](https://github.com/pentschev)
+- Add create_cuda_context UCX config from Distributed ([#801](https://github.com/rapidsai/dask-cuda/pull/801)) [@pentschev](https://github.com/pentschev)
+- Ignore deprecation warnings from pkg_resources ([#784](https://github.com/rapidsai/dask-cuda/pull/784)) [@pentschev](https://github.com/pentschev)
+- Fix parsing of device by UUID ([#780](https://github.com/rapidsai/dask-cuda/pull/780)) [@pentschev](https://github.com/pentschev)
+- Avoid creating CUDA context in LocalCUDACluster parent process ([#765](https://github.com/rapidsai/dask-cuda/pull/765)) [@pentschev](https://github.com/pentschev)
+- Remove gen_cluster spill tests ([#758](https://github.com/rapidsai/dask-cuda/pull/758)) [@pentschev](https://github.com/pentschev)
+- Update memory_pause_fraction in test_spill ([#757](https://github.com/rapidsai/dask-cuda/pull/757)) [@pentschev](https://github.com/pentschev)
+
+## 📖 Documentation
+
+- Add troubleshooting page with PCI Bus ID issue description ([#777](https://github.com/rapidsai/dask-cuda/pull/777)) [@pentschev](https://github.com/pentschev)
+
+## 🚀 New Features
+
+- Handle UCX-Py FutureWarning on UCX &lt; 1.11.1 deprecation ([#799](https://github.com/rapidsai/dask-cuda/pull/799)) [@pentschev](https://github.com/pentschev)
+- Pin max `dask` &amp; `distributed` versions ([#794](https://github.com/rapidsai/dask-cuda/pull/794)) [@galipremsagar](https://github.com/galipremsagar)
+- Update to UCX-Py 0.23 ([#752](https://github.com/rapidsai/dask-cuda/pull/752)) [@pentschev](https://github.com/pentschev)
+
+## 🛠️ Improvements
+
+- Fix spill-to-disk triggered by Dask explicitly ([#800](https://github.com/rapidsai/dask-cuda/pull/800)) [@madsbk](https://github.com/madsbk)
+- Fix Changelog Merge Conflicts for `branch-21.12` ([#797](https://github.com/rapidsai/dask-cuda/pull/797)) [@ajschmidt8](https://github.com/ajschmidt8)
+- Use unittest.mock.patch for all os.environ tests ([#787](https://github.com/rapidsai/dask-cuda/pull/787)) [@pentschev](https://github.com/pentschev)
+- Logging when RMM allocation fails ([#782](https://github.com/rapidsai/dask-cuda/pull/782)) [@madsbk](https://github.com/madsbk)
+- Tally IDs instead of device buffers directly ([#779](https://github.com/rapidsai/dask-cuda/pull/779)) [@madsbk](https://github.com/madsbk)
+- Avoid proxy object aliasing ([#775](https://github.com/rapidsai/dask-cuda/pull/775)) [@madsbk](https://github.com/madsbk)
+- Test of sizeof proxy object ([#774](https://github.com/rapidsai/dask-cuda/pull/774)) [@madsbk](https://github.com/madsbk)
+- gc.collect when spilling on demand ([#771](https://github.com/rapidsai/dask-cuda/pull/771)) [@madsbk](https://github.com/madsbk)
+- Reenable explicit comms tests ([#770](https://github.com/rapidsai/dask-cuda/pull/770)) [@madsbk](https://github.com/madsbk)
+- Simplify JIT-unspill and writing docs ([#768](https://github.com/rapidsai/dask-cuda/pull/768)) [@madsbk](https://github.com/madsbk)
+- Increase CUDAWorker close timeout ([#764](https://github.com/rapidsai/dask-cuda/pull/764)) [@pentschev](https://github.com/pentschev)
+- Ignore known but expected test warnings ([#759](https://github.com/rapidsai/dask-cuda/pull/759)) [@pentschev](https://github.com/pentschev)
+- Spilling on demand ([#756](https://github.com/rapidsai/dask-cuda/pull/756)) [@madsbk](https://github.com/madsbk)
+- Revert &quot;Temporarily skipping some tests because of a bug in Dask ([#753)&quot; (#754](https://github.com/rapidsai/dask-cuda/pull/753)&quot; (#754)) [@madsbk](https://github.com/madsbk)
+- Temporarily skipping some tests because of a bug in Dask ([#753](https://github.com/rapidsai/dask-cuda/pull/753)) [@madsbk](https://github.com/madsbk)
+- Removing the `FrameProxyObject` workaround ([#751](https://github.com/rapidsai/dask-cuda/pull/751)) [@madsbk](https://github.com/madsbk)
+- Use cuDF Frame instead of Table ([#748](https://github.com/rapidsai/dask-cuda/pull/748)) [@madsbk](https://github.com/madsbk)
+- Remove proxy object locks ([#747](https://github.com/rapidsai/dask-cuda/pull/747)) [@madsbk](https://github.com/madsbk)
+- Unpin `dask` &amp; `distributed` in CI ([#742](https://github.com/rapidsai/dask-cuda/pull/742)) [@galipremsagar](https://github.com/galipremsagar)
+- Update SSHCluster usage in benchmarks with new CUDAWorker ([#326](https://github.com/rapidsai/dask-cuda/pull/326)) [@pentschev](https://github.com/pentschev)
 
 # dask-cuda 21.10.00 (7 Oct 2021)
 

diff --git a/ci/gpu/build.sh b/ci/gpu/build.sh
@@ -26,6 +26,7 @@ cd "$WORKSPACE"
 export GIT_DESCRIBE_TAG=`git describe --tags`
 export MINOR_VERSION=`echo $GIT_DESCRIBE_TAG | grep -o -E '([0-9]+\.[0-9]+)'`
 export UCX_PATH=$CONDA_PREFIX
+export UCXPY_VERSION="0.24.*"
 
 # Enable NumPy's __array_function__ protocol (needed for NumPy 1.16.x,
 # will possibly be enabled by default starting on 1.17)
@@ -53,11 +54,13 @@ conda config --show-sources
 conda list --show-channel-urls
 
 # Fixing Numpy version to avoid RuntimeWarning: numpy.ufunc size changed, may
-# indicate binary incompatibility. Expected 192 from C header, got 216 from PyObject
+# indicate binary incompatibility. Expected 192 from C header, got 216 from PyObject.
+# Also installing cucim in order to test GDS spilling
 gpuci_mamba_retry install "cudatoolkit=$CUDA_REL" \
               "cudf=${MINOR_VERSION}" "dask-cudf=${MINOR_VERSION}" \
-              "ucx-py=0.23.*" "ucx-proc=*=gpu" \
-              "rapids-build-env=$MINOR_VERSION.*"
+              "ucx-py=${UCXPY_VERSION}" "ucx-proc=*=gpu" \
+              "rapids-build-env=$MINOR_VERSION.*" \
+              "cucim"
 
 # Pin pytest-asyncio because latest versions modify the default asyncio
 # `event_loop_policy`. See https://github.com/dask/distributed/pull/4212 .
@@ -67,7 +70,6 @@ gpuci_mamba_retry install "pytest-asyncio=<0.14.0"
 # gpuci_mamba_retry remove -f rapids-build-env
 # gpuci_mamba_retry install "your-pkg=1.0.0"
 
-
 conda info
 conda config --show-sources
 conda list --show-channel-urls

diff --git a/ci/release/update-version.sh b/ci/release/update-version.sh
@@ -22,6 +22,7 @@ CURRENT_SHORT_TAG=${CURRENT_MAJOR}.${CURRENT_MINOR}
 NEXT_MAJOR=$(echo $NEXT_FULL_TAG | awk '{split($0, a, "."); print a[1]}')
 NEXT_MINOR=$(echo $NEXT_FULL_TAG | awk '{split($0, a, "."); print a[2]}')
 NEXT_SHORT_TAG=${NEXT_MAJOR}.${NEXT_MINOR}
+NEXT_UCXPY_VERSION="$(curl -s https://version.gpuci.io/rapids/${NEXT_SHORT_TAG}).*"
 
 echo "Preparing release $CURRENT_TAG => $NEXT_FULL_TAG"
 
@@ -30,5 +31,5 @@ function sed_runner() {
     sed -i.bak ''"$1"'' $2 && rm -f ${2}.bak
 }
 
-
-# No-op
+# Update UCX-Py version
+sed_runner "s/export UCXPY_VERSION=.*/export UCXPY_VERSION="${NEXT_UCXPY_VERSION}"/g" ci/gpu/build.sh
diff --git a/conda/recipes/dask-cuda/meta.yaml b/conda/recipes/dask-cuda/meta.yaml
@@ -27,8 +27,8 @@ requirements:
     - setuptools
   run:
     - python
-    - dask>=2021.11.1,<=2021.11.2
-    - distributed>=2021.11.1,<=2021.11.2
+    - dask>=2021.11.1,<=2022.01.0
+    - distributed>=2021.11.1,<=2022.01.0
     - pynvml >=8.0.3
     - numpy >=1.16.0
     - numba >=0.53.1

diff --git a/dask_cuda/benchmarks/local_cudf_merge.py b/dask_cuda/benchmarks/local_cudf_merge.py
@@ -288,16 +288,27 @@ def main(args):
         print(f"ib             | {args.enable_infiniband}")
         print(f"nvlink         | {args.enable_nvlink}")
     print(f"data-processed | {format_bytes(took_list[0][0])}")
-    print("===============================")
+    print("=" * 80)
     print("Wall-clock     | Throughput")
-    print("-------------------------------")
+    print("-" * 80)
+    t_p = []
+    times = []
     for idx, (data_processed, took) in enumerate(took_list):
         throughput = int(data_processed / took)
         m = format_time(took)
+        times.append(took)
+        t_p.append(throughput)
         m += " " * (15 - len(m))
         print(f"{m}| {format_bytes(throughput)}/s")
         t_runs[idx] = float(format_bytes(throughput).split(" ")[0])
-    print("===============================")
+    t_p = numpy.asarray(t_p)
+    times = numpy.asarray(times)
+    print("=" * 80)
+    print(f"Throughput     | {format_bytes(t_p.mean())} +/- {format_bytes(t_p.std()) }")
+    print(
+        f"Wall-Clock     | {format_time(times.mean())} +/- {format_time(times.std()) }"
+    )
+    print("=" * 80)
     if args.markdown:
         print("\n```")
 

diff --git a/dask_cuda/benchmarks/utils.py b/dask_cuda/benchmarks/utils.py
@@ -66,24 +66,28 @@ def parse_benchmark_args(description="Generic dask-cuda Benchmark", args_list=[]
     )
     parser.add_argument(
         "--enable-tcp-over-ucx",
+        default=None,
         action="store_true",
         dest="enable_tcp_over_ucx",
         help="Enable TCP over UCX.",
     )
     parser.add_argument(
         "--enable-infiniband",
+        default=None,
         action="store_true",
         dest="enable_infiniband",
         help="Enable InfiniBand over UCX.",
     )
     parser.add_argument(
         "--enable-nvlink",
+        default=None,
         action="store_true",
         dest="enable_nvlink",
         help="Enable NVLink over UCX.",
     )
     parser.add_argument(
         "--enable-rdmacm",
+        default=None,
         action="store_true",
         dest="enable_rdmacm",
         help="Enable RDMACM with UCX.",
@@ -181,27 +185,23 @@ def parse_benchmark_args(description="Generic dask-cuda Benchmark", args_list=[]
             name = [name]
         parser.add_argument(*name, **args)
 
-    parser.set_defaults(
-        enable_tcp_over_ucx=True,
-        enable_infiniband=True,
-        enable_nvlink=True,
-        enable_rdmacm=False,
-    )
     args = parser.parse_args()
 
-    if args.protocol == "tcp":
-        args.enable_tcp_over_ucx = False
-        args.enable_infiniband = False
-        args.enable_nvlink = False
-        args.enable_rdmacm = False
-
     if args.multi_node and len(args.hosts.split(",")) < 2:
         raise ValueError("--multi-node requires at least 2 hosts")
 
     return args
 
 
 def get_cluster_options(args):
+    ucx_options = {
+        "ucx_net_devices": args.ucx_net_devices,
+        "enable_tcp_over_ucx": args.enable_tcp_over_ucx,
+        "enable_infiniband": args.enable_infiniband,
+        "enable_nvlink": args.enable_nvlink,
+        "enable_rdmacm": args.enable_rdmacm,
+    }
+
     if args.multi_node is True:
         Cluster = SSHCluster
         cluster_args = [args.hosts.split(",")]
@@ -214,11 +214,6 @@ def get_cluster_options(args):
             "worker_options": {
                 "protocol": args.protocol,
                 "nthreads": args.threads_per_worker,
-                "net_devices": args.ucx_net_devices,
-                "enable_tcp_over_ucx": args.enable_tcp_over_ucx,
-                "enable_infiniband": args.enable_infiniband,
-                "enable_nvlink": args.enable_nvlink,
-                "enable_rdmacm": args.enable_rdmacm,
                 "interface": args.interface,
                 "device_memory_limit": args.device_memory_limit,
             },
@@ -234,13 +229,9 @@ def get_cluster_options(args):
             "n_workers": len(args.devs.split(",")),
             "threads_per_worker": args.threads_per_worker,
             "CUDA_VISIBLE_DEVICES": args.devs,
-            "ucx_net_devices": args.ucx_net_devices,
-            "enable_tcp_over_ucx": args.enable_tcp_over_ucx,
-            "enable_infiniband": args.enable_infiniband,
-            "enable_nvlink": args.enable_nvlink,
-            "enable_rdmacm": args.enable_rdmacm,
             "interface": args.interface,
             "device_memory_limit": args.device_memory_limit,
+            **ucx_options,
         }
         if args.no_silence_logs:
             cluster_kwargs["silence_logs"] = False

diff --git a/dask_cuda/cli/dask_cuda_worker.py b/dask_cuda/cli/dask_cuda_worker.py
@@ -73,6 +73,18 @@
     .. note::
         This size is a per-worker configuration, and not cluster-wide.""",
 )
+@click.option(
+    "--rmm-maximum-pool-size",
+    default=None,
+    help="""When ``--rmm-pool-size`` is specified, this argument indicates the maximum pool size.
+        Can be an integer (bytes), string (like ``"5GB"`` or ``"5000M"``) or ``None``.
+        By default, the total available memory on the GPU is used.
+        ``rmm_pool_size`` must be specified to use RMM pool and
+        to set the maximum pool size.
+
+        .. note::
+            This size is a per-worker configuration, and not cluster-wide.""",
+)
 @click.option(
     "--rmm-managed-memory/--no-rmm-managed-memory",
     default=False,
@@ -210,28 +222,28 @@
 )
 @click.option(
     "--enable-tcp-over-ucx/--disable-tcp-over-ucx",
-    default=False,
+    default=None,
     show_default=True,
     help="""Set environment variables to enable TCP over UCX, even if InfiniBand and
     NVLink are not supported or disabled.""",
 )
 @click.option(
     "--enable-infiniband/--disable-infiniband",
-    default=False,
+    default=None,
     show_default=True,
     help="""Set environment variables to enable UCX over InfiniBand, implies
-    ``--enable-tcp-over-ucx``.""",
+    ``--enable-tcp-over-ucx`` when enabled.""",
 )
 @click.option(
     "--enable-nvlink/--disable-nvlink",
-    default=False,
+    default=None,
     show_default=True,
     help="""Set environment variables to enable UCX over NVLink, implies
-    ``--enable-tcp-over-ucx``.""",
+    ``--enable-tcp-over-ucx`` when enabled.""",
 )
 @click.option(
     "--enable-rdmacm/--disable-rdmacm",
-    default=False,
+    default=None,
     show_default=True,
     help="""Set environment variables to enable UCX RDMA connection manager support,
     requires ``--enable-infiniband``.""",
@@ -277,6 +289,7 @@ def main(
     memory_limit,
     device_memory_limit,
     rmm_pool_size,
+    rmm_maximum_pool_size,
     rmm_managed_memory,
     rmm_async,
     rmm_log_directory,
@@ -327,6 +340,7 @@ def main(
         memory_limit,
         device_memory_limit,
         rmm_pool_size,
+        rmm_maximum_pool_size,
         rmm_managed_memory,
         rmm_async,
         rmm_log_directory,

diff --git a/dask_cuda/cuda_worker.py b/dask_cuda/cuda_worker.py
@@ -58,6 +58,7 @@ def __init__(
         memory_limit="auto",
         device_memory_limit="auto",
         rmm_pool_size=None,
+        rmm_maximum_pool_size=None,
         rmm_managed_memory=False,
         rmm_async=False,
         rmm_log_directory=None,
@@ -72,10 +73,10 @@ def __init__(
         preload=[],
         dashboard_prefix=None,
         security=None,
-        enable_tcp_over_ucx=False,
-        enable_infiniband=False,
-        enable_nvlink=False,
-        enable_rdmacm=False,
+        enable_tcp_over_ucx=None,
+        enable_infiniband=None,
+        enable_nvlink=None,
+        enable_rdmacm=None,
         net_devices=None,
         jit_unspill=None,
         worker_class=None,
@@ -152,6 +153,9 @@ def del_pid_file():
                 )
             if rmm_pool_size is not None:
                 rmm_pool_size = parse_bytes(rmm_pool_size)
+                if rmm_maximum_pool_size is not None:
+                    rmm_maximum_pool_size = parse_bytes(rmm_maximum_pool_size)
+
         else:
             if enable_nvlink:
                 warnings.warn(
@@ -239,7 +243,11 @@ def del_pid_file():
                         get_cpu_affinity(nvml_device_index(i, cuda_visible_devices(i)))
                     ),
                     RMMSetup(
-                        rmm_pool_size, rmm_managed_memory, rmm_async, rmm_log_directory,
+                        rmm_pool_size,
+                        rmm_maximum_pool_size,
+                        rmm_managed_memory,
+                        rmm_async,
+                        rmm_log_directory,
                     ),
                 },
                 name=name if nprocs == 1 or name is None else str(name) + "-" + str(i),