Skip to content

Commit

Permalink
Merge pull request #839 from rapidsai/branch-22.02
Browse files Browse the repository at this point in the history
[RELEASE] dask-cuda v22.02
  • Loading branch information
raydouglass authored Feb 2, 2022
2 parents e1e49b6 + d0845ad commit 6070be5
Show file tree
Hide file tree
Showing 25 changed files with 790 additions and 215 deletions.
78 changes: 76 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,80 @@
# dask-cuda 21.12.00 (Date TBD)
# dask-cuda 22.02.00 (2 Feb 2022)

Please see https://github.com/rapidsai/dask-cuda/releases/tag/v21.12.00a for the latest changes to this development branch.
## 🐛 Bug Fixes

- Ignoe `DepecationWaning` fom `distutils.Vesion` classes ([#823](https://github.com/rapidsai/dask-cuda/pull/823)) [@pentschev](https://github.com/pentschev)
- Handle explicitly disabled UCX tanspots ([#820](https://github.com/rapidsai/dask-cuda/pull/820)) [@pentschev](https://github.com/pentschev)
- Fix egex patten to match to in test_on_demand_debug_info ([#819](https://github.com/rapidsai/dask-cuda/pull/819)) [@pentschev](https://github.com/pentschev)
- Fix skipping GDS test if cucim is not installed ([#813](https://github.com/rapidsai/dask-cuda/pull/813)) [@pentschev](https://github.com/pentschev)
- Unpin Dask and Distibuted vesions ([#810](https://github.com/rapidsai/dask-cuda/pull/810)) [@pentschev](https://github.com/pentschev)
- Update to UCX-Py 0.24 ([#805](https://github.com/rapidsai/dask-cuda/pull/805)) [@pentschev](https://github.com/pentschev)

## 📖 Documentation

- Fix Dask-CUDA vesion to 22.02 ([#835](https://github.com/rapidsai/dask-cuda/pull/835)) [@jakikham](https://github.com/jakikham)
- Mege banch-21.12 into banch-22.02 ([#829](https://github.com/rapidsai/dask-cuda/pull/829)) [@pentschev](https://github.com/pentschev)
- Claify `LocalCUDACluste`'s `n_wokes` docstings ([#812](https://github.com/rapidsai/dask-cuda/pull/812)) [@pentschev](https://github.com/pentschev)

## 🚀 New Featues

- Pin `dask` & `distibuted` vesions ([#832](https://github.com/rapidsai/dask-cuda/pull/832)) [@galipemsaga](https://github.com/galipemsaga)
- Expose mm-maximum_pool_size agument ([#827](https://github.com/rapidsai/dask-cuda/pull/827)) [@VibhuJawa](https://github.com/VibhuJawa)
- Simplify UCX configs, pemitting UCX_TLS=all ([#792](https://github.com/rapidsai/dask-cuda/pull/792)) [@pentschev](https://github.com/pentschev)

## 🛠️ Impovements

- Add avg and std calculation fo time and thoughput ([#828](https://github.com/rapidsai/dask-cuda/pull/828)) [@quasiben](https://github.com/quasiben)
- sizeof test: incease toleance ([#825](https://github.com/rapidsai/dask-cuda/pull/825)) [@madsbk](https://github.com/madsbk)
- Quey UCX-Py fom gpuCI vesioning sevice ([#818](https://github.com/rapidsai/dask-cuda/pull/818)) [@pentschev](https://github.com/pentschev)
- Standadize Distibuted config sepaato in get_ucx_config ([#806](https://github.com/rapidsai/dask-cuda/pull/806)) [@pentschev](https://github.com/pentschev)
- Fixed `PoxyObject.__del__` to use the new Disk IO API fom #791 ([#802](https://github.com/rapidsai/dask-cuda/pull/802)) [@madsbk](https://github.com/madsbk)
- GPUDiect Stoage (GDS) suppot fo spilling ([#793](https://github.com/rapidsai/dask-cuda/pull/793)) [@madsbk](https://github.com/madsbk)
- Disk IO inteface ([#791](https://github.com/rapidsai/dask-cuda/pull/791)) [@madsbk](https://github.com/madsbk)

# dask-cuda 21.12.00 (9 Dec 2021)

## 🐛 Bug Fixes

- Remove automatic `doc` labeler ([#807](https://github.com/rapidsai/dask-cuda/pull/807)) [@pentschev](https://github.com/pentschev)
- Add create_cuda_context UCX config from Distributed ([#801](https://github.com/rapidsai/dask-cuda/pull/801)) [@pentschev](https://github.com/pentschev)
- Ignore deprecation warnings from pkg_resources ([#784](https://github.com/rapidsai/dask-cuda/pull/784)) [@pentschev](https://github.com/pentschev)
- Fix parsing of device by UUID ([#780](https://github.com/rapidsai/dask-cuda/pull/780)) [@pentschev](https://github.com/pentschev)
- Avoid creating CUDA context in LocalCUDACluster parent process ([#765](https://github.com/rapidsai/dask-cuda/pull/765)) [@pentschev](https://github.com/pentschev)
- Remove gen_cluster spill tests ([#758](https://github.com/rapidsai/dask-cuda/pull/758)) [@pentschev](https://github.com/pentschev)
- Update memory_pause_fraction in test_spill ([#757](https://github.com/rapidsai/dask-cuda/pull/757)) [@pentschev](https://github.com/pentschev)

## 📖 Documentation

- Add troubleshooting page with PCI Bus ID issue description ([#777](https://github.com/rapidsai/dask-cuda/pull/777)) [@pentschev](https://github.com/pentschev)

## 🚀 New Features

- Handle UCX-Py FutureWarning on UCX < 1.11.1 deprecation ([#799](https://github.com/rapidsai/dask-cuda/pull/799)) [@pentschev](https://github.com/pentschev)
- Pin max `dask` & `distributed` versions ([#794](https://github.com/rapidsai/dask-cuda/pull/794)) [@galipremsagar](https://github.com/galipremsagar)
- Update to UCX-Py 0.23 ([#752](https://github.com/rapidsai/dask-cuda/pull/752)) [@pentschev](https://github.com/pentschev)

## 🛠️ Improvements

- Fix spill-to-disk triggered by Dask explicitly ([#800](https://github.com/rapidsai/dask-cuda/pull/800)) [@madsbk](https://github.com/madsbk)
- Fix Changelog Merge Conflicts for `branch-21.12` ([#797](https://github.com/rapidsai/dask-cuda/pull/797)) [@ajschmidt8](https://github.com/ajschmidt8)
- Use unittest.mock.patch for all os.environ tests ([#787](https://github.com/rapidsai/dask-cuda/pull/787)) [@pentschev](https://github.com/pentschev)
- Logging when RMM allocation fails ([#782](https://github.com/rapidsai/dask-cuda/pull/782)) [@madsbk](https://github.com/madsbk)
- Tally IDs instead of device buffers directly ([#779](https://github.com/rapidsai/dask-cuda/pull/779)) [@madsbk](https://github.com/madsbk)
- Avoid proxy object aliasing ([#775](https://github.com/rapidsai/dask-cuda/pull/775)) [@madsbk](https://github.com/madsbk)
- Test of sizeof proxy object ([#774](https://github.com/rapidsai/dask-cuda/pull/774)) [@madsbk](https://github.com/madsbk)
- gc.collect when spilling on demand ([#771](https://github.com/rapidsai/dask-cuda/pull/771)) [@madsbk](https://github.com/madsbk)
- Reenable explicit comms tests ([#770](https://github.com/rapidsai/dask-cuda/pull/770)) [@madsbk](https://github.com/madsbk)
- Simplify JIT-unspill and writing docs ([#768](https://github.com/rapidsai/dask-cuda/pull/768)) [@madsbk](https://github.com/madsbk)
- Increase CUDAWorker close timeout ([#764](https://github.com/rapidsai/dask-cuda/pull/764)) [@pentschev](https://github.com/pentschev)
- Ignore known but expected test warnings ([#759](https://github.com/rapidsai/dask-cuda/pull/759)) [@pentschev](https://github.com/pentschev)
- Spilling on demand ([#756](https://github.com/rapidsai/dask-cuda/pull/756)) [@madsbk](https://github.com/madsbk)
- Revert "Temporarily skipping some tests because of a bug in Dask ([#753)" (#754](https://github.com/rapidsai/dask-cuda/pull/753)" (#754)) [@madsbk](https://github.com/madsbk)
- Temporarily skipping some tests because of a bug in Dask ([#753](https://github.com/rapidsai/dask-cuda/pull/753)) [@madsbk](https://github.com/madsbk)
- Removing the `FrameProxyObject` workaround ([#751](https://github.com/rapidsai/dask-cuda/pull/751)) [@madsbk](https://github.com/madsbk)
- Use cuDF Frame instead of Table ([#748](https://github.com/rapidsai/dask-cuda/pull/748)) [@madsbk](https://github.com/madsbk)
- Remove proxy object locks ([#747](https://github.com/rapidsai/dask-cuda/pull/747)) [@madsbk](https://github.com/madsbk)
- Unpin `dask` & `distributed` in CI ([#742](https://github.com/rapidsai/dask-cuda/pull/742)) [@galipremsagar](https://github.com/galipremsagar)
- Update SSHCluster usage in benchmarks with new CUDAWorker ([#326](https://github.com/rapidsai/dask-cuda/pull/326)) [@pentschev](https://github.com/pentschev)

# dask-cuda 21.10.00 (7 Oct 2021)

Expand Down
10 changes: 6 additions & 4 deletions ci/gpu/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ cd "$WORKSPACE"
export GIT_DESCRIBE_TAG=`git describe --tags`
export MINOR_VERSION=`echo $GIT_DESCRIBE_TAG | grep -o -E '([0-9]+\.[0-9]+)'`
export UCX_PATH=$CONDA_PREFIX
export UCXPY_VERSION="0.24.*"

# Enable NumPy's __array_function__ protocol (needed for NumPy 1.16.x,
# will possibly be enabled by default starting on 1.17)
Expand Down Expand Up @@ -53,11 +54,13 @@ conda config --show-sources
conda list --show-channel-urls

# Fixing Numpy version to avoid RuntimeWarning: numpy.ufunc size changed, may
# indicate binary incompatibility. Expected 192 from C header, got 216 from PyObject
# indicate binary incompatibility. Expected 192 from C header, got 216 from PyObject.
# Also installing cucim in order to test GDS spilling
gpuci_mamba_retry install "cudatoolkit=$CUDA_REL" \
"cudf=${MINOR_VERSION}" "dask-cudf=${MINOR_VERSION}" \
"ucx-py=0.23.*" "ucx-proc=*=gpu" \
"rapids-build-env=$MINOR_VERSION.*"
"ucx-py=${UCXPY_VERSION}" "ucx-proc=*=gpu" \
"rapids-build-env=$MINOR_VERSION.*" \
"cucim"

# Pin pytest-asyncio because latest versions modify the default asyncio
# `event_loop_policy`. See https://github.com/dask/distributed/pull/4212 .
Expand All @@ -67,7 +70,6 @@ gpuci_mamba_retry install "pytest-asyncio=<0.14.0"
# gpuci_mamba_retry remove -f rapids-build-env
# gpuci_mamba_retry install "your-pkg=1.0.0"


conda info
conda config --show-sources
conda list --show-channel-urls
Expand Down
5 changes: 3 additions & 2 deletions ci/release/update-version.sh
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ CURRENT_SHORT_TAG=${CURRENT_MAJOR}.${CURRENT_MINOR}
NEXT_MAJOR=$(echo $NEXT_FULL_TAG | awk '{split($0, a, "."); print a[1]}')
NEXT_MINOR=$(echo $NEXT_FULL_TAG | awk '{split($0, a, "."); print a[2]}')
NEXT_SHORT_TAG=${NEXT_MAJOR}.${NEXT_MINOR}
NEXT_UCXPY_VERSION="$(curl -s https://version.gpuci.io/rapids/${NEXT_SHORT_TAG}).*"

echo "Preparing release $CURRENT_TAG => $NEXT_FULL_TAG"

Expand All @@ -30,5 +31,5 @@ function sed_runner() {
sed -i.bak ''"$1"'' $2 && rm -f ${2}.bak
}


# No-op
# Update UCX-Py version
sed_runner "s/export UCXPY_VERSION=.*/export UCXPY_VERSION="${NEXT_UCXPY_VERSION}"/g" ci/gpu/build.sh
4 changes: 2 additions & 2 deletions conda/recipes/dask-cuda/meta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,8 @@ requirements:
- setuptools
run:
- python
- dask>=2021.11.1,<=2021.11.2
- distributed>=2021.11.1,<=2021.11.2
- dask>=2021.11.1,<=2022.01.0
- distributed>=2021.11.1,<=2022.01.0
- pynvml >=8.0.3
- numpy >=1.16.0
- numba >=0.53.1
Expand Down
17 changes: 14 additions & 3 deletions dask_cuda/benchmarks/local_cudf_merge.py
Original file line number Diff line number Diff line change
Expand Up @@ -288,16 +288,27 @@ def main(args):
print(f"ib | {args.enable_infiniband}")
print(f"nvlink | {args.enable_nvlink}")
print(f"data-processed | {format_bytes(took_list[0][0])}")
print("===============================")
print("=" * 80)
print("Wall-clock | Throughput")
print("-------------------------------")
print("-" * 80)
t_p = []
times = []
for idx, (data_processed, took) in enumerate(took_list):
throughput = int(data_processed / took)
m = format_time(took)
times.append(took)
t_p.append(throughput)
m += " " * (15 - len(m))
print(f"{m}| {format_bytes(throughput)}/s")
t_runs[idx] = float(format_bytes(throughput).split(" ")[0])
print("===============================")
t_p = numpy.asarray(t_p)
times = numpy.asarray(times)
print("=" * 80)
print(f"Throughput | {format_bytes(t_p.mean())} +/- {format_bytes(t_p.std()) }")
print(
f"Wall-Clock | {format_time(times.mean())} +/- {format_time(times.std()) }"
)
print("=" * 80)
if args.markdown:
print("\n```")

Expand Down
35 changes: 13 additions & 22 deletions dask_cuda/benchmarks/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,24 +66,28 @@ def parse_benchmark_args(description="Generic dask-cuda Benchmark", args_list=[]
)
parser.add_argument(
"--enable-tcp-over-ucx",
default=None,
action="store_true",
dest="enable_tcp_over_ucx",
help="Enable TCP over UCX.",
)
parser.add_argument(
"--enable-infiniband",
default=None,
action="store_true",
dest="enable_infiniband",
help="Enable InfiniBand over UCX.",
)
parser.add_argument(
"--enable-nvlink",
default=None,
action="store_true",
dest="enable_nvlink",
help="Enable NVLink over UCX.",
)
parser.add_argument(
"--enable-rdmacm",
default=None,
action="store_true",
dest="enable_rdmacm",
help="Enable RDMACM with UCX.",
Expand Down Expand Up @@ -181,27 +185,23 @@ def parse_benchmark_args(description="Generic dask-cuda Benchmark", args_list=[]
name = [name]
parser.add_argument(*name, **args)

parser.set_defaults(
enable_tcp_over_ucx=True,
enable_infiniband=True,
enable_nvlink=True,
enable_rdmacm=False,
)
args = parser.parse_args()

if args.protocol == "tcp":
args.enable_tcp_over_ucx = False
args.enable_infiniband = False
args.enable_nvlink = False
args.enable_rdmacm = False

if args.multi_node and len(args.hosts.split(",")) < 2:
raise ValueError("--multi-node requires at least 2 hosts")

return args


def get_cluster_options(args):
ucx_options = {
"ucx_net_devices": args.ucx_net_devices,
"enable_tcp_over_ucx": args.enable_tcp_over_ucx,
"enable_infiniband": args.enable_infiniband,
"enable_nvlink": args.enable_nvlink,
"enable_rdmacm": args.enable_rdmacm,
}

if args.multi_node is True:
Cluster = SSHCluster
cluster_args = [args.hosts.split(",")]
Expand All @@ -214,11 +214,6 @@ def get_cluster_options(args):
"worker_options": {
"protocol": args.protocol,
"nthreads": args.threads_per_worker,
"net_devices": args.ucx_net_devices,
"enable_tcp_over_ucx": args.enable_tcp_over_ucx,
"enable_infiniband": args.enable_infiniband,
"enable_nvlink": args.enable_nvlink,
"enable_rdmacm": args.enable_rdmacm,
"interface": args.interface,
"device_memory_limit": args.device_memory_limit,
},
Expand All @@ -234,13 +229,9 @@ def get_cluster_options(args):
"n_workers": len(args.devs.split(",")),
"threads_per_worker": args.threads_per_worker,
"CUDA_VISIBLE_DEVICES": args.devs,
"ucx_net_devices": args.ucx_net_devices,
"enable_tcp_over_ucx": args.enable_tcp_over_ucx,
"enable_infiniband": args.enable_infiniband,
"enable_nvlink": args.enable_nvlink,
"enable_rdmacm": args.enable_rdmacm,
"interface": args.interface,
"device_memory_limit": args.device_memory_limit,
**ucx_options,
}
if args.no_silence_logs:
cluster_kwargs["silence_logs"] = False
Expand Down
26 changes: 20 additions & 6 deletions dask_cuda/cli/dask_cuda_worker.py
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,18 @@
.. note::
This size is a per-worker configuration, and not cluster-wide.""",
)
@click.option(
"--rmm-maximum-pool-size",
default=None,
help="""When ``--rmm-pool-size`` is specified, this argument indicates the maximum pool size.
Can be an integer (bytes), string (like ``"5GB"`` or ``"5000M"``) or ``None``.
By default, the total available memory on the GPU is used.
``rmm_pool_size`` must be specified to use RMM pool and
to set the maximum pool size.
.. note::
This size is a per-worker configuration, and not cluster-wide.""",
)
@click.option(
"--rmm-managed-memory/--no-rmm-managed-memory",
default=False,
Expand Down Expand Up @@ -210,28 +222,28 @@
)
@click.option(
"--enable-tcp-over-ucx/--disable-tcp-over-ucx",
default=False,
default=None,
show_default=True,
help="""Set environment variables to enable TCP over UCX, even if InfiniBand and
NVLink are not supported or disabled.""",
)
@click.option(
"--enable-infiniband/--disable-infiniband",
default=False,
default=None,
show_default=True,
help="""Set environment variables to enable UCX over InfiniBand, implies
``--enable-tcp-over-ucx``.""",
``--enable-tcp-over-ucx`` when enabled.""",
)
@click.option(
"--enable-nvlink/--disable-nvlink",
default=False,
default=None,
show_default=True,
help="""Set environment variables to enable UCX over NVLink, implies
``--enable-tcp-over-ucx``.""",
``--enable-tcp-over-ucx`` when enabled.""",
)
@click.option(
"--enable-rdmacm/--disable-rdmacm",
default=False,
default=None,
show_default=True,
help="""Set environment variables to enable UCX RDMA connection manager support,
requires ``--enable-infiniband``.""",
Expand Down Expand Up @@ -277,6 +289,7 @@ def main(
memory_limit,
device_memory_limit,
rmm_pool_size,
rmm_maximum_pool_size,
rmm_managed_memory,
rmm_async,
rmm_log_directory,
Expand Down Expand Up @@ -327,6 +340,7 @@ def main(
memory_limit,
device_memory_limit,
rmm_pool_size,
rmm_maximum_pool_size,
rmm_managed_memory,
rmm_async,
rmm_log_directory,
Expand Down
18 changes: 13 additions & 5 deletions dask_cuda/cuda_worker.py
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,7 @@ def __init__(
memory_limit="auto",
device_memory_limit="auto",
rmm_pool_size=None,
rmm_maximum_pool_size=None,
rmm_managed_memory=False,
rmm_async=False,
rmm_log_directory=None,
Expand All @@ -72,10 +73,10 @@ def __init__(
preload=[],
dashboard_prefix=None,
security=None,
enable_tcp_over_ucx=False,
enable_infiniband=False,
enable_nvlink=False,
enable_rdmacm=False,
enable_tcp_over_ucx=None,
enable_infiniband=None,
enable_nvlink=None,
enable_rdmacm=None,
net_devices=None,
jit_unspill=None,
worker_class=None,
Expand Down Expand Up @@ -152,6 +153,9 @@ def del_pid_file():
)
if rmm_pool_size is not None:
rmm_pool_size = parse_bytes(rmm_pool_size)
if rmm_maximum_pool_size is not None:
rmm_maximum_pool_size = parse_bytes(rmm_maximum_pool_size)

else:
if enable_nvlink:
warnings.warn(
Expand Down Expand Up @@ -239,7 +243,11 @@ def del_pid_file():
get_cpu_affinity(nvml_device_index(i, cuda_visible_devices(i)))
),
RMMSetup(
rmm_pool_size, rmm_managed_memory, rmm_async, rmm_log_directory,
rmm_pool_size,
rmm_maximum_pool_size,
rmm_managed_memory,
rmm_async,
rmm_log_directory,
),
},
name=name if nprocs == 1 or name is None else str(name) + "-" + str(i),
Expand Down
Loading

0 comments on commit 6070be5

Please sign in to comment.