Please see https://github.com/rapidsai/dask-cuda/releases/tag/v22.06.00a for the latest changes to this development branch.
- Resolve build issues / consistency with conda-forge packages (#883) @charlesbluca
- Increase test_worker_force_spill_to_disk timeout (#857) @pentschev
- Remove description from non-existing
--nprocs
CLI argument (#852) @pentschev
- Add --pre-import/pre_import argument (#854) @pentschev
- Remove support for UCX < 1.11.1 (#830) @pentschev
- Raise
ImportError
when platform is not Linux (#885) @pentschev - Temporarily disable new
ops-bot
functionality (#880) @ajschmidt8 - Pin
dask
&distributed
(#878) @galipremsagar - Upgrade min
dask
&distributed
versions (#872) @galipremsagar - Add
.github/ops-bot.yaml
config file (#871) @ajschmidt8 - Make Dask CUDA work with the new WorkerMemoryManager abstraction (#870) @shwina
- Implement ProxifyHostFile.evict() (#862) @madsbk
- Introduce incompatible-types and enables spilling of CuPy arrays (#856) @madsbk
- Spill to disk clean up (#853) @madsbk
- ProxyObject to support matrix multiplication (#849) @madsbk
- Unpin max dask and distributed (#847) @galipremsagar
- test_gds: skip if GDS is not available (#845) @madsbk
- ProxyObject implement array_function (#843) @madsbk
- Add option to track RMM allocations (#842) @shwina
- Ignore
DeprecationWarning
fromdistutils.Version
classes (#823) @pentschev - Handle explicitly disabled UCX transports (#820) @pentschev
- Fix regex pattern to match to in test_on_demand_debug_info (#819) @pentschev
- Fix skipping GDS test if cucim is not installed (#813) @pentschev
- Unpin Dask and Distributed versions (#810) @pentschev
- Update to UCX-Py 0.24 (#805) @pentschev
- Fix Dask-CUDA version to 22.02 (#835) @jakirkham
- Merge branch-21.12 into branch-22.02 (#829) @pentschev
- Clarify
LocalCUDACluster
'sn_workers
docstrings (#812) @pentschev
- Pin
dask
&distributed
versions (#832) @galipremsagar - Expose rmm-maximum_pool_size argument (#827) @VibhuJawa
- Simplify UCX configs, permitting UCX_TLS=all (#792) @pentschev
- Add avg and std calculation for time and throughput (#828) @quasiben
- sizeof test: increase tolerance (#825) @madsbk
- Query UCX-Py from gpuCI versioning service (#818) @pentschev
- Standardize Distributed config separator in get_ucx_config (#806) @pentschev
- Fixed
ProxyObject.__del__
to use the new Disk IO API from #791 (#802) @madsbk - GPUDirect Storage (GDS) support for spilling (#793) @madsbk
- Disk IO interface (#791) @madsbk
- Remove automatic
doc
labeler (#807) @pentschev - Add create_cuda_context UCX config from Distributed (#801) @pentschev
- Ignore deprecation warnings from pkg_resources (#784) @pentschev
- Fix parsing of device by UUID (#780) @pentschev
- Avoid creating CUDA context in LocalCUDACluster parent process (#765) @pentschev
- Remove gen_cluster spill tests (#758) @pentschev
- Update memory_pause_fraction in test_spill (#757) @pentschev
- Add troubleshooting page with PCI Bus ID issue description (#777) @pentschev
- Handle UCX-Py FutureWarning on UCX < 1.11.1 deprecation (#799) @pentschev
- Pin max
dask
&distributed
versions (#794) @galipremsagar - Update to UCX-Py 0.23 (#752) @pentschev
- Fix spill-to-disk triggered by Dask explicitly (#800) @madsbk
- Fix Changelog Merge Conflicts for
branch-21.12
(#797) @ajschmidt8 - Use unittest.mock.patch for all os.environ tests (#787) @pentschev
- Logging when RMM allocation fails (#782) @madsbk
- Tally IDs instead of device buffers directly (#779) @madsbk
- Avoid proxy object aliasing (#775) @madsbk
- Test of sizeof proxy object (#774) @madsbk
- gc.collect when spilling on demand (#771) @madsbk
- Reenable explicit comms tests (#770) @madsbk
- Simplify JIT-unspill and writing docs (#768) @madsbk
- Increase CUDAWorker close timeout (#764) @pentschev
- Ignore known but expected test warnings (#759) @pentschev
- Spilling on demand (#756) @madsbk
- Revert "Temporarily skipping some tests because of a bug in Dask (#753)" (#754" (#754)) @madsbk
- Temporarily skipping some tests because of a bug in Dask (#753) @madsbk
- Removing the
FrameProxyObject
workaround (#751) @madsbk - Use cuDF Frame instead of Table (#748) @madsbk
- Remove proxy object locks (#747) @madsbk
- Unpin
dask
&distributed
in CI (#742) @galipremsagar - Update SSHCluster usage in benchmarks with new CUDAWorker (#326) @pentschev
- Drop test setting UCX global options via Dask config (#738) @pentschev
- Prevent CUDA context errors when testing on single-GPU (#737) @pentschev
- Handle
ucp
import error duringinitialize()
(#729) @pentschev - Check if CUDA context was created in distributed.comm.ucx (#722) @pentschev
- Fix registering correct dispatches for
cudf.Index
(#718) @galipremsagar - Register
percentile_lookup
forFrameProxyObject
(#716) @galipremsagar - Leave interface unset when ucx_net_devices unset in LocalCUDACluster (#711) @pentschev
- Update to UCX-Py 0.22 (#710) @pentschev
- Missing fixes to Distributed config namespace refactoring (#703) @pentschev
- Reset UCX-Py after rdmacm tests run (#702) @pentschev
- Skip DGX InfiniBand tests when "rc" transport is unavailable (#701) @pentschev
- Update UCX config namespace (#695) @pentschev
- Bump isort hook version (#682) @charlesbluca
- Update more docs for UCX 1.11+ (#720) @pentschev
- Forward-merge branch-21.08 to branch-21.10 (#707) @jakirkham
- Warn if CUDA context is created on incorrect device with
LocalCUDACluster
(#719) @pentschev - Add
--benchmark-json
option to all benchmarks (#700) @charlesbluca - Remove Distributed tests from CI (#699) @pentschev
- Add device memory limit argument to benchmarks (#683) @charlesbluca
- Support for LocalCUDACluster with MIG (#674) @akaanirban
- Pin max
dask
anddistributed
versions to2021.09.1
(#735) @galipremsagar - Implements a ProxyManagerDummy for convenience (#733) @madsbk
- Add
__array_ufunc__
support forProxyObject
(#731) @galipremsagar - Use
has_cuda_context
from Distributed (#723) @pentschev - Fix deadlock and simplify proxy tracking (#712) @madsbk
- JIT-unspill: support spilling to/from disk (#708) @madsbk
- Tests: replacing the obsolete cudf.testing._utils.assert_eq calls (#706) @madsbk
- JIT-unspill: warn when spill to disk triggers (#705) @madsbk
- Remove max version pin for
dask
&distributed
on development branch (#693) @galipremsagar - ENH Replace gpuci_conda_retry with gpuci_mamba_retry (#675) @dillon-cullinan
- Use aliases to check for installed UCX version (#692) @pentschev
- Don't install Dask main branch in CI for 21.08 release (#687) @pentschev
- Skip test_get_ucx_net_devices_raises on UCX >= 1.11.0 (#685) @pentschev
- Fix NVML index usage in CUDAWorker/LocalCUDACluster (#671) @pentschev
- Add --protocol flag to dask-cuda-worker (#670) @jacobtomlinson
- Fix
assert_eq
related imports (#663) @galipremsagar - Small tweaks to make compatible with dask-mpi (#656) @jacobtomlinson
- Remove Dask version pin (#647) @pentschev
- Fix CUDA_VISIBLE_DEVICES tests (#638) @pentschev
- Add
make_meta_dispatch
handling (#637) @galipremsagar - Update UCX-Py version in CI to 0.21.* (#636) @pentschev
- Deprecation warning for ucx_net_devices='auto' on UCX 1.11+ (#681) @pentschev
- Update documentation on InfiniBand with UCX >= 1.11 (#669) @pentschev
- Merge branch-21.06 (#622) @pentschev
- Treat Deprecation/Future warnings as errors (#672) @pentschev
- Update parse_bytes imports to resolve deprecation warnings (#662) @pentschev
- Pin max
dask
&distributed
versions (#686) @galipremsagar - Fix DGX tests warnings on RMM pool size and file not closed (#673) @pentschev
- Remove dot calling style for pytest (#661) @quasiben
- get_device_memory_objects(): dispatch on cudf.core.frame.Frame (#658) @madsbk
- Fix
21.08
forward-merge conflicts (#655) @ajschmidt8 - Fix conflicts in
643
(#644) @ajschmidt8
- Handle
import
ing relocated dispatch functions (#623) @jakirkham - Fix DGX tests for UCX 1.9 (#619) @pentschev
- Add PROJECTS var (#614) @ajschmidt8
- Bump docs copyright year (#616) @charlesbluca
- Update RTD site to redirect to RAPIDS docs (#615) @charlesbluca
- Document DASK_JIT_UNSPILL (#604) @madsbk
- Disable reuse endpoints with UCX >= 1.11 (#620) @pentschev
- Adding profiling to dask shuffle (#625) @arunraman
- Update
CHANGELOG.md
links for calver (#618) @ajschmidt8 - Fixing Dataframe merge benchmark (#617) @madsbk
- Fix DGX tests for UCX 1.10+ (#613) @pentschev
- Update docs build script (#612) @ajschmidt8
- Pin Dask and Distributed <=2021.04.0 (#585) @pentschev
- Unblock CI by xfailing test_dataframe_merge_empty_partitions (#581) @pentschev
- Install Dask + Distributed from
main
(#546) @jakirkham - Replace compute() calls on CuPy benchmarks by persist() (#537) @pentschev
- Add standalone examples of UCX usage (#551) @charlesbluca
- Improve UCX documentation and examples (#545) @charlesbluca
- Auto-merge branch-0.18 to branch-0.19 (#538) @GPUtester
- Add option to enable RMM logging (#542) @charlesbluca
- Add capability to log spilling (#442) @pentschev
- Fix UCX examples for InfiniBand (#556) @charlesbluca
- Fix list to tuple conversion (#555) @madsbk
- Add column masking operation for CuPy benchmarking (#553) @jakirkham
- Update Changelog Link (#550) @ajschmidt8
- cuDF-style operations & NVTX annotations for local CuPy benchmark (#548) @charlesbluca
- Prepare Changelog for Automation (#543) @ajschmidt8
- Add --enable-rdmacm flag to benchmarks utils (#539) @pentschev
- ProxifyHostFile: tracking of external objects (#527) @madsbk
- Test broadcast merge in local_cudf_merge benchmark (#507) @rjzamora
- Explicit-comms house cleaning (#515) @madsbk
- Fix device synchronization in local_cupy benchmark (#518) @pentschev
- Proxify register lazy (#492) @madsbk
- Work on deadlock issue 431 (#490) @madsbk
- Fix usage of --dashboard-address in dask-cuda-worker (#487) @pentschev
- Fail if scheduler starts with '-' in dask-cuda-worker (#485) @pentschev
- Add device synchonization for local CuPy benchmarks with Dask profiling (#533) @charlesbluca
- Shuffle benchmark (#496) @madsbk
- Update stale GHA with exemptions & new labels (#531) @mike-wendt
- Add GHA to mark issues/prs as stale/rotten (#528) @Ethyling
- Add operations/arguments to local CuPy array benchmark (#524) @charlesbluca
- Explicit-comms house cleaning (#515) @madsbk
- Fixing fixed-attribute-proxy-object-test (#511) @madsbk
- Prepare Changelog for Automation (#509) @ajschmidt8
- remove conditional check to start conda uploads (#504) @jolorunyomi
- ProxyObject: ignore initial fixed attribute errors (#503) @madsbk
- JIT-unspill: fix potential deadlock (#501) @madsbk
- Hostfile: register the removal of an existing key (#500) @madsbk
- proxy_object: cleanup type dispatching (#497) @madsbk
- Redesign and implementation of dataframe shuffle (#494) @madsbk
- Add --threads-per-worker option to benchmarks (#489) @pentschev
- Extend CuPy benchmark with more operations (#488) @pentschev
- Auto-label PRs based on their content (#480) @jolorunyomi
- CI: cleanup style check (#477) @madsbk
- Individual CUDA object spilling (#451) @madsbk
- FIX Move codecov upload to gpu build script (#450) @dillon-cullinan
- Add support for connecting a CUDAWorker to a cluster object (#428) @jacobtomlinson
- Fix benchmark output when scheduler address is specified (#414) @quasiben
- Fix typo in benchmark utils (#416) @quasiben
- More RMM options in benchmarks (#419) @quasiben
- Add utility function to establish all-to-all connectivity upon request (#420) @quasiben
- Filter
rmm_pool_size
warnings in benchmarks (#422) @pentschev - Add functionality to plot cuDF benchmarks (#423) @quasiben
- Decrease data size to shorten spilling tests time (#422) @pentschev
- Temporarily xfail explicit-comms tests (#432) @pentschev
- Add codecov.yml and ignore uncovered files (#433) @pentschev
- Do not skip DGX/TCP tests when ucp is not installed (#436) @pentschev
- Support UUID in CUDA_VISIBLE_DEVICES (#437) @pentschev
- Unify
device_memory_limit
parsing and set default to 0.8 (#439) @pentschev - Update and clean gpuCI scripts (#440) @msadang
- Add notes on controlling number of workers to docs (#441) @quasiben
- Add CPU support to CuPy transpose sum benchmark (#444) @pentschev
- Update builddocs dependency requirements (#447) @quasiben
- Fix versioneer (#448) @jakirkham
- Cleanup conda recipe (#449) @jakirkham
- Fix
pip install
issues with new resolver (#454) @jakirkham - Make threads per worker consistent (#456) @pentschev
- Support for ProxyObject binary operations (#458) @madsbk
- Support for ProxyObject pickling (#459) @madsbk
- Clarify RMM pool is a per-worker attribute on docs (#462) @pentschev
- Fix typo on specializations docs (#463) @vfdev-5
- Parse pool size only when set (#396) @quasiben
- Improve CUDAWorker scheduler-address parsing and init (#397) @necaris
- Add benchmark for
da.map_overlap
(#399) @jakirkham - Explicit-comms: dataframe shuffle (#401) @madsbk
- Use new NVTX module (#406) @pentschev
- Run Dask's NVML tests (#408) @quasiben
- Skip tests that require cuDF/UCX-Py, when not installed (#411) @pentschev
- Fix-up versioneer (#305) @jakirkham
- Require Distributed 2.15.0+ (#306) @jakirkham
- Rely on Dask's ability to serialize collections (#307) @jakirkham
- Ensure CI installs GPU build of UCX (#308) @pentschev
- Skip 2nd serialization pass of
DeviceSerialized
(#309) @jakirkham - Fix tests related to latest RMM changes (#310) @pentschev
- Fix dask-cuda-worker's interface argument (#314) @pentschev
- Check only for memory type during test_get_device_total_memory (#315) @pentschev
- Fix and improve DGX tests (#316) @pentschev
- Install dependencies via meta package (#317) @raydouglass
- Fix errors when TLS files are not specified (#320) @pentschev
- Refactor dask-cuda-worker into CUDAWorker class (#324) @jacobtomlinson
- Add missing init.py to dask_cuda/cli (#327) @pentschev
- Add Dask distributed GPU tests to CI (#329) @quasiben
- Fix rmm_pool_size argument name in docstrings (#329) @quasiben
- Add CPU support to benchmarks (#338) @quasiben
- Fix isort configuration (#339) @madsbk
- Explicit-comms: cleanup and bug fix (#340) @madsbk
- Add support for RMM managed memory (#343) @pentschev
- Update docker image in local build script (#345) @sean-frye
- Support pickle protocol 5 based spilling (#349) @jakirkham
- Use get_n_gpus for RMM test with dask-cuda-worker (#356) @pentschev
- Update RMM tests based on deprecated CNMeM (#359) @jakirkham
- Fix a black error in explicit comms (#360) @jakirkham
- Fix an
isort
error (#360) @jakirkham - Set
RMM_NO_INITIALIZE
environment variable (#363) @quasiben - Fix bash lines in docs (#369) @quasiben
- Replace
RMM_NO_INITIALIZE
withRAPIDS_NO_INITIALIZE
(#371) @jakirkham - Fixes for docs and RTD updates (#373) @quasiben
- Confirm DGX tests are running baremetal (#376) @pentschev
- Set RAPIDS_NO_INITIALIZE at the top of CUDAWorker/LocalCUDACluster (#379) @pentschev
- Change pytest's basetemp in CI build script (#380) @pentschev
- Pin Numba version to exclude 0.51.0 (#385) @quasiben
- Publish branch-0.14 to conda (#262) @trxcllnt
- Fix behavior for
memory_limit=0
(#269) @pentschev - Raise serialization errors when spilling (#272) @jakirkham
- Fix dask-cuda-worker memory_limit (#279) @pentschev
- Add NVTX annotations for spilling (#282) @pentschev
- Skip existing on conda uploads (#284) @raydouglass
- Local gpuCI build script (#285) @efajardo-nv
- Remove deprecated DGX class (#286) @pentschev
- Add RDMACM support (#287) @pentschev
- Read the Docs Setup (#290) @quasiben
- Raise ValueError when ucx_net_devices="auto" and IB is disabled (#291) @pentschev
- Multi-node benchmarks (#293) @pentschev
- Add docs for UCX (#294) @pentschev
- Add
--runs
argument to CuPy benchmark (#295) @pentschev - Fixing LocalCUDACluster example. Adding README links to docs (#297) @randerzander
- Add
nfinal
argument to shuffle_group, required in Dask >= 2.17 (#299) @pentschev - Initialize parent process' UCX configuration (#301) @pentschev
- Add Read the Docs link (#302) @jakirkham
- Use RMM's
DeviceBuffer
directly (#235) @jakirkham - Add RMM pool support from dask-cuda-worker/LocalCUDACluster (#236) @pentschev
- Restrict CuPy to <7.2 (#239) @quasiben
- Fix UCX configurations (#246) @pentschev
- Respect
temporary-directory
config for spilling (#247) @jakirkham - Relax CuPy pin (#248) @jakirkham
- Added
ignore_index
argument topartition_by_hash()
(#253) @madsbk - Use
"dask"
serialization to move to/from host (#256) @jakirkham - Drop Numba
DeviceNDArray
code forsizeof
(#257) @jakirkham - Support spilling of device objects in dictionaries (#260) @madsbk
- Add ucx-py dependency to CI (#212) @raydouglass
- Follow-up revision of local_cudf_merge benchmark (#213) @rjzamora
- Add codeowners file (#217) @raydouglass
- Add pypi upload script (#218) @raydouglass
- Skip existing on PyPi uploads (#219) @raydouglass
- Make benchmarks use rmm_cupy_allocator (#220) @madsbk
- cudf-merge-benchmark now reports throughput (#222) @madsbk
- Fix dask-cuda-worker --interface/--net-devices docs (#223) @pentschev
- Use RMM for serialization when available (#227) @pentschev
- Use UCX-Py initialization API (#152) @pentschev
- Remove all CUDA labels (#160) @mike-wendt
- Setting UCX options through dask global config (#168) @madsbk
- Make test_cudf_device_spill xfail (#170) @pentschev
- Updated CI, cleanup tests and reformat Python files (#171) @madsbk
- Fix GPU dependency versions (#173) @dillon-cullinan
- Set LocalCUDACluster n_workers equal to the length of CUDA_VISIBLE_DEVICES (#174) @mrocklin
- Simplify dask-cuda code (#175) @madsbk
- DGX inherit from LocalCUDACluster (#177) @madsbk
- Single-node CUDA benchmarks (#179) @madsbk
- Set TCP for UCX tests (#180) @pentschev
- Single-node cuDF merge benchmarks (#183) @madsbk
- Add black and isort checks in CI (#185) @pentschev
- Remove outdated xfail/importorskip test entries (#188) @pentschev
- Use UCX-Py's TopologicalDistance to determine IB interfaces in DGX (#189) @pentschev
- Dask Performance Report (#192) @madsbk
- Allow test_cupy_device_spill to xfail (#195) @pentschev
- Use ucx-py from rapidsai-nightly in CI (#196) @pentschev
- LocalCUDACluster sets closest network device (#200) @madsbk
- Expand cudf-merge benchmark (#201) @rjzamora
- Added --runs to merge benchmark (#202) @madsbk
- Move UCX code to LocalCUDACluster and deprecate DGX (#205) @pentschev
- Add markdown output option to cuDF merge benchmark (#208) @quasiben
- Change the updated new_worker_spec API for upstream (#128) @mrocklin
- Update TOTAL_MEMORY to match new distributed MEMORY_LIMIT (#131) @pentschev
- Bum Dask requirement to 2.4 (#133) @mrocklin
- Use YYMMDD tag in nightly build (#134) @mluukkainen
- Automatically determine CPU affinity (#138) @pentschev
- Fix full memory use check testcase (#139) @ksangeek
- Use pynvml to get memory info without creating CUDA context (#140) @pentschev
- Pass missing local_directory to Nanny from dask-cuda-worker (#141) @pentschev
- New worker_spec function for worker recipes (#147) @pentschev
- Add new Scheduler class supporting environment variables (#149) @pentschev
- Support for TCP over UCX (#152) @pentschev
- Fix serialization of collections and bump dask to 2.3.0 (#118) @pentschev
- Add versioneer (#88) @matthieubulte
- Python CodeCov Integration (#91) @dillon-cullinan
- Update cudf, dask, dask-cudf, distributed version requirements (#97) @pentschev
- Improve device memory spilling performance (#98) @pentschev
- Update dask-cuda for dask 2.2 (#101) @mrocklin
- Streamline CUDA_REL environment variable (#102) @okoskinen
- Replace ncores= parameter with nthreads= (#101) @mrocklin
- Fix remove CodeCov upload from build script (#115) @dillon-cullinan
- Remove CodeCov upload (#116) @dillon-cullinan
- Add device memory spill support (LRU-based only) (#51) @pentschev
- Update CI dependency to CuPy 6.0.0 (#53) @pentschev
- Add a hard-coded DGX configuration (#46) (#70) @mrocklin
- Fix LocalCUDACluster data spilling and its test (#67) @pentschev
- Add test skipping functionality to build.sh (#71) @dillon-cullinan
- Replace use of ncores= keywords with nthreads= (#75) @mrocklin
- Fix device memory spilling with cuDF (#65) @pentschev
- LocalCUDACluster calls _correct_state() to ensure workers started (#78) @pentschev
- Delay some of spilling test assertions (#80) @pentschev