v24.06.00

raydouglass released this 05 Jun 17:20

· 44 commits to main since this release

🐛 Bug Fixes

a quick fix to wholememory tensor gather default data type (#173) @linhu-nv
quick fix to a map_indice bug && add comment for parameter round_robin_size (#172) @linhu-nv

🚀 New Features

Add initial support of distributed sampling (#171) @chang-l

🛠️ Improvements

Sort indices before gathering (#174) @zhuofan1123
Always use a static gtest (#167) @vyasr
Fix host view for mnnvl (#166) @chuangz0
subwarp version gather op for small embedding size (#165) @chuangz0
Migrate to {{ stdlib("c") }} (#164) @hcho3
support read file with multi threads and add test_wholememory_io for round-roubin read (#163) @chuangz0
fix CI issue due to pytorch and mkl version conflict (#162) @linhu-nv
allow temp_memory_handler to allocate memory for multiple times (#161) @linhu-nv
remove unnecessary sync between thrust ops and host threads (#160) @linhu-nv
Remove scripts/checks/copyright.py (#149) @KyleFromNVIDIA

Contributors

vyasr, hcho3, and 5 other contributors

Assets 2