v24.06.00
🐛 Bug Fixes
- a quick fix to wholememory tensor gather default data type (#173) @linhu-nv
- quick fix to a map_indice bug && add comment for parameter round_robin_size (#172) @linhu-nv
🚀 New Features
🛠️ Improvements
- Sort indices before gathering (#174) @zhuofan1123
- Always use a static gtest (#167) @vyasr
- Fix host view for mnnvl (#166) @chuangz0
- subwarp version gather op for small embedding size (#165) @chuangz0
- Migrate to
{{ stdlib("c") }}
(#164) @hcho3 - support read file with multi threads and add test_wholememory_io for round-roubin read (#163) @chuangz0
- fix CI issue due to pytorch and mkl version conflict (#162) @linhu-nv
- allow temp_memory_handler to allocate memory for multiple times (#161) @linhu-nv
- remove unnecessary sync between thrust ops and host threads (#160) @linhu-nv
- Remove scripts/checks/copyright.py (#149) @KyleFromNVIDIA