Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release 24.04 [skip ci] #654

Merged
merged 47 commits into from
May 10, 2024
Merged

release 24.04 [skip ci] #654

merged 47 commits into from
May 10, 2024

Conversation

YanxuanLiu
Copy link
Collaborator

Merge to main for 24.04 release

Release notes as follows:

  • Feature standardization in logistic regression for sparse vectors.
  • GPU accelerated Density Based Spatial Clustering for Applications with Noise (DBSCAN) algorithm with example notebook.
  • GPU accelerated IVF-Flat Approximate Nearest Neighbor algorithm with example notebook
  • Stage level scheduling support for Yarn and K8s.
  • Update of RAPIDS dependencies to 24.04.

NOTE: this PR must be merged as create a merge commit

nvauto and others added 30 commits January 24, 2024 18:08
[auto-merge] branch-24.02 to branch-24.04 [skip ci] [bot]
[auto-merge] branch-24.02 to branch-24.04 [skip ci] [bot]
[auto-merge] branch-24.02 to branch-24.04 [skip ci] [bot]
[auto-merge] branch-24.02 to branch-24.04 [skip ci] [bot]
[auto-merge] branch-24.02 to branch-24.04 [skip ci] [bot]
[auto-merge] branch-24.02 to branch-24.04 [skip ci] [bot]
[auto-merge] branch-24.02 to branch-24.04 [skip ci] [bot]
[auto-merge] branch-24.02 to branch-24.04 [skip ci] [bot]
[auto-merge] branch-24.02 to branch-24.04 [skip ci] [bot]
[auto-merge] branch-24.02 to branch-24.04 [skip ci] [bot]
[auto-merge] branch-24.02 to branch-24.04 [skip ci] [bot]
[auto-merge] branch-24.02 to branch-24.04 [skip ci] [bot]
[auto-merge] branch-24.02 to branch-24.04 [skip ci] [bot]
[auto-merge] branch-24.02 to branch-24.04 [skip ci] [bot]
[auto-merge] branch-24.02 to branch-24.04 [skip ci] [bot]
[auto-merge] branch-24.02 to branch-24.04 [skip ci] [bot]
Signed-off-by: YanxuanLiu <[email protected]>
* DBSCAN basis

* Precomputed distance testcase, usecase comments

Signed-off-by: nvssh nssswitch user account <[email protected]>

* Code structure cleaning and update for ColID extraction

* testfile fix

* Remove Precomputed mode support

* idCol fix

* Syntax fix for CI python version

* ColID fix, code cleaning for sparse vector input

* Avoid core indices calc kernel by default

---------

Signed-off-by: nvssh nssswitch user account <[email protected]>
Co-authored-by: nvssh nssswitch user account <[email protected]>
* kNN update for colID extraction

* comment fix

Signed-off-by: nvssh nssswitch user account <[email protected]>

---------

Signed-off-by: nvssh nssswitch user account <[email protected]>
Co-authored-by: nvssh nssswitch user account <[email protected]>
* Datagen Fix

* Auth fix

Signed-off-by: nvssh nssswitch user account <[email protected]>

* Auth fix

Signed-off-by: Hongzhe Cheng <[email protected]>

---------

Signed-off-by: nvssh nssswitch user account <[email protected]>
Signed-off-by: Hongzhe Cheng <[email protected]>
Co-authored-by: nvssh nssswitch user account <[email protected]>
* DBSCAN notebook, benchmark script

* Move DBSCAN to clustering

Signed-off-by: Hongzhe Cheng <[email protected]>

* File check in

* benchmark script support

* Benchmark parameter fix

* Transform time fix

* DBSCAN data broadcast fix, comment fix

* separate change for DBSCAN source code fix

* cmdline switch for benchmark score compute

* style fix

---------

Signed-off-by: Hongzhe Cheng <[email protected]>
We need to remove all the CI job's link from the pre-merge-CI workflow for security concern.

Signed-off-by: Tim Liu <[email protected]>
* DBSCAN broadcast fix

Signed-off-by: Hongzhe Cheng <[email protected]>

* Comment delete

---------

Signed-off-by: Hongzhe Cheng <[email protected]>
* Add in algorithm parameter support for DBSCAN

Signed-off-by: Hongzhe Cheng <[email protected]>

* comment fix

---------

Signed-off-by: Hongzhe Cheng <[email protected]>
wbo4958 and others added 17 commits April 17, 2024 07:54
* support standardization for sparse vectors per cuml 24.04

* revise test cases to test sparse standardization

* revise

* revise docstring regarding sparse standardization

---------

Signed-off-by: Jinfeng <[email protected]>
* add dbscan to api docs

Signed-off-by: Erik Ordentlich <[email protected]>

* make consistent with source changes

Signed-off-by: Erik Ordentlich <[email protected]>

---------

Signed-off-by: Erik Ordentlich <[email protected]>
* keep eval related computations and data on gpu

Signed-off-by: Erik Ordentlich <[email protected]>

* extend to regression and rf

Signed-off-by: Erik Ordentlich <[email protected]>

* clean up

Signed-off-by: Erik Ordentlich <[email protected]>

* fix types

Signed-off-by: Erik Ordentlich <[email protected]>

---------

Signed-off-by: Erik Ordentlich <[email protected]>
* Twitter DBSCAN exmaple

Signed-off-by: Hongzhe Cheng <[email protected]>

* Parquet Save

---------

Signed-off-by: Hongzhe Cheng <[email protected]>
…#630)

* Get toy example working

* square ivfflat dists, add join API, add test with/without setting idCol

* test key APIs, and add parametrize

* fix a bug relates to returned id

* move dictionary typeconverter to a class

* revised per comments that can be easily addressed

* remove brute option from approximatenearestneighbors

* add example and docstring to Estimator class and tested the examples in pyspark shell

* add docstring to kneighbors and approxSimilarityJoin

* reuse code: exact knn working

* get ann working, runslow tested

* test getter setter

* support metric argument and all values except cosine and correlation

* fix mypy error

---------

Signed-off-by: Jinfeng <[email protected]>
* minor doc updates

Signed-off-by: Erik Ordentlich <[email protected]>

* fix bullets in doc string

Signed-off-by: Erik Ordentlich <[email protected]>

---------

Signed-off-by: Erik Ordentlich <[email protected]>
…ns instead of relying on isSet(idCol) (#642)

* fix ensureIdCol to avoid using isSet(idCol)

* simply the logic of ensureIdCol

* try set idCol to None

---------

Signed-off-by: Jinfeng <[email protected]>
#646)

* remove unsupported save,load,read,write from api docs for knn estimator, model classes

Signed-off-by: Erik Ordentlich <[email protected]>

* fix class names in error messages

Signed-off-by: Erik Ordentlich <[email protected]>

* typo

Signed-off-by: Erik Ordentlich <[email protected]>

---------

Signed-off-by: Erik Ordentlich <[email protected]>
@YanxuanLiu YanxuanLiu requested a review from eordentlich May 10, 2024 01:42
@YanxuanLiu YanxuanLiu self-assigned this May 10, 2024
@YanxuanLiu
Copy link
Collaborator Author

build

Copy link
Collaborator

@eordentlich eordentlich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@YanxuanLiu YanxuanLiu merged commit df01b39 into main May 10, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants