Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Main 2406 release #397

Closed
wants to merge 16 commits into from
Closed

Main 2406 release #397

wants to merge 16 commits into from

Conversation

nvliyuan
Copy link
Collaborator

@nvliyuan nvliyuan commented Jun 25, 2024

update main branch for v2406 release.
Please create merge commit not squash.

NvTimLiu and others added 11 commits March 22, 2024 20:25
[auto-merge] branch-24.04 to branch-24.06 [skip ci] [bot]
[auto-merge] branch-24.04 to branch-24.06 [skip ci] [bot]
[auto-merge] branch-24.04 to branch-24.06 [skip ci] [bot]
After installing latest Jupyter notebooke, the json format check is strick, due to passing a string value "true"

for the "use_gpu":"true", "use_gpu": True is required.

```
params = {
    "tree_method": "gpu_hist",
    "grow_policy": "depthwise",
    "num_workers": 1,
    "use_gpu": "true",
}
```
TypeError: Invalid param value given for param "use_gpu".

Boolean Param requires value of type bool. Found <class 'str'>.

More over, "use_gpu": True is deprecated in dmlc/XGBoost, change to user "device": "cuda" instead

Signed-off-by: Tim Liu <[email protected]>
* Update Databricks Tools Notebooks for 24.04.0

Signed-off-by: Partho Sarthi <[email protected]>

* Add limitation for single event logs

* Revert "Add limitation for single event logs"

* Update Databricks Tools Notebooks for 24.06.0

Signed-off-by: Partho Sarthi <[email protected]>

---------

Signed-off-by: Partho Sarthi <[email protected]>
# Conflicts:
#	examples/ML+DL-Examples/Spark-cuML/pca/README.md
#	examples/ML+DL-Examples/Spark-cuML/pca/pom.xml
#	examples/ML+DL-Examples/Spark-cuML/pca/spark-submit.sh
#	examples/UDF-Examples/RAPIDS-accelerated-UDFs/pom.xml
#	examples/UDF-Examples/Spark-cuSpatial/gpu-run.sh
#	examples/UDF-Examples/Spark-cuSpatial/notebooks/cuspatial_sample_standalone.ipynb
#	examples/UDF-Examples/Spark-cuSpatial/pom.xml
#	examples/XGBoost-Examples/mortgage/notebooks/python/cv-mortgage-gpu.ipynb
#	examples/XGBoost-Examples/mortgage/notebooks/scala/mortgage-ETL.ipynb
@nvliyuan nvliyuan requested a review from NvTimLiu June 25, 2024 06:44
@jlowe
Copy link
Member

jlowe commented Jun 25, 2024

This is missing #388.

@nvliyuan
Copy link
Collaborator Author

This is missing #388.

Hi @jlowe , do you mean this commit id?

NvTimLiu
NvTimLiu previously approved these changes Jun 26, 2024
@jlowe
Copy link
Member

jlowe commented Jun 26, 2024

Yes, I mean that commit. It's weird in that I can see the commit in the commits to be merged, yet when I diff this PR against branch-24.06 there are some missing changes that I thought were coming from #388. For example:

$ git fetch https://github.com/nvliyuan/spark-rapids-examples.git main-2406-release
$ git diff branch-24.06 FETCH_HEAD
diff --git a/docs/get-started/xgboost-examples/csp/databricks/databricks.md b/docs/get-started/xgboost-examples/csp/databricks/databricks.md
index 1377a15..2d03013 100644
--- a/docs/get-started/xgboost-examples/csp/databricks/databricks.md
+++ b/docs/get-started/xgboost-examples/csp/databricks/databricks.md
@@ -21,8 +21,7 @@ Navigate to your home directory in the UI and select **Create** > **File** from
 create an `init.sh` scripts with contents:   
    ```bash
    #!/bin/bash
-   sudo wget -O /databricks/jars/rapids-4-spark_2.12-24.06.0.jar https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/24.06.0/rapids-4-spark_2.12-24.06.0.jar
-   ```
+   sudo wget -O /databricks/jars/rapids-4-spark_2.12-24.04.0.jar https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/24.04.0/rapids-4-spark_2.12-24.04.0.jar
 1. Select the Databricks Runtime Version from one of the supported runtimes specified in the
    Prerequisites section.
 2. Choose the number of workers that matches the number of GPUs you want to use.
[...]

We can see from the diff above that we're rolling back changes in branch-24.06 as part of this merge of that branch to main. Not quite sure why these changes were lost despite #388 being part of the commit list in this PR (bad merge conflict resolution maybe?), but it needs to be fixed. You can do a diff like I show above to verify all of the changes between this PR and branch-24.06 make sense.

Signed-off-by: liyuan <[email protected]>
Signed-off-by: liyuan <[email protected]>
Signed-off-by: liyuan <[email protected]>
Signed-off-by: liyuan <[email protected]>
@nvliyuan
Copy link
Collaborator Author

nvliyuan commented Jun 27, 2024

Hi @jlowe, thanks for pointing it put, I just checked the diff details, basically there are three part:
part1: I miss some commits while doing the merge conflict, for example:

diff --git a/docs/get-started/xgboost-examples/csp/databricks/databricks.md b/docs/get-started/xgboost-examples/csp/databricks/databricks.md
index 1377a15..2d03013 100644
--- a/docs/get-started/xgboost-examples/csp/databricks/databricks.md
+++ b/docs/get-started/xgboost-examples/csp/databricks/databricks.md
@@ -21,8 +21,7 @@ Navigate to your home directory in the UI and select **Create** > **File** from
 create an `init.sh` scripts with contents:   
    ```bash
    #!/bin/bash
-   sudo wget -O /databricks/jars/rapids-4-spark_2.12-24.06.0.jar https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/24.06.0/rapids-4-spark_2.12-24.06.0.jar
-   ```
+   sudo wget -O /databricks/jars/rapids-4-spark_2.12-24.04.0.jar https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/24.04.0/rapids-4-spark_2.12-24.04.0.jar
 1. Select the Databricks Runtime Version from one of the supported runtimes specified in the

part2: There are something wrong in branch-2406, since PCA demo only compatible with spark-rapids v23.04, cuspecial demo only compatible with spark-rapids v23.02, we miss this during the previous release, for example:

diff --git a/examples/ML+DL-Examples/Spark-cuML/pca/README.md b/examples/ML+DL-Examples/Spark-cuML/pca/README.md
index 1b2b22d..0c4a514 100644
--- a/examples/ML+DL-Examples/Spark-cuML/pca/README.md
+++ b/examples/ML+DL-Examples/Spark-cuML/pca/README.md
@@ -12,7 +12,7 @@ User can also download the release jar from Maven central:
 
 [rapids-4-spark-ml_2.12-22.02.0-cuda11.jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark-ml_2.12/22.02.0/rapids-4-spark-ml_2.12-22.02.0-cuda11.jar)
 
-[rapids-4-spark_2.12-24.06.0.jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/24.06.0/rapids-4-spark_2.12-24.06.0.jar)
+[rapids-4-spark_2.12-23.04.0.jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/23.04.0/rapids-4-spark_2.12-23.04.0.jar)
 
 Note: This demo could only work with v22.02.0 spark-ml version, and only compatible with spark-rapids versions prior to 24.06.0 . Please do not update the version in release.
 
@@ -49,7 +49,7 @@ It is assumed that a Standalone Spark cluster has been set up, the `SPARK_MASTER
 
     ``` bash
     RAPIDS_ML_JAR=PATH_TO_rapids-4-spark-ml_2.12-22.02.0-cuda11.jar
-    PLUGIN_JAR=PATH_TO_rapids-4-spark_2.12-24.06.0.jar
+    PLUGIN_JAR=PATH_TO_rapids-4-spark_2.12-23.04.0.jar

part3: it is ok to exist some snapshot version diff since we will remove all -snapshot in the main branch, for example:

diff --git a/examples/XGBoost-Examples/taxi/pom.xml b/examples/XGBoost-Examples/taxi/pom.xml
index 7dcfa92..dad8dfa 100644
--- a/examples/XGBoost-Examples/taxi/pom.xml
+++ b/examples/XGBoost-Examples/taxi/pom.xml
@@ -21,7 +21,7 @@
     <parent>
         <artifactId>sample_xgboost_examples</artifactId>
         <groupId>com.nvidia</groupId>
-        <version>0.2.3-SNAPSHOT</version>
+        <version>0.2.3</version>
     </parent>
     <modelVersion>4.0.0</modelVersion>
 
diff --git a/examples/XGBoost-Examples/utility/pom.xml b/examples/XGBoost-Examples/utility/pom.xml
index e84d9e2..15d5a97 100644
--- a/examples/XGBoost-Examples/utility/pom.xml
+++ b/examples/XGBoost-Examples/utility/pom.xml
@@ -21,7 +21,7 @@
     <parent>
         <artifactId>sample_xgboost_examples</artifactId>
         <groupId>com.nvidia</groupId>
-        <version>0.2.3-SNAPSHOT</version>
+        <version>0.2.3</version>
     </parent>
     <modelVersion>4.0.0</modelVersion>

For part1, I already push some commits to fix that
For part2, I already draft pr #398 to fix the version issue in branch-2406
For part3, I assume we could ignore them.
Now if you do a diff to compare main-2406-release and branch-2406, there should be only snapshot version diffs.

@jlowe
Copy link
Member

jlowe commented Jun 27, 2024

Yes, snapshot diffs are expected. The other two need to be fixed.

Nit: Given there were a number of merge mistakes on this PR, it would be cleaner to wait for #398 to be fixed and then re-merge branch-24.06 to main in a fresh PR. We're not squashing commits for this, and all these "fix" commits will appear in the mainline release. The mainline history ideally should have a minimal number of divergent commits from the dev branches. Not must-fix for me, just would be nice.

@nvliyuan
Copy link
Collaborator Author

close this pr and draft new pr #401

@nvliyuan nvliyuan closed this Jun 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants