Skip to content

Commit

Permalink
merge dev-2210 branch to Main branch (#237)
Browse files Browse the repository at this point in the history
* Init 22.10.0-SNAPSHOT (#214)

Signed-off-by: Peixin Li <[email protected]>

Signed-off-by: Peixin Li <[email protected]>

* update version and fix some document error, add more comments for running xgboost notebooks on GCP (#215) (#222)

Signed-off-by: liyuan <[email protected]>

Signed-off-by: liyuan <[email protected]>

Signed-off-by: liyuan <[email protected]>

* update version and fix some document error, add more comments for running xgboost notebooks on GCP (#215) (#224)

Signed-off-by: liyuan <[email protected]>

Signed-off-by: liyuan <[email protected]>

Signed-off-by: liyuan <[email protected]>

* Update default cmake to 3.23.X in udf exmaple dockerfile (#227)

Signed-off-by: Peixin Li <[email protected]>

Signed-off-by: Peixin Li <[email protected]>

* [xgboost] Remove default parameters (#226)

* remove the default parameters for xgboost examples

* remove the default parameters

Signed-off-by: Bobby Wang <[email protected]>

* remove unused variables for mortgage-ETL

Signed-off-by: Bobby Wang <[email protected]>

* add more details/notes for the mortgage perforamcne tests (#229)

* add more details/notes for the mortgage perforamcne tests

Signed-off-by: liyuan <[email protected]>

* Update examples/XGBoost-Examples/README.md

Co-authored-by: Hao Zhu <[email protected]>

* Update examples/XGBoost-Examples/README.md

Co-authored-by: Hao Zhu <[email protected]>

* Update examples/XGBoost-Examples/README.md

Co-authored-by: Hao Zhu <[email protected]>

Signed-off-by: liyuan <[email protected]>
Co-authored-by: Hao Zhu <[email protected]>

* Enable automerge from 22.10 to 22.12 (#230)

Signed-off-by: Peixin Li <[email protected]>

Signed-off-by: Peixin Li <[email protected]>

* update versions for v22.10 release (#235)

Signed-off-by: liyuan <[email protected]>

Signed-off-by: liyuan <[email protected]>

Signed-off-by: Peixin Li <[email protected]>
Signed-off-by: liyuan <[email protected]>
Signed-off-by: Bobby Wang <[email protected]>
Co-authored-by: Jenkins Automation <[email protected]>
Co-authored-by: Peixin <[email protected]>
Co-authored-by: Bobby Wang <[email protected]>
Co-authored-by: Hao Zhu <[email protected]>
  • Loading branch information
5 people authored Oct 28, 2022
1 parent 998abfb commit 41f25c7
Show file tree
Hide file tree
Showing 31 changed files with 60 additions and 87 deletions.
8 changes: 4 additions & 4 deletions .github/workflows/auto-merge.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ name: auto-merge HEAD to BASE
on:
pull_request_target:
branches:
- branch-22.08
- branch-22.10
types: [closed]

jobs:
Expand All @@ -29,13 +29,13 @@ jobs:
steps:
- uses: actions/checkout@v2
with:
ref: branch-22.08 # force to fetch from latest upstream instead of PR ref
ref: branch-22.10 # force to fetch from latest upstream instead of PR ref

- name: auto-merge job
uses: ./.github/workflows/auto-merge
env:
OWNER: NVIDIA
REPO_NAME: spark-rapids-examples
HEAD: branch-22.08
BASE: branch-22.10
HEAD: branch-22.10
BASE: branch-22.12
AUTOMERGE_TOKEN: ${{ secrets.AUTOMERGE_TOKEN }} # use to merge PR
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@
"source": [
"%sh\n",
"cd ../../dbfs/FileStore/jars/\n",
"sudo wget -O rapids-4-spark_2.12-22.08.0.jar https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/22.08.0/rapids-4-spark_2.12-22.08.0.jar\n",
"sudo wget -O rapids-4-spark_2.12-22.10.0.jar https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/22.10.0/rapids-4-spark_2.12-22.10.0.jar\n",
"sudo wget -O xgboost4j-gpu_2.12-1.6.1.jar https://repo1.maven.org/maven2/ml/dmlc/xgboost4j-gpu_2.12/1.6.1/xgboost4j-gpu_2.12-1.6.1.jar\n",
"sudo wget -O xgboost4j-spark-gpu_2.12-1.6.1.jar https://repo1.maven.org/maven2/ml/dmlc/xgboost4j-spark-gpu_2.12/1.6.1/xgboost4j-spark-gpu_2.12-1.6.1.jar\n",
"ls -ltr\n",
Expand Down Expand Up @@ -60,7 +60,7 @@
"sudo rm -f /databricks/jars/spark--maven-trees--ml--10.x--xgboost-gpu--ml.dmlc--xgboost4j-spark-gpu_2.12--ml.dmlc__xgboost4j-spark-gpu_2.12__1.5.2.jar\n",
"\n",
"sudo cp /dbfs/FileStore/jars/xgboost4j-gpu_2.12-1.6.1.jar /databricks/jars/\n",
"sudo cp /dbfs/FileStore/jars/rapids-4-spark_2.12-22.08.0.jar /databricks/jars/\n",
"sudo cp /dbfs/FileStore/jars/rapids-4-spark_2.12-22.10.0.jar /databricks/jars/\n",
"sudo cp /dbfs/FileStore/jars/xgboost4j-spark-gpu_2.12-1.6.1.jar /databricks/jars/\"\"\", True)"
]
},
Expand Down Expand Up @@ -133,7 +133,7 @@
"1. Edit your cluster, adding an initialization script from `dbfs:/databricks/init_scripts/init.sh` in the \"Advanced Options\" under \"Init Scripts\" tab\n",
"2. Reboot the cluster\n",
"3. Go to \"Libraries\" tab under your cluster and install `dbfs:/FileStore/jars/xgboost4j-spark-gpu_2.12-1.6.1.jar` in your cluster by selecting the \"DBFS\" option for installing jars\n",
"4. Import the mortgage example notebook from `https://github.com/NVIDIA/spark-rapids-examples/blob/branch-22.08/examples/XGBoost-Examples/mortgage/notebooks/python/mortgage-gpu.ipynb`\n",
"4. Import the mortgage example notebook from `https://github.com/NVIDIA/spark-rapids-examples/blob/branch-22.10/examples/XGBoost-Examples/mortgage/notebooks/python/mortgage-gpu.ipynb`\n",
"5. Inside the mortgage example notebook, update the data paths\n",
" `train_data = reader.schema(schema).option('header', True).csv('/data/mortgage/csv/small-train.csv')`\n",
" `trans_data = reader.schema(schema).option('header', True).csv('/data/mortgage/csv/small-trans.csv')`"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@
"source": [
"%sh\n",
"cd ../../dbfs/FileStore/jars/\n",
"sudo wget -O rapids-4-spark_2.12-22.08.0.jar https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/22.08.0/rapids-4-spark_2.12-22.08.0.jar\n",
"sudo wget -O rapids-4-spark_2.12-22.10.0.jar https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/22.10.0/rapids-4-spark_2.12-22.10.0.jar\n",
"sudo wget -O xgboost4j-gpu_2.12-1.6.1.jar https://repo1.maven.org/maven2/ml/dmlc/xgboost4j-gpu_2.12/1.6.1/xgboost4j-gpu_2.12-1.6.1.jar\n",
"sudo wget -O xgboost4j-spark-gpu_2.12-1.6.1.jar https://repo1.maven.org/maven2/ml/dmlc/xgboost4j-spark-gpu_2.12/1.6.1/xgboost4j-spark-gpu_2.12-1.6.1.jar\n",
"ls -ltr\n",
Expand Down Expand Up @@ -60,7 +60,7 @@
"sudo rm -f /databricks/jars/spark--maven-trees--ml--9.x--xgboost-gpu--ml.dmlc--xgboost4j-spark-gpu_2.12--ml.dmlc__xgboost4j-spark-gpu_2.12__1.4.1.jar\n",
"\n",
"sudo cp /dbfs/FileStore/jars/xgboost4j-gpu_2.12-1.6.1.jar /databricks/jars/\n",
"sudo cp /dbfs/FileStore/jars/rapids-4-spark_2.12-22.08.0.jar /databricks/jars/\n",
"sudo cp /dbfs/FileStore/jars/rapids-4-spark_2.12-22.10.0.jar /databricks/jars/\n",
"sudo cp /dbfs/FileStore/jars/xgboost4j-spark-gpu_2.12-1.6.1.jar /databricks/jars/\"\"\", True)"
]
},
Expand Down Expand Up @@ -133,7 +133,7 @@
"1. Edit your cluster, adding an initialization script from `dbfs:/databricks/init_scripts/init.sh` in the \"Advanced Options\" under \"Init Scripts\" tab\n",
"2. Reboot the cluster\n",
"3. Go to \"Libraries\" tab under your cluster and install `dbfs:/FileStore/jars/xgboost4j-spark-gpu_2.12-1.6.1.jar` in your cluster by selecting the \"DBFS\" option for installing jars\n",
"4. Import the mortgage example notebook from `https://github.com/NVIDIA/spark-rapids-examples/blob/branch-22.08/examples/XGBoost-Examples/mortgage/notebooks/python/mortgage-gpu.ipynb`\n",
"4. Import the mortgage example notebook from `https://github.com/NVIDIA/spark-rapids-examples/blob/branch-22.10/examples/XGBoost-Examples/mortgage/notebooks/python/mortgage-gpu.ipynb`\n",
"5. Inside the mortgage example notebook, update the data paths\n",
" `train_data = reader.schema(schema).option('header', True).csv('/data/mortgage/csv/small-train.csv')`\n",
" `trans_data = reader.schema(schema).option('header', True).csv('/data/mortgage/csv/small-trans.csv')`"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ export SPARK_DOCKER_IMAGE=<gpu spark docker image repo and name>
export SPARK_DOCKER_TAG=<spark docker image tag>

pushd ${SPARK_HOME}
wget https://github.com/NVIDIA/spark-rapids-examples/raw/branch-22.08/dockerfile/Dockerfile
wget https://github.com/NVIDIA/spark-rapids-examples/raw/branch-22.10/dockerfile/Dockerfile

# Optionally install additional jars into ${SPARK_HOME}/jars/

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ For simplicity export the location to these jars. All examples assume the packag
* [XGBoost4j-Spark Package](https://repo1.maven.org/maven2/com/nvidia/xgboost4j-spark_3.0/1.4.2-0.3.0/)

2. Download the RAPIDS Accelerator for Apache Spark plugin jar
* [RAPIDS Spark Package](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/22.08.0/rapids-4-spark_2.12-22.08.0.jar)
* [RAPIDS Spark Package](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/22.10.0/rapids-4-spark_2.12-22.10.0.jar)

### Build XGBoost Python Examples

Expand All @@ -26,7 +26,7 @@ You need to copy the dataset to `/opt/xgboost`. Use the following links to downl

``` bash
export SPARK_XGBOOST_DIR=/opt/xgboost
export RAPIDS_JAR=${SPARK_XGBOOST_DIR}/rapids-4-spark_2.12-22.08.0.jar
export RAPIDS_JAR=${SPARK_XGBOOST_DIR}/rapids-4-spark_2.12-22.10.0.jar
export XGBOOST4J_JAR=${SPARK_XGBOOST_DIR}/xgboost4j_3.0-1.4.2-0.3.0.jar
export XGBOOST4J_SPARK_JAR=${SPARK_XGBOOST_DIR}/xgboost4j-spark_3.0-1.4.2-0.3.0.jar
export SAMPLE_ZIP=${SPARK_XGBOOST_DIR}/samples.zip
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ For simplicity export the location to these jars. All examples assume the packag
### Download the jars

1. Download the RAPIDS Accelerator for Apache Spark plugin jar
* [RAPIDS Spark Package](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/22.08.0/rapids-4-spark_2.12-22.08.0.jar)
* [RAPIDS Spark Package](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/22.10.0/rapids-4-spark_2.12-22.10.0.jar)

### Build XGBoost Scala Examples

Expand All @@ -22,6 +22,6 @@ You need to copy the dataset to `/opt/xgboost`. Use the following links to downl

``` bash
export SPARK_XGBOOST_DIR=/opt/xgboost
export RAPIDS_JAR=${SPARK_XGBOOST_DIR}/rapids-4-spark_2.12-22.08.0.jar
export RAPIDS_JAR=${SPARK_XGBOOST_DIR}/rapids-4-spark_2.12-22.10.0.jar
export SAMPLE_JAR=${SPARK_XGBOOST_DIR}/sample_xgboost_apps-0.2.3-jar-with-dependencies.jar
```
2 changes: 1 addition & 1 deletion examples/ML+DL-Examples/Spark-cuML/pca/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@

ARG CUDA_VER=11.5.1
FROM nvidia/cuda:${CUDA_VER}-devel-ubuntu20.04
ARG BRANCH_VER=22.08
ARG BRANCH_VER=22.10

RUN apt-get update
RUN apt-get install -y wget ninja-build git
Expand Down
4 changes: 2 additions & 2 deletions examples/ML+DL-Examples/Spark-cuML/pca/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ User can also download the release jar from Maven central:

[rapids-4-spark-ml_2.12-22.02.0-cuda11.jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark-ml_2.12/22.02.0/rapids-4-spark-ml_2.12-22.02.0-cuda11.jar)

[rapids-4-spark_2.12-22.08.0.jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/22.08.0/rapids-4-spark_2.12-22.08.0.jar)
[rapids-4-spark_2.12-22.10.0.jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/22.10.0/rapids-4-spark_2.12-22.10.0.jar)


## Sample code
Expand Down Expand Up @@ -48,7 +48,7 @@ It is assumed that a Standalone Spark cluster has been set up, the `SPARK_MASTER

``` bash
RAPIDS_ML_JAR=PATH_TO_rapids-4-spark-ml_2.12-22.02.0-cuda11.jar
PLUGIN_JAR=PATH_TO_rapids-4-spark_2.12-22.08.0.jar
PLUGIN_JAR=PATH_TO_rapids-4-spark_2.12-22.10.0.jar
jupyter toree install \
--spark_home=${SPARK_HOME} \
Expand Down
4 changes: 2 additions & 2 deletions examples/ML+DL-Examples/Spark-cuML/pca/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@
<groupId>com.nvidia</groupId>
<artifactId>PCAExample</artifactId>
<packaging>jar</packaging>
<version>22.08.0-SNAPSHOT</version>
<version>22.10.0-SNAPSHOT</version>

<properties>
<maven.compiler.source>8</maven.compiler.source>
Expand Down Expand Up @@ -51,7 +51,7 @@
<dependency>
<groupId>com.nvidia</groupId>
<artifactId>rapids-4-spark-ml_2.12</artifactId>
<version>22.08.0-SNAPSHOT</version>
<version>22.10.0-SNAPSHOT</version>
</dependency>
</dependencies>

Expand Down
6 changes: 3 additions & 3 deletions examples/ML+DL-Examples/Spark-cuML/pca/spark-submit.sh
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,8 @@
# limitations under the License.
#

ML_JAR=/root/.m2/repository/com/nvidia/rapids-4-spark-ml_2.12/22.08.0-SNAPSHOT/rapids-4-spark-ml_2.12-22.08.0-SNAPSHOT.jar
PLUGIN_JAR=/root/.m2/repository/com/nvidia/rapids-4-spark_2.12/22.08.0-SNAPSHOT/rapids-4-spark_2.12-22.08.0-SNAPSHOT.jar
ML_JAR=/root/.m2/repository/com/nvidia/rapids-4-spark-ml_2.12/22.10.0-SNAPSHOT/rapids-4-spark-ml_2.12-22.10.0-SNAPSHOT.jar
PLUGIN_JAR=/root/.m2/repository/com/nvidia/rapids-4-spark_2.12/22.10.0-SNAPSHOT/rapids-4-spark_2.12-22.10.0-SNAPSHOT.jar

$SPARK_HOME/bin/spark-submit \
--master spark://127.0.0.1:7077 \
Expand All @@ -38,4 +38,4 @@ $SPARK_HOME/bin/spark-submit \
--conf spark.network.timeout=1000s \
--jars $ML_JAR,$PLUGIN_JAR \
--class com.nvidia.spark.examples.pca.Main \
/workspace/target/PCAExample-22.08.0-SNAPSHOT.jar
/workspace/target/PCAExample-22.10.0-SNAPSHOT.jar
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
"import os\n",
"# Change to your cluster ip:port and directories\n",
"SPARK_MASTER_URL = os.getenv(\"SPARK_MASTER_URL\", \"spark:your-ip:port\")\n",
"RAPIDS_JAR = os.getenv(\"RAPIDS_JAR\", \"/your-path/rapids-4-spark_2.12-22.08.0.jar\")\n"
"RAPIDS_JAR = os.getenv(\"RAPIDS_JAR\", \"/your-path/rapids-4-spark_2.12-22.10.0.jar\")\n"
]
},
{
Expand Down
2 changes: 1 addition & 1 deletion examples/UDF-Examples/RAPIDS-accelerated-UDFs/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ CUDA_VERSION_MINOR=$(echo $CUDA_VERSION | tr -d '.' | cut -c 3); \
# Set JDK8 as the default Java
&& update-alternatives --set java /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java

ARG CMAKE_VERSION=3.20.5
ARG CMAKE_VERSION=3.23.3

# Install CMake
RUN cd /tmp \
Expand Down
2 changes: 1 addition & 1 deletion examples/UDF-Examples/RAPIDS-accelerated-UDFs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,7 @@ See above Prerequisites section
First finish the steps in "Building with Native Code Examples and run test cases" section, then do the following in the docker.

### Get jars from Maven Central
[rapids-4-spark_2.12-22.08.0.jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/22.08.0/rapids-4-spark_2.12-22.08.0.jar)
[rapids-4-spark_2.12-22.10.0.jar](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/22.10.0/rapids-4-spark_2.12-22.10.0.jar)

### Launch a local mode Spark

Expand Down
4 changes: 2 additions & 2 deletions examples/UDF-Examples/RAPIDS-accelerated-UDFs/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@
user defined functions for use with the RAPIDS Accelerator
for Apache Spark
</description>
<version>22.08.0-SNAPSHOT</version>
<version>22.10.0-SNAPSHOT</version>

<properties>
<maven.compiler.source>1.8</maven.compiler.source>
Expand All @@ -37,7 +37,7 @@
<cuda.version>cuda11</cuda.version>
<scala.binary.version>2.12</scala.binary.version>
<!-- Depends on release version, Snapshot version is not published to the Maven Central -->
<rapids4spark.version>22.08.0</rapids4spark.version>
<rapids4spark.version>22.10.0</rapids4spark.version>
<spark.version>3.1.1</spark.version>
<scala.version>2.12.15</scala.version>
<udf.native.build.path>${project.build.directory}/cpp-build</udf.native.build.path>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,9 @@
# limitations under the License.
#=============================================================================

cmake_minimum_required(VERSION 3.20.1 FATAL_ERROR)
cmake_minimum_required(VERSION 3.23.1 FATAL_ERROR)

file(DOWNLOAD https://raw.githubusercontent.com/rapidsai/rapids-cmake/branch-22.08/RAPIDS.cmake
file(DOWNLOAD https://raw.githubusercontent.com/rapidsai/rapids-cmake/branch-22.10/RAPIDS.cmake
${CMAKE_BINARY_DIR}/RAPIDS.cmake)
include(${CMAKE_BINARY_DIR}/RAPIDS.cmake)

Expand All @@ -32,7 +32,7 @@ if(DEFINED GPU_ARCHS)
endif()
rapids_cuda_init_architectures(UDFEXAMPLESJNI)

project(UDFEXAMPLESJNI VERSION 22.08.0 LANGUAGES C CXX CUDA)
project(UDFEXAMPLESJNI VERSION 22.10.0 LANGUAGES C CXX CUDA)

option(PER_THREAD_DEFAULT_STREAM "Build with per-thread default stream" OFF)
option(BUILD_UDF_BENCHMARKS "Build the benchmarks" OFF)
Expand Down Expand Up @@ -84,10 +84,10 @@ set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -w --expt-extended-lambda --expt-relax
set(CUDA_USE_STATIC_CUDA_RUNTIME OFF)

rapids_cpm_init()
rapids_cpm_find(cudf 22.08.00
rapids_cpm_find(cudf 22.10.00
CPM_ARGS
GIT_REPOSITORY https://github.com/rapidsai/cudf.git
GIT_TAG branch-22.08
GIT_TAG branch-22.10
GIT_SHALLOW TRUE
SOURCE_SUBDIR cpp
OPTIONS "BUILD_TESTS OFF"
Expand Down
2 changes: 1 addition & 1 deletion examples/UDF-Examples/Spark-cuSpatial/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ RUN conda --version
RUN conda install -c conda-forge openjdk=8 maven=3.8.1 -y

# install cuDF dependency.
RUN conda install -c rapidsai -c nvidia -c conda-forge -c defaults libcuspatial=22.08 python=3.8 -y
RUN conda install -c rapidsai -c nvidia -c conda-forge -c defaults libcuspatial=22.10 python=3.8 -y

RUN wget --quiet \
https://github.com/Kitware/CMake/releases/download/v3.21.3/cmake-3.21.3-linux-x86_64.tar.gz \
Expand Down
2 changes: 1 addition & 1 deletion examples/UDF-Examples/Spark-cuSpatial/Dockerfile.awsdb
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ RUN wget -q https://repo.continuum.io/miniconda/Miniconda3-py38_4.9.2-Linux-x86_
conda config --system --set always_yes True && \
conda clean --all

RUN conda install -c rapidsai-nightly -c nvidia -c conda-forge -c defaults libcuspatial=22.08
RUN conda install -c rapidsai-nightly -c nvidia -c conda-forge -c defaults libcuspatial=22.10
RUN conda install -c conda-forge libgdal==3.3.1
RUN pip install jupyter
ENV JAVA_HOME /usr/lib/jvm/java-1.8.0-openjdk-amd64
Expand Down
6 changes: 3 additions & 3 deletions examples/UDF-Examples/Spark-cuSpatial/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,9 +65,9 @@ Note: The docker env is just for building the jar, not for running the applicati
4. [cuspatial](https://github.com/rapidsai/cuspatial): install libcuspatial
```Bash
# Install libcuspatial from conda
conda install -c rapidsai -c nvidia -c conda-forge -c defaults libcuspatial=22.06
conda install -c rapidsai -c nvidia -c conda-forge -c defaults libcuspatial=22.10
# or below command for the nightly (aka SNAPSHOT) version.
conda install -c rapidsai-nightly -c nvidia -c conda-forge -c defaults libcuspatial=22.08
conda install -c rapidsai-nightly -c nvidia -c conda-forge -c defaults libcuspatial=22.10
```
5. Build the JAR using `mvn package`.
```Bash
Expand All @@ -86,7 +86,7 @@ Note: The docker env is just for building the jar, not for running the applicati
2. Set up [a standalone cluster](/docs/get-started/xgboost-examples/on-prem-cluster/standalone-scala.md) of Spark. Make sure the conda/lib is included in LD_LIBRARY_PATH, so that spark executors can load libcuspatial.so.
3. Download Spark RAPIDS JAR
* [Spark RAPIDS JAR v22.08.0](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/22.08.0/rapids-4-spark_2.12-22.08.0.jar) or above
* [Spark RAPIDS JAR v22.10.0](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/22.10.0/rapids-4-spark_2.12-22.10.0.jar) or above
4. Prepare sample dataset and JARs. Copy the [sample dataset](../../../datasets/cuspatial_data.tar.gz) to `/data/cuspatial_data/`.
Copy Spark RAPIDS JAR and `spark-cuspatial-<version>.jar` to `/data/cuspatial_data/jars/`.
If you build the `spark-cuspatial-<version>.jar` in docker, please copy the jar from docker to local:
Expand Down
2 changes: 1 addition & 1 deletion examples/UDF-Examples/Spark-cuSpatial/gpu-run.sh
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ rm -rf $DATA_OUT_PATH
# the path to keep the jars of spark-rapids & spark-cuspatial
JARS=$ROOT_PATH/jars

JARS_PATH=${JARS_PATH:-$JARS/rapids-4-spark_2.12-22.08.0.jar,$JARS/spark-cuspatial-22.08.0-SNAPSHOT.jar}
JARS_PATH=${JARS_PATH:-$JARS/rapids-4-spark_2.12-22.10.0.jar,$JARS/spark-cuspatial-22.10.0-SNAPSHOT.jar}

$SPARK_HOME/bin/spark-submit --master spark://$HOSTNAME:7077 \
--name "Gpu Spatial Join UDF" \
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
"source": [
"from pyspark.sql import SparkSession\n",
"import os\n",
"jarsPath = os.getenv(\"JARS_PATH\", \"/data/cuspatial_data/jars/rapids-4-spark_2.12-22.08.0.jar,/data/cuspatial_data/jars/spark-cuspatial-22.08.0-SNAPSHOT.jar\")\n",
"jarsPath = os.getenv(\"JARS_PATH\", \"/data/cuspatial_data/jars/rapids-4-spark_2.12-22.10.0.jar,/data/cuspatial_data/jars/spark-cuspatial-22.10.0-SNAPSHOT.jar\")\n",
"spark = SparkSession.builder \\\n",
" .config(\"spark.jars\", jarsPath) \\\n",
" .config(\"spark.sql.adaptive.enabled\", \"false\") \\\n",
Expand Down
4 changes: 2 additions & 2 deletions examples/UDF-Examples/Spark-cuSpatial/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -24,13 +24,13 @@
<name>UDF of the cuSpatial case for the RAPIDS Accelerator</name>
<description>The RAPIDS accelerated user defined function of the cuSpatial case
for use with the RAPIDS Accelerator for Apache Spark</description>
<version>22.08.0-SNAPSHOT</version>
<version>22.10.0-SNAPSHOT</version>

<properties>
<maven.compiler.source>1.8</maven.compiler.source>
<maven.compiler.target>1.8</maven.compiler.target>
<java.major.version>8</java.major.version>
<rapids.version>22.08.0</rapids.version>
<rapids.version>22.10.0</rapids.version>
<scala.binary.version>2.12</scala.binary.version>
<spark.version>3.2.0</spark.version>
<udf.native.build.path>${project.build.directory}/cpp-build</udf.native.build.path>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@

cmake_minimum_required(VERSION 3.20.1 FATAL_ERROR)

project(SPATIALUDJNI VERSION 22.08.0 LANGUAGES C CXX CUDA)
project(SPATIALUDJNI VERSION 22.10.0 LANGUAGES C CXX CUDA)

###################################################################################################
# - build type ------------------------------------------------------------------------------------
Expand Down
4 changes: 4 additions & 0 deletions examples/XGBoost-Examples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,10 @@ In the public cloud, better performance can lead to significantly lower costs as

![mortgage-speedup](/docs/img/guides/mortgage-perf.png)

Note that the test result is based on 21 years [Fannie Mea Single-Family Loan Performance Data](https://capitalmarkets.fanniemae.com/credit-risk-transfer/single-family-credit-risk-transfer/fannie-mae-single-family-loan-performance-data)
with a 4 A100 GPU and 512 CPU vcores cluster, the performance is affected by many aspects,
including data size and type of GPU.

In this folder, there are three blue prints for users to learn about using
Spark XGBoost and RAPIDS Accelerator on GPUs :

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -63,9 +63,6 @@ object Main {
val xgbClassificationModel = if (xgboostArgs.isToTrain) {
// build XGBoost classifier
val paramMap = xgboostArgs.xgboostParams(Map(
"eta" -> 0.1,
"missing" -> 0.0,
"max_depth" -> 2,
"objective" -> "binary:logistic",
"eval_sets" -> datasets(1).map(ds => Map("eval" -> ds)).getOrElse(Map.empty)
))
Expand Down
Loading

0 comments on commit 41f25c7

Please sign in to comment.