Skip to content

Commit

Permalink
Merge pull request #263 from nvliyuan/main-2212-release
Browse files Browse the repository at this point in the history
merge branch-22.12 to main branch
  • Loading branch information
nvliyuan authored Dec 20, 2022
2 parents c1af0cd + 239389e commit 8599ece
Show file tree
Hide file tree
Showing 65 changed files with 2,239 additions and 1,038 deletions.
10 changes: 5 additions & 5 deletions .github/workflows/auto-merge.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ name: auto-merge HEAD to BASE
on:
pull_request_target:
branches:
- branch-22.10
- branch-22.12
types: [closed]

jobs:
Expand All @@ -27,15 +27,15 @@ jobs:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v3
with:
ref: branch-22.10 # force to fetch from latest upstream instead of PR ref
ref: branch-22.12 # force to fetch from latest upstream instead of PR ref

- name: auto-merge job
uses: ./.github/workflows/auto-merge
env:
OWNER: NVIDIA
REPO_NAME: spark-rapids-examples
HEAD: branch-22.10
BASE: branch-22.12
HEAD: branch-22.12
BASE: branch-23.02
AUTOMERGE_TOKEN: ${{ secrets.AUTOMERGE_TOKEN }} # use to merge PR
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ There are broadly four categories of examples in this repo:
2. [Spark XGBoost](./examples/XGBoost-Examples)
3. [Deep Learning/Machine Learning](./examples/ML+DL-Examples)
4. [RAPIDS UDF](./examples/UDF-Examples)
5. [Databricks Tools demo notebooks](./tools/databricks)

For more information on each of the examples please look into respective categories.

Expand Down
4 changes: 2 additions & 2 deletions docs/get-started/xgboost-examples/csp/aws/ec2.md
Original file line number Diff line number Diff line change
Expand Up @@ -177,8 +177,8 @@ spark-submit --master spark://$HOSTNAME:7077 \
${SAMPLE_JAR} \
-num_workers=${NUM_EXECUTORS} \
-format=csv \
-dataPath="train::s3a://spark-xgboost-mortgage-dataset/csv/train/2000Q1" \
-dataPath="trans::s3a://spark-xgboost-mortgage-dataset/csv/eval/2000Q1" \
-dataPath="train::your-train-data-path" \
-dataPath="trans::your-eval-data-path" \
-numRound=100 -max_depth=8 -nthread=$NUM_EXECUTOR_CORES -showFeatures=0 \
-tree_method=gpu_hist
```
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,9 +24,9 @@
"source": [
"%sh\n",
"cd ../../dbfs/FileStore/jars/\n",
"sudo wget -O rapids-4-spark_2.12-22.10.0.jar https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/22.10.0/rapids-4-spark_2.12-22.10.0.jar\n",
"sudo wget -O xgboost4j-gpu_2.12-1.6.1.jar https://repo1.maven.org/maven2/ml/dmlc/xgboost4j-gpu_2.12/1.6.1/xgboost4j-gpu_2.12-1.6.1.jar\n",
"sudo wget -O xgboost4j-spark-gpu_2.12-1.6.1.jar https://repo1.maven.org/maven2/ml/dmlc/xgboost4j-spark-gpu_2.12/1.6.1/xgboost4j-spark-gpu_2.12-1.6.1.jar\n",
"sudo wget -O rapids-4-spark_2.12-22.12.0.jar https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/22.12.0/rapids-4-spark_2.12-22.12.0.jar\n",
"sudo wget -O xgboost4j-gpu_2.12-1.7.1.jar https://repo1.maven.org/maven2/ml/dmlc/xgboost4j-gpu_2.12/1.7.1/xgboost4j-gpu_2.12-1.7.1.jar\n",
"sudo wget -O xgboost4j-spark-gpu_2.12-1.7.1.jar https://repo1.maven.org/maven2/ml/dmlc/xgboost4j-spark-gpu_2.12/1.7.1/xgboost4j-spark-gpu_2.12-1.7.1.jar\n",
"ls -ltr\n",
"\n",
"# Your Jars are downloaded in dbfs:/FileStore/jars directory"
Expand Down Expand Up @@ -59,9 +59,9 @@
"sudo rm -f /databricks/jars/spark--maven-trees--ml--10.x--xgboost-gpu--ml.dmlc--xgboost4j-gpu_2.12--ml.dmlc__xgboost4j-gpu_2.12__1.5.2.jar\n",
"sudo rm -f /databricks/jars/spark--maven-trees--ml--10.x--xgboost-gpu--ml.dmlc--xgboost4j-spark-gpu_2.12--ml.dmlc__xgboost4j-spark-gpu_2.12__1.5.2.jar\n",
"\n",
"sudo cp /dbfs/FileStore/jars/xgboost4j-gpu_2.12-1.6.1.jar /databricks/jars/\n",
"sudo cp /dbfs/FileStore/jars/rapids-4-spark_2.12-22.10.0.jar /databricks/jars/\n",
"sudo cp /dbfs/FileStore/jars/xgboost4j-spark-gpu_2.12-1.6.1.jar /databricks/jars/\"\"\", True)"
"sudo cp /dbfs/FileStore/jars/xgboost4j-gpu_2.12-1.7.1.jar /databricks/jars/\n",
"sudo cp /dbfs/FileStore/jars/rapids-4-spark_2.12-22.12.0.jar /databricks/jars/\n",
"sudo cp /dbfs/FileStore/jars/xgboost4j-spark-gpu_2.12-1.7.1.jar /databricks/jars/\"\"\", True)"
]
},
{
Expand Down Expand Up @@ -132,8 +132,8 @@
"\n",
"1. Edit your cluster, adding an initialization script from `dbfs:/databricks/init_scripts/init.sh` in the \"Advanced Options\" under \"Init Scripts\" tab\n",
"2. Reboot the cluster\n",
"3. Go to \"Libraries\" tab under your cluster and install `dbfs:/FileStore/jars/xgboost4j-spark-gpu_2.12-1.6.1.jar` in your cluster by selecting the \"DBFS\" option for installing jars\n",
"4. Import the mortgage example notebook from `https://github.com/NVIDIA/spark-rapids-examples/blob/branch-22.10/examples/XGBoost-Examples/mortgage/notebooks/python/mortgage-gpu.ipynb`\n",
"3. Go to \"Libraries\" tab under your cluster and install `dbfs:/FileStore/jars/xgboost4j-spark-gpu_2.12-1.7.1.jar` in your cluster by selecting the \"DBFS\" option for installing jars\n",
"4. Import the mortgage example notebook from `https://github.com/NVIDIA/spark-rapids-examples/blob/branch-22.12/examples/XGBoost-Examples/mortgage/notebooks/python/mortgage-gpu.ipynb`\n",
"5. Inside the mortgage example notebook, update the data paths\n",
" `train_data = reader.schema(schema).option('header', True).csv('/data/mortgage/csv/small-train.csv')`\n",
" `trans_data = reader.schema(schema).option('header', True).csv('/data/mortgage/csv/small-trans.csv')`"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,9 +24,9 @@
"source": [
"%sh\n",
"cd ../../dbfs/FileStore/jars/\n",
"sudo wget -O rapids-4-spark_2.12-22.10.0.jar https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/22.10.0/rapids-4-spark_2.12-22.10.0.jar\n",
"sudo wget -O xgboost4j-gpu_2.12-1.6.1.jar https://repo1.maven.org/maven2/ml/dmlc/xgboost4j-gpu_2.12/1.6.1/xgboost4j-gpu_2.12-1.6.1.jar\n",
"sudo wget -O xgboost4j-spark-gpu_2.12-1.6.1.jar https://repo1.maven.org/maven2/ml/dmlc/xgboost4j-spark-gpu_2.12/1.6.1/xgboost4j-spark-gpu_2.12-1.6.1.jar\n",
"sudo wget -O rapids-4-spark_2.12-22.12.0.jar https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/22.12.0/rapids-4-spark_2.12-22.12.0.jar\n",
"sudo wget -O xgboost4j-gpu_2.12-1.7.1.jar https://repo1.maven.org/maven2/ml/dmlc/xgboost4j-gpu_2.12/1.7.1/xgboost4j-gpu_2.12-1.7.1.jar\n",
"sudo wget -O xgboost4j-spark-gpu_2.12-1.7.1.jar https://repo1.maven.org/maven2/ml/dmlc/xgboost4j-spark-gpu_2.12/1.7.1/xgboost4j-spark-gpu_2.12-1.7.1.jar\n",
"ls -ltr\n",
"\n",
"# Your Jars are downloaded in dbfs:/FileStore/jars directory"
Expand Down Expand Up @@ -59,9 +59,9 @@
"sudo rm -f /databricks/jars/spark--maven-trees--ml--9.x--xgboost-gpu--ml.dmlc--xgboost4j-gpu_2.12--ml.dmlc__xgboost4j-gpu_2.12__1.4.1.jar\n",
"sudo rm -f /databricks/jars/spark--maven-trees--ml--9.x--xgboost-gpu--ml.dmlc--xgboost4j-spark-gpu_2.12--ml.dmlc__xgboost4j-spark-gpu_2.12__1.4.1.jar\n",
"\n",
"sudo cp /dbfs/FileStore/jars/xgboost4j-gpu_2.12-1.6.1.jar /databricks/jars/\n",
"sudo cp /dbfs/FileStore/jars/rapids-4-spark_2.12-22.10.0.jar /databricks/jars/\n",
"sudo cp /dbfs/FileStore/jars/xgboost4j-spark-gpu_2.12-1.6.1.jar /databricks/jars/\"\"\", True)"
"sudo cp /dbfs/FileStore/jars/xgboost4j-gpu_2.12-1.7.1.jar /databricks/jars/\n",
"sudo cp /dbfs/FileStore/jars/rapids-4-spark_2.12-22.12.0.jar /databricks/jars/\n",
"sudo cp /dbfs/FileStore/jars/xgboost4j-spark-gpu_2.12-1.7.1.jar /databricks/jars/\"\"\", True)"
]
},
{
Expand Down Expand Up @@ -132,8 +132,8 @@
"\n",
"1. Edit your cluster, adding an initialization script from `dbfs:/databricks/init_scripts/init.sh` in the \"Advanced Options\" under \"Init Scripts\" tab\n",
"2. Reboot the cluster\n",
"3. Go to \"Libraries\" tab under your cluster and install `dbfs:/FileStore/jars/xgboost4j-spark-gpu_2.12-1.6.1.jar` in your cluster by selecting the \"DBFS\" option for installing jars\n",
"4. Import the mortgage example notebook from `https://github.com/NVIDIA/spark-rapids-examples/blob/branch-22.10/examples/XGBoost-Examples/mortgage/notebooks/python/mortgage-gpu.ipynb`\n",
"3. Go to \"Libraries\" tab under your cluster and install `dbfs:/FileStore/jars/xgboost4j-spark-gpu_2.12-1.7.1.jar` in your cluster by selecting the \"DBFS\" option for installing jars\n",
"4. Import the mortgage example notebook from `https://github.com/NVIDIA/spark-rapids-examples/blob/branch-22.12/examples/XGBoost-Examples/mortgage/notebooks/python/mortgage-gpu.ipynb`\n",
"5. Inside the mortgage example notebook, update the data paths\n",
" `train_data = reader.schema(schema).option('header', True).csv('/data/mortgage/csv/small-train.csv')`\n",
" `trans_data = reader.schema(schema).option('header', True).csv('/data/mortgage/csv/small-trans.csv')`"
Expand Down
4 changes: 0 additions & 4 deletions docs/get-started/xgboost-examples/notebook/python-notebook.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,3 @@ and the home directory for Apache Spark respectively.
- Mortgage ETL Notebook: [Python](../../../../examples/XGBoost-Examples/mortgage/notebooks/python/MortgageETL.ipynb)
- Taxi ETL Notebook: [Python](../../../../examples/XGBoost-Examples/taxi/notebooks/python/taxi-ETL.ipynb)
- Note: Agaricus does not have ETL part.

For PySpark based XGBoost, please refer to the
[Spark-RAPIDS-examples 22.04 branch](https://github.com/NVIDIA/spark-rapids-examples/blob/branch-22.04/docs/get-started/xgboost-examples/notebook/python-notebook.md)
that uses [NVIDIA’s Spark XGBoost version](https://repo1.maven.org/maven2/com/nvidia/xgboost4j-spark_3.0/).
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ export SPARK_DOCKER_IMAGE=<gpu spark docker image repo and name>
export SPARK_DOCKER_TAG=<spark docker image tag>

pushd ${SPARK_HOME}
wget https://github.com/NVIDIA/spark-rapids-examples/raw/branch-22.10/dockerfile/Dockerfile
wget https://github.com/NVIDIA/spark-rapids-examples/raw/branch-22.12/dockerfile/Dockerfile

# Optionally install additional jars into ${SPARK_HOME}/jars/

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,13 @@ Prerequisites
* Multi-node clusters with homogenous GPU configuration
* Software Requirements
* Ubuntu 18.04, 20.04/CentOS7, CentOS8
* CUDA 11.0+
* CUDA 11.5+
* NVIDIA driver compatible with your CUDA
* NCCL 2.7.8+
* Python 3.6+
* Python 3.8 or 3.9
* NumPy
* XGBoost 1.7.0+
* cudf-cu11

The number of GPUs in each host dictates the number of Spark executors that can run there.
Additionally, cores per Spark executor and cores per Spark task must match, such that each executor can run 1 task at any given time.
Expand Down Expand Up @@ -47,6 +49,14 @@ And here are the steps to enable the GPU resources discovery for Spark 3.1+.
spark.worker.resource.gpu.amount 1
spark.worker.resource.gpu.discoveryScript ${SPARK_HOME}/examples/src/main/scripts/getGpusResources.sh
```
3. Install the XGBoost, cudf-cu11, numpy libraries on all nodes before running XGBoost application.

``` bash
pip install xgboost
pip install cudf-cu11 --extra-index-url=https://pypi.ngc.nvidia.com
pip install numpy
pip install scikit-learn
```

Get Application Files, Jar and Dataset
-------------------------------
Expand Down Expand Up @@ -182,6 +192,10 @@ export EXAMPLE_CLASS=com.nvidia.spark.examples.mortgage.gpu_main
# tree construction algorithm
export TREE_METHOD=gpu_hist
# if you enable archive python environment
export PYSPARK_DRIVER_PYTHON=python
export PYSPARK_PYTHON=./environment/bin/python
```

Run spark-submit:
Expand All @@ -197,8 +211,9 @@ ${SPARK_HOME}/bin/spark-submit
--driver-memory ${SPARK_DRIVER_MEMORY} \
--executor-memory ${SPARK_EXECUTOR_MEMORY} \
--conf spark.cores.max=${TOTAL_CORES} \
--jars ${RAPIDS_JAR},${XGBOOST4J_JAR},${XGBOOST4J_SPARK_JAR} \
--py-files ${XGBOOST4J_SPARK_JAR},${SAMPLE_ZIP} \
--archives your_pyspark_venv.tar.gz#environment #if you enabled archive python environment \
--jars ${RAPIDS_JAR} \
--py-files ${SAMPLE_ZIP} \
${MAIN_PY} \
--mainClass=${EXAMPLE_CLASS} \
--dataPath=train::${SPARK_XGBOOST_DIR}/mortgage/output/train/ \
Expand Down Expand Up @@ -261,6 +276,10 @@ export EXAMPLE_CLASS=com.nvidia.spark.examples.mortgage.cpu_main
# tree construction algorithm
export TREE_METHOD=hist
# if you enable archive python environment
export PYSPARK_DRIVER_PYTHON=python
export PYSPARK_PYTHON=./environment/bin/python
```

This is the same command as for the GPU example, repeated for convenience:
Expand All @@ -271,8 +290,9 @@ ${SPARK_HOME}/bin/spark-submit
--driver-memory ${SPARK_DRIVER_MEMORY} \
--executor-memory ${SPARK_EXECUTOR_MEMORY} \
--conf spark.cores.max=${TOTAL_CORES} \
--jars ${XGBOOST4J_JAR},${XGBOOST4J_SPARK_JAR} \
--py-files ${XGBOOST4J_SPARK_JAR},${SAMPLE_ZIP} \
--archives your_pyspark_venv.tar.gz#environment #if you enabled archive python environment \
--jars ${RAPIDS_JAR} \
--py-files ${SAMPLE_ZIP} \
${SPARK_PYTHON_ENTRYPOINT} \
--mainClass=${EXAMPLE_CLASS} \
--dataPath=train::${DATA_PATH}/mortgage/output/train/ \
Expand Down
52 changes: 45 additions & 7 deletions docs/get-started/xgboost-examples/on-prem-cluster/yarn-python.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,12 +12,14 @@ Prerequisites
* Multi-node clusters with homogenous GPU configuration
* Software Requirements
* Ubuntu 18.04, 20.04/CentOS7, CentOS8
* CUDA 11.0+
* CUDA 11.5+
* NVIDIA driver compatible with your CUDA
* NCCL 2.7.8+
* Python 3.6+
* Python 3.8 or 3.9
* NumPy

* XGBoost 1.7.0+
* cudf-cu11

The number of GPUs per NodeManager dictates the number of Spark executors that can run in that NodeManager.
Additionally, cores per Spark executor and cores per Spark task must match, such that each executor can run 1 task at any given time.

Expand All @@ -32,6 +34,32 @@ We use `SPARK_HOME` environment variable to point to the Apache Spark cluster.
And as to how to enable GPU scheduling and isolation for Yarn,
please refer to [here](https://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/UsingGpus.html).

Please make sure to install the XGBoost, cudf-cu11, numpy libraries on all nodes before running XGBoost application.
``` bash
pip install xgboost
pip install cudf-cu11 --extra-index-url=https://pypi.ngc.nvidia.com
pip install numpy
pip install scikit-learn
```
You can also create an isolated python environment by using (Virtualenv)[https://virtualenv.pypa.io/en/latest/],
and then directly pass/unpack the archive file and enable the environment on executors
by leveraging the --archives option or spark.archives configuration.
``` bash
# create an isolated python environment and install libraries
python -m venv pyspark_venv
source pyspark_venv/bin/activate
pip install xgboost
pip install cudf-cu11 --extra-index-url=https://pypi.ngc.nvidia.com
pip install numpy
pip install scikit-learn
venv-pack -o pyspark_venv.tar.gz

# enable archive python environment on executors
export PYSPARK_DRIVER_PYTHON=python # Do not set in cluster modes.
export PYSPARK_PYTHON=./environment/bin/python
spark-submit --archives pyspark_venv.tar.gz#environment app.py
```

Get Application Files, Jar and Dataset
-------------------------------

Expand Down Expand Up @@ -114,6 +142,10 @@ export EXAMPLE_CLASS=com.nvidia.spark.examples.mortgage.gpu_main

# tree construction algorithm
export TREE_METHOD=gpu_hist

# if you enable archive python environment
export PYSPARK_DRIVER_PYTHON=python
export PYSPARK_PYTHON=./environment/bin/python
```

Run spark-submit:
Expand All @@ -129,11 +161,12 @@ ${SPARK_HOME}/bin/spark-submit
--files ${SPARK_HOME}/examples/src/main/scripts/getGpusResources.sh \
--master yarn \
--deploy-mode ${SPARK_DEPLOY_MODE} \
--archives your_pyspark_venv.tar.gz#environment #if you enabled archive python environment \
--num-executors ${SPARK_NUM_EXECUTORS} \
--driver-memory ${SPARK_DRIVER_MEMORY} \
--executor-memory ${SPARK_EXECUTOR_MEMORY} \
--jars ${RAPIDS_JAR},${XGBOOST4J_JAR} \
--py-files ${XGBOOST4J_SPARK_JAR},${SAMPLE_ZIP} \
--jars ${RAPIDS_JAR} \
--py-files ${SAMPLE_ZIP} \
${MAIN_PY} \
--mainClass=${EXAMPLE_CLASS} \
--dataPath=train::${DATA_PATH}/mortgage/out/train/ \
Expand Down Expand Up @@ -190,19 +223,24 @@ export EXAMPLE_CLASS=com.nvidia.spark.examples.mortgage.cpu_main

# tree construction algorithm
export TREE_METHOD=hist

# if you enable archive python environment
export PYSPARK_DRIVER_PYTHON=python
export PYSPARK_PYTHON=./environment/bin/python
```

This is the same command as for the GPU example, repeated for convenience:

``` bash
${SPARK_HOME}/bin/spark-submit \
--master yarn \
--archives your_pyspark_venv.tar.gz#environment #if you enabled archive python environment \
--deploy-mode ${SPARK_DEPLOY_MODE} \
--num-executors ${SPARK_NUM_EXECUTORS} \
--driver-memory ${SPARK_DRIVER_MEMORY} \
--executor-memory ${SPARK_EXECUTOR_MEMORY} \
--jars ${XGBOOST4J_JAR},${XGBOOST4J_SPARK_JAR} \
--py-files ${XGBOOST4J_SPARK_JAR},${SAMPLE_ZIP} \
--jars ${RAPIDS_JAR} \
--py-files ${SAMPLE_ZIP} \
${MAIN_PY} \
--mainClass=${EXAMPLE_CLASS} \
--dataPath=train::${DATA_PATH}/mortgage/output/train/ \
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ For simplicity export the location to these jars. All examples assume the packag
* [XGBoost4j-Spark Package](https://repo1.maven.org/maven2/com/nvidia/xgboost4j-spark_3.0/1.4.2-0.3.0/)

2. Download the RAPIDS Accelerator for Apache Spark plugin jar
* [RAPIDS Spark Package](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/22.10.0/rapids-4-spark_2.12-22.10.0.jar)
* [RAPIDS Spark Package](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/22.12.0/rapids-4-spark_2.12-22.12.0.jar)

### Build XGBoost Python Examples

Expand All @@ -21,14 +21,3 @@ You need to copy the dataset to `/opt/xgboost`. Use the following links to downl
1. [Mortgage dataset](/docs/get-started/xgboost-examples/dataset/mortgage.md)
2. [Taxi dataset](https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page)
3. [Agaricus dataset](https://gust.dev/r/xgboost-agaricus)

### Setup environments

``` bash
export SPARK_XGBOOST_DIR=/opt/xgboost
export RAPIDS_JAR=${SPARK_XGBOOST_DIR}/rapids-4-spark_2.12-22.10.0.jar
export XGBOOST4J_JAR=${SPARK_XGBOOST_DIR}/xgboost4j_3.0-1.4.2-0.3.0.jar
export XGBOOST4J_SPARK_JAR=${SPARK_XGBOOST_DIR}/xgboost4j-spark_3.0-1.4.2-0.3.0.jar
export SAMPLE_ZIP=${SPARK_XGBOOST_DIR}/samples.zip
export MAIN_PY=${SPARK_XGBOOST_DIR}/main.py
```
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ For simplicity export the location to these jars. All examples assume the packag
### Download the jars

1. Download the RAPIDS Accelerator for Apache Spark plugin jar
* [RAPIDS Spark Package](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/22.10.0/rapids-4-spark_2.12-22.10.0.jar)
* [RAPIDS Spark Package](https://repo1.maven.org/maven2/com/nvidia/rapids-4-spark_2.12/22.12.0/rapids-4-spark_2.12-22.12.0.jar)

### Build XGBoost Scala Examples

Expand All @@ -17,11 +17,3 @@ You need to copy the dataset to `/opt/xgboost`. Use the following links to downl
1. [Mortgage dataset](/docs/get-started/xgboost-examples/dataset/mortgage.md)
2. [Taxi dataset](https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page)
3. [Agaricus dataset](https://gust.dev/r/xgboost-agaricus)

### Setup environments

``` bash
export SPARK_XGBOOST_DIR=/opt/xgboost
export RAPIDS_JAR=${SPARK_XGBOOST_DIR}/rapids-4-spark_2.12-22.10.0.jar
export SAMPLE_JAR=${SPARK_XGBOOST_DIR}/sample_xgboost_apps-0.2.3-jar-with-dependencies.jar
```
Binary file modified docs/img/guides/mortgage-perf.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion examples/ML+DL-Examples/Spark-cuML/pca/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@

ARG CUDA_VER=11.5.1
FROM nvidia/cuda:${CUDA_VER}-devel-ubuntu20.04
ARG BRANCH_VER=22.10
ARG BRANCH_VER=22.12

RUN apt-get update
RUN apt-get install -y wget ninja-build git
Expand Down
Loading

0 comments on commit 8599ece

Please sign in to comment.