Skip to content

Commit

Permalink
Build R docs with pkgdown.
Browse files Browse the repository at this point in the history
install dir.

Restore.

system dependencies.

Update ubuntu version.

work on script.
  • Loading branch information
trivialfis committed Jan 16, 2025
1 parent 1f1cf3a commit 4d08c07
Show file tree
Hide file tree
Showing 10 changed files with 132 additions and 20 deletions.
22 changes: 22 additions & 0 deletions .github/workflows/r_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -101,3 +101,25 @@ jobs:
if: steps.changes.outputs.r_package == 'true'
run: |
python3 ops/script/test_r_package.py --r=/usr/bin/R --task=doc
build-r-docs:
name: Build docs for the R package
runs-on:
- runs-on=${{ github.run_id }}
- runner=linux-amd64-cpu
- tag=r-tests-build-jvm-docs
steps:
# Restart Docker daemon so that it recognizes the ephemeral disks
- run: sudo systemctl restart docker
- uses: actions/checkout@v4
with:
submodules: "true"
- name: Log into Docker registry (AWS ECR)
run: bash ops/pipeline/login-docker-registry.sh
- run: bash ops/pipeline/build-r-docs.sh
- name: Upload R doc
run: |
python3 ops/pipeline/manage-artifacts.py upload \
--s3-bucket xgboost-docs \
--prefix ${BRANCH_NAME}/${GITHUB_SHA} --make-public \
R-package/r-docs-${{ env.BRANCH_NAME }}.tar.bz2
3 changes: 3 additions & 0 deletions R-package/.Rbuildignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,6 @@
README.md
^doc$
^Meta$
^_pkgdown\.yml$
^docs$
^pkgdown$
1 change: 1 addition & 0 deletions R-package/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
docs
12 changes: 3 additions & 9 deletions R-package/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,11 @@ XGBoost R Package for Scalable GBM

[![CRAN Status Badge](http://www.r-pkg.org/badges/version/xgboost)](https://cran.r-project.org/web/packages/xgboost)
[![CRAN Downloads](http://cranlogs.r-pkg.org/badges/xgboost)](https://cran.rstudio.com/web/packages/xgboost/index.html)
[![Documentation Status](https://readthedocs.org/projects/xgboost/badge/?version=latest)](http://xgboost.readthedocs.org/en/latest/R-package/index.html)
[![Documentation Status](https://readthedocs.org/projects/xgboost/badge/?version=latest)](https://xgboost.readthedocs.org/en/latest/R-package/index.html)

Resources
---------
* [XGBoost R Package Online Documentation](http://xgboost.readthedocs.org/en/latest/R-package/index.html)
* [XGBoost R Package Online Documentation](https://xgboost.readthedocs.org/en/latest/R-package/index.html)
- Check this out for detailed documents, examples and tutorials.

Installation
Expand All @@ -19,13 +19,7 @@ We are [on CRAN](https://cran.r-project.org/web/packages/xgboost/index.html) now
install.packages('xgboost')
```

For more detailed installation instructions, please see [here](http://xgboost.readthedocs.org/en/latest/build.html#r-package-installation).

Examples
--------

* Please visit [walk through example](demo).
* See also the [example scripts](../demo/kaggle-higgs) for Kaggle Higgs Challenge, including [speedtest script](../demo/kaggle-higgs/speedtest.R) on this dataset and the one related to [Otto challenge](../demo/kaggle-otto), including a [RMarkdown documentation](../demo/kaggle-otto/understandingXGBoostModel.Rmd).
For more detailed installation instructions, please see [here](https://xgboost.readthedocs.io/en/stable/install.html).

Development
-----------
Expand Down
4 changes: 4 additions & 0 deletions R-package/pkgdown/_pkgdown.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
url: https://github.com/dmlc/xgboost

template:
bootstrap: 5
15 changes: 9 additions & 6 deletions R-package/vignettes/xgboost_introduction.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,10 @@ output:
toc_float: true
---

# Introduction
XGBoost for R introduction
==========================

## Introduction

**XGBoost** is an optimized distributed gradient boosting library designed to be highly **efficient**, **flexible** and **portable**. It implements machine learning algorithms under the [Gradient Boosting](https://en.wikipedia.org/wiki/Gradient_boosting) framework. XGBoost provides a parallel tree boosting (also known as GBDT, GBM) that solve many data science problems in a fast and accurate way. The same code runs on major distributed environment (Hadoop, SGE, MPI) and can solve problems beyond billions of examples.

Expand All @@ -22,7 +25,7 @@ For more details about XGBoost's features and usage, see the [online documentati

This short vignette outlines the basic usage of the R interface for XGBoost, assuming the reader has some familiarity with the underlying concepts behind statistical modeling with gradient-boosted decision trees.

# Building a predictive model
## Building a predictive model

At its core, XGBoost consists of a C++ library which offers bindings for different programming languages, including R. The R package for XGBoost provides an idiomatic interface similar to those of other statistical modeling packages using and x/y design, as well as a lower-level interface that interacts more directly with the underlying core library and which is similar to those of other language bindings like Python, plus various helpers to interact with its model objects such as by plotting their feature importances or converting them to other formats.

Expand Down Expand Up @@ -62,7 +65,7 @@ model_abserr <- xgboost(x, y, objective = "reg:absoluteerror", nthreads = 1, nro

_Note: the objective must match with the type of the "y" response variable - for example, classification objectives for discrete choices require "factor" types, while regression models for real-valued data require "numeric" types._

# Model parameters
## Model parameters

XGBoost models allow a large degree of control over how they are built. By their nature, gradient-boosted decision tree ensembles are able to capture very complex patterns between features in the data and a response variable, which also means they can suffer from overfitting if not controlled appropirately.

Expand Down Expand Up @@ -105,7 +108,7 @@ xgboost(
)
```

# Examining model objects
## Examining model objects

XGBoost model objects for the most part consist of a pointer to a C++ object where most of the information is held and which is interfaced through the utility functions and methods in the package, but also contains some R attributes that can be retrieved (and new ones added) through `attributes()`:

Expand All @@ -131,7 +134,7 @@ xgb.importance(model)
xgb.model.dt.tree(model)
```

# Other features
## Other features

XGBoost supports many additional features on top of its traditional gradient-boosting framework, including, among others:

Expand All @@ -143,7 +146,7 @@ XGBoost supports many additional features on top of its traditional gradient-boo

See the [online documentation](https://xgboost.readthedocs.io/en/stable/index.html) - particularly the [tutorials section](https://xgboost.readthedocs.io/en/stable/tutorials/index.html) - for a glimpse over further functionalities that XGBoost offers.

# The low-level interface
## The low-level interface

In addition to the `xgboost(x, y, ...)` function, XGBoost also provides a lower-level interface for creating model objects through the function `xgb.train()`, which resembles the same `xgb.train` functions in other language bindings of XGBoost.

Expand Down
9 changes: 9 additions & 0 deletions doc/R-package/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,15 @@ Get Started
* Checkout the :doc:`Installation Guide </install>` contains instructions to install xgboost, and :doc:`Tutorials </tutorials/index>` for examples on how to use XGBoost for various tasks.
* Read the `API documentation <https://cran.r-project.org/web/packages/xgboost/xgboost.pdf>`_.

*********
Vignettes
*********

.. toctree::

xgboost_introduction
xgboostfromJSON

************
Other topics
************
Expand Down
43 changes: 38 additions & 5 deletions doc/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,6 @@
# All configuration values have a default; values that are commented out
# serve to show the default.
import os
import re
import shutil
import subprocess
import sys
Expand All @@ -35,7 +34,7 @@
release = xgboost.__version__


def run_doxygen():
def run_doxygen() -> None:
"""Run the doxygen make command in the designated folder."""
curdir = os.path.normpath(os.path.abspath(os.path.curdir))
if os.path.exists(TMP_DIR):
Expand Down Expand Up @@ -67,8 +66,8 @@ def run_doxygen():
os.chdir(curdir)


def build_jvm_docs():
"""Build docs for the JVM packages"""
def get_branch() -> str:
"""Guess the git branch."""
git_branch = os.getenv("READTHEDOCS_VERSION_NAME", default=None)
print(f"READTHEDOCS_VERSION_NAME = {git_branch}")

Expand All @@ -79,6 +78,12 @@ def build_jvm_docs():
elif git_branch == "stable":
git_branch = f"release_{xgboost.__version__}"
print(f"git_branch = {git_branch}")
return git_branch


def build_jvm_docs() -> None:
"""Build docs for the JVM packages"""
git_branch = get_branch()

def try_fetch_jvm_doc(branch):
"""
Expand Down Expand Up @@ -106,10 +111,37 @@ def try_fetch_jvm_doc(branch):
return False

if not try_fetch_jvm_doc(git_branch):
print(f"Falling back to the master branch...")
print("Falling back to the master branch...")
try_fetch_jvm_doc("master")


def build_r_docs() -> None:
"""Fetch R document from s3."""
git_branch = get_branch()

def try_fetch_r_doc(branch: str) -> bool:
try:
url = f"https://s3-us-west-2.amazonaws.com/xgboost-docs/r-docs-{branch}.tar.bz2"
filename, _ = urllib.request.urlretrieve(url)
if not os.path.exists(TMP_DIR):
print(f"Create directory {TMP_DIR}")
os.mkdir(TMP_DIR)
r_doc_dir = os.path.join(TMP_DIR, "r_docs")
if os.path.exists(r_doc_dir):
shutil.rmtree(r_doc_dir)
os.mkdir(r_doc_dir)

with tarfile.open(filename, "r:bz2") as t:
t.extractall(r_doc_dir)
return True
except HTTPError:
print(f"R doc not found at {url}.")
return False

if not try_fetch_r_doc(git_branch):
try_fetch_r_doc("master")


def is_readthedocs_build():
if os.environ.get("READTHEDOCS", None) == "True":
return True
Expand All @@ -125,6 +157,7 @@ def is_readthedocs_build():
if is_readthedocs_build():
run_doxygen()
build_jvm_docs()
build_r_docs()


# If extensions (or modules to document with autodoc) are in another directory,
Expand Down
24 changes: 24 additions & 0 deletions ops/pipeline/build-r-docs-impl.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
#!/bin/bash

if [[ $# -ne 1 ]]
then
echo "Usage: $0 [branch name]"
exit 1
fi

set -euo pipefail

branch_name=$1

# See instructions at: https://cran.r-project.org/bin/linux/ubuntu/

wget -qO- https://cloud.r-project.org/bin/linux/ubuntu/marutter_pubkey.asc | sudo tee -a /etc/apt/trusted.gpg.d/cran_ubuntu_key.asc
# add the R 4.0 repo from CRAN -- adjust 'focal' to 'groovy' or 'bionic' as needed
sudo add-apt-repository "deb https://cloud.r-project.org/bin/linux/ubuntu $(lsb_release -cs)-cran40/"

sudo apt install --no-install-recommends r-base
Rscript -e "install.packages(c('pkgdown'), repos = 'https://mirror.las.iastate.edu/CRAN/')"
cd R-package
Rscript -e "pkgdown::build_site()"
cd -
tar cvjf r-docs-${branch_name}.tar.bz2 R-package/docs
19 changes: 19 additions & 0 deletions ops/pipeline/build-r-docs.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
#!/bin/bash

set -euo pipefail

if [[ -z ${BRANCH_NAME:-} ]]
then
echo "Make sure to define environment variable BRANCH_NAME."
exit 1
fi

source ops/pipeline/get-docker-registry-details.sh

IMAGE_URI=${DOCKER_REGISTRY_URL}/xgb-ci.cpu

echo "--- Build R package doc"
set -x
python3 ops/docker_run.py \
--image-uri ${IMAGE_URI} \
-- ops/pipeline/build-r-docs-impl.sh ${BRANCH_NAME}

0 comments on commit 4d08c07

Please sign in to comment.