-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[R-package] v4.0.0 CRAN submission issues #5987
Comments
Re-opening this. Sadly our |
Regarding the timing: in the worst case, pack the firing examples in a dontrun block:
|
Thanks for the suggestion @mayer79 ! But I don't think that's the fix we should pursue. I don't think we can safely assume it's only the one example that happened to be reported in CRAN's initial checks. The timings are unpredictable and depend on, for example, what else happens to be running on the CRAN check machines at the same time as they're checking In addition...I no longer trust We used to wrap all examples in ... and then with R 4.0, See https://cran.r-project.org/doc/manuals/r-devel/NEWS.html
We have to find a more permanent solution to ensure that everything multi-threaded in this project uses at most 2 threads in tests and examples. It's frustrating that v4.0.0 of the project is being blocked from CRAN because of an example that takes less than 2 seconds to run but... we can only work on the things we can control. |
Ha, darn. I recently had the same problem: Setting all The issue about |
Regarding the issue with the threads on I also don't think that'd be grounds for rejection (the one about S3 mismatch would though) since, when checks fail due to large thread usage, it typically manifests as a segmentation fault on the solaris checks (or at least it does when passing the numer of threads on individual pragmas, which this library doesn't). |
That is not correct. Our second attempted submission was rejected by the automatic CRAN checks, and the only issue was that NOTE about the example timings. |
There has been significant discussion about this recently on the
I'm still reading through all of them, so won't attempt to summarize yet... just linking these to make it clear this is not a LightGBM-specific issue. |
Additionally, see this recent comment on an old |
Note that those discussions from The issue here is not coming from usage of |
That's not entirely true. That conversation has also continued the r-pkg-devel mailing list conversation about:
|
XGBoost release is running into similar issues with system time constraints: dmlc/xgboost#9497 (comment) We actually test this https://github.com/dmlc/xgboost/blob/f90d034a86784f4f07417d1d28a77b4a189acc89/tests/ci_build/test_r_package.py#L109 , but I haven't been able to reproduce the error. |
Appreciate the tag on the bundle issue! Just wanted to drop our approach to skipping long-running or dependency-heavy examples conditionally here in case it's useful for yall. In some of our packages, we define a helper that tells us whether we ought to run long-running or dependency-heavy examples, used like so. The Definitely not ideal, though somewhat more resilient to CRAN check environments than |
I'm actively working on this. Just sharing my notes as I go through it. Before putting up potential fixes, I'm first trying to find a way to reproduce the timing-related warnings. On my macbook pro (1 Dual-Core Intel Core i5 CPU), I gave Docker for Mac access to 4 CPUs then ran the following from the root of this repo. docker run \
--rm \
--cpus 4 \
--env MAKEFLAGS=-j2 \
-v $(pwd):/opt/LightGBM \
-w /opt/LightGBM \
-it rocker/verse:4.3.1 \
/bin/bash Installed OpenMP in there. apt-get update
apt-get install -y \
libomp-dev And the one Rscript \
--vanilla \
-e "install.packages(c('RhpcBLASctl'), repos = 'https://cran.r-project.org')" Then built a CRAN-style source distribution of the R package, and moved it to a new directory (not on the path that's mounted, so reads and writes are faster). mkdir -p /opt/r-check
sh build-cran-package.sh --no-build-vignettes
cp ./lightgbm_4.1.0.99.tar.gz /opt/r-check/
cd /opt/r-check Approach 1: Then ran OMP_NUM_THREADS=2 \
_R_CHECK_EXAMPLE_TIMING_THRESHOLD_=0 \
_R_CHECK_EXAMPLE_TIMING_CPU_TO_ELAPSED_THRESHOLD_=2.5 \
R --vanilla CMD check \
--no-codoc \
--no-manual \
--no-tests \
--no-vignettes \
--run-dontrun \
--run-donttest \
--timings \
./lightgbm_4.1.0.99.tar.gz For a description of what those flags mean, see dmlc/xgboost#9497 (comment). results with `OMP_NUM_THREADS=1` (click me)
results with `OMP_NUM_THREADS=2` (click me)
results with `OMP_NUM_THREADS=3` (click me)
results with `OMP_NUM_THREADS=4` (click me)
(CPU time being around 2x elapsed time is expected for many of these examples, since we hardcoded Approach 2: directly running the examples Following @trivialfis 's excellent work from dmlc/xgboost#9497 (comment), I tried just installing the package and running the examples directly. output of doing that (click me)R --vanilla CMD INSTALL ./lightgbm_4.1.0.99.tar.gz
cd /opt/LightGBM
cat << EOF > check-examples.R
library(pkgload)
library(lightgbm)
files <- list.files(
"./R-package/man"
, pattern="*.Rd"
, full.names = TRUE
)
run_example_timeit <- function(f) {
print(paste("Test", f))
flush.console()
t0 <- proc.time()
capture.output({
pkgload::run_example(f, run_donttest = TRUE, quiet = TRUE)
})
example_timing = proc.time() - t0
print(example_timing)
}
timings <- lapply(files, run_example_timeit)
EOF
OMP_NUM_THREADS=2 \
Rscript --vanilla ./check-examples.R That printed the following:
Other things I checked No OpenMP environment variables werre set in the container. env | grep -i omp
# (empty) All 4 logical CPUs I was giving to Docker appeared to be visible to the R process. Rscript \
--vanilla \
-e "print(RhpcBLASctl::omp_get_max_threads())"
# [1] 4 Next steps When I return to this in the next few days, I'll try repeating these approaches on a machine with more physical CPUs. I'm hoping with something like a 12-core or 24-core machine, the sources of unintended parallelism will really show up... and then we'll be able to test fixes. |
I repeated steps similar to the ones above tonight on a VM from AWS with the following specs:
And installed the following on it:
setup commands (click me)sudo apt-get update
sudo apt-get install --no-install-recommends -y \
software-properties-common
sudo apt-get install --no-install-recommends -y \
apt-utils \
build-essential \
ca-certificates \
cmake \
curl \
git \
iputils-ping \
jq \
libcurl4 \
libicu-dev \
libomp-dev \
libssl-dev \
libunwind8 \
locales \
locales-all \
netcat \
unzip \
zip
export LANG="en_US.UTF-8"
sudo update-locale LANG=${LANG}
export LC_ALL="${LANG}"
# set up R environment
export CRAN_MIRROR="https://cran.rstudio.com"
export MAKEFLAGS=-j8
export R_LIB_PATH=~/Rlib
export R_LIBS=$R_LIB_PATH
export PATH="$R_LIB_PATH/R/bin:$PATH"
export R_APT_REPO="jammy-cran40/"
export R_LINUX_VERSION="4.3.1-1.2204.0"
mkdir -p $R_LIB_PATH
mkdir -p ~/.gnupg
echo "disable-ipv6" >> ~/.gnupg/dirmngr.conf
sudo apt-key adv \
--homedir ~/.gnupg \
--keyserver keyserver.ubuntu.com \
--recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9
sudo add-apt-repository \
"deb ${CRAN_MIRROR}/bin/linux/ubuntu ${R_APT_REPO}"
sudo apt-get update
sudo apt-get install \
--no-install-recommends \
-y \
autoconf \
automake \
clang \
devscripts \
r-base-core=${R_LINUX_VERSION} \
r-base-dev=${R_LINUX_VERSION} \
texinfo \
texlive-latex-extra \
texlive-latex-recommended \
texlive-fonts-recommended \
texlive-fonts-extra \
tidy \
qpdf
# install dependencies
Rscript \
--vanilla \
-e "install.packages(c('data.table', 'jsonlite', 'knitr', 'Matrix', 'R6', 'RhpcBLASctl', 'rmarkdown', 'testthat'), repos = '${CRAN_MIRROR}', lib = '${R_LIB_PATH}', dependencies = c('Depends', 'Imports', 'LinkingTo'), Ncpus = parallel::detectCores())"
# build LightGBM
mkdir -p ${HOME}/repos
cd ${HOME}/repos
git clone --recursive https://github.com/microsoft/LightGBM.git
cd ./LightGBM
sh build-cran-package.sh --no-build-vignettes With OMP_NUM_THREADS=16 \
_R_CHECK_EXAMPLE_TIMING_THRESHOLD_=0 \
_R_CHECK_EXAMPLE_TIMING_CPU_TO_ELAPSED_THRESHOLD_=2.5 \
R --vanilla CMD check \
--no-codoc \
--no-manual \
--no-tests \
--no-vignettes \
--run-dontrun \
--run-donttest \
--timings \
./lightgbm_4.1.0.99.tar.gz results (click me)
I noticed that CRAN was using
So I switched to cat << EOF > ${HOME}/.R/Makevars
CC=clang
CXX=clang++
CXX17=clang++
EOF ... and tried again. OMP_NUM_THREADS=16 \
_R_CHECK_EXAMPLE_TIMING_THRESHOLD_=0 \
_R_CHECK_EXAMPLE_TIMING_CPU_TO_ELAPSED_THRESHOLD_=2.5 \
R --vanilla CMD check \
--no-codoc \
--no-manual \
--no-tests \
--no-vignettes \
--run-dontrun \
--run-donttest \
--timings \
./lightgbm_4.1.0.99.tar.gz That didn't reproduce the specific WARNING triggered on CRAN, about the
full results (click me)
Now that we have a way to reproduce this, I'll start working through changes to address that. I'm hoping that a mix of the following will be enough:
Getting closer 😁 |
Still experimenting with this, just want to add one happy finding... the issues appear to completely be about LightGBM's own use of multi-threading + possibly how it configures Using
That's a very good sign. It means that this issue isn't about some other source of multi-threading from LightGBM's dependencies, such as Eigen: https://eigen.tuxfamily.org/dox/TopicMultiThreading.html. Will share more updates soon! |
Status update for those subscribed to this issue:
@simonpcouch I know the tidymodels team was waiting for for v4.x of I'm planning to try checking reverse dependencies using a package built from the branch on #6226 over the next few days, but would welcome any help your team could provide if you have the capacity. |
We do have capacity to run reverse dependency checks! I'll start up some checks based on #6226 today. |
Thank you so much! |
I see no new problems with #6226! Checked with revdepcheck's Note for CRAN:
|
Thank you SO MUCH for that! It's really helpful. I thought for sure we'd have issue, especially since a few new packages have taken on Glad that that's not the case 😁 |
Congrats on the CRAN release! |
thanks again for your help @simonpcouch ! |
v4.2.0 of the R package has been accepted to CRAN and has passed all of their main checks: #6191 (comment) This issue can be closed. Thank you so much to everyone who helped with this!!!! |
This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
Description
The recent submission of
{lightgbm}
v4.0.0 to CRAN was rejected because of the following issues detected byR CMD check
.These need to be fixed for CRAN to accept our submission.
Reproducible example
These check failures happened during the CRAN incoming checks.
Environment info
LightGBM version or commit hash:
v4.0.0
.Additional Comments
I will put up a pull request shortly to fix these things. We can figure out later how to catch them in CI so they don't surprise us on CRAN submissions in the future.
The examples timing issue is because I did not complete #5102 as part of this release, sorry 😞 . I said I would almost a year ago (#5102 (comment)), just forgot about it.
The text was updated successfully, but these errors were encountered: