Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(ci,dev/release): Update Arrow C++ CI usage and install instructions #695

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/build-and-test-device.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -79,13 +79,13 @@ jobs:
# The self-hosted runner needs its own Arrow build since it is running
# a much older linux distribution. When the Arrow C++ requirement for tests is
# dropped, we don't have to use this at all as part of the CI configuration.
key: arrow-device-${{ runner.os }}-${{ runner.arch }}-${{ matrix.config.label }}-2
key: arrow-device-${{ runner.os }}-${{ runner.arch }}-${{ matrix.config.label }}-3

- name: Build Arrow C++
if: steps.cache-arrow-build.outputs.cache-hit != 'true'
shell: bash
run: |
ci/scripts/build-arrow-cpp-minimal.sh 18.0.0 arrow
ci/scripts/build-arrow-cpp-minimal.sh 18.1.0 arrow

- name: Build
run: |
Expand Down
8 changes: 4 additions & 4 deletions .github/workflows/build-and-test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -64,13 +64,13 @@ jobs:
with:
path: arrow
# Bump the number at the end of this line to force a new Arrow C++ build
key: arrow-${{ runner.os }}-${{ runner.arch }}-1
key: arrow-${{ runner.os }}-${{ runner.arch }}-2

- name: Build Arrow C++
if: steps.cache-arrow-build.outputs.cache-hit != 'true'
shell: bash
run: |
ci/scripts/build-arrow-cpp-minimal.sh 18.0.0 arrow
ci/scripts/build-arrow-cpp-minimal.sh 18.1.0 arrow

- name: Build nanoarrow
run: |
Expand Down Expand Up @@ -150,13 +150,13 @@ jobs:
with:
path: arrow
# Bump the number at the end of this line to force a new Arrow C++ build
key: arrow-${{ runner.os }}-${{ runner.arch }}-2
key: arrow-${{ runner.os }}-${{ runner.arch }}-3

- name: Build Arrow C++
if: steps.cache-arrow-build.outputs.cache-hit != 'true'
shell: bash
run: |
ci/scripts/build-arrow-cpp-minimal.sh 18.0.0 arrow
ci/scripts/build-arrow-cpp-minimal.sh 18.1.0 arrow

- name: Run meson testing script
run: |
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/clang-tidy.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -48,13 +48,13 @@ jobs:
with:
path: arrow
# Bump the number at the end of this line to force a new Arrow C++ build
key: arrow-${{ runner.os }}-${{ runner.arch }}-1
key: arrow-${{ runner.os }}-${{ runner.arch }}-3

- name: Build Arrow C++
if: steps.cache-arrow-build.outputs.cache-hit != 'true'
shell: bash
run: |
ci/scripts/build-arrow-cpp-minimal.sh 18.0.0 arrow
ci/scripts/build-arrow-cpp-minimal.sh 18.1.0 arrow

- name: Build nanoarrow
run: |
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/verify.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -90,13 +90,13 @@ jobs:
with:
path: arrow
# Bump the number at the end of this line to force a new Arrow C++ build
key: arrow-${{ runner.os }}-${{ runner.arch }}-2
key: arrow-${{ runner.os }}-${{ runner.arch }}-3

- name: Build Arrow C++
if: steps.cache-arrow-build.outputs.cache-hit != 'true' && matrix.config.label != 'windows-win32'
shell: bash
run: |
src/ci/scripts/build-arrow-cpp-minimal.sh 18.0.0 arrow
src/ci/scripts/build-arrow-cpp-minimal.sh 18.1.0 arrow

- name: Set CMake options
shell: bash
Expand Down
2 changes: 1 addition & 1 deletion ci/docker/alpine.dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ RUN apk add bash linux-headers git cmake R R-dev g++ gfortran gnupg curl py3-vir

# For Arrow C++
COPY ci/scripts/build-arrow-cpp-minimal.sh /
RUN /build-arrow-cpp-minimal.sh 18.0.0 /arrow
RUN /build-arrow-cpp-minimal.sh 18.1.0 /arrow

# There's a missing define that numpy's build needs on s390x and there is no wheel
RUN (grep -e "S390" /usr/include/bits/hwcap.h && echo "#define HWCAP_S390_VX HWCAP_S390_VXRS" >> /usr/include/bits/hwcap.h) || true
Expand Down
126 changes: 32 additions & 94 deletions dev/release/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,81 +33,51 @@ cd arrow-nanoarrow/dev/release
./verify-release-candidate.sh 0.6.0 0
```

Full verification requires [CMake](https://cmake.org/download/) to build and run the test
suite. The test suite currently depends on an Arrow C++ installation that is discoverable
by CMake (e.g., using one of the methods described in the
[Arrow installation instructions](https://arrow.apache.org/install/)). For environments
where binary packages are not provided, building and installing Arrow C++ from source
may be required. You can provide the `NANOARROW_CMAKE_OPTIONS` environment variable to
pass extra arguments to `cmake` (e.g., `-DArrow_DIR=<path/to/arrow>/lib/cmake/Arrow` or
`-DCMAKE_TOOLCHAIN_FILE=[path to vcpkg]/scripts/buildsystems/vcpkg.cmake`).

Verification of the R package requires an
[R installation](https://cloud.r-project.org/) and a C/C++ compiler (e.g.,
[RTools](https://cloud.r-project.org/bin/windows/Rtools/) on Windows or XCode Command
Line Tools). You can set the `R_HOME` environment variable or
`export PATH="$PATH:/path/to/R"` (where `$R_HOME/bin/R` is the R executable)
to point to a specific R installation.

The verification script itself is written in `bash` and requires the `curl`, `gpg`, and
`shasum`/`sha512sum` commands. These are typically available from a package
manager except on Windows (see below).
manager except on Windows (see below). [CMake](https://cmake.org/download/),
Python (>=3.8), and a C/C++ compiler are required to verify the C libraries;
Python (>=3.8) is required to verify the Python bindings; and R (>= 4.0) is
required to verify the R bindings. See below for platform-specific direction
for how to obtain verification dependencies.

To run only C library verification (requires CMake and Arrow C++ but not R or Python):
Options are passed to the verification script using environment variables.
For example, to run only C library verification (requires CMake and Python but not R):

```bash
TEST_DEFAULT=0 TEST_C=1 TEST_C_BUNDLED=1 ./verify-release-candidate.sh 0.6.0 0
```

To run only R package verification (requires R but not CMake or Arrow C++):
To run only R package verification (requires R and Python but not CMake):

```bash
TEST_DEFAULT=0 TEST_R=1 ./verify-release-candidate.sh 0.6.0 0
```

To run only Python verification (requires Python but not CMake or Arrow C++):
To run only Python verification (requires Python but not CMake or R):

```bash
TEST_DEFAULT=0 TEST_PYTHON=1 ./verify-release-candidate.sh 0.6.0 0
```

### MacOS

On MacOS you can install all requirements except R using [Homebrew](https://brew.sh):
On MacOS you can install a modern C/C++ toolchain via the XCode Command Line Tools (i.e.,
`xcode-select --install`). Other dependencies are available via [Homebrew](https://brew.sh):

```bash
brew install cmake gnupg apache-arrow
```

For older MacOS or MacOS without Homebrew, you will have to install the XCode
Command Line Tools (i.e., `xcode-select --install`),
[install GnuPG](https://gnupg.org/download/),
[install CMake](https://cmake.org/download/), and build Arrow C++ from source.

```bash
# Download + build Arrow C++
curl https://github.com/apache/arrow/archive/refs/tags/apache-arrow-14.0.2.tar.gz | \
tar -zxf -
mkdir arrow-build && cd arrow-build
cmake ../apache-arrow-14.0.2/cpp \
-DARROW_JEMALLOC=OFF -DARROW_SIMD_LEVEL=NONE \
# Required for Arrow on old MacOS
-DCMAKE_CXX_FLAGS="-D_LIBCPP_DISABLE_AVAILABILITY" \
-DCMAKE_INSTALL_PREFIX=../arrow
cmake --build .
cmake --install . --prefix=../arrow
cd ..

# Pass location of install to the release verification script
export NANOARROW_CMAKE_OPTIONS="-DArrow_DIR=$(pwd)/arrow/lib/cmake/Arrow -DCMAKE_CXX_FLAGS=-D_LIBCPP_DISABLE_AVAILABILITY"
brew install cmake gnupg
```

You can install R using the instructions provided on the
[R Project Download page](https://cloud.r-project.org/bin/macosx/).

The system `python3` provided by MacOS is sufficient to verify the release
[R Project Download page](https://cloud.r-project.org/bin/macosx/);
the system `python3` provided by MacOS is sufficient to verify the release
candidate.

For older MacOS or MacOS without Homebrew, you can
[install GnuPG](https://gnupg.org/download/) and
[install CMake](https://cmake.org/download/) separately.

### Conda (Linux and MacOS)

Using `conda`, one can install all requirements needed for verification on Linux
Expand All @@ -120,7 +90,7 @@ conda create --name nanoarrow-verify-rc
conda activate nanoarrow-verify-rc
conda config --set channel_priority strict

conda install -c conda-forge compilers git cmake libarrow
conda install -c conda-forge compilers git cmake
# For R (see below about potential interactions with system R
# before installing via conda on MacOS)
conda install -c conda-forge r-testthat r-hms r-blob r-pkgbuild r-bit64
Expand All @@ -129,35 +99,22 @@ conda install -c conda-forge r-testthat r-hms r-blob r-pkgbuild r-bit64
Note that using conda-provided R when there is also a system install of R
on MacOS is unlikely to work.

Linux users that have built and installed a custom build of Arrow C++ may
have to `export LD_LIBRARY_PATH=${CONDA_PREFIX}/lib` before running the
verification script.

### Windows

On Windows, prerequisites can be installed using officially provided
installers:
On Windows, prerequisites can be installed using officially provided installers:
[Visual Studio](https://visualstudio.microsoft.com/vs/),
[CMake](https://cmake.org/download/), and
[Git](https://git-scm.com/downloads) should provide the prerequisties
to verify the C library; R and Rtools can be installed using the
[official R-project installer](https://cloud.r-project.org/bin/windows/).
Arrow C++ can be built from source. The version of bash provided with
Git for Windows can be used to execute the Arrow C++ build commands and
the verification script.
The version of bash provided with Git for Windows can be used to
run the verification script.

For R verification, You can set the `R_HOME` environment variable or
`export PATH="$PATH:/path/to/R"` (where `$R_HOME/bin/R` is the R executable)
to point to a specific R installation.

```bash
# Build Arrow C++ from source
curl -L https://github.com/apache/arrow/archive/refs/tags/apache-arrow-17.0.0.tar.gz | \
tar -zxf -
mkdir arrow-build && cd arrow-build
cmake ../apache-arrow-17.0.0/cpp -DCMAKE_INSTALL_PREFIX=../arrow-minimal
cmake --build .
cmake --install . --prefix=../arrow-minimal --config=Debug
cd ..

# Pass location of Arrow and R to the verification script
export NANOARROW_CMAKE_OPTIONS="-DCMAKE_PREFIX_PATH=$(pwd -W)/arrow-minimal -Dgtest_force_shared_crt=ON -DNANOARROW_ARROW_STATIC=ON"
export R_HOME="/c/Program Files/R/R-4.4.1"
```

Expand All @@ -167,13 +124,6 @@ On Debian/Ubuntu (e.g., `docker run --rm -it ubuntu:latest`) you can install pre

```bash
apt-get update && apt-get install -y git g++ cmake r-base gnupg curl python3-dev python3-venv

# For Arrow C++
apt-get install -y -V ca-certificates lsb-release wget
wget https://apache.jfrog.io/artifactory/arrow/$(lsb_release --id --short | tr 'A-Z' 'a-z')/apache-arrow-apt-source-latest-$(lsb_release --codename --short).deb
apt-get install -y -V ./apache-arrow-apt-source-latest-$(lsb_release --codename --short).deb
apt-get update
apt-get install -y -V libarrow-dev
```

If you have never installed an R package before, R verification will fail when it
Expand All @@ -187,7 +137,7 @@ On recent Fedora (e.g., `docker run --rm -it fedora:latest`), you can install al
using `dnf`:

```bash
dnf install -y git cmake R gnupg curl libarrow-devel python3-devel python3-virtualenv
dnf install -y git cmake R gnupg curl python3-devel python3-virtualenv
```

### Arch Linux
Expand All @@ -196,35 +146,23 @@ On Arch Linux (e.g., `docker run --rm -it archlinux:latest`, you can install all
using `pacman`):

```bash
pacman -Sy git gcc make cmake r-base gnupg curl arrow python
pacman -Sy git gcc make cmake r-base gnupg curl python
```

### Alpine Linux

On Alpine Linux (e.g., `docker run --rm -it alpine:latest`), most prerequisites are available using `apk add` except for Arrow C++ which requires enabling the
community repository.
On Alpine Linux (e.g., `docker run --rm -it alpine:latest`), all prerequisites are
available using `apk add`:

```bash
# Enable community repository for Arrow C++. Alternatively, you can build Arrow C++
# from source and pass its location via NANOARROW_CMAKE_OPTIONS="-DArrow_DIR=...".
cat > /etc/apk/repositories << EOF; $(echo)

https://dl-cdn.alpinelinux.org/alpine/v$(cut -d'.' -f1,2 /etc/alpine-release)/main/
https://dl-cdn.alpinelinux.org/alpine/v$(cut -d'.' -f1,2 /etc/alpine-release)/community/
https://dl-cdn.alpinelinux.org/alpine/edge/testing/

EOF
apk update

apk add bash linux-headers git cmake R R-dev g++ gnupg curl apache-arrow-dev \
python3-dev
apk add bash linux-headers git cmake R R-dev g++ gnupg curl python3-dev
```

### Big endian

One can verify a nanoarrow release candidate on big endian by setting
`DOCKER_DEFAULT_PLATFORM=linux/s390x` and following the instructions for
[Alpine Linux](#alpine-linux) or [Fedora](#fedora).
[Alpine Linux](#alpine-linux), [Fedora](#fedora), or [Debian/Ubuntu](#debianubuntu).

## Creating a release candidate

Expand Down
3 changes: 2 additions & 1 deletion dev/release/verify-release-candidate.sh
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,9 @@
# Requirements
# - cmake >= 3.14
# - R >= 3.5.0
# - Arrow C++ >= 9.0.0
# - Python >= 3.8
# - gpg (for key verification)
# - shasum or sha512sum
#
# Environment Variables
# - CMAKE_BIN: Command to use for cmake (e.g., cmake3 on Centos7)
Expand Down
Loading