Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configure workflow-run-job-linux to use sccache-dist build cluster #2672

Draft
wants to merge 109 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
109 commits
Select commit Hold shift + click to select a range
c3263ac
Configure workflow-run-job-linux to use sccache-dist build cluster [s…
trxcllnt Oct 30, 2024
3d8e058
try parallelism=(nproc * 2) [skip-vdc] [skip-docs] [skip-rapids]
trxcllnt Nov 1, 2024
3c706f5
try on cpu4 runners with 4x parallelism [skip-vdc] [skip-docs] [skip-…
trxcllnt Nov 1, 2024
476fb43
turn off SCCACHE_NO_CACHE [skip-vdc] [skip-docs] [skip-rapids]
trxcllnt Nov 1, 2024
9868491
initializeCommand should not pass two arguments to `bash -c` [skip-vd…
trxcllnt Nov 1, 2024
d389ba4
build uncached on cpu8 with -j16 [skip-vdc] [skip-docs] [skip-rapids]
trxcllnt Nov 1, 2024
4425803
test fewer jobs [skip-vdc] [skip-docs] [skip-rapids]
trxcllnt Nov 1, 2024
3ef25fe
test all jobs, use cpu4 runners [skip-vdc] [skip-docs] [skip-rapids]
trxcllnt Nov 12, 2024
569d179
use cpu16 for tests, only use build cluster for build jobs [skip-vdc]…
trxcllnt Nov 13, 2024
8d68433
bump sccache version [skip-vdc] [skip-docs] [skip-rapids]
trxcllnt Nov 23, 2024
95c726e
Merge branch 'main' of github.com:NVIDIA/cccl into fea/use-sccache-bu…
trxcllnt Nov 23, 2024
3aeaefe
update cuda12.6ext-gcc13 devcontainer [skip-vdc] [skip-docs] [skip-ra…
trxcllnt Nov 23, 2024
8c01095
bump sccache version [skip-vdc] [skip-docs] [skip-rapids]
trxcllnt Nov 24, 2024
1983388
Merge branch 'main' of github.com:NVIDIA/cccl into fea/use-sccache-bu…
trxcllnt Nov 27, 2024
3d36247
use -j64 [skip-vdc] [skip-docs] [skip-rapids]
trxcllnt Nov 27, 2024
e884b4e
test with cpu16 runners [skip-vdc] [skip-docs] [skip-rapids]
trxcllnt Nov 27, 2024
31c3817
include hidden files (.ninja_log) in job artifact [skip-vdc] [skip-do…
trxcllnt Nov 27, 2024
47135b9
bump sccache version [skip-vdc] [skip-docs] [skip-rapids]
trxcllnt Nov 30, 2024
5a976d2
Merge branch 'main' of github.com:NVIDIA/cccl into fea/use-sccache-bu…
trxcllnt Dec 2, 2024
7085ad2
bump sccache version [skip-vdc] [skip-docs] [skip-rapids]
trxcllnt Dec 2, 2024
6e1c181
Merge branch 'main' of github.com:NVIDIA/cccl into fea/use-sccache-bu…
trxcllnt Dec 3, 2024
97a5c91
bump sccache version [skip-vdc] [skip-docs] [skip-rapids]
trxcllnt Dec 3, 2024
c3aaba3
Merge branch 'main' of github.com:NVIDIA/cccl into fea/use-sccache-bu…
trxcllnt Dec 28, 2024
bcfa0b5
bump sccache version [skip-vdc] [skip-docs] [skip-rapids]
trxcllnt Dec 28, 2024
4eb6e78
bump sccache version [skip-vdc] [skip-docs] [skip-rapids]
trxcllnt Dec 28, 2024
f684fe3
Merge branch 'main' of github.com:NVIDIA/cccl into fea/use-sccache-bu…
trxcllnt Jan 13, 2025
32ef519
bump sccache version [skip-vdc] [skip-docs] [skip-rapids]
trxcllnt Jan 14, 2025
501b259
bump sccache version [skip-vdc] [skip-docs] [skip-rapids]
trxcllnt Jan 14, 2025
6529947
add script to print dist status table [skip-vdc] [skip-docs] [skip-ra…
trxcllnt Jan 14, 2025
8a1e909
bump sccache version [skip-vdc] [skip-docs] [skip-rapids]
trxcllnt Jan 15, 2025
2ca46f7
include timestamp in dist stats
trxcllnt Jan 15, 2025
c901c62
include quotes in csv output [skip-vdc] [skip-docs] [skip-rapids]
trxcllnt Jan 15, 2025
4ca3499
bump sccache version [skip-vdc] [skip-docs] [skip-rapids]
trxcllnt Jan 16, 2025
74bb161
Merge branch 'main' of github.com:NVIDIA/cccl into fea/use-sccache-bu…
trxcllnt Jan 16, 2025
288e3b5
bump sccache version [skip-vdc] [skip-docs] [skip-rapids]
trxcllnt Jan 16, 2025
82d5a45
Merge branch 'main' of github.com:NVIDIA/cccl into fea/use-sccache-bu…
trxcllnt Jan 17, 2025
bc89a11
bump sccache version [skip-vdc] [skip-docs] [skip-rapids]
trxcllnt Jan 17, 2025
bdd9395
Merge branch 'main' of github.com:NVIDIA/cccl into fea/use-sccache-bu…
trxcllnt Jan 24, 2025
dd3d91f
bump sccache version [skip-vdc] [skip-docs] [skip-rapids]
trxcllnt Jan 24, 2025
f0cf283
Merge branch 'main' of github.com:NVIDIA/cccl into fea/use-sccache-bu…
trxcllnt Jan 27, 2025
eb245ca
Merge branch 'main' of github.com:NVIDIA/cccl into fea/use-sccache-bu…
trxcllnt Jan 28, 2025
574d6fe
bump sccache version [skip-vdc] [skip-docs] [skip-rapids]
trxcllnt Jan 28, 2025
d9a6dc8
use 4-core runners [skip-vdc] [skip-docs] [skip-rapids]
trxcllnt Jan 29, 2025
f2537f9
Merge branch 'main' of github.com:NVIDIA/cccl into fea/use-sccache-bu…
trxcllnt Jan 31, 2025
5ddbbc7
bump sccache version [skip-vdc] [skip-docs] [skip-rapids]
trxcllnt Jan 31, 2025
f845766
Merge branch 'main' of github.com:NVIDIA/cccl into fea/use-sccache-bu…
trxcllnt Feb 3, 2025
3af5ded
bump sccache version [skip-vdc] [skip-docs] [skip-rapids]
trxcllnt Feb 3, 2025
cedfeaf
Merge branch 'main' of github.com:NVIDIA/cccl into fea/use-sccache-bu…
trxcllnt Feb 4, 2025
88c964a
bump sccache version [skip-vdc] [skip-docs] [skip-rapids]
trxcllnt Feb 4, 2025
7dabc2f
Merge branch 'main' of github.com:NVIDIA/cccl into fea/use-sccache-bu…
trxcllnt Feb 6, 2025
aeb23df
bump sccache version [skip-vdc] [skip-docs] [skip-rapids]
trxcllnt Feb 6, 2025
eae4a82
bump sccache version [skip-vdc] [skip-docs] [skip-rapids]
trxcllnt Feb 6, 2025
e22e8a3
Merge branch 'main' of github.com:NVIDIA/cccl into fea/use-sccache-bu…
trxcllnt Feb 11, 2025
fb9ffc9
bump sccache version [skip-vdc] [skip-docs] [skip-rapids]
trxcllnt Feb 11, 2025
3689fbd
Merge branch 'main' of github.com:NVIDIA/cccl into fea/use-sccache-bu…
trxcllnt Feb 12, 2025
68620b6
bump sccache version [skip-vdc] [skip-docs] [skip-rapids]
trxcllnt Feb 12, 2025
bc29661
Merge branch 'main' of github.com:NVIDIA/cccl into fea/use-sccache-bu…
trxcllnt Feb 13, 2025
1c73a61
bump sccache version [skip-vdc] [skip-docs] [skip-rapids]
trxcllnt Feb 13, 2025
f40b539
use new sccache binary for test jobs too [skip-vdc] [skip-docs] [skip…
trxcllnt Feb 13, 2025
c86c0c1
Merge branch 'main' of github.com:NVIDIA/cccl into fea/use-sccache-bu…
trxcllnt Feb 14, 2025
aa46308
bump sccache version [skip-vdc] [skip-docs] [skip-rapids]
trxcllnt Feb 14, 2025
4ce2974
bump sccache version [skip-vdc] [skip-docs] [skip-rapids]
trxcllnt Feb 14, 2025
0adf4fd
Merge branch 'main' of github.com:NVIDIA/cccl into fea/use-sccache-bu…
trxcllnt Feb 17, 2025
6445191
bump sccache version [skip-vdc] [skip-docs] [skip-rapids]
trxcllnt Feb 17, 2025
7e21e14
Merge branch 'main' of github.com:NVIDIA/cccl into fea/use-sccache-bu…
trxcllnt Feb 18, 2025
8e8d0d5
bump sccache version to use build cluster for nvhpc [skip-vdc] [skip-…
trxcllnt Feb 18, 2025
f7a5618
Merge branch 'main' of github.com:NVIDIA/cccl into fea/use-sccache-bu…
trxcllnt Feb 18, 2025
d1d908e
set SCCACHE_RECACHE via --env
trxcllnt Feb 18, 2025
3766e79
pass PARALLEL_LEVEL to lit when pre-compiling
trxcllnt Feb 18, 2025
17925a5
use build cluster for test jobs too
trxcllnt Feb 18, 2025
9fd2488
bump sccache version [skip-vdc] [skip-docs] [skip-rapids]
trxcllnt Feb 18, 2025
14b0771
fix lint [skip-vdc] [skip-docs] [skip-rapids]
trxcllnt Feb 19, 2025
12faf50
Merge branch 'main' of github.com:NVIDIA/cccl into fea/use-sccache-bu…
trxcllnt Feb 20, 2025
afe318f
bump sccache version [skip-vdc] [skip-docs] [skip-rapids]
trxcllnt Feb 20, 2025
003487b
Merge branch 'main' of github.com:NVIDIA/cccl into fea/use-sccache-bu…
trxcllnt Feb 20, 2025
8459a8c
comment out SCCACHE_RECACHE [skip-vdc] [skip-docs] [skip-rapids]
trxcllnt Feb 20, 2025
a4a19fe
Merge branch 'main' of github.com:NVIDIA/cccl into fea/use-sccache-bu…
trxcllnt Mar 5, 2025
df9cbe1
bump sccache version [skip-vdc] [skip-docs] [skip-rapids]
trxcllnt Mar 5, 2025
bf437a9
Merge branch 'main' of github.com:NVIDIA/cccl into fea/use-sccache-bu…
trxcllnt Mar 11, 2025
00f0aeb
bump sccache version [skip-vdc] [skip-docs] [skip-rapids]
trxcllnt Mar 11, 2025
d3e8ab9
Merge branch 'main' of github.com:NVIDIA/cccl into fea/use-sccache-bu…
trxcllnt Mar 13, 2025
c25ebd3
bump sccache version [skip-vdc] [skip-docs] [skip-rapids]
trxcllnt Mar 13, 2025
8d987e2
Merge branch 'main' of github.com:NVIDIA/cccl into fea/use-sccache-bu…
trxcllnt Mar 16, 2025
be2706a
bump sccache version and fail on local compile fallback [skip-vdc] [s…
trxcllnt Mar 16, 2025
d7510b6
bump sccache version [skip-vdc] [skip-docs] [skip-rapids]
trxcllnt Mar 16, 2025
b0d83dd
Merge branch 'main' of github.com:NVIDIA/cccl into fea/use-sccache-bu…
trxcllnt Mar 16, 2025
c427a75
bump sccache version [skip-vdc] [skip-docs] [skip-rapids]
trxcllnt Mar 16, 2025
ed74b12
bump sccache version, define SCCACHE_NO_DIST_COMPILE=1 during configu…
trxcllnt Mar 16, 2025
7275f71
bump sccache version [skip-vdc] [skip-docs] [skip-rapids]
trxcllnt Mar 16, 2025
5d8e506
Merge branch 'main' of github.com:NVIDIA/cccl into fea/use-sccache-bu…
trxcllnt Mar 17, 2025
0a6e787
bump sccache version [skip-vdc] [skip-docs] [skip-rapids]
trxcllnt Mar 17, 2025
6cf31f3
Merge branch 'main' of github.com:NVIDIA/cccl into fea/use-sccache-bu…
trxcllnt Mar 18, 2025
f319656
lower timeout to 30m [skip-vdc] [skip-docs] [skip-rapids]
trxcllnt Mar 18, 2025
76a9fea
Merge branch 'main' of github.com:NVIDIA/cccl into fea/use-sccache-bu…
trxcllnt Mar 22, 2025
b9a8a13
bump sccache version [skip-vdc] [skip-docs] [skip-rapids]
trxcllnt Mar 22, 2025
98f3614
Merge branch 'main' of github.com:NVIDIA/cccl into fea/use-sccache-bu…
trxcllnt Mar 24, 2025
327ba8e
bump sccache version [skip-vdc] [skip-docs] [skip-rapids]
trxcllnt Mar 24, 2025
2452d75
fix -Wlogical-op-parentheses [skip-vdc] [skip-docs] [skip-rapids]
trxcllnt Mar 24, 2025
c41230d
Merge branch 'main' of github.com:NVIDIA/cccl into fea/use-sccache-bu…
trxcllnt Mar 27, 2025
d0a30ea
bump sccache version [skip-vdc] [skip-docs] [skip-rapids]
trxcllnt Mar 27, 2025
9d0562d
bump sccache version [skip-vdc] [skip-docs] [skip-rapids]
trxcllnt Mar 27, 2025
df64cae
Merge branch 'main' of github.com:NVIDIA/cccl into fea/use-sccache-bu…
trxcllnt Mar 27, 2025
9939165
bump sccache version [skip-vdc] [skip-docs] [skip-rapids]
trxcllnt Mar 27, 2025
0b300fd
Merge branch 'main' of github.com:NVIDIA/cccl into fea/use-sccache-bu…
trxcllnt Apr 3, 2025
2f44678
support docker run --ulimit argument in launch.sh [skip-vdc] [skip-do…
trxcllnt Apr 3, 2025
7ace9c2
bump sccache version [skip-vdc] [skip-docs] [skip-rapids]
trxcllnt Apr 3, 2025
c87cdf5
bump sccache version [skip-vdc] [skip-docs] [skip-rapids]
trxcllnt Apr 3, 2025
a9c4ddc
set PARALLEL_LEVEL to nproc * 64 [skip-vdc] [skip-docs] [skip-rapids]
trxcllnt Apr 3, 2025
800b147
restrict PARALLEL_LEVEL for libcu++ builds only [skip-vdc] [skip-docs…
trxcllnt Apr 3, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 16 additions & 2 deletions .devcontainer/launch.sh
Original file line number Diff line number Diff line change
Expand Up @@ -45,9 +45,13 @@ parse_options() {
local UNPARSED="${!#}";
# Splice the unparsed arguments variable name from the arguments list
set -- "${@:1:$#-1}";
# Read the name of the variable in which to return docker run arguments
local RUN_ARGS="${!#}";
# Splice the docker run arguments variable name from the arguments list
set -- "${@:1:$#-1}";

local OPTIONS=c:e:H:dhv:
local LONG_OPTIONS=cuda:,cuda-ext,env:,host:,gpus:,volume:,docker,help
local LONG_OPTIONS=cuda:,cuda-ext,env:,host:,gpus:,volume:,ulimit:,docker,help
# shellcheck disable=SC2155
local PARSED_OPTIONS="$(getopt -n "$0" -o "${OPTIONS}" --long "${LONG_OPTIONS}" -- "$@")"

Expand All @@ -58,6 +62,8 @@ parse_options() {

eval set -- "${PARSED_OPTIONS}"

local -a DOCKER_RUN_ARGS=();

while true; do
case "$1" in
-c|--cuda)
Expand Down Expand Up @@ -92,6 +98,10 @@ parse_options() {
volumes+=("$1" "$2")
shift 2
;;
--ulimit)
DOCKER_RUN_ARGS+=("$1" "$2")
shift 2
;;
--)
shift
_upvar "${UNPARSED}" "${@}"
Expand All @@ -104,6 +114,8 @@ parse_options() {
;;
esac
done

_upvar "${RUN_ARGS}" "${DOCKER_RUN_ARGS[@]}"
}

# shellcheck disable=SC2155
Expand Down Expand Up @@ -243,6 +255,7 @@ launch_docker() {
fi

exec docker run \
"${run_args[@]}" \
"${RUN_ARGS[@]}" \
"${ENV_VARS[@]}" \
"${MOUNTS[@]}" \
Expand Down Expand Up @@ -285,8 +298,9 @@ launch_vscode() {
}

main() {
local -a run_args;
local -a unparsed;
parse_options "$@" unparsed;
parse_options "$@" run_args unparsed;
set -- "${unparsed[@]}";

# If no CTK/Host compiler are provided, just use the default environment
Expand Down
9 changes: 9 additions & 0 deletions .github/actions/workflow-build/build-workflow.py
Original file line number Diff line number Diff line change
Expand Up @@ -424,6 +424,15 @@ def generate_dispatch_job_runner(matrix_job, job_type):

job_info = get_job_type_info(job_type)
if not job_info["gpu"]:
# Use smaller 4-core runners for build jobs if we can
if job_type == "build":
# ClangCUDA, MSVC, and NVHPC should use 16-core runners
if (
("clang" not in matrix_job["cudacxx"])
and ("msvc" not in matrix_job["cxx"])
and ("nvhpc" not in matrix_job["cxx"])
):
return f"{runner_os}-{cpu}-cpu4"
return f"{runner_os}-{cpu}-cpu16"

gpu = get_gpu(matrix_job["gpu"])
Expand Down
190 changes: 150 additions & 40 deletions .github/actions/workflow-run-job-linux/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,10 @@ inputs:
host:
description: "The host compiler to use when selecting a devcontainer."
required: true
# This token must have the "read:enterprise" scope
dist-token:
description: "The token used to authenticate with the sccache-dist build cluster."
required: false

runs:
using: "composite"
Expand Down Expand Up @@ -72,12 +76,15 @@ runs:
# Dereferencing the command from an env var instead of a GHA input avoids issues with escaping
# semicolons and other special characters (e.g. `-arch "60;70;80"`).
COMMAND: "${{inputs.command}}"
DIST_TOKEN: "${{inputs.dist-token}}"
AWS_ACCESS_KEY_ID: "${{env.AWS_ACCESS_KEY_ID}}"
AWS_SESSION_TOKEN: "${{env.AWS_SESSION_TOKEN}}"
AWS_SECRET_ACCESS_KEY: "${{env.AWS_SECRET_ACCESS_KEY}}"
run: |
echo "[host] github.workspace: ${{github.workspace}}"
echo "[host] runner.temp: ${{runner.temp}}"
echo "[container] GITHUB_WORKSPACE: ${GITHUB_WORKSPACE:-}"
echo "[container] RUNNER_TEMP: ${RUNNER_TEMP:-}"
echo "[container] PWD: $(pwd)"

# Necessary because we're doing docker-outside-of-docker:
Expand All @@ -87,12 +94,14 @@ runs:
ln -s "$(pwd)" "${{github.workspace}}"
cd "${{github.workspace}}"

mkdir artifacts
echo "[container] new PWD: $(pwd)"

cat <<'EOF' > ci.sh
cat <<"EOF" > "$RUNNER_TEMP/ci.sh"
#! /usr/bin/env bash
set -euo pipefail
echo -e "\e[1;34mRunning as '$(whoami)' user in $(pwd):\e[0m"
# Print current dist status to verify we're connected
echo -e "\e[1;34mBuild cluster:\n$(./ci/sccache_dist_status.sh | sed 's/\"//g' | column -t -s,)\e[0m"
echo -e "\e[1;34m${COMMAND}\e[0m"
eval "${COMMAND}"
exit_code=$?
Expand All @@ -110,34 +119,20 @@ runs:
echo " - Continuous Integration (CI) Overview: https://github.com/NVIDIA/cccl/blob/main/ci-overview.md"
exit $exit_code
fi

# Copy any artifacts we want to preserve out of the container:
results_dir=/artifacts

# Finds a matching file in the repo directory and copies it to the results directory.
find_and_copy() {
filename="$1"
filepath="$(find . -name "${filename}" -print -quit)"
if [[ -z "$filepath" ]]; then
echo "${filename} does not exist in repo directory."
return 1
fi
cp -v "$filepath" "$results_dir"
}

find_and_copy "sccache_stats.json" || :
EOF

chmod +x ci.sh
chmod +x "$RUNNER_TEMP/ci.sh"

mkdir "$RUNNER_TEMP/.aws";
mkdir -p "$RUNNER_TEMP/.aws"

cat <<EOF > "$RUNNER_TEMP/.aws/config"
[default]
bucket=rapids-sccache-devs
region=us-east-2
EOF

chmod 0664 "$RUNNER_TEMP/.aws/config"

cat <<EOF > "$RUNNER_TEMP/.aws/credentials"
[default]
aws_access_key_id=$AWS_ACCESS_KEY_ID
Expand All @@ -146,32 +141,117 @@ runs:
EOF

chmod 0600 "$RUNNER_TEMP/.aws/credentials"
chmod 0664 "$RUNNER_TEMP/.aws/config"

declare -a gpu_request=()
mkdir -p "$RUNNER_TEMP/.config/sccache"

# Configure the sccache client
cat <<EOF > "$RUNNER_TEMP/.config/sccache/config"
server_startup_timeout_ms = $((5 * 60 * 1000))
[cache.disk]
size = 0
[cache.disk.preprocessor_cache_mode]
use_preprocessor_cache_mode = false
EOF

chmod 0664 "$RUNNER_TEMP/.config/sccache/config"

# Download new sccache binary
mkdir -p "$RUNNER_TEMP/bin"
curl -fsSL \
"https://github.com/trxcllnt/sccache/releases/download/v0.10.0-rapids.15/sccache-v0.10.0-rapids.15-$(uname -m)-unknown-linux-musl.tar.gz" \
| tar -C "$RUNNER_TEMP/bin" -zf - --wildcards --strip-components=1 -x '*/sccache'

declare -a extra_launch_args=(
# Write debug logs to a file we can upload
--env "SCCACHE_SERVER_LOG=sccache=debug"
--env "SCCACHE_ERROR_LOG=/home/coder/cccl/sccache.log"
# Cache in a separate S3 bucket prefix
--env "SCCACHE_S3_KEY_PREFIX=cccl-test-sccache-dist"
# Mount in new sccache binary
--volume "${{runner.temp}}/bin/sccache:/usr/bin/sccache:ro"
)

OS="$(uname -s)"
CPUS="$(nproc --all)"
ARCH="$(dpkg --print-architecture)"

# Use the build cluster
if test -n "${DIST_TOKEN+x}"; then

# Configure sccache client to talk to the build cluster
cat <<EOF >> "$RUNNER_TEMP/.config/sccache/config"
[dist]
# Infinitely retry all retryable dist-compilation errors
max_retries = inf
# Never fallback to building locally, fail instead
fallback_to_local_compile = false

scheduler_url = "https://${ARCH}.${OS,,}.sccache.gha-runners.nvidia.com"

# Build cluster auth
[dist.auth]
type = "token"
token = "$DIST_TOKEN"

# Build cluster network config
[dist.net]
connect_timeout = 30
request_timeout = 1800
EOF

if grep -q '"./ci/build_' <<< "$COMMAND"; then
extra_launch_args+=(
# Repopulate the cache
--env "SCCACHE_RECACHE=1"
# Do not cache build products
# --env "SCCACHE_NO_CACHE=1"
)
fi

# Over-subscribe -j to keep the build cluster busy if _not_ ClangCUDA.
# ClangCUDA can use the build cluster for C++ files, but _not_ CUDA,
# and we'll OOM if we try to compile too many at once.
if ! grep -q '\-cuda "clang' <<< "$COMMAND"; then
if ! grep -q '_libcudacxx.sh' <<< "$COMMAND"; then
extra_launch_args+=(
--env "PARALLEL_LEVEL=100000"
)
else
extra_launch_args+=(
--env "PARALLEL_LEVEL=$((CPUS * 64))"
)
fi
extra_launch_args+=(
--ulimit nofile=100000:100000
)
fi

if ! grep -q '11.1' <<< "${{inputs.cuda}}"; then
# Compile device objects in parallel
extra_launch_args+=(
--env "NVCC_APPEND_FLAGS=-t=100"
)
fi
fi

# Explicitly pass which GPU to use if on a GPU runner
if [[ "${RUNNER}" = *"-gpu-"* ]]; then
gpu_request+=(--gpus "device=${NVIDIA_VISIBLE_DEVICES}")
extra_launch_args+=(--gpus "device=${NVIDIA_VISIBLE_DEVICES}")
fi

host_path() {
sed "s@/__w@$(dirname "$(dirname "${{github.workspace}}")")@" <<< "$1"
}

# If the image contains "cudaXX.Yext"...
if [[ "${IMAGE}" =~ cuda[0-9.]+ext ]]; then
cuda_ext_request="--cuda-ext"
extra_launch_args+=(--cuda-ext)
fi

# Launch this container using the host's docker daemon
set -x

${{github.event.repository.name}}/.devcontainer/launch.sh \
--docker \
--cuda ${{inputs.cuda}} \
--host ${{inputs.host}} \
${cuda_ext_request:-} \
"${gpu_request[@]}" \
"${extra_launch_args[@]}" \
--env "CI=$CI" \
--env "AWS_ROLE_ARN=" \
--env "COMMAND=$COMMAND" \
Expand All @@ -185,33 +265,63 @@ runs:
--env "GITHUB_WORKSPACE=$GITHUB_WORKSPACE" \
--env "GITHUB_REPOSITORY=$GITHUB_REPOSITORY" \
--env "GITHUB_STEP_SUMMARY=$GITHUB_STEP_SUMMARY" \
--volume "${{github.workspace}}/ci.sh:/ci.sh" \
--volume "${{github.workspace}}/artifacts:/artifacts" \
--volume "$(host_path "$RUNNER_TEMP")/.aws:/root/.aws" \
--volume "${{runner.temp}}/ci.sh:/ci.sh:ro" \
--volume "${{runner.temp}}/.aws:/root/.aws" \
--volume "${{runner.temp}}/.config:/root/.config:ro" \
--volume "$(dirname "$(dirname "${{github.workspace}}")"):/__w" \
-- /ci.sh

- name: Prepare job artifacts
- if: ${{ always() }}
name: Create job artifact dir
shell: bash --noprofile --norc -euo pipefail {0}
run: |
echo "Prepare job artifacts"
result_dir="jobs/${{inputs.id}}"
mkdir -p "$result_dir"
echo "result_dir=$result_dir" >> "$GITHUB_ENV"

- if: ${{ success() }}
name: Record job success
shell: bash --noprofile --norc -euo pipefail {0}
run: |
touch "$result_dir/success"

artifacts_exist="$(ls -A artifacts)"
if [ "$artifacts_exist" ]; then
cp -rv artifacts/* "$result_dir"
fi
- if: ${{ always() }}
name: Prepare job artifacts
shell: bash --noprofile --norc -euo pipefail {0}
run: |
echo "Prepare job artifacts"

# chmod all temp contents 777 so the runner can delete them
find "$RUNNER_TEMP/" -exec chmod 0777 {} \;

# Finds a matching file in the repo directory and copies it to the results directory.
find_and_copy() {
pat="$1"
dir="${{github.event.repository.name}}"
filepath="$(find "$dir/" -type f -path "$dir/$pat" -print -quit)"
if [[ -z "$filepath" ]]; then
echo "File with pattern '$dir/$pat' does not exist in repo directory."
return 1
fi
cp -v "$filepath" "$result_dir"
}

# Copy any artifacts we want to preserve out of the container
find_and_copy "sccache.log" || :
find_and_copy "build/*/.ninja_log" || :
find_and_copy "build/*/build.ninja" || :
find_and_copy "build/*/rules.ninja" || :
find_and_copy "build/*/sccache_stats.json" || :

echo "::group::Job artifacts"
tree "$result_dir"
echo "::endgroup::"

- name: Upload job artifacts
- if: ${{ always() }}
name: Upload job artifacts
uses: actions/upload-artifact@v4
with:
name: jobs-${{inputs.id}}
path: jobs
compression-level: 0
include-hidden-files: true
Loading
Loading