diff --git a/.dockerhub/source.Dockerfile b/.dockerhub/source.Dockerfile index 149bc7c949..c17e65a8de 100644 --- a/.dockerhub/source.Dockerfile +++ b/.dockerhub/source.Dockerfile @@ -223,16 +223,16 @@ RUN if [ -n "$RUN_TESTS" ]; then \ || ([ -z "$FAIL_ON_TEST_FAILURE" ] || exit 1) \ fi -# multi-stage for testing workspaces -FROM overlay_builder AS workspaces_tester +# multi-stage for testing workspace +FROM overlay_builder AS workspace_tester # copy workspace test results COPY --from=ros2_tester $ROS2_WS/log $ROS2_WS/log COPY --from=underlay_tester $UNDERLAY_WS/log $UNDERLAY_WS/log COPY --from=overlay_tester $OVERLAY_WS/log $OVERLAY_WS/log -# multi-stage for shipping overlay -FROM overlay_builder AS overlay_shipper +# multi-stage for shipping workspace +FROM overlay_builder AS workspace_shipper # restore apt for docker RUN mv /etc/apt/docker-clean /etc/apt/apt.conf.d/ && \ diff --git a/doc/continuous_integration/README.md b/doc/continuous_integration/README.md new file mode 100644 index 0000000000..92e5fe81ee --- /dev/null +++ b/doc/continuous_integration/README.md @@ -0,0 +1,37 @@ +# Continuous Integration Documentation +Documentation on the existing CI for the project is resides here. + +## Overview +The existing CI is composed of multiple integration services that together help provide maintainers a fast and scalable testing environment. To help detect upstream breakages quickly as well, the existing CI allows for changes to be evaluated using the latest development dependencies (e.g. using ROS2 master branches). In light of the large dependency footprint a high-level ROS2 navigation stack necessitates, the use of each integration service is optimized to maximize caching of environmental setup and increase workflow throughput. As these optimizations add complexity to the CI configuration, this documentation provides further explanations and reasoning behind each configuration. + +![pipeline](figs/pipeline.svg) + +The figure above is a high level diagram on how the integration services described below are composed. + +## Integrations + +The following links document each integration and are best approached in the same order presented. + +### GitHub + +GitHub is used for hosting the source repo, tickets and PRs, as well for managing the OAuth and configs for the rest of the other integration services in the CI pipeline. + +### [Dockerfile](dockerfile.md) + +Dockerfiles are used for generating the docker images for building and testing the project. They also self document the setup and configuration of upstream dependencies, ensuring contributors have a reproducible and repeatable development environment for collaboration. + +### [DockerHub](dockerhub.md) + +DockerHub is used to build and host the regestry of tagged docker images, so that downstream services in the CI pipeline can quickly download and bootstrap the latest up-to-date development environment. + +### [CircleCI](circleci.md) + +CircleCI is used to checkout, build, and test the project via docker containers spawned from the tagged docker images. Triggered by scheduled or GitHub events like pushed commits branches or pull requests, it deterministically retains a warm build cache while generating logs and test result artifacts. + +### [CodeCov](codecov.md) + +CodeCov is used to help monitor code quality by rendering test artifacts from the upstream pipeline into interactive analytics, improving the visibility of the project's health and feedback for contributions. + +### [Future Work](future_work.md) + +The CI has room for improvement and may still evolve over time, as alternate integration options become more viable, and the pros and cons of each shift. diff --git a/doc/continuous_integration/circleci.md b/doc/continuous_integration/circleci.md new file mode 100644 index 0000000000..fcf4ccbed5 --- /dev/null +++ b/doc/continuous_integration/circleci.md @@ -0,0 +1,104 @@ +# CircleCI Documentation + +CircleCI is a service used to build and test the project's codebase to catch regressions as well as to check pull request submissions. Using continuous integration helps maintainers sustain high code quality while providing external contributors a well defined evaluation method with which to validate contributions, even without having to setup a local development environment. More info on CircleCI can be found here: + +* [CircleCI](https://circleci.com/) +* [Navigation2 on CircleCI](https://circleci.com/gh/ros-planning/navigation2) + +For this particular CI, Docker is used to build and test the project within containers derived from images pulled from DockerHub, bootstrapping the CI with a development environment including pre-configured dependencies and warm workspace build caches. View the accompanying DockerFile and DockerHub documentation for more info on this accompanying CI setup. + +* [Dockerfile](dockerfile.md) +* [DockerHub](dockerhub.md) + +CircleCI is configured via the [config.yml](/.circleci/config.yml) yaml file within the `.circleci/` folder at the root of the GitHub repo. The config file for this project is self-contained and thus densely structured, yet written in a functional style to remain DRY and modular. This keeps it easily generalizable for other ROS packages or for adjusting overlayed workspaces. Despite the anchors and references in yaml, the config file is best understood read in reverse, from bottom to top, in order of abstraction hierarchy, while reading this accompanied document. Further references on CircleCI configurations, such as syntax and structure can be found here: + +* [Writing YAML](https://circleci.com/docs/2.0/writing-yaml) +* [Configuring CircleCI](https://circleci.com/docs/2.0/configuration-reference) + +## Workflows + +The CI config consists of three main [workflows](https://circleci.com/docs/2.0/configuration-reference/#workflows). One workflow is for checking PR events, essentially triggered by any pushed commits targeting the main branch, by building and testing the project both in release and debug mode for accurate performance benchmarking or generating test coverage results. The second is nightly cron scheduled to check the main branch, while additionally testing a matrix of supported RMW vendors not tested on normal PRs. This helps prioritize CI to quickly check new contributions, while simultaneously keeping tabs on the health of existing code. The third is another cron for updating CI image builds on DockerHub, and is scheduled to finish prior to the nightly workflow. This reduces the chance of updating CI images while a CI workflow is in progress. + +The order in which jobs are executed is conditional upon the completion of those it [requires](https://circleci.com/docs/2.0/configuration-reference/#requires), forming a conventional directed acyclic graph of dependent jobs. Independent jobs may of course be parallelized in the CI pipeline; so by splitting the build and test jobs in two, multiple test jobs, such as the matrix of RMW tests, may commence as soon as the dependent build job completes, avoiding unnecessarily re-building the same codebase for each RMW vendor. + +## Jobs + +The list of [jobs](https://circleci.com/docs/2.0/configuration-reference/#jobs) referenced within the workflows consist of either release or debug build/test jobs. These jobs are largely similar with the exception of using different executors, and that debug test jobs include additional command steps for reporting code coverage analytics. The build jobs simply checkout the project, set up the build environment, and compile the project. The test jobs in turn pick up from where the build job left off by restoring the build, and then running the test on that build. + +In addition to enabling parallelism between independent jobs, the bifurcation build and test job types enables [parallelism](https://circleci.com/docs/2.0/configuration-reference/#parallelism) within each test job, leveraged later for splitting longer tests across multiple containers for that job. Given packages independencies, container parallelism for build jobs is not as easily applicable, and thus only used for testing. + +## Executors + +Two different [executors](https://circleci.com/docs/2.0/configuration-reference/#executors-requires-version-21) are used to define the environment in which the steps of a job will be run, one for debug and one for release builds. Given only the release executor is used in testing the full matrix of RMW vendors, the docker images for the debug executor merely include the default RMW; sparing debug jobs the time in compiling unused RMW vendors in debug testing. + +The executors also differ by a few environment variables. These include appropriate mixin flags for the workspace builds and an additional nonce string, unique to that executor, used in deriving the hash key for saving and restoring caches. This prevents cache key collisions or cross talk between parallel jobs using different executors in the same workflow. + +## Commands + +To reuse sequences of steps across multiple jobs, a number of [commands](https://circleci.com/docs/2.0/configuration-reference/#commands-requires-version-21) are defined. + +When checking our source code from the repo, additional pre and post checkout steps are performed to prepare the file system. When setting up dependencies, the build is prepared much the same way as in the project Dockerfile, by first setting up the underlay and then the overlay workspace. Additional steps are included for restoring and resetting previous ccache files and statistics, including a few diagnostic steps that can be used for debugging the CI itself, e.g. investigating as to why the hit rate for ccache may be abnormally low. Once the overlay setup is finished, more ccache diagnostics and files are saved for inspection or later restoration. A restore build command is also defined for test jobs, operating under the condition that a workspace with the same checksum can be restored from a previous build job in the workflow. Additional commands are also defined for steps in testing the overlay workspace and reporting coverage results. + +# References + +Per the YAML syntax, a reference must proceed any anchor that it links back to. Stylistically, any root level keys declared that only include references are denoted with an underscore (`_`) prefixed to help distinguish them from keys expected by the CircleCI config schema. + +## Common Environment + +For environment variables common among the executors, workspace paths and specific config options are listed here. While the workspace paths are parameterized to avoid hardcoding paths in common command functions, they may also be hardcoded as fields among a few high level steps given a limitation of the config syntax. The additional config options help optimize our project build given CI resource limits: + +* Capping ccache size given CI storage limits, speeding up cache restoration +* Setting ccache location to use RAM disk to improve file IO performance +* Limiting parallel make and linker jobs as to avoid exhausting container's RAM +* Further adjustments for changing test behavior and sequential test stdout. + +## Steps + +Low level steps, defined prior to the job commands where they are used, are recursively defined from more functional common commands. + +### Checkout + +Checking out code consists of three stages, including pre and post checkout steps. To simplify the formulaic common commands above, the pre-checkout step replicates a synthetic workspace from the installed ROS distro directory by symbolically linking an install folder, and bootstrapping the checksum from the timestamp of an expected file in ROS docker images. This is a measure to ensure if the nightly docker image is changed/rebuilt, then all CI caches should also be busted. Ideally, the docker image sha/hash should be used for this instead, but as of writing there does not seem to be a reliable method for acquiring this image digest from within the same derived container: + +* [Introspect image info from inside docker executor](https://discuss.circleci.com/t/introspect-image-info-from-inside-docker-executor/31620) + +The overlay workspace is then cleaned prior to checking out the project. The post checkout step simply checks to see if the underlay has changed to determine whether it should also be cleaned, recloned, and thus rebuilt as well. + +## Workspaces + +The rest of the steps are references to repeatedly define workspace specific commands, such as install, building and testing either the underlay or overlay workspace. Some points of note however include: + +* The CI cache for ccache is intentionally linked with the underlay rather than the overlay workspace + * so that consecutive commits to the same PR are more likely to retain a warm and recent ccache +* CCache Stats is intentionally used to zero stats before building a workspace + * so the next consecutive run of the CCache Stats reflects only upon that given workspace +* Restore workspace command intentionally sets the `build` parameter to `false` + * to avoid unnecessary duplication of the same workspace cache and build logs + +## Code Coverage + +The last few steps invoke a custom script to collect and post process the generated code coverage results from debug test jobs. This always runs regardless if the tests fail so that failed code coverage reports may still be reviewed. The final report is uploaded to CodeCov. + +## Common Commands + +Common commands for low level, repeated, and formulaic tasks are defined for saving and restoring CI caches, as well as for installing, building, and testing workspaces. + +### Caching + +Multiple forms of [caching](https://circleci.com/docs/2.0/caching/) is done over the completion of a single workflow. To appropriately save and restore caches in a deterministic manner, these steps are abstracted as commands. Although CircleCI does provide an explicit step for persisting temporary files across jobs in the same workflow, i.e. a [workspace](https://circleci.com/docs/2.0/configuration-reference/#persist_to_workspace), caches are used instead for a few reasons. First there is only one workspace per workflow, thus avoiding cross talk between executors or release/debug job types is not as easily achievable. Secondly, unlike workspaces, caches can persist across workflows, permitting repeated workflows for the same PR to pull from the cache of prior runs; e.g. in order to keep the ccache as fresh as possible for that specific PR. + +For this project, caching is done with respect to a given workspace. As such, a specified string and checksum from the workspace are combined with workflow specifics to ensure the [restored cache](https://circleci.com/docs/2.0/configuration-reference/#restore_cache) is uniquely identifiable and won't collide with other concurrent PRs. + +For saving a cache, the command is similar, aside from specifying the path to directory or file to be stored in the [saved cache](https://circleci.com/docs/2.0/configuration-reference/#save_cache). Given CI cache are conventionally read only, meaning a cache key can not be reused or updated to point to a newer cache, the current unix epoch is appended to the cache key to ensure key uniqueness. Because CircleCI key lookup behavior for cache restoration is performed via the most recent longest matching prefix, the latest matching cache is always restored. + +These workspace checksums are uploaded as [stored artifacts](https://circleci.com/docs/2.0/configuration-reference/#store_artifacts) throughout other commands to help introspect to debug caching behavior when needed. + +### Building + +For installing and building workspaces, the process resembles that within the project Dockerfile. Additional bookkeeping is performed to update the workspace checksum file by piping stdout that deterministically describes the state of the build environment. This is done by seeding from the checksum of the underlay workspace and then appending info about source code checked out into the overlay workspace, as well as the list of required dependencies installed. When setting up the workspace, this checksum will first be used to check if the workspace can be restored from a prior workflow build. If the source code or required dependencies change, resulting in a missed cache hit, the unfinished workspace is then built. If the workspace build is successful then it will be cached. Regardless however, the build logs are always uploaded as stored artifacts for review or debugging. The odd shuffling of symbolic directories is done as a workaround given a limitation of the S3 SDK: + +* [Failing to upload artifacts from symbolic link directories](https://discuss.circleci.com/t/failing-to-upload-artifacts-from-symbolic-link-directories/28000) + +### Testing + +For testing workspaces, the list of packages within the workspace are [tested in parallel](https://circleci.com/docs/2.0/parallelism-faster-jobs/) across the number of replicated containers for the given test job as denoted by the `parallelism` option. Here packages are split by anticipated test timing; the heuristic derived from the reported duration and classname of prior recent test results. The logs and results from the tests are then always [reported](https://circleci.com/docs/2.0/configuration-reference/#store_test_results) and uploaded. diff --git a/doc/continuous_integration/codecov.md b/doc/continuous_integration/codecov.md new file mode 100644 index 0000000000..e21aec9258 --- /dev/null +++ b/doc/continuous_integration/codecov.md @@ -0,0 +1,17 @@ +### Codecov + +Codecov is a service used to aggregate and monitor code coverage results, rendering test statistics generated by CI pipelines into interactive analytics, improving the visibility of the project's health and providing feedback for potential contributions. More info on Codecov can be found here: + +* [Codecov](https://codecov.io/) +* [Navigation2 on Codecov](https://codecov.io/gh/ros-planning/navigation2) + +Codecov is configured via the [`codecov.yml`](/codecov.yml) file. More info on this can be found here: + +* [About the Codecov yaml](https://docs.codecov.io/docs/codecov-yaml) +* [codecov.yml Reference](https://docs.codecov.io/docs/codecovyml-reference) + +A custom script within the repo is reused to collect and post process the generated code coverage results from debug test jobs. This scrip simply invokes lcov on the overlay workspace to output `full_coverage.info`, and then filter this down to `workspace_coverage.info` by removing and irrelevant subdirectories, e.g. for message or test packages. + +* [code_coverage_report.bash](/tools/code_coverage_report.bash) + +After the coverage info is uploaded, the project `codecov.yml` is used to further ignore any source test directories, as well as fix the project root path from when the repo was cloned into the relative workspace's src directory. diff --git a/doc/continuous_integration/dockerfile.md b/doc/continuous_integration/dockerfile.md new file mode 100644 index 0000000000..e960456bdb --- /dev/null +++ b/doc/continuous_integration/dockerfile.md @@ -0,0 +1,139 @@ +# Dockerfile Documentation + +Dockerfiles, denoted via the `(.)Dockerfile` file name extension, provide: +- Repeatable and reproducible means to build and test the project +- Build images for running container based CI services +- Identical images to scalably deploy onto robot systems + +Further references on writing and building Dockerfiles, such as syntax and tooling can be found here: + +* [Dockerfile reference](https://docs.docker.com/engine/reference/builder) +* [Best practices for writing Dockerfiles](https://docs.docker.com/develop/develop-images/dockerfile_best-practices) + +The Dockerfiles for this project are built upon parent images from upstream repos on DockerHub (e.g. osrf/ros2:nightly, library/ros2:foxy), thus abbreviating environmental setup and build time, yet written in a parameterized style to remain ROS2 distro agnostic. This keeps them easily generalizable for future ROS2 releases or for switching between custom parent images. When choosing the parent image, a tradeoff may persist between choosing a larger tag with more than what you need pre-installed (e.g. desktop images), saving time building the image locally, vs. choosing a smaller tag without anything you don't need (e.g. core images), saving time pulling or pushing the image remotely. Given the use of multiple build stages, they're consequently best approached by reading from top to bottom in the order in which image layers are appended. More info on upstream repos on DockerHub can be found here: + +* [ROS Docker Images](https://hub.docker.com/_/ros) + * DockerHub repo for official images +* [ROS Dockerfiles](https://github.com/osrf/docker_images) + * GitHub repo for OSRF Dockerfiles +* [Official Images on Docker Hub](https://docs.docker.com/docker-hub/official_images) + +While the main [`Dockerfile`](/Dockerfile) at the root of the repo is used for development and continuous integration, the [`.dockerhub/`](/.dockerhub) directory contains additional Dockerfiles that can be used for building the project entirely from scratch, include the minimal spanning set of recursive ROS2 dependencies from source, or building the project from a released ROS2 distro using available pre-built binary dependencies. These are particularly helpful for developers needing to build/test the project using a custom ROS2 branch, or for a users building with an alternate ROS2 base image, but are not used for the CI pipeline. We'll walk through the main Dockerfile here, although all of them follow the same basic pattern. + +## Global Arguments + +The Dockerfile first declares a number of optional `ARG` values and respective defaults to specify the parent image to build `FROM` and workspace paths. Here the Dockerfiles assume all workspaces are nested within the `/opt` directory. These `ARG`s can be accessed similarly to `ENV`s, but must be declared in a stage's scope before they can be used, and unlike `ENV` only exist at build time of that stage and do not persist in the resulting image. Despite this scope behavior, we can keep the Dockerfile DRY by specifying default values of `ARG`s only where the are first declared. For multi-stage builds, the last stage is what is tagged as the final image. More info on multi-stage builds can be found here: + +* [Use multi-stage builds](https://docs.docker.com/develop/develop-images/multistage-build) + * Optimize while keeping Dockerfiles readable and maintainable + +Here the parent image to build `FROM` is set to `osrf/ros2:nightly` by default, allowing the master branch to simply build against the bleeding edge of ROS2 core development. This allows project maintainers to spot breaking API changes and regressions as soon as they are merged upstream. Alternatively, any image tag based on a released ROS2 distro image, e.g. `ros:`, could also be substituted to compile the project, say for quickly experimenting with planners ontop of complex reinforcement or deep learning framework dependencies. + +## Cacher Stage + +``` Dockerfile +# multi-stage for caching +FROM $FROM_IMAGE AS cacher +``` + +A `cacher` stage is then declared to gather together the necessary source files to eventually build. This stage deterministically pre-processes input source files to help preserve the docker image layer build cache for subsequent stages. This is achieved by strategically filtering and splitting input artifacts with different degrees of volatility so they may be independently copied from bit-for-bit. + +A directory for the underlay workspace is then created and populated using `vcs` with the respective `.repos` file that defines the relevant repositories to pull and particular versions to checkout into the source directory. More info on vcstool can be found here: + +* [vcstool](https://github.com/dirk-thomas/vcstool) + * CLI for working with multiple repositories easier + +The `.repos` file is not copied directly into the `src` folder to avoid any restructuring of the yaml data from unintentionally busting the docker build cache in later stages. The ephemeral files within `.git` repo folders are similarly removed and help bolster deterministic builds. + +The overlay workspace is then also created and populated using all the files in the docker build context, (e.g. the root directory of the branch being built). This is done after the underlay is cloned to avoid re-downloading underlay dependency source files if the `.repos` file is unchanged in the branch. However, if the `.repos` file is changed, and different source files are cloned, this will bust the docker build cache and clean rebuild. Other project files are safely ignored using the [`.dockerignore`](/.dockerignore) config. If the docker build's cache is somehow stale, using the docker build flag `--no-cache` may be used to freshly build anew. + +Finally the `cacher` stage copies all manifest related files in place within the `/opt` directory into a temporary mirrored directory that later stages can copy from without unnecessarily busting it's docker build cache. The [`source.Dockerfile`](/.dockerhub/source.Dockerfile) provides an advance example of avoiding ignored packages, or packages that are unnecessary as overlay dependencies. + +## Builder Stage + +``` Dockerfile +# multi-stage for building +FROM $FROM_IMAGE AS builder +``` + +A `builder` stage is then declared to install external dependencies and compile the respective workspaces. Static CI dependencies are first installed before any later potential cache busting directives to optimize rebuilds. These include: + +* [ccache](https://ccache.dev) + * Compiler cache for speeding up recompilation +* [lcov](http://ltp.sourceforge.net/coverage/lcov.php) + * Front-end for GCC's coverage testing tool gcov + +### Install Dependencies + +Dependencies for the underlay workspace are then installed using `rosdep` by pointing to the manifest files within the mirrored source directory copied from the `cacher` stage. This ensures the lengthy process of downloading, unpacking, and installing any external dependencies can be skipped using the docker build cache as long as the manifest files within the underlay remain unchanged. More info on rosdep can be found here: + +* [rosdep](http://wiki.ros.org/rosdep) + * CLI tool for installing system dependencies + +The sourcing of the ROS setup file is done to permit rosdep to find additional packages within the installed in ament index via `AMENT_PREFIX_PATH` environment variable, or potentially vendored packages unregistered in ament index via the legacy `ROS_PACKAGE_PATH` env. Cleanup of the apt list directory is done as a best practice in Docker to prevent from ever using stale apt list caches. + +### Build Source + +The underlay workspace is then built using `colcon` by first copying over the rest of the source files from the original `src` directory from the `cacher` stage. The colcon flag `--symlink-install` is used to avoid the duplication of files for smaller image sizes, while mixin argument is also parameterized as a Dockerfile `ARG` to programmatically switch between `debug` or `release` builds in CI. More info on colcon can be found here: + +* [colcon](https://colcon.readthedocs.io) + * CLI tool to build sets of software packages +* [colcon-mixin](https://github.com/colcon/colcon-mixin) + * An extension to fetch and manage CLI mixins from repositories +* [colcon-mixin-repository](https://github.com/colcon/colcon-mixin-repository) + * Repository of common colcon CLI mixins + +The addition of the `ccache` mixin is used to pre-bake a warm ccache directory into the image as well as a pre-built underlay workspace. This will help speedup consecutive builds should later steps in the CI or maintainers have need to rebuild the underlay using the final image. The `console_direct` event handler is used to avoid CI timeout from inactive stdout for slower package builds while the `FAIL_ON_BUILD_FAILURE` env is used to control whether the docker image build should fail to complete upon encountering errors durring colcon build. + +## Overlay Workspace + +The overlay workspace is then set up in a similar manner where the same steps are repeated, only now sourcing the underlay setup file, and by building the overlay directory. The separation of underlay vs overlay workspace helps split caching of compilation across the two major points of change; that of external dependencies that change infrequently upon new releases vs local project source files that perpetually change during development. The overlay mixins are parameterized via `ARG` as well to allow the underlay and overlay to be independently configured by CI or local developers. This pattern can be repeated to chain together workspaces in one or multiple Dockerfiles; practically useful when working with a stack of related projects with deep recursive dependencies. + +### Setup Entrypoint + +The default entrypoint `ros_entrypoint.sh` inherited from the parent image is then updated to only source the top level overlay instead. The configured `ARG`s defining the paths used to the underlay and overlay are also exported to `ENV`s to persist in the final image as a developer convenience. + +### Testing Overlay + +The overlay may then be optionally tested using the test related `ARG`s. The results of the test may also be used to optionally fail the entire build; useful if the return code from `docker build` command itself is used as a primitive form of CI, or demonstrating to new contributors on how to locally test pull requests by invoking the colcon CLI. In other terms, failing on test failure may be good for a production system, but practically speaking, CI may be broken on occasionally and these images will still be required for fixing those issues so they must still be deployed. + +## Buildkit + +A difference for other Dockerfiles, not needing to be built by DockerHub and thus not limited in backwards compatibility, is the use of newer mount syntax options in buildkit, allowing for persistent caching of downloaded apt packages and ccache files across successful builds of the same Dockerfile. More info on buildkit can be found here: + +* [Build images with BuildKit](https://docs.docker.com/develop/develop-images/build_enhancements) +* [Buildkit repo](https://github.com/moby/buildkit) + +The [`distro.Dockerfile`](/.dockerhub/distro.Dockerfile) provides once such example of this. More info on using mounts for caching data across docker builds can be found here: + +* [cache apt packages](https://github.com/moby/buildkit/blob/master/frontend/dockerfile/docs/experimental.md#example-cache-apt-packages) + * avoid unnecessarily re-downloading the same packages over the network, even if the docker image layer cache for that `RUN` directive in the Dockerfile is busted + +### Advanced Optimizations + +With Buildkit's concurrent dependency resolution, multistage builds become parallelizable, assisting in shorter over all image build times. Granular expansion of the Directed Acyclic Graph (DAG) of workspace build steps into separate stages can be used to exploit this parallelism further, as well to maximize caching. This is exemplified in [`source.Dockerfile`](/.dockerhub/source.Dockerfile). The figure bellow depicts how the multiple stages are composed to exploit the DAG of workspaces. + +![pipeline](figs/multistage.svg) + +Note that stages independent of one another within the DAG represent opportunities where each may be executed in parallel; the colors of which simply denote it's functional purpose, wether it be for installing dependencies, building, or testing a given workspace. Depending upon the build `ARG`s provided or target stage selected when running docker build, the parent image the `cacher` stage builds from can be swapped out, and testing stages can be executed or bypassed as desired. + +This composition of stages follows a few basic principles: + +* Enforce Determinism + * Filter workspace source files down to what's essential + * E.g. `cacher` stage prunes underlay packages irrelevant for overlays +* Maximize Caching + * Leverage dependency build order when forming DAG + * E.g. prevent `builder` stages from invalidating `depender` stages +* Optimize Layers + * Lazily COPY and build FROM other stages as by-need + * E.g. Avoid dependencies between `tester` stages to build in parallel + +The table below is compares the finish build times between sequential (one stage one at a time) and multistage (many stages at once) builds, with and without caching (a warm and valid cache available). + +| | w/o Caching | w/ Caching | +|---|---|---| +| Sequential Build | 1h:22m:38s | 0h:49m:30s | +| Multistage Build | 0h:53m:49s | 0h:27m:44s | + +For reference, Sequential Build without Caching is equivalent to building a dockerfile without the use of multistages nor Buildkit. diff --git a/doc/continuous_integration/dockerhub.md b/doc/continuous_integration/dockerhub.md new file mode 100644 index 0000000000..4728f3ce3e --- /dev/null +++ b/doc/continuous_integration/dockerhub.md @@ -0,0 +1,64 @@ +# DockerHub Documentation + +DockerHub is a service used to build docker images from the project's Dockerfiles, as well as a registry used to host tagged images built. Using a docker registry permits the project to offload much of the environmental setup from the rest of the CI pipeline. More info on DockerHub can be found here: + +* [DockerHub](https://hub.docker.com/) +* [Docker Hub Quickstart](https://docs.docker.com/docker-hub) + +The tagged images in the project's registry repo are eventually used by the CI pipeline to spawn containers to build and test the project. The advantage of hosting branch specific image tags in a registry to pull from, rather than merely re-building the Dockerfiles at CI runtime, enables frontloading much of the principled environmental setup prior to the start of CI jobs. This saves CI time, spares resources/credits for other jobs, and helps to accelerate the development cycle. The project's DockerHub repo can be found here: + +* [Navigation2 on DockerHub](https://hub.docker.com/r/rosplanning/navigation2) + +While DockerHub does not require the use of configuration files in the source repo, the Dockerfiles and respective scripts used to customize automated builds of images are tracked in the [`.dockerhub`](/.dockerhub) directory. These scripts include custom build phase hooks used by DockerHub during automated builds. More info on automated builds can be found here: + +* [How Automated Builds work](https://docs.docker.com/docker-hub/builds) +* [Advanced options for Autobuild and Autotest](https://docs.docker.com/docker-hub/builds/advanced) + +Automated Builds are controlled via the build configurations menu within the repo's administrative console on DockerHub. For reference, a figure of the project's build configurations is shown here: + +![DockerHub Build Configurations](figs/dockerhub_build_configurations.png) + +Here the repo's source repository is linked to the project's GitHub repo, while Autotest is disabled given a dedicated CI service is separately used to run test jobs. Repository linking can be used so that whenever a parent image repo is updated on DockerHub, it will also trigger a build for the project's DockerHub repository. Note that this only works for non-official library images, and only for build rules where Autobuild is is enabled. Two build rules are added for the main branch, providing both a release and debug tag for CI to pull from. The relative Dockerfiles paths are designated, while the build context is intentionally left empty, ensuring the build phase hooks within the same paths are used appropriately. Build caching is also enabled to shorten image turnaround time if multiple rebuilds a day are triggered. + +The build hooks, e.g. [build](/.dockerhub/debug/hooks/build), are for customizing docker build `ARG`s in the Dockerfile; such as changing the base `FROM_IMAGE` between release or debug tags, adjusting the colcon mixins for each workspace to enable code coverage, or disabling fail on build failure, preventing source build breakages from blocking CI image tag updates. This also allows the slower debug CI to build and test only using the default RMW, while allowing the faster release CI jobs to build and test from an image with more RMWs installed. + +## Autobuild + +While the free registry for hosting is great, allowing projects gigabytes of free storage/bandwidth to cache pre-configured images for various branches or CI scenarios, that would otherwise take CI instances far longer to rebuild from scratch rather than pull from the registry, the automated build integration for DockerHub is basic. Rather than trigging build rules via pushed commits to matching GitHub branches, Autobuild is left disabled and the Build Trigger API is used instead. This Build Trigger API is evoked from scheduled cron jobs to rate limit DockerHub rebuilds; e.g. preventing hourly merge commits from needlessly churning CI image tags. + +### Build Trigger API + +Rather than configuring the build rules on DockerHub, a build trigger URL can be generated for the linked repo and used to programmatically specify build parameters such as: tag names, context path, source branch or version, etc. More info on Build Trigger can be found here: + +* [Remote Build Triggers](https://github.com/docker/docker.github.io/blob/v17.06-release/docker-hub/builds.md#remote-build-triggers) + * Legacy docs on using build triggers +* [Example GitHub Action](https://github.com/osrf/docker_images/blob/master/.github/workflows/trigger_nightly.yaml) + * Scheduled cron job used to rebuild a nightly image +* [Example Build Hook](https://github.com/osrf/docker_images/blob/master/ros2/nightly/nightly/hooks/post_push) + * A hook to rebuild a child image post push or parent tag + +However, triggering builds via the API rather than relying on the DockerHub build rules also means forgoing the convenience of repository linking, where a repo's image can be sure to be rebuilt as soon as a new version of the parent image tag is pushed to the registry without needing to monitor the parent image repos oneself. This helps keep the CI environment uptodate and in sync with upstream development. However, for finer control flow when triggering DockerHub build rules, this API can be called from any scheduled cron job to periodically update images when a project is least active. + +## Alternatives + +### Outsource Image Builds + +Instead of using DockerHub as the remote builder, any CI that supports Docker can similarly be used to build and push images to DockerHub's registry. As of writing however, aside from username/password authentication, DockerHub only provides personal access tokens tied to individual usernames rather than an organizations. This is a bit tedious, as to manage the CI as an organization, a separate machine account must be created and delegated with user permissions to push to the DockerHub repo. More info on CI integration with Docker can be found here: + +* [Docker Hub: Managing access tokens](https://docs.docker.com/docker-hub/access-tokens/) +* [Docker + GitHub Actions](https://github.com/marketplace/actions/build-and-push-docker-images) +* [Docker + CircleCI Orbs](https://circleci.com/orbs/registry/orb/circleci/docker) +* [Docker + Azure Tasks](https://docs.microsoft.com/en-us/azure/devops/pipelines/tasks/build/docker?view=azure-devops#build-and-push) + +Additionally, local build caching is often a premium feature for most other services, thus to benefit from docker build caching, one must manage a build agent for docker builds, or pony up to upgrade from a conventional open source free tier CI plan. + +### Alternate Docker Registry + +In addition to DockerHub, one can also self host a docker registry, or use alternate Docker Registry providers. This may be helpful if you anticipate exceeding DockerHub's free tier limits or would like co-locate the docker registry and CI runners within the same service provider or local network. + +* [DockerHub Download rate limit](https://docs.docker.com/docker-hub/download-rate-limit/) +* [Self Hosted Registry](https://docs.docker.com/registry/) +* [GitHub Cloud Registry](https://github.com/features/packages) +* [AWS Elastic Container Registry](https://aws.amazon.com/ecr/) +* [Google Cloud Registry](https://cloud.google.com/container-registry) +* [GitLab Container Registry](https://about.gitlab.com/blog/2016/05/23/gitlab-container-registry/) diff --git a/doc/continuous_integration/figs/dockerhub_build_configurations.png b/doc/continuous_integration/figs/dockerhub_build_configurations.png new file mode 100644 index 0000000000..34023d63da Binary files /dev/null and b/doc/continuous_integration/figs/dockerhub_build_configurations.png differ diff --git a/doc/continuous_integration/figs/multistage.drawio b/doc/continuous_integration/figs/multistage.drawio new file mode 100644 index 0000000000..7b34cf99d6 --- /dev/null +++ b/doc/continuous_integration/figs/multistage.drawio @@ -0,0 +1,307 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/doc/continuous_integration/figs/multistage.svg b/doc/continuous_integration/figs/multistage.svg new file mode 100644 index 0000000000..6c0d8aa7fa --- /dev/null +++ b/doc/continuous_integration/figs/multistage.svg @@ -0,0 +1 @@ +cacherros2_dependerunderlay_dependeroverlay_dependerros2_builderunderlay_builderoverlay_builderros2_testerunderlay_testeroverlay_testerworkspace_shipperworkspace_testerLegend
FROM

COPY
FROM...
package manifests
package manifests
source code
source code
test logs
test logs
workspace
workspace
workspace
workspace
osrf/ros2:devel
packages.ros.org

colcon
rosdep
vcstool

ros_entrypoint.sh
packages.ros.org...
Viewer does not support full SVG 1.1
\ No newline at end of file diff --git a/doc/continuous_integration/figs/pipeline.drawio b/doc/continuous_integration/figs/pipeline.drawio new file mode 100644 index 0000000000..9a7dfdedd8 --- /dev/null +++ b/doc/continuous_integration/figs/pipeline.drawio @@ -0,0 +1,278 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/doc/continuous_integration/figs/pipeline.svg b/doc/continuous_integration/figs/pipeline.svg new file mode 100644 index 0000000000..7cfff7beb5 --- /dev/null +++ b/doc/continuous_integration/figs/pipeline.svg @@ -0,0 +1 @@ +DockerHub
Dockerfile
Dockerfile
osrf/docker_images
Nightly Build
osrf/docker_images...
.dockerhub/
.dockerhub/
rosplanning/
navigation2:
<tag>
rosplanning/...
osrf/
ros2:
nightly
osrf/...
ros-planning/
navigation2
Nightly Build
ros-planning/...
GitHubCache Busting Filesunderlay.repospackage.xmlCOLCON_IGNOREBranchesmain*-devel
Commit Pushed
to Pull Request
Commit Pushed...
Commit Pushed
to Main Branch
Commit Pushed...
.dockerignore
.dockerignore
CircleCI
Workflow #N
Workflow #N
Build Job
Build Job
Release
Debug
Release...
Test Job
Test Job
Test
Results
Test...
Test
Results
Test...
Cache
Cache
Cache
Cache
.circleci/
.circleci/
ros-planning/
navigation2
Build Trigger
ros-planning/...
Debug
Coverage
Debug...
Workflow #N+1
Workflow #N+1
Build Job
Build Job
Release
Debug
Release...
Test Job
Test Job
Release
Debug
rmw
Release...
CodeCov
Post Proccess
Code Coverage
Post Proccess...
codecov      
.yml
codecov...
Code Coverage Results
Code Coverage...
Release
Debug
rmw
Release...
Viewer does not support full SVG 1.1
\ No newline at end of file diff --git a/doc/continuous_integration/future_work.md b/doc/continuous_integration/future_work.md new file mode 100644 index 0000000000..49d5819326 --- /dev/null +++ b/doc/continuous_integration/future_work.md @@ -0,0 +1,33 @@ +# Future Work + +The CI has room for improvement and may still evolve over time. The following notes alterative integration options, including current pros and cons for each. + +## [GitHub Actions](https://github.com/features/actions) + +Github actions is an emerging container based CI service that tightly integrates with the rest of GitHub's service offerings. With a growing ecosystem of official and federated 3rd party actions available, one can compose custom and extensive CI/CD workflows. + +### Pros: + +* Self hosted runners + * Optionally run workflows form on site, not just cloud VMs + * https://docs.github.com/en/free-pro-team@latest/actions/hosting-your-own-runners + * Leverage local hardware, e.g: GPUs, persistent storage, robot sensors, etc. + +### Cons: + +* No test introspection + * One must still roll there own test result reporting + * https://github.community/t/publishing-test-results/16215/12 + * Xunit test results are not rendered, aggregated, nor summarized +* Restricted caching + * Caching with runners is less ergonomic than other CI providers + * https://github.com/microsoft/azure-pipelines-agent/issues/2043 + * Implementation inherits same limitation from azure-pipelines-agent +* No job level parallelism + * No equivalent parallelism for splitting tests via timing data + * https://circleci.com/docs/2.0/parallelism-faster-jobs + * Parameterizable parallelism without adding jobs to workflow +* No RAM Disk access + * Useful to improve file IO performance + * https://circleci.com/docs/2.0/executor-types/#ram-disks + * Applicable for frequent reads/writes, e.g. ccache \ No newline at end of file