Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document CI / Devop pipelines #1731

Open
wants to merge 62 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
05ef58d
Add drawing of CI pipline
ruffsl May 12, 2020
b211da6
Add png of CI pipline
ruffsl May 12, 2020
93d2b87
Add CI readme
ruffsl May 12, 2020
2f6bea1
Add placeholder doc files
ruffsl May 12, 2020
0077680
Move figs to subfolder
ruffsl May 12, 2020
f40ef64
Add comments on circleci
ruffsl May 12, 2020
154ef0f
Stage more links
ruffsl May 15, 2020
b57a867
Add content about Dockerfiles
ruffsl May 29, 2020
61d9a29
Title docker links
ruffsl May 29, 2020
708a52e
Add comments on FROM image
ruffsl May 29, 2020
e55b01d
Fix typos
ruffsl May 29, 2020
3294835
Small touchup
ruffsl May 29, 2020
ee8d022
Touchup
ruffsl May 29, 2020
6c99582
Update links
ruffsl May 29, 2020
66ee19f
Tweek
ruffsl May 29, 2020
d9efda1
Update links
ruffsl May 29, 2020
d152a90
Add docs on DockerHub
ruffsl May 29, 2020
f4517af
Fix links
ruffsl May 29, 2020
2f95535
Fix formating
ruffsl May 29, 2020
f7ca008
Update CIrcleCI docs
ruffsl Jul 10, 2020
6112532
Note why caches are used instead of workspaces
ruffsl Jul 10, 2020
0371a47
Stage multistage diagram
ruffsl Sep 24, 2020
6124ec0
stage multistage svg
ruffsl Sep 24, 2020
ae3444a
Annotate multistage figure
ruffsl Sep 24, 2020
f20f228
Tweek final stage names
ruffsl Sep 24, 2020
66f74cd
Update Future Work with alterative pros and cons
ruffsl Oct 9, 2020
74d8816
Update pipeline figure
ruffsl Oct 10, 2020
45ecd7b
Add section on Advanced Optimizations
ruffsl Oct 10, 2020
6ff1edc
Update pipline figure
ruffsl Oct 10, 2020
a3372b4
Add more comment on tags and image purpose
ruffsl Oct 10, 2020
452382c
Clarify dockerifle in .dockerhub folder
ruffsl Oct 10, 2020
b26cedd
Update DockerHub docs
ruffsl Oct 12, 2020
5ee0708
Re-Wording
ruffsl Oct 12, 2020
bdba1c2
Update DockerHub screencap
ruffsl Oct 12, 2020
937c4fe
Update circleci to reflect latest changes
ruffsl Oct 12, 2020
11cbe26
Reorder steps and commands refrences
ruffsl Oct 12, 2020
a7e6749
Move morkspaces back under steps
ruffsl Oct 12, 2020
b583d1d
Move code coverage back under steps
ruffsl Oct 12, 2020
e5fc81d
Merge remote-tracking branch 'origin/main' into ci-docs
ruffsl Oct 12, 2020
49a0015
Update codecov doc
ruffsl Oct 12, 2020
d809661
Comment that ccache uses ram disk
ruffsl Oct 12, 2020
2fd049d
Fix typo
ruffsl Nov 6, 2020
ff1bca0
Complete sentence on codecov.yml
ruffsl Nov 6, 2020
112178d
Update doc/continuous_integration/README.md
ruffsl Nov 6, 2020
bd630b6
Update doc/continuous_integration/dockerfile.md
ruffsl Nov 6, 2020
9fba827
Update doc/continuous_integration/circleci.md
ruffsl Nov 7, 2020
62f8ac8
Update doc/continuous_integration/dockerfile.md
ruffsl Nov 7, 2020
5755076
Update doc/continuous_integration/README.md
ruffsl Nov 7, 2020
65ae4c6
Update doc/continuous_integration/dockerfile.md
ruffsl Nov 7, 2020
8aadf99
Update doc/continuous_integration/dockerfile.md
ruffsl Nov 7, 2020
1074b19
Update doc/continuous_integration/dockerfile.md
ruffsl Nov 7, 2020
900535c
Update doc/continuous_integration/dockerhub.md
ruffsl Nov 7, 2020
94a468c
Update doc/continuous_integration/circleci.md
ruffsl Nov 7, 2020
ce8da83
Break-out Future Work to separate file
ruffsl Nov 8, 2020
3b2906e
Update figure
ruffsl Nov 8, 2020
ae16809
repo -> regestry
ruffsl Nov 8, 2020
fc8b92b
Update doc/continuous_integration/dockerfile.md
ruffsl Nov 8, 2020
f811edf
Correct tag name
ruffsl Nov 8, 2020
e4677ef
Update remarks on ARGs
ruffsl Nov 8, 2020
62be611
Clarify purpose of cacher stage
ruffsl Nov 9, 2020
2f6504f
Add code snippets
ruffsl Nov 9, 2020
643bc08
Expand comments on multistage figure
ruffsl Nov 9, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions .dockerhub/source.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -223,16 +223,16 @@ RUN if [ -n "$RUN_TESTS" ]; then \
|| ([ -z "$FAIL_ON_TEST_FAILURE" ] || exit 1) \
fi

# multi-stage for testing workspaces
FROM overlay_builder AS workspaces_tester
# multi-stage for testing workspace
FROM overlay_builder AS workspace_tester

# copy workspace test results
COPY --from=ros2_tester $ROS2_WS/log $ROS2_WS/log
COPY --from=underlay_tester $UNDERLAY_WS/log $UNDERLAY_WS/log
COPY --from=overlay_tester $OVERLAY_WS/log $OVERLAY_WS/log

# multi-stage for shipping overlay
FROM overlay_builder AS overlay_shipper
# multi-stage for shipping workspace
FROM overlay_builder AS workspace_shipper

# restore apt for docker
RUN mv /etc/apt/docker-clean /etc/apt/apt.conf.d/ && \
Expand Down
37 changes: 37 additions & 0 deletions doc/continuous_integration/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# Continuous Integration Documentation
Documentation on the existing CI for the project is resides here.

## Overview
The existing CI is composed of multiple integration services that together help provide maintainers a fast and scalable testing environment. To help detect upstream breakages quickly as well, the existing CI allows for changes to be evaluated using the latest development dependencies (e.g. using ROS2 master branches). In light of the large dependency footprint a high-level ROS2 navigation stack necessitates, the use of each integration service is optimized to maximize caching of environmental setup and increase workflow throughput. As these optimizations add complexity to the CI configuration, this documentation provides further explanations and reasoning behind each configuration.

![pipeline](figs/pipeline.svg)
ruffsl marked this conversation as resolved.
Show resolved Hide resolved

The figure above is a high level diagram on how the integration services described below are composed.

## Integrations

The following links document each integration and are best approached in the same order presented.

### GitHub

GitHub is used for hosting the source repo, tickets and PRs, as well for managing the OAuth and configs for the rest of the other integration services in the CI pipeline.

### [Dockerfile](dockerfile.md)

Dockerfiles are used for generating the docker images for building and testing the project. They also self document the setup and configuration of upstream dependencies, ensuring contributors have a reproducible and repeatable development environment for collaboration.

### [DockerHub](dockerhub.md)

DockerHub is used to build and host the regestry of tagged docker images, so that downstream services in the CI pipeline can quickly download and bootstrap the latest up-to-date development environment.

### [CircleCI](circleci.md)

CircleCI is used to checkout, build, and test the project via docker containers spawned from the tagged docker images. Triggered by scheduled or GitHub events like pushed commits branches or pull requests, it deterministically retains a warm build cache while generating logs and test result artifacts.

### [CodeCov](codecov.md)

CodeCov is used to help monitor code quality by rendering test artifacts from the upstream pipeline into interactive analytics, improving the visibility of the project's health and feedback for contributions.

### [Future Work](future_work.md)

The CI has room for improvement and may still evolve over time, as alternate integration options become more viable, and the pros and cons of each shift.
104 changes: 104 additions & 0 deletions doc/continuous_integration/circleci.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
# CircleCI Documentation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file doesn't cover the topics:

  • Artifacts, how is that setup and where are they saved to
  • For all sections after workflows, it usually skips over explaining the breakdown of the cron jobs for nightly / dockerhub and only discusses the PR builder (e.x. jobs, executors, commands, steps)
  • Matrix, for the nightly

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On this file, I think it would be useful to take a step back and not miss the forest for the trees. It does a good job at introducing some circle CI concepts (that I'd like to just be completed from my comments), but in some ways it doesn't really explain the specific navigation2 use of CircleCI; which mostly just just about the 3 workflows (what they do, when they run, why they run, their general description of steps) that is generally missing or overlooked in favor of explaining the components they're made of abstractly.


CircleCI is a service used to build and test the project's codebase to catch regressions as well as to check pull request submissions. Using continuous integration helps maintainers sustain high code quality while providing external contributors a well defined evaluation method with which to validate contributions, even without having to setup a local development environment. More info on CircleCI can be found here:

* [CircleCI](https://circleci.com/)
* [Navigation2 on CircleCI](https://circleci.com/gh/ros-planning/navigation2)

For this particular CI, Docker is used to build and test the project within containers derived from images pulled from DockerHub, bootstrapping the CI with a development environment including pre-configured dependencies and warm workspace build caches. View the accompanying DockerFile and DockerHub documentation for more info on this accompanying CI setup.

* [Dockerfile](dockerfile.md)
* [DockerHub](dockerhub.md)

CircleCI is configured via the [config.yml](/.circleci/config.yml) yaml file within the `.circleci/` folder at the root of the GitHub repo. The config file for this project is self-contained and thus densely structured, yet written in a functional style to remain DRY and modular. This keeps it easily generalizable for other ROS packages or for adjusting overlayed workspaces. Despite the anchors and references in yaml, the config file is best understood read in reverse, from bottom to top, in order of abstraction hierarchy, while reading this accompanied document. Further references on CircleCI configurations, such as syntax and structure can be found here:

* [Writing YAML](https://circleci.com/docs/2.0/writing-yaml)
* [Configuring CircleCI](https://circleci.com/docs/2.0/configuration-reference)

## Workflows

The CI config consists of three main [workflows](https://circleci.com/docs/2.0/configuration-reference/#workflows). One workflow is for checking PR events, essentially triggered by any pushed commits targeting the main branch, by building and testing the project both in release and debug mode for accurate performance benchmarking or generating test coverage results. The second is nightly cron scheduled to check the main branch, while additionally testing a matrix of supported RMW vendors not tested on normal PRs. This helps prioritize CI to quickly check new contributions, while simultaneously keeping tabs on the health of existing code. The third is another cron for updating CI image builds on DockerHub, and is scheduled to finish prior to the nightly workflow. This reduces the chance of updating CI images while a CI workflow is in progress.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first workflow I think covers both PRs to test them and then also the main branch after merging for the new code introduction. That should be mentioned specifically.

On the third item, I thought that cron job was setup in Dockerhub? I assume we moved it over here at some point, that makes sense then your dockerhub trigger API references in the dockerhub section. Please make sure you close that loop in the dockerhub page


The order in which jobs are executed is conditional upon the completion of those it [requires](https://circleci.com/docs/2.0/configuration-reference/#requires), forming a conventional directed acyclic graph of dependent jobs. Independent jobs may of course be parallelized in the CI pipeline; so by splitting the build and test jobs in two, multiple test jobs, such as the matrix of RMW tests, may commence as soon as the dependent build job completes, avoiding unnecessarily re-building the same codebase for each RMW vendor.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Be specific here, why are you talking about this? You should tell the reader why how you're using the requires in each of the 3 use cases you highlight in the paragraph above. Clearly one is for the 4 build/test jobs for the main CI situation. You abstractly say that it "may of course be parallelized in the CI pipeline", so tell them why you're bringing this up at all - you're leveraging that feature to speed up the pipelines by having many running jobs in parallel. Specifically go over that design for the 3 cases since this is the reference guide for our CircleCI setup. "We use this in the PR workflow by.... We use this in the nightly workflow by... etc"


## Jobs

The list of [jobs](https://circleci.com/docs/2.0/configuration-reference/#jobs) referenced within the workflows consist of either release or debug build/test jobs. These jobs are largely similar with the exception of using different executors, and that debug test jobs include additional command steps for reporting code coverage analytics. The build jobs simply checkout the project, set up the build environment, and compile the project. The test jobs in turn pick up from where the build job left off by restoring the build, and then running the test on that build.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A hyperlink to the docs on "executors" would be good to be consistent with your other ones like jobs/workflows

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You gloss over how the build and test steps share the built workspace to test on, please go into detail about that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A new paragraph talking 1-2 sentences about the nightly jobs and dockerhub cron job would be good references to round it off


In addition to enabling parallelism between independent jobs, the bifurcation build and test job types enables [parallelism](https://circleci.com/docs/2.0/configuration-reference/#parallelism) within each test job, leveraged later for splitting longer tests across multiple containers for that job. Given packages independencies, container parallelism for build jobs is not as easily applicable, and thus only used for testing.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"leveraged later for splitting longer tests across multiple containers for that job" where do we do that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, I'm not sure what this paragraph is trying to communicate or if it aids in explaining the concept of jobs and how we use them in Circle


## Executors

Two different [executors](https://circleci.com/docs/2.0/configuration-reference/#executors-requires-version-21) are used to define the environment in which the steps of a job will be run, one for debug and one for release builds. Given only the release executor is used in testing the full matrix of RMW vendors, the docker images for the debug executor merely include the default RMW; sparing debug jobs the time in compiling unused RMW vendors in debug testing.

The executors also differ by a few environment variables. These include appropriate mixin flags for the workspace builds and an additional nonce string, unique to that executor, used in deriving the hash key for saving and restoring caches. This prevents cache key collisions or cross talk between parallel jobs using different executors in the same workflow.

## Commands

To reuse sequences of steps across multiple jobs, a number of [commands](https://circleci.com/docs/2.0/configuration-reference/#commands-requires-version-21) are defined.

When checking our source code from the repo, additional pre and post checkout steps are performed to prepare the file system. When setting up dependencies, the build is prepared much the same way as in the project Dockerfile, by first setting up the underlay and then the overlay workspace. Additional steps are included for restoring and resetting previous ccache files and statistics, including a few diagnostic steps that can be used for debugging the CI itself, e.g. investigating as to why the hit rate for ccache may be abnormally low. Once the overlay setup is finished, more ccache diagnostics and files are saved for inspection or later restoration. A restore build command is also defined for test jobs, operating under the condition that a workspace with the same checksum can be restored from a previous build job in the workflow. Additional commands are also defined for steps in testing the overlay workspace and reporting coverage results.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This paragraph would be benefited from explaining the why there's caching in both the docker images and in circle CI, why not just do one? Why not just have circle just build the dockerfile? (those kinds of questions are open and unanswerwed)


# References

Per the YAML syntax, a reference must proceed any anchor that it links back to. Stylistically, any root level keys declared that only include references are denoted with an underscore (`_`) prefixed to help distinguish them from keys expected by the CircleCI config schema.

## Common Environment

For environment variables common among the executors, workspace paths and specific config options are listed here. While the workspace paths are parameterized to avoid hardcoding paths in common command functions, they may also be hardcoded as fields among a few high level steps given a limitation of the config syntax. The additional config options help optimize our project build given CI resource limits:

* Capping ccache size given CI storage limits, speeding up cache restoration
* Setting ccache location to use RAM disk to improve file IO performance
* Limiting parallel make and linker jobs as to avoid exhausting container's RAM
* Further adjustments for changing test behavior and sequential test stdout.

## Steps

Low level steps, defined prior to the job commands where they are used, are recursively defined from more functional common commands.

### Checkout

Checking out code consists of three stages, including pre and post checkout steps. To simplify the formulaic common commands above, the pre-checkout step replicates a synthetic workspace from the installed ROS distro directory by symbolically linking an install folder, and bootstrapping the checksum from the timestamp of an expected file in ROS docker images. This is a measure to ensure if the nightly docker image is changed/rebuilt, then all CI caches should also be busted. Ideally, the docker image sha/hash should be used for this instead, but as of writing there does not seem to be a reliable method for acquiring this image digest from within the same derived container:

* [Introspect image info from inside docker executor](https://discuss.circleci.com/t/introspect-image-info-from-inside-docker-executor/31620)

The overlay workspace is then cleaned prior to checking out the project. The post checkout step simply checks to see if the underlay has changed to determine whether it should also be cleaned, recloned, and thus rebuilt as well.

## Workspaces

The rest of the steps are references to repeatedly define workspace specific commands, such as install, building and testing either the underlay or overlay workspace. Some points of note however include:

* The CI cache for ccache is intentionally linked with the underlay rather than the overlay workspace
* so that consecutive commits to the same PR are more likely to retain a warm and recent ccache
* CCache Stats is intentionally used to zero stats before building a workspace
* so the next consecutive run of the CCache Stats reflects only upon that given workspace
* Restore workspace command intentionally sets the `build` parameter to `false`
* to avoid unnecessary duplication of the same workspace cache and build logs

## Code Coverage

The last few steps invoke a custom script to collect and post process the generated code coverage results from debug test jobs. This always runs regardless if the tests fail so that failed code coverage reports may still be reviewed. The final report is uploaded to CodeCov.

## Common Commands

Common commands for low level, repeated, and formulaic tasks are defined for saving and restoring CI caches, as well as for installing, building, and testing workspaces.

### Caching

Multiple forms of [caching](https://circleci.com/docs/2.0/caching/) is done over the completion of a single workflow. To appropriately save and restore caches in a deterministic manner, these steps are abstracted as commands. Although CircleCI does provide an explicit step for persisting temporary files across jobs in the same workflow, i.e. a [workspace](https://circleci.com/docs/2.0/configuration-reference/#persist_to_workspace), caches are used instead for a few reasons. First there is only one workspace per workflow, thus avoiding cross talk between executors or release/debug job types is not as easily achievable. Secondly, unlike workspaces, caches can persist across workflows, permitting repeated workflows for the same PR to pull from the cache of prior runs; e.g. in order to keep the ccache as fresh as possible for that specific PR.

For this project, caching is done with respect to a given workspace. As such, a specified string and checksum from the workspace are combined with workflow specifics to ensure the [restored cache](https://circleci.com/docs/2.0/configuration-reference/#restore_cache) is uniquely identifiable and won't collide with other concurrent PRs.

For saving a cache, the command is similar, aside from specifying the path to directory or file to be stored in the [saved cache](https://circleci.com/docs/2.0/configuration-reference/#save_cache). Given CI cache are conventionally read only, meaning a cache key can not be reused or updated to point to a newer cache, the current unix epoch is appended to the cache key to ensure key uniqueness. Because CircleCI key lookup behavior for cache restoration is performed via the most recent longest matching prefix, the latest matching cache is always restored.

These workspace checksums are uploaded as [stored artifacts](https://circleci.com/docs/2.0/configuration-reference/#store_artifacts) throughout other commands to help introspect to debug caching behavior when needed.

### Building

For installing and building workspaces, the process resembles that within the project Dockerfile. Additional bookkeeping is performed to update the workspace checksum file by piping stdout that deterministically describes the state of the build environment. This is done by seeding from the checksum of the underlay workspace and then appending info about source code checked out into the overlay workspace, as well as the list of required dependencies installed. When setting up the workspace, this checksum will first be used to check if the workspace can be restored from a prior workflow build. If the source code or required dependencies change, resulting in a missed cache hit, the unfinished workspace is then built. If the workspace build is successful then it will be cached. Regardless however, the build logs are always uploaded as stored artifacts for review or debugging. The odd shuffling of symbolic directories is done as a workaround given a limitation of the S3 SDK:

* [Failing to upload artifacts from symbolic link directories](https://discuss.circleci.com/t/failing-to-upload-artifacts-from-symbolic-link-directories/28000)

### Testing

For testing workspaces, the list of packages within the workspace are [tested in parallel](https://circleci.com/docs/2.0/parallelism-faster-jobs/) across the number of replicated containers for the given test job as denoted by the `parallelism` option. Here packages are split by anticipated test timing; the heuristic derived from the reported duration and classname of prior recent test results. The logs and results from the tests are then always [reported](https://circleci.com/docs/2.0/configuration-reference/#store_test_results) and uploaded.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You say that in testing, different packages are in different containers, is that true? Where is that defined?

17 changes: 17 additions & 0 deletions doc/continuous_integration/codecov.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
### Codecov

Codecov is a service used to aggregate and monitor code coverage results, rendering test statistics generated by CI pipelines into interactive analytics, improving the visibility of the project's health and providing feedback for potential contributions. More info on Codecov can be found here:

* [Codecov](https://codecov.io/)
* [Navigation2 on Codecov](https://codecov.io/gh/ros-planning/navigation2)

Codecov is configured via the [`codecov.yml`](/codecov.yml) file. More info on this can be found here:

* [About the Codecov yaml](https://docs.codecov.io/docs/codecov-yaml)
* [codecov.yml Reference](https://docs.codecov.io/docs/codecovyml-reference)

A custom script within the repo is reused to collect and post process the generated code coverage results from debug test jobs. This scrip simply invokes lcov on the overlay workspace to output `full_coverage.info`, and then filter this down to `workspace_coverage.info` by removing and irrelevant subdirectories, e.g. for message or test packages.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reused? Wouldn't this be "used"?

"script" spelling

"This scrip simply invokes lcov on the overlay workspace after testing is completed ..."


* [code_coverage_report.bash](/tools/code_coverage_report.bash)

After the coverage info is uploaded, the project `codecov.yml` is used to further ignore any source test directories, as well as fix the project root path from when the repo was cloned into the relative workspace's src directory.
Loading