Skip to content

Commit e818651

Browse files
Add document to describe code coverage reporting in PRs + main
This document provides design suggestions for CI code coverage reporting. Also, I've added unrecognized words to the spell checking dictionary Signed-off-by: Courtney Pacheco <[email protected]>
1 parent c601d42 commit e818651

File tree

2 files changed

+132
-0
lines changed

2 files changed

+132
-0
lines changed

.spellcheck-en-custom.txt

+7
Original file line numberDiff line numberDiff line change
@@ -13,13 +13,15 @@ backends
1313
benchmarking
1414
Bhandwaldar
1515
brainer
16+
Callouts
1617
Cappi
1718
checkpointing
1819
CLI
1920
CLI
2021
cli
2122
CLI's
2223
codebase
24+
Codecov
2325
Colab
2426
compositional
2527
Conda
@@ -94,6 +96,7 @@ Kaggle's
9496
Kai
9597
Kubernetes
9698
Kumar
99+
lcov
97100
lignment
98101
LLM
99102
llms
@@ -151,6 +154,8 @@ pyenv
151154
PyPI
152155
pyproject
153156
PyTorch
157+
pytest
158+
pytest's
154159
qlora
155160
qna
156161
quantized
@@ -182,6 +187,7 @@ Shellcheck
182187
Shivchander
183188
Signoff
184189
Sigstore
190+
SonarQube
185191
specifiying
186192
src
187193
Srivastava
@@ -197,6 +203,7 @@ TBD
197203
templating
198204
Tesla
199205
th
206+
tiering
200207
tl
201208
TODO
202209
tox

docs/ci/ci-code-coverage-reporting.md

+125
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,125 @@
1+
# Code Coverage Reporting for InstructLab Repos
2+
3+
## High-Level Requirements
4+
5+
* Code coverage should be reported in PR builds as a GitHub _bot_ comment
6+
* New code:
7+
* Should show % coverage on new lines of code
8+
* Should show which new lines of code are uncovered (if any)
9+
* Existing code:
10+
* Should show existing % coverage on all lines of code
11+
* New code + existing code:
12+
* Should show projected % coverage if the PR is to be merged (e.g., “overall code coverage 53% -> 62% after merge”)
13+
* The GitHub coverage reporting bot should be able to interpret at least one of these pytest coverage reporting formats:
14+
* HTML
15+
* XML
16+
* JSON
17+
* lcov
18+
* annotate
19+
* The README should display a code coverage badge
20+
21+
## Code Coverage Reporting Bot: Options
22+
23+
### [Codecov](https://github.com/marketplace/codecov)
24+
25+
* [Example Codecov job](https://github.com/ansible/vscode-ansible/runs/33887209843)
26+
* Pros:
27+
* Free to use in open source projects
28+
* Codecov reports all the data on GitHub in a comment. [Example](https://docs.codecov.com/docs/pull-request-comments).
29+
* Cons / Callouts:
30+
* Not applicable at this time, but if we want to use it in >1 private repo, we must purchase a customer plan.
31+
* According to Codecov: "All subscribers of a Pull Request get an email when Codecov posts a comment (only when posting a new comment), and there's nothing we can do about it.” (Read more [here](https://docs.codecov.com/docs/pull-request-comments#filtering-emails).)
32+
33+
### [Coveralls](https://github.com/marketplace/actions/coveralls-github-action)
34+
35+
* [Example Coveralls job](https://github.com/gophercloud/gophercloud/actions/runs/12117814428/job/33858835625?pr=3244)
36+
* Pros:
37+
* Free to use via [MIT license here](https://github.com/coverallsapp/github-action/blob/main/LICENSE.md)
38+
* Appears to be free for open source repos, but like with Codecov, we should validate that the ToS doesn’t have fine print, etc.
39+
* Can report all the data in a GitHub comment. [Example](https://github.com/coverallsapp/coveralls-node-demo/pull/19#issuecomment-1035344985).
40+
* Cons / Callouts:
41+
* Code coverage visuals, etc. are all viewed through the Coveralls front end (coveralls.io), so if that service were to go down, then we’re technically at their mercy
42+
43+
### [SonarQube](https://github.com/marketplace/actions/official-sonarqube-scan)
44+
45+
(**Note**: Most expensive $$$)
46+
47+
* [Example SonarQube job](https://github.com/ansible/vscode-ansible/runs/34526218267)
48+
* Pros:
49+
* Interactive, intuitive interface
50+
* Explains what isn’t covered and gives detailed suggestions on how to increase code coverage. (Seems better than Codecov, but… it’s also a more refined product?)
51+
* Keeps a running history of your code coverage reporting, if desired
52+
* Cons / Callouts:
53+
* Appears to be "free to use," but Sonar wants you to store coverage data in their cloud and storage costs will apply unless you agree to host your own Sonar service -- in which case, we'll need to pay for a license. (Not ideal.)
54+
* Code coverage visuals, etc. are all viewed through the Sonar web front end, so the Sonar bot will make a comment with a direct link to the appropriate URL and users have to click that
55+
* If Sonar is down, then we’re at their mercy unless we host Sonar ourselves – but then we have to support the hosted service and that’s not ideal
56+
57+
## Code Coverage Badge: Options
58+
59+
### [shields.io](http://shields.io)
60+
61+
Compatibility:
62+
63+
* Codecov. View Codecov setup instructions [here](https://shields.io/badges/codecov-with-branch).
64+
* SonarQube. View Sonar setup instructions [here](https://shields.io/badges/sonar-coverage).
65+
* Coveralls. View Coveralls setup instructions [here](https://shields.io/badges/coveralls).
66+
67+
### Codecov
68+
69+
If we use Codecov for displaying coverage information, then we can utilize their built-in [status badge](https://docs.codecov.com/docs/status-badges) functionality that integrates with their coverage report tooling.
70+
71+
### Coveralls
72+
73+
Coveralls hosts status badges on their website if you prefer Coveralls' hosting over shields.io.
74+
75+
## Workflow Design
76+
77+
### Technical requirements
78+
79+
#### Bot Permissions
80+
81+
The bot should have the following accesses at a bare minimum, but may need more:
82+
83+
* GitHub comment
84+
* S3 read permissions** (if the bot must be connected to S3 to read reports)
85+
86+
### Functional (Non-Technical) Requirements
87+
88+
#### Data Storage (for Storing Coverage reports)
89+
90+
If we’re not committing to hosting data through a 3rd party reporting service (like SonarQube), then code coverage reports should be uploaded to the cloud that supports one of pytest’s reporting formats (JSON, HTML, XML, lcov, or annotate)
91+
92+
* Users should not be able to view uploaded reports in the public cloud, as some services like AWS S3 can charge for data retrieval (depending on the tier you use)
93+
* Sometimes coverage tools can also capture vulnerability/bug concerns, so if we change tooling in the future, we don’t want people seeing vulnerabilities/bugs
94+
95+
We should generate a temporary, unique file path for each PR’s code coverage report so that PR reports don’t overwrite each other. (It’s okay to overwrite code coverage reports if someone creates multiple commits in the same PR, though.)
96+
97+
After PR is merged:
98+
99+
* Code coverage report for the PR along with its unique S3 path should be deleted
100+
* Sometimes, contributor PRs have no merge conflicts but are still "X commits behind `main`" when merged, so we don’t want to use the PR’s code coverage report if it’s outdated
101+
* Regenerate code coverage report on `main` to update the code coverage badge
102+
* Upload the latest code coverage report in S3
103+
104+
After PR is closed:
105+
106+
* Delete code coverage report for the PR along with its S3 path.
107+
108+
#### Bot Comment Contents
109+
110+
The bot should be able to make a comment that meets the high-level reporting requirements defined at the top of this doc (e.g., % coverage on new lines of code, etc.)
111+
112+
#### Coverage Badge
113+
114+
The coverage badge must:
115+
116+
* Report the latest code coverage % on main
117+
* Be placed at the top of the README and visible to all viewers
118+
119+
### Additional Design Suggestions
120+
121+
#### Data Storage Cost Management Ideas
122+
123+
* The AWS S3 service scales storage automatically, so we can start with AWS’ [S3 intelligent tiering](https://aws.amazon.com/s3/storage-classes/intelligent-tiering/) to optimize for cost savings when uploading multiple PR coverage reports
124+
* We can use the [S3 Glacier storage class](https://aws.amazon.com/s3/storage-classes/glacier/) to save funds on data >1 week old (since we will likely only need it for historical reasons)
125+
* **Side note**: Uploading to S3 will be helpful if we want to keep track of our code coverage progress, especially if we need to migrate to another tool for graphing

0 commit comments

Comments
 (0)