Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RELENG-7422] 📝 Add documentation #42

Merged
merged 24 commits into from
Jun 27, 2023
Merged
Show file tree
Hide file tree
Changes from 22 commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions docs/getting-started/local-setup.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# Local environment

Setting up your local environment

## Install Poetry

The Runner manager uses [Poetry](https://python-poetry.org/), a Python packaging
and dependency management.

To install and use this project, please make sure you have poetry
installed. Follow [poetry](https://python-poetry.org/docs/#installation)
documentation for proper installation instruction.

## Install dependencies

```shell
poetry install
```
86 changes: 86 additions & 0 deletions docs/getting-started/run-it-locally.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
# Run it locally

Before starting this guide:

- Follow the [local setup](./local-setup.md) documentation.

## Run

Once everything is properly set up, you can launch the project
with the following command at root level:

```bash
poetry run start
```

The application is now launched and running on port 8000 of the machine.

## Webhook setup

### Ngrok setup

As GitHub Actions Exporter depends on webhook coming from github to work properly.

Ngrok can help you setup a public URL to be used with GitHub webhooks.

You can install Ngrok on your Linux machine using the following command:

```bash
curl -s https://ngrok-agent.s3.amazonaws.com/ngrok.asc | sudo tee /etc/apt/trusted.gpg.d/ngrok.asc >/dev/null && echo "deb https://ngrok-agent.s3.amazonaws.com buster main" | sudo tee /etc/apt/sources.list.d/ngrok.list && sudo apt update && sudo apt install ngrok
```

For more information, you can visit the Ngrok [website](https://ngrok.com/download).

Once installed, you can run the following command to listen on port 8000
of the machine and assign a public URL to it.

```shell
ngrok http 8000
```

### Setting up the webhook

Setup a webhook at the organization level, should be on a link like the following:
`https://github.com/organizations/<your org>/settings/hooks`

- Click on Add Webhook
- In payload url, enter your ngrok url, like the following:
`https://xxxxx.ngrok.io/webhook`
- Content type: application/json
- Click on `Let me select individual events.`
- Select: `Workflow jobs` and `Workflow runs`
- Save

## Setting up your testing repo

Create a new repository in the organization you have configured the runner manager.

And push a workflow in the repository, here is an example:

```yaml
# .github/workflows/test-gh-actions-exporter.yaml
---
name: test-gh-actions-exporter
on:
push:
workflow_dispatch:
jobs:
greet:
strategy:
matrix:
person:
- foo
- bar
runs-on:
- ubuntu
- focal
- large
- gcloud
steps:
- name: sleep
run: sleep 120
- name: Send greeting
run: echo "Hello ${{ matrix.person }}!"
```

Trigger builds and enjoy :beers:
25 changes: 4 additions & 21 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,5 @@
# GitHub WebHook Exporter
# GitHub Actions Exporter

The idea of this exporter is to be able to expose this service to listen
from WebHooks coming from GitHub.
Then expose those metrics in OpenMetrics format for later usage.

## Install

To install and use this project, please make sure you have [poetry](https://python-poetry.org/) installed.

Then run:
```shell
poetry install
```

## Start

To start the API locally you can use the following command:

```shell
poetry run start
```
The GitHub Actions Exporter is a project used to retrieve information
provided by GitHub, notably through Webhooks, process it, and store it
via Prometheus.
145 changes: 145 additions & 0 deletions docs/metrics-analysis-prometheus/collected-reported-metrics.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
# Collected and reported metrics

In first place, it is important to differentiate the `workflow_run`
and the `workflow_job` webhook events.

The `workflow_run` request is triggered when a workflow run is `requested`,
`in_progress`, `completed` or `failure`. However, for this project, we are not
interested in the `cancelled` or `skipped` events, so we will ignore them.

On the other hand, the `workflow_job` request is triggered when a
workflow job is `queued`, `in_progress`, or `completed`. We will also ignore
the `cancelled` or `skipped` events for `workflow_job` in this project.

## Workflow run

Here are the different metrics collected by the GitHub Actions Exporter
project for workflow runs:

The number of workflow rebuilds: `github_actions_workflow_rebuild_count`.

The duration of a workflow in seconds: `github_actions_workflow_duration_seconds`.

Count the number of workflows for each state:

- `github_actions_workflow_failure_count`
- `github_actions_workflow_success_count`
- `github_actions_workflow_cancelled_count`
- `github_actions_workflow_inprogress_count`
- `github_actions_workflow_total_count`

## Workflow job

Here are the different metrics collected by the GitHub Actions
Exporter project for workflows and jobs.

The duration of a job in seconds: `github_actions_job_duration_seconds`.

Time between when a job is requested and started: `github_actions_job_start_duration_seconds`.

Count the number of jobs for each states:

- `github_actions_job_failure_count`
- `github_actions_job_success_count`
- `github_actions_job_cancelled_count`
- `github_actions_job_inprogress_count`
- `github_actions_job_queued_count`
- `github_actions_job_total_count`

## Cost metric

This is the last metric we collect, and it is one of the most important
ones. It allows us to determine the cost of our CI runs.

### Formula

Here is the formula to calculate the cost over a period of time:

```bash
cost = duration (per second) / 60 * cost (per minute)
```

### How do we find the cost per minute?

#### GitHub

As for GitHub, it is quite simple. They provide us with a fixed value, and
the price never varies. To give an example, for `ubuntu-latest`, we have a cost
gaspardmoindrot marked this conversation as resolved.
Show resolved Hide resolved
of 0.008$/min, that's it. Easy!

For larger GitHub hosted runners, such as the high-performance options, the
pricing structure may differ. The exact details and costs associated with those
specific runner types can be obtained from
[GitHub's documentation](https://docs.github.com/en/billing/managing-billing-for-github-actions/about-billing-for-github-actions).

#### Self-Hosted

When it comes to the cost of self-hosted runners, it's a bit more complicated.

To calculate the costs of self-hosted runners, we can play the game of
calculating for the main ones, namely AWS and Google Cloud Provider (GCP).

The cost can be found based on the machine type in the Management Console
for AWS (when creating an EC2 instance) and on the
[Google Cloud website](https://cloud.google.com/compute/vm-instance-pricing)
for GCP.

Key points to consider for retrieving cost information:

!!! note "Cost for self-hosted runners are approximate"

When retrieving the cost of each key point,
calculating the exact cost per minute might not be possible
as it depends on the cloud provider billing policy
and each individual CI workload:

- Internal cloud provider/lab with dedicated hardware.
- Cloud provider billing policy for virtual machines is per hour or day only.
- Price of instance varies during the day, week or month.
- CI job that uploads a large amount of data.

- RAM and CPU Costs : provided cost per minute for RAM and CPU expenses, can
be found in the documentation of the respective cloud provider.
- Storage Costs : provided cost per minute for storage expenses, can
be found in the documentation of the respective cloud provider.
- Bandwidth Cost: Directly determining the cost per minute of bandwidth is
not feasible.

Calculating the bandwidth cost per minutes is up to the discretion of the
user and will vary depending on the workload. As an example, adding an
extra 30% is what we found by comparing the values in the documentation
of different cloud providers (for CPU, RAM, and storage) with the actual
values available on our invoices. Using this information,
estimating the overall cost can be done using the following formula:
(all costs are per minute)

```bash
cost = (cost_per_flavor + cost_per_storage) * percentage_cost_of_bandwidth
```

!!! note

GCP and AWS costs are quite the same for the same flavors.
tcarmet marked this conversation as resolved.
Show resolved Hide resolved

### The different tags and their associated cost

| Provider | Runner | Cost ($ per min) |
| -------- | -------------------- | ---------------- |
| GitHub | `ubuntu-latest` | 0.008 |
| GitHub | `ubuntu-18.04` | 0.008 |
| GitHub | `ubuntu-20.04` | 0.008 |
| GitHub | `ubuntu-22.04` | 0.008 |
| GitHub | `ubuntu-20.04-4core` | 0.016 |
| GitHub | `ubuntu-22.04-4core` | 0.016 |
| GitHub | `ubuntu-22.04-8core` | 0.032 |
gaspardmoindrot marked this conversation as resolved.
Show resolved Hide resolved
| AWS | `t3.small` | 0.000625 |
| GCP | `n2-standard-2` | 0.0025 |
| AWS | `t3.large` | 0.0025 |
| GCP | `n2-standard-4` | 0.005 |
| GCP | `n2-standard-8` | 0.01 |

!!! note

Please note that the names of large GitHub hosted runners
gaspardmoindrot marked this conversation as resolved.
Show resolved Hide resolved
may not be explicitly the same as shown below, but this is
the naming convention recommended by GitHub.
55 changes: 55 additions & 0 deletions docs/metrics-analysis-prometheus/prometheus.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# Prometheus

## Introduction

Prometheus is a powerful open-source monitoring and alerting system that allows
tcarmet marked this conversation as resolved.
Show resolved Hide resolved
users to collect, store, and analyze time-series data. In this guide, we will
explore how to effectively utilize Prometheus to analyze GitHub Actions.

In order to collect and analyze GitHub Actions metrics, users are expected
to have an existing Prometheus installation and configure it to pull metrics.

## Understanding Prometheus Queries

The idea here is not to recreate the entire Prometheus documentation; we will
simply discuss the key points to get you started easily without getting lost in
the plethora of information available on the Internet.

To learn more about Prometheus itself, checkout the official
[documentation](https://prometheus.io/docs/introduction/overview/),
as well as [querying Prometheus](https://prometheus.io/docs/prometheus/latest/querying/basics/).

To proceed, I will take a typical query and break it down, discussing other
potentially useful information to cover.

Let's examining this example query:

```bash
topk(5, sum(increase(github_actions_job_cost_count_total{}[5m]])) by (repository) > 0)
```

This query retrieves data related to GitHub Actions job costs and
provides the top 5 repositories with the highest cumulative cost
within a specified time range.

1. The query starts with the topk(5, ...) function, which returns the
top 5 values based on a specified metric or condition.
2. The sum(increase(...)) part of the query calculates the cumulative
sum of the specified metric. In our example, it calculates the
cumulative sum of the github_actions_job_cost_count_total metric,
representing the total job cost count.
3. The `[5m]` part specifies the time range for the query.
4. The `by (repository)` clause groups the data by the repository label.
This enables the query to calculate the cost sum for each repository individually.
5. The expression `> 0` filters the query results to only include
repositories with a value greater than zero.

!!! info

Using Grafana enhances the visualization of Prometheus data and
provides powerful querying capabilities. Within Grafana, apply filters,
combine queries, and utilize variables for dynamic filtering. It's important
to understand `__interval` (time interval between data points) and `__range`
(selected time range) when working with Prometheus data in Grafana. This
integration enables efficient data exploration and analysis for better
insights and decision-making.
10 changes: 9 additions & 1 deletion mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,15 @@ theme:
code: Roboto Mono

nav:
- Home: index.md
- Home: index.md

- Getting Started:
- Local Setup: getting-started/local-setup.md
- Run it Locally: getting-started/run-it-locally.md

- Metrics Analysis and Prometheus Monitoring for GitHub Actions:
- Collected and reported metrics: metrics-analysis-prometheus/collected-reported-metrics.md
- Prometheus Monitoring: metrics-analysis-prometheus/prometheus.md

markdown_extensions:
- pymdownx.highlight:
Expand Down