Skip to content

Commit

Permalink
docs(adr): apply suggestions from @liamsi and add minor improvements
Browse files Browse the repository at this point in the history
  • Loading branch information
Wondertan committed Jul 15, 2022
1 parent 1e2e2d5 commit 62027a7
Showing 1 changed file with 19 additions and 11 deletions.
30 changes: 19 additions & 11 deletions docs/adr/adr-009-telemetry.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@

@Wondertan @liamsi

## Legend
## Glossary
- "ShrEx" - Share Exchange
> It's all ogre now
Expand All @@ -22,23 +22,23 @@ Celestia Node needs deeper observability of each module and their components. Th
we have is logging and there are two more options we need to explore from the observability triangle(tracing, metrics and logs).

There are several priorities and "why"s we need deeper observability:
* Establishing metrics/data driven engineering culture for celestia-node devs
* Metrics and tracing allows extracting dry facts out of any software on its performance, liveness, bottlenecks,
regressions, etc., on whole system scale, so devs can reliably respond
* Basing on these, all the improvements can be proven with data _before_ and _after_ a change
* Analysis of the current p2p share exchange or "ShrEx" stack
* So we can evaluate real world Full Node reconstruction qualities, along with data availability sampling
* And adjust our roadmap accordingly.
* And adjust our roadmap accordingly
* Incentivized Testnet
* Tracking participants and validation that do task correctly
* So all participants provide to us valuable data/insight/traces that we can analyze and improve on
* Monitoring dashboars
* Monitoring dashboards
* For Celestia's own DA network infrastructure, e.g. DA Network Bootstrappers
* For the node operators
* Extend debugging arsenal for the networking heavy DA layer
* Local development
* Issues found with Testground
* Issues found with [Testground testing](https://github.com/celestiaorg/test-infra)
* Production
* Establishing metrics/data driven engineering culture for celestia-node devs
* Metrics and tracing allows extracting dry facts out of any software on its performance, liveness, bottlenecks,
regressions, etc., on whole system scale, so devs can reliably respond
* Basing on these, all the improvements can be proven with data _before_ and _after_ a change

This ADR is intended to outline the decisions on how to proceed with:
* Integration plan according to the priorities and the requirements
Expand All @@ -51,10 +51,10 @@ This ADR is intended to outline the decisions on how to proceed with:
### Plan

The first "ShrEx" stack analysis priority is critical for Celestia project. The analysis results will tell us whether
our current Full Node reconstruction qualities conforms to the main network requirements, subsequently affecting
the development roadmap of the celestia-node before the main network launch, therefore is a potential blocker to the
our current [Full Node reconstruction](https://github.com/celestiaorg/celestia-node/issues/602) qualities conforms to the main network requirements, subsequently affecting
the development roadmap of the celestia-node before the main network launch.
Basing on the former, the plan is focused on unblocking the reconstruction
story first and then proceed with steady covering of our codebase with traces for the complex codepaths as well as
analysis first and then proceed with steady covering of our codebase with traces for the complex codepaths as well as
metrics and dashboards for "measurables".

Fortunately, the "ShrEx" analysis can be performed with _tracing_ only(more on that in [Tracing](./#Tracing)), so the
Expand Down Expand Up @@ -222,5 +222,13 @@ tracing is debug logging on steroids, and we can potentially consider dropping c
fully cover our codebases with the tracing. Same as logging, traces can be pipe out into the stdout as prettyprinted
event log.

## Further Readings
- [Uptrace tracing tools comparison](https://get.uptrace.dev/compare/distributed-tracing-tools.html)
- [Uptrace guide](https://get.uptrace.dev/guide/)
- [Uptrace OpenTelemetry Docs](https://opentelemetry.uptrace.dev/)
> Provides simple Go API guide for metrics and traces
- [OpenTelemetry Docs](https://opentelemetry.io/docs/)
- [Prometheus Docs](prometheus.io/docs/introduction/overview)

## Status
Proposed

0 comments on commit 62027a7

Please sign in to comment.