Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run Summary #3628

Open
yhakbar opened this issue Dec 5, 2024 · 2 comments
Open

Run Summary #3628

yhakbar opened this issue Dec 5, 2024 · 2 comments
Labels
pending-decision Pending decision from maintainers rfc Request For Comments

Comments

@yhakbar
Copy link
Collaborator

yhakbar commented Dec 5, 2024

Summary

Introduce a short summary that is emitted at the end of Terragrunt runs to give high level information about Terragrunt runs.

In addition to emitting a short summary, introduce a flag to allow for a serialized format of a detailed report that can be parsed and acted upon when the run is completed.

Motivation

When running in the context of a run-all or when operating on an individual unit with dependencies, Terragrunt can run multiple units in one command.

In that scenario, users have to parse logs to determine what happened to individual units and determine the overall status of a Terragrunt run.

When an individual unit fails to run successfully or is taking longer than the rest, users have to rely on parsing logs of the entire run to determine this.

In the context of CI, this can be useful as well, as automated reasoning regarding the success/failure of unit runs can enrich notifications emitted to user on pull requests.

Proposal

Introduce a short summary of every run at the end of every Terragrunt command involving a wrapped binary (e.g. plan or run-all plan) providing a summary of the run.

That summary will include the following:

  • Total Units
  • Total Duration
  • Units Succeeded Count
  • Units Failed Count
  • Descendants of Failed Unit Count (Early exit)
  • Excluded Count

These would be color coded to help folks quickly discover relevant information (red for failed, green for succeeded, etc).

The counts would be hidden for any zero values.

e.g. Don't display 0 excludes every time if no unit is ever excluded.

What it would look like:

$ terragrunt run-all plan
...

❯❯ Run Summary
Units:      8
Duration:   600s
Succeeded:  3
Failed:     1
Early Exit: 3
Excluded:   1

In addition, introduce a flag named --terragrunt-report-file that can also write a report to a specified file.

This file would receive a detailed report of the run with the following for each unit:

  • Name of unit (string)
  • Time started (timestamp)
  • Time ended (timestamp)
  • Result (One of succeeded, failed, early exit, excluded)
  • Failed Ancestor (Name of failed unit)

What it would look like:

This is a contrived example to show all the fields being used:
Report of a run involving a VPC with a dependent subnet, with two EC2 instances that are dependent.
The first instance is manually deployed , and is excluded, and the second is deployed with Terragrunt. An apply failed during the updates to the subnet.

(Extra whitespace included to make the CSV legible)

name,       started,          ended,            result,     failed ancestor
vpc,        2024-09-16 14:20, 2024-09-16 14:30, succeeded,  N/A
subnet,     2024-09-16 14:30, 2024-09-16 14:40, failed,     N/A
manual ec2, 2024-09-16 14:40, 2024-09-16 14:40, excluded,   N/A
TG ec2,     2024-09-16 14:40, 2024-09-16 14:40, early exit, subnet

The report file here would output maximal information for a run, instead of the minimal information made available in the summary.

To control the format of this report, use the following additional flag --terragrunt-report-file-format, which will control the format of the summary file.

The two initial values will be:

  • csv (default)
  • json

This should allow users to parse the summary as they see fit and aggregate information accordingly.

Finally, introduce the flag --terragrunt-summary-disable. This will disable the emission of the final summary to stdout by Terragrunt, as the emission of the summary will be the default behavior. The summary file will not be written by default, however, so this flag will not prevent that file creation.

While not part of this proposal, another feature that can be provided is the availability of a --terragrunt-summary-only flag. This would disable all logs except the summary. This feature wouldn't be part of this proposal, as the availability of a run summary would need to be battle tested before making it the only information emitted by Terragrunt. The advantage of such a flag is that it would allow for the majority of noise emitted by Terragrunt to be ignored.

A corresponding feature that could be added is to also include a --terragrunt-log-file to sink the logs into a file instead of being emitted to stoud in this case. By doing this, users would still be able to revisit the logs of the run while only emitting the summary to stdout. Again, not something to be tackled as part of this RFC.

Technical Details

An execution report will have to be tracked for the duration of Terragrunt runs in order to determine when units start executing, when they complete their executions and what the outcomes of their runs are.

Terragrunt will also need to detect the completion of all runs and emit a final short summary to stdout.

The detailed summary will need to be serialized according to the specifications of the user, and provide the high-level information that users expect.

Flags added:

  • --terragrunt-summary-disable
  • --terragrunt-report-file
  • --terragrunt-report-file-format

In addition, OpenTelemetry traces should take these metrics into account. There should be events emitted that track the information being reported on and summarized.

Press Release

Announcing Terragrunt Run Summaries!

Visual demonstration of run summaries

Terragrunt has long been a tool that supported operating over multiple Units, providing a simple abstraction for running them with a single command.

Terragrunt will now emit a short summary when performing runs that gives high level information about the run. This summary will give quick at-a-glance information about every run instead of requiring analysis of log output.

Drawbacks

This change will require a small, potentially undesirable, change in that the stdout of Terragrunt will be altered by default. Users will have to explicitly opt-out of emitting a summary via the --terragrunt-disable-summary flag in order to prevent this behavior.

The advantages of this change should outweigh the negatives, however. Users will get access to more data without any additional configuration when making updates to Terragrunt units, and many will benefit heavily from this.

Alternatives

Don't do it

Some users don’t like UI/UX changes, however small, and they may rely on Terragrunt emitting logs in the exact format they are today.

Only support summary file

Users may be able to acquire all the same benefits from opting into this functionality via a summary file with no change to the stdout emitted by Terragrunt.

The trade-off here is that the short summary is very useful information. Users shouldn't have to learn a new flag or change configurations to receive that data.

Migration Strategy

If you currently rely on parsing Terragrunt logs, you will likely want to be aware of this change and adjust your scripts accordingly.

Unresolved Questions

  • Would users prefer any of the alternates?
  • Is there any data users would prefer removed from the summary?
  • What about the report?
  • Are there any other formats users would really like supported out of the box?
  • Do users like the format of the short summary? How should it be styled?

Edits

  1. Added note about OpenTelemetry traces.
@yhakbar yhakbar added rfc Request For Comments pending-decision Pending decision from maintainers labels Dec 5, 2024
@wakeful
Copy link
Contributor

wakeful commented Dec 10, 2024

I really like the summary! A few thoughts:

  • it would be great if the summary could also be pushed via otel metrics, if that isn’t already being done

  • I understand the intent is to display the summary only at the end of the run, but perhaps in the future, you might consider introducing a "live feed" or "refresh status" feature. This could be useful outside of CI, especially for more complex projects where it's challenging to assess progress in real-time

  • regarding the flags, shouldn't we use a summary prefix for report file and report file format? For example: --terragrunt-summary-report-file and --terragrunt-summary-report-file-format

@yhakbar
Copy link
Collaborator Author

yhakbar commented Dec 11, 2024

In order:

  • That's a great idea! I'll add it as a requirement to the technical details. It might not necessarily be a single summary event/span, but there should definitely be events emitted that give you equivalent information to the summary. Maybe both.

  • We have thoughts about this, but delivering anything like this would not be something we'd focus on in the short term.

  • The thought there was that the summary and the report are really two different things that interact with the same data. It's not a report on the summary, but on the run.

    Maybe run-report-file/format and run-summary are better?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pending-decision Pending decision from maintainers rfc Request For Comments
Projects
None yet
Development

No branches or pull requests

2 participants