Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logs are missing when CodeBuild uses Compute fleets #171

Closed
mic-kul opened this issue Sep 17, 2024 · 3 comments
Closed

Logs are missing when CodeBuild uses Compute fleets #171

mic-kul opened this issue Sep 17, 2024 · 3 comments

Comments

@mic-kul
Copy link
Contributor

mic-kul commented Sep 17, 2024

Hi,

We're encountering an issue with the aws-codebuild-run-build action when using CodeBuild Compute fleets: the logs are missing, whenever generating output takes more than 60s (default updateInterval 30s).

I've checked CloudWatch GetLogEvents metrics and found no errors.

We run this with default update interval of 30 seconds.

First, I thought we were encountering the condition described in this section of the code:

  // GetLogEvents can return partial/empty responses even when there is data.
  // We wait for two consecutive empty log responses to minimize false positive on EOF.
  // Empty response counter starts after any logs have been received, or when the build completes.

However, it doesn't make sense that everything works as expected when running on On Demand builders, and the issue occurs only when we run the build on CodeBuild Compute Fleet.

The minimal buildspec to reproduce the issue:

version: 0.2

phases:
  pre_build:
    commands:
      - echo "Preparing to execute the sleep script"
  build:
    commands:
      - echo "Starting the sleep script"
      - |
        #!/bin/bash

        # Initialize total sleep time
        total_sleep_time=20

        # Loop until total sleep time reaches or exceeds 60 seconds
        while [ $total_sleep_time -lt 160 ]; do
          echo $total_sleep_time
          sleep $total_sleep_time
          total_sleep_time=$((total_sleep_time + 15))
          
        done

        echo "Total sleep time: $total_sleep_time seconds"
  post_build:
    commands:
      - echo "Sleep script execution completed"

Example:

When running CodeBuild On Demand started by this Github Action, GHA outputs:

on_demand_finished 

When running CodeBuild Compute fleets started by this Github Action, CB&GHA output:

  • in progress:
reservered_capacity_in_progress
  • finished:
reservered_capacity_finished

Is there anything that can be done to try to pull all missing logs again, once "CODEBUILD COMPLETE" signal is received?

@shuohaoliu
Copy link

shuohaoliu commented Sep 26, 2024

Created a pull request to use nextForwardToken instead of two consecutive empty event list to determine EOF when pulling CloudWatch log.

Regarding the concern why this was only observed when using CodeBuild Compute Fleet (not on-demand mode), I think it might be related with how/when CloudWatch agent is pushing the log from instance to CloudWatch service. For example, CloudWatch agent has some configuration such as force_flush_interval . When using CodeBuild on-demand compute resource, the ec2 instance would be terminated right after the build is complete, and CloudWatch agent would push everything in memory to the CloudWatch service without waiting during the termination/shutdown process. However, with CodeBuild compute fleet mode, you would get a reserved ec2 instance capacity, it won't be terminated after the build, hence CloudWatch agent would honor such configuration to determine when to push the log to CloudWatch service next time. It seems to be a timing issue in certain scenarios.

@mic-kul
Copy link
Contributor Author

mic-kul commented Sep 26, 2024

Thank you @shuohaoliu.

For the context I will add another oddity we've noticed is that all logs in Cloudwatch, when using CodeBuild Compute Fleet, have the same timestamp attached, and that is the timestamp of very first log message. No matter how much sleep we add in bash ;)
We've already raised this with AWS Premium Support and it was escalated with CodeBuild team.

@shuohaoliu
Copy link

Issue should have been fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants