Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log warning on same version upgrade to prevent failed status #6273

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

michel-laterman
Copy link
Contributor

@michel-laterman michel-laterman commented Dec 10, 2024

What does this PR do?

Log warning on same version upgrade and filter error to prevent upgrade status reporting.
If a user attempts to upgrade to the same version, the elastic-agent upgrade call will exit with an error, but the agent's status will be cleared of any upgrade markers.

Include an earlier check for same-version upgrades before artifacts are downloaded to prevent unnecessary downloads

Why is it important?

Currently upgrading to the same version results in the agent reporting a failed state until a successful upgrade can be done.

Testing instructions

Install a stand-alone agent and use elastic-agent upgrade 9.0.0-SNAPSHOT --source-uri file://path/to/dir --skip-verification to upgrade the agent

For example:

# Install the agent
elastic-agent-9.0.0-SNAPSHOT-darwin-aarch64|⇒ sudo ./elastic-agent install --develop
Installing into development namespace; this is an experimental and currently unsupported feature.
Elastic Agent will be installed at /Library/Elastic/Agent-Development and will run as a service. Do you want to continue? [Y/n]:y
Do you want to enroll this Agent into Fleet? [Y/n]:n
[   =] Service Started  [4s] Elastic Agent successfully installed, starting enrollment.
[   =] Done  [4s]
Elastic Agent has been successfully installed.

# Check status and version
elastic-agent-9.0.0-SNAPSHOT-darwin-aarch64|⇒ sudo elastic-development-agent version
Binary: 9.0.0-SNAPSHOT (build: 14e78753e6c06295e3348a00de01ae54905fb2c0 at 2024-12-18 18:30:32 +0000 UTC)
Daemon: 9.0.0-SNAPSHOT (build: 14e78753e6c06295e3348a00de01ae54905fb2c0 at 2024-12-18 18:30:32 +0000 UTC)
elastic-agent-9.0.0-SNAPSHOT-darwin-aarch64|⇒ sudo elastic-development-agent status
┌─ fleet
│  └─ status: (STOPPED) Not enrolled into Fleet
└─ elastic-agent
   └─ status: (HEALTHY) Running

# attempt to upgrade to the same version
elastic-agent-9.0.0-SNAPSHOT-darwin-aarch64|⇒ sudo elastic-development-agent upgrade 9.0.0-SNAPSHOT --source-uri file:///Users/mlaterman/git/elastic-agent/build/distributions/ --skip-verify
Upgrade triggered to version 9.0.0-SNAPSHOT, Elastic Agent is currently restarting
elastic-agent-9.0.0-SNAPSHOT-darwin-aarch64|⇒ echo $?
0
elastic-agent-9.0.0-SNAPSHOT-darwin-aarch64|⇒ sudo elastic-development-agent status
┌─ fleet
│  └─ status: (STOPPED) Not enrolled into Fleet
└─ elastic-agent
   └─ status: (HEALTHY) Running
elastic-agent-9.0.0-SNAPSHOT-darwin-aarch64|⇒ sudo elastic-development-agent version
Binary: 9.0.0-SNAPSHOT (build: 14e78753e6c06295e3348a00de01ae54905fb2c0 at 2024-12-18 18:30:32 +0000 UTC)
Daemon: 9.0.0-SNAPSHOT (build: 14e78753e6c06295e3348a00de01ae54905fb2c0 at 2024-12-18 18:30:32 +0000 UTC)

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in ./changelog/fragments using the changelog tool
  • I have added an integration test or an E2E test

Disruptive User Impact

N/A

Related issues

@michel-laterman michel-laterman added bug Something isn't working Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team backport-8.x Automated backport to the 8.x branch with mergify backport-8.16 Automated backport with mergify backport-8.17 Automated backport with mergify labels Dec 10, 2024
@michel-laterman michel-laterman marked this pull request as ready for review December 13, 2024 00:25
@michel-laterman michel-laterman requested a review from a team as a code owner December 13, 2024 00:25
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

}

// check version before download
same, _ := isSameVersion(u.log, currentVersion, packageMetadata{}, version)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check will not work for snapshot upgrades.
The packageMetadata is needed to check on the hash for snapshots in the check below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but it should work for non-snapshot attempts, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pchila so in this case this will fail because hash is unpopulated right? but then later it is extracted and checked again on line 224. so we lose a bit of time but will not perform upgrade

Copy link
Member

@pchila pchila Dec 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we are relying on the fact that the current version has a not empty hash compared to the zero value in packageMetadata{} so I don't see how it can ever detect this to be the same version as the current one if the hash in packageMetadata is ""

I guess we will never get true from here... Did you test this ? Is there a case where this returns true ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're correct, I forgot that comparing currentVersion vs version would also compare the hashes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a new func that does this comparison correctly.

internal/pkg/agent/application/upgrade/upgrade.go Outdated Show resolved Hide resolved
testing/integration/upgrade_standalone_same_commit_test.go Outdated Show resolved Hide resolved
@cmacknz
Copy link
Member

cmacknz commented Dec 17, 2024

If a user attempts to upgrade to the same version, the elastic-agent upgrade call will exit with an error,

You don't want it to return an error. It should be treated like an idempotent API call and return success because the upgrade was already done successfully in the past. This is to handle the case where someone uses an orchestration tool to ensure all of their agents are a specific version. Without an error, they can unconditionally run elastic-agent upgrade, with an error they need to check and parse the version first.

Edit: this is also true for the Fleet upgrade action and is what caused the bug. Someone unconditionally upgraded all Fleet managed agents to a specific version regardless of what version they currently were as a way of driving consistency.

if errors.Is(err, upgrade.ErrUpgradeSameVersion) {
// Set upgrade state to completed, but return an error if a same version-upgrade is attempted.
det.SetState(details.StateCompleted)
return err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want to consider this an error, that is what lead to the bug this is fixing in the first place. It needs to be like an idempotent API call. It returns success if the action has already completed successfully once.

Copy link
Contributor

@swiatekm swiatekm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but someone with more familiarity with the domain should review as well before merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-8.x Automated backport to the 8.x branch with mergify backport-8.16 Automated backport with mergify backport-8.17 Automated backport with mergify bug Something isn't working Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team
Projects
None yet
6 participants