Skip to content

Fix infrastructure leak on exception while attaching/detaching volumes in VMware #10860

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

erikbocks
Copy link
Contributor

Description

In VMware environments, when a VM resides on a host in the Disconnected state, and an attach/detach volume operation is initiated, an exception containing infrastructure data is thrown. This PR addresses the issue by handling the AgentUnavailableException separately. The exception will still appear in the application logs, allowing operators to troubleshoot effectively.

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)
  • build/CI
  • test (unit or integration test code)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

I made the following tests, in my local lab:

  1. Created a new VM and attached a volume to it.
  2. Shutdown my VMware host.
  3. Tried to attach a new volume, and the exception containing the infrastructure data was thrown.
  4. Tried to detach the previously attached volume, and the same exception was thrown.
  5. Built and installed CloudStack's packages with my fix.
  6. Repeated the same processes, and validated that the new error message contained no infrastructure data.

Copy link
Contributor

@DaanHoogland DaanHoogland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clgtm

Copy link

codecov bot commented May 13, 2025

Codecov Report

Attention: Patch coverage is 0% with 5 lines in your changes missing coverage. Please review.

Project coverage is 16.57%. Comparing base (39c5641) to head (b4495d9).
Report is 113 commits behind head on main.

Files with missing lines Patch % Lines
...n/java/com/cloud/storage/VolumeApiServiceImpl.java 0.00% 5 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #10860      +/-   ##
============================================
+ Coverage     16.40%   16.57%   +0.16%     
- Complexity    13590    13869     +279     
============================================
  Files          5692     5719      +27     
  Lines        501976   507206    +5230     
  Branches      60795    61575     +780     
============================================
+ Hits          82369    84089    +1720     
- Misses       410449   413698    +3249     
- Partials       9158     9419     +261     
Flag Coverage Δ
uitests 3.96% <ø> (-0.04%) ⬇️
unittests 17.45% <0.00%> (+0.18%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@erikbocks
Copy link
Contributor Author

Thank you for the review @sureshanaparti.

@DaanHoogland
Copy link
Contributor

@blueorangutan package

@blueorangutan
Copy link

@DaanHoogland a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 13675

@DaanHoogland
Copy link
Contributor

@blueorangutan test

@blueorangutan
Copy link

@DaanHoogland a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

@blueorangutan
Copy link

[SF] Trillian test result (tid-13500)
Environment: kvm-ol8 (x2), Advanced Networking with Mgmt server ol8
Total time taken: 89073 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr10860-t13500-kvm-ol8.zip
Smoke tests completed. 130 look OK, 11 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_nic_secondaryip_add_remove Error 1518.41 test_multipleips_per_nic.py
ContextSuite context=TestNestedVirtualization>:setup Error 0.00 test_nested_virtualization.py
ContextSuite context=TestNetworkACL>:setup Error 0.00 test_network_acl.py
ContextSuite context=TestIpv6Network>:setup Error 0.00 test_network_ipv6.py
test_delete_account Error 1517.39 test_network.py
test_delete_network_while_vm_on_it Error 1.26 test_network.py
test_deploy_vm_l2network Error 1.20 test_network.py
test_l2network_restart Error 2.35 test_network.py
ContextSuite context=TestPortForwarding>:setup Error 3.59 test_network.py
ContextSuite context=TestPublicIP>:setup Error 12.44 test_network.py
test_reboot_router Failure 0.09 test_network.py
test_releaseIP Error 6.53 test_network.py
test_releaseIP_using_IP Error 6.02 test_network.py
ContextSuite context=TestRouterRules>:setup Error 6.11 test_network.py
ContextSuite context=TestSharedNetworkWithConfigDrive>:setup Error 1521.96 test_network.py
ContextSuite context=TestPrivateGwACL>:setup Error 0.00 test_privategw_acl.py
ContextSuite context=TestAdapterTypeForNic>:setup Error 0.00 test_nic_adapter_type.py
ContextSuite context=TestNonStrictAffinityGroups>:setup Error 0.00 test_nonstrict_affinity_group.py
ContextSuite context=TestIsolatedNetworksPasswdServer>:setup Error 0.00 test_password_server.py
ContextSuite context=TestPortForwardingRules>:setup Error 0.00 test_portforwardingrules.py
ContextSuite context=TestProjectSuspendActivate>:setup Error 1529.70 test_projects.py

@DaanHoogland
Copy link
Contributor

@blueorangutan test ol8 vmware-80u3

@blueorangutan
Copy link

@DaanHoogland a [SL] Trillian-Jenkins test job (ol8 mgmt + vmware-80u3) has been kicked to run smoke tests

@blueorangutan
Copy link

[SF] Trillian Build Failed (tid-13517)

@DaanHoogland
Copy link
Contributor

@blueorangutan test ol8 vmware-70u3

@blueorangutan
Copy link

@DaanHoogland a [SL] Trillian-Jenkins test job (ol8 mgmt + vmware-70u3) has been kicked to run smoke tests

@blueorangutan
Copy link

[SF] Trillian Build Failed (tid-13521)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

4 participants