Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Controlplane part of forwarder-vpp leaks #1129

Open
10 tasks done
NikitaSkrynnik opened this issue Jun 25, 2024 · 3 comments
Open
10 tasks done

Controlplane part of forwarder-vpp leaks #1129

NikitaSkrynnik opened this issue Jun 25, 2024 · 3 comments
Assignees
Labels
bug Something isn't working performance The problem related to system effectivity stability The problem is related to system stability

Comments

@NikitaSkrynnik
Copy link
Contributor

NikitaSkrynnik commented Jun 25, 2024

Description

forwarder-vpp has two interface leaks:

  1. timeout chain element doesn't call Close for expired connections. Therefore, vpp interfaces are not deleted
  2. Even if we call Close in forwarder-vpp it doesn't delete vxlan interfaces

Tasks

  • Investigate why timeout doesn't call Close - 8h
    • Make timeout chain element close connetions faster - 1h
    • Add logs to timeout chain element - 1h
    • Run tests with the modified timeout - 3h
    • Check collected logs - 3h
  • Fix timeout chain element - 24h
  • Investigate why forwarder-vpp doesn't close vxlan interfaces - 6h
    • Run scaling tests - 3h
    • Check collected logs - 3h
  • Fix vxlan problem - 16h

Total: 54h

@NikitaSkrynnik
Copy link
Contributor Author

All changes that fix tap interface leaks are in these PRs:

  1. Leak fixes sdk#1643
  2. Some changes that fix inteface leaks sdk-vpp#835

@denis-tingaikin
Copy link
Member

denis-tingaikin commented Sep 24, 2024

It seems like the problem is still existing and the forwarder is leaking.

image

The picture shows mem consumption for forwarder vpp.

forwarder-vpp_goroutineprofiles_20240920082027.tar.gz
forwarder-vpp_memprofiles_20240920082112.tar.gz

@denis-tingaikin denis-tingaikin added bug Something isn't working stability The problem is related to system stability performance The problem related to system effectivity labels Sep 24, 2024
@NikitaSkrynnik
Copy link
Contributor Author

NikitaSkrynnik commented Oct 3, 2024

Current plan

  • Test 20 clients and 20 endpoint on Azure cluster with 2 nodes for 24 hours. Collect memory, goroutine profiles, vpp memory profiles and kubectl top after 10 min of testing and after 24 hours of testing.
  • Test 40 clients and 1 endpoint on Azure cluster with 2 nodes for 24 hours. Scale clients for 0 to 40 every 60 seconds. Collect memory, goroutine profiles, vpp memory profiles and kubectl top at the beginning, after 10 min of testing and after 24 hours of testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working performance The problem related to system effectivity stability The problem is related to system stability
Projects
Status: No status
Status: Moved to next release
Status: No status
Development

No branches or pull requests

2 participants