Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flaky webserver 02 unit tests times-out #7008

Open
Tracked by #3421
sanderegg opened this issue Jan 7, 2025 · 3 comments · May be fixed by #7077
Open
Tracked by #3421

flaky webserver 02 unit tests times-out #7008

sanderegg opened this issue Jan 7, 2025 · 3 comments · May be fixed by #7077
Assignees
Labels
High Priority a totally crucial bug/feature to be fixed asap t:maintenance Some planned maintenance work
Milestone

Comments

@sanderegg
Copy link
Member

sanderegg commented Jan 7, 2025

image

@sanderegg sanderegg changed the title webserver 02 unit tests times-out flaky webserver 02 unit tests times-out Jan 7, 2025
@sanderegg sanderegg added the t:maintenance Some planned maintenance work label Jan 7, 2025
@pcrespov pcrespov added this to the Event Horizon milestone Jan 8, 2025
@pcrespov
Copy link
Member

pcrespov commented Jan 8, 2025

@pcrespov pcrespov added the High Priority a totally crucial bug/feature to be fixed asap label Jan 8, 2025
@pcrespov
Copy link
Member

pcrespov commented Jan 16, 2025

The problem is that in the tests tear-down the postgres database tables are dropped but there are some transactions still opened which makes the db to lock.

Using docker exec ... psql test -U admin we can enter in the console of testing postgres database and find the unclosed transactions

SELECT pid, query, state, wait_event_type, wait_event, application_name
FROM pg_stat_activity
WHERE state = 'idle in transaction';

and we can close them with

SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE state = 'idle in transaction';

by enabling

    command:
      - "postgres"
      - "-c"
      - "log_lock_waits=on"
      - "-c"
      - "log_statement=all"

we see in the logs that two processes (i.e. here 85 and 91) are locked

2025-01-16 13:34:33.575 UTC [91] LOG:  duration: 504.342 ms
2025-01-16 13:34:34.204 UTC [96] LOG:  process 96 still waiting for AccessExclusiveLock on relation 16432 of database 16384 after 1000.075 ms
2025-01-16 13:34:34.204 UTC [96] DETAIL:  Processes holding the lock: 85, 91. Wait queue: 96.  <------------------
2025-01-16 13:34:34.204 UTC [96] STATEMENT:

Further debugging indicates that this issue may stem from a fire-and-forget task still running while the tests attempt to drop the tables. The task keeps its transaction open, while the test teardown mechanism tries to delete the affected table. This remains a hypothesis and requires further investigation.

For now, we've added in #7018 a command in the teardown process to close any remaining transactions, allowing the test to proceed. However, this could be a sign of an underlying bug in the system.

@pcrespov pcrespov modified the milestones: Event Horizon, Singularity Jan 27, 2025
@GitHK
Copy link
Contributor

GitHK commented Jan 27, 2025

In one of my PRs I also have it inside of unit-servicelibrary. Same thing, just hangs https://github.com/ITISFoundation/osparc-simcore/actions/runs/12988958713/job/36221158635?pr=7075#step:9:280

@pcrespov pcrespov linked a pull request Jan 27, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
High Priority a totally crucial bug/feature to be fixed asap t:maintenance Some planned maintenance work
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants