Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Github Actions #834

Merged
merged 83 commits into from
Jul 14, 2023
Merged

Fix Github Actions #834

merged 83 commits into from
Jul 14, 2023

Conversation

andarut
Copy link
Contributor

@andarut andarut commented May 25, 2023

The problem

Github Actions fails randomly. On my fork i rerun all jobs for same code many times. And got this:
16831599489896

I think it's very important to fix actions, because without all checks passed branch can't be merged and of course there is no trust to PR. For example my previous PR Fix build C libraries inside kphp. @Danil42Russia rerun checks on my PR 3 times and no one succed. So branch with useful fix can't be merged, because of random. And this is not about code in PR, this is about randomly failed tests. You can check my fork for more (i tested another my PR Add mbstring functions to kphp): my fork.

The path and fix

Linux build

So, there is 5 randomly failing python tests (timeout error):

  • test_define_from_config
  • test_script_errors
  • test_headers_limit
  • test_post_limit
  • test_query_limit

The cause is that the latest version of requests does not support urllib3 2.0.
This is triggered by versions of requests-toolbelt and urllib3 that were both released in the past few weeks (May 1 and May 4, 2023). This is already an issue in requests-toolbelt. This is refer to urllib3 2.0 bug: urllib3 keeps a reference to that exception to include it in the MaxRetryError exception it will eventually raise. This means the garbage collector will not close the socket on its own, which means the connection will be kept open. So when urllib3 retries, it won't be able to open a connection, because the server is waiting for the previous try to finish. That's a deadlock!

Note! I've tried to rewrite send_http_request function from tests/python/lib/http_server.py with urllib requests. This is wrong way, urllib have different work style. Downgrade urllib much more safety.

And 1 test, randomly failing with RuntimeError: Got bad stat line error

  • test_store_fetch_delete

To fix this, i changed tests/python/lib/stats_receiver.py that way, so now when we receive not complete stats key, we just skip it. That's right because after this test fails, even logs shows that stats file is full and right. So sometimes engine don't be on time to write full string, when test runs. But old code already have timeout function for that (wait_next_stats). So i think when we wait for updating stats and got not full key, we must't throw runtime error. Also when get ValueError when unpacking key, this is also reason to wait for engine to write.

And 1 test, randomly failing because of timeout:

  • test_job_stack_overflow_error

Test deleted, because of incorrect work with ASAN. ASAN randomly shutdown worker when getting stackoverflow error, so we have timeout.

MacOS build

So, there is 3 failed cpp tests:

  • counter_test
  • parallel_limit_counter
  • parallel_counter

All tests deal with threads. I think that slow syscalls might cause other threads to spawn to take over. Github Actions default mac os is 3 cores machine, so maybe the efficiency cores are too slow to take over and kphp thinks the syscall takes too long. So i disable these tests similar to maximum-test.cpp (which also deals with threads).

Result

Result
Снимок экрана 2023-05-27 в 23 57 46

@troy4eg troy4eg self-requested a review May 25, 2023 18:43
@Danil42Russia Danil42Russia added the refactoring Logic and code style improvements label May 25, 2023
@DrDet DrDet self-requested a review June 1, 2023 16:29
@Danil42Russia Danil42Russia removed the refactoring Logic and code style improvements label Jun 2, 2023
Danil42Russia
Danil42Russia previously approved these changes Jun 2, 2023
Copy link
Contributor

@Danil42Russia Danil42Russia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I give my approve, ONLY for running tests.

This does not mean that this PR is ready to merge in master.

@Danil42Russia Danil42Russia self-requested a review June 2, 2023 11:36
@Danil42Russia Danil42Russia dismissed their stale review June 2, 2023 18:50

Tests passed

@andarut andarut force-pushed the fix branch 13 times, most recently from 225097d to 4aa15cd Compare July 12, 2023 20:15
Comment on lines 78 to 83
# def test_job_stack_overflow_error(self):
# error_code = self.JOB_STACK_OVERFLOW_ERROR
# data = [[1, 2, 3, 4], [7, 9, 12]]
# buffers = 4
# stats_before = self.kphp_server.get_stats()
# resp = self.kphp_server.http_post(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we'll remove this completely?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, done.

@andarut andarut requested a review from DrDet July 14, 2023 15:50
@andarut andarut merged commit 60656e1 into VKCOM:master Jul 14, 2023
@andarut andarut added this to the next milestone Jul 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants