-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add build id to workflow info #458
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just have to skip the test for the time-skipping server
tests/worker/test_workflow.py
Outdated
self.do_finish = True | ||
|
||
|
||
async def test_workflow_current_build_id_appropriately_set(client: Client): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
async def test_workflow_current_build_id_appropriately_set(client: Client): | |
async def test_workflow_current_build_id_appropriately_set(client: Client): | |
if env.supports_time_skipping: | |
pytest.skip("Java test server does not support worker versioning") | |
if not await worker_versioning_enabled(client): | |
pytest.skip("This server does not have worker versioning enabled") |
Ignore this test when on time-skipping server or on older server
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worker versioning isn't actually being used here - just the fields about build id which should work in the test server, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought I saw a CI error when we run these tests against time skipping. But now I'm seeing some other update error in regular tests which is preventing it from getting to the time-skipping tests that show the time-skipping server failed somehow here. But yes, can just:
async def test_workflow_current_build_id_appropriately_set(client: Client): | |
async def test_workflow_current_build_id_appropriately_set(client: Client): | |
if env.supports_time_skipping: | |
pytest.skip("Java test server does not support worker versioning") |
and ignore the versioning part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean it shouldn't need to be skipped at all, the propagation of this data should also work on test server
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤷 Something else is breaking CI before it even gets to time-skipping now, unsure what, I can debug if necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I don't know what's up with that. It's the test I wrote back when I added update but it was working perfectly fine but seems consistently broken here, but nothing here should affect it... (and it runs fine locally)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, weird. I did manage to repro it once locally. I think it just flakes very rarely. Looks like there might be some race where occasionally the update handler is not registered on the first task. I'm not totally sure how that could be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, something confusing is happening here. From the code, surely the start workflow task should run before the update attempt. I can't see from the code how that could be either. I wonder in the case where you repro if somehow the activation has the update before the start?
I do think it's quite an important find and I think we should definitely dig into it, but I understand if you don't want it holding this up. You can either open an issue and skip the test (or that part at least), or feel free to investigate inline if you don't mind it holding this up. I was gonna ask internally whether there was any known issue where an update event coming before a start event on the server, but then I recall it's actually sent as a "protocol". So I wonder if that protocol event interleaving code either has a random issue putting it before start in the activation, or the event sequencing number put on the protocol message is wrong from the server side. I wonder if we can repro more easily by not starting the worker until after we've sent the update ensuring that start and update are in the same activation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, we figured this out. We intentionally order updates before start, so in the case where your client command start-then-updates before the task is delivered, update will run first. And we don't have a buffer-update-for-life-of-task like Go, Java, and TS do. We will discuss internally, but in the meantime this test is a bit flawed in assuming updates always come after start. Can fix now, or skip that assertion and open issue (we're probably going to have to open an issue anyways).
ce162e2
to
c693704
Compare
fb4f6a0
to
4fe2329
Compare
Update testsrv protos (#668)
4fe2329
to
d4ae5c3
Compare
What was changed
Why?
Part of temporalio/features#253
Checklist
Closes
How was this tested:
Added IT
Any docs updates needed?