Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add build id to workflow info #458

Merged
merged 9 commits into from
Jan 12, 2024
Merged

Add build id to workflow info #458

merged 9 commits into from
Jan 12, 2024

Conversation

Sushisource
Copy link
Member

What was changed

  • Add Build ID to workflow info
  • Update Core

Why?

Part of temporalio/features#253

Checklist

  1. Closes

  2. How was this tested:
    Added IT

  3. Any docs updates needed?

@Sushisource Sushisource requested a review from a team as a code owner January 10, 2024 00:02
Copy link
Member

@cretz cretz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just have to skip the test for the time-skipping server

temporalio/workflow.py Outdated Show resolved Hide resolved
self.do_finish = True


async def test_workflow_current_build_id_appropriately_set(client: Client):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
async def test_workflow_current_build_id_appropriately_set(client: Client):
async def test_workflow_current_build_id_appropriately_set(client: Client):
if env.supports_time_skipping:
pytest.skip("Java test server does not support worker versioning")
if not await worker_versioning_enabled(client):
pytest.skip("This server does not have worker versioning enabled")

Ignore this test when on time-skipping server or on older server

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worker versioning isn't actually being used here - just the fields about build id which should work in the test server, right?

Copy link
Member

@cretz cretz Jan 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought I saw a CI error when we run these tests against time skipping. But now I'm seeing some other update error in regular tests which is preventing it from getting to the time-skipping tests that show the time-skipping server failed somehow here. But yes, can just:

Suggested change
async def test_workflow_current_build_id_appropriately_set(client: Client):
async def test_workflow_current_build_id_appropriately_set(client: Client):
if env.supports_time_skipping:
pytest.skip("Java test server does not support worker versioning")

and ignore the versioning part.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean it shouldn't need to be skipped at all, the propagation of this data should also work on test server

Copy link
Member

@cretz cretz Jan 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤷 Something else is breaking CI before it even gets to time-skipping now, unsure what, I can debug if necessary

Copy link
Member Author

@Sushisource Sushisource Jan 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I don't know what's up with that. It's the test I wrote back when I added update but it was working perfectly fine but seems consistently broken here, but nothing here should affect it... (and it runs fine locally)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, weird. I did manage to repro it once locally. I think it just flakes very rarely. Looks like there might be some race where occasionally the update handler is not registered on the first task. I'm not totally sure how that could be.

Copy link
Member

@cretz cretz Jan 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, something confusing is happening here. From the code, surely the start workflow task should run before the update attempt. I can't see from the code how that could be either. I wonder in the case where you repro if somehow the activation has the update before the start?

I do think it's quite an important find and I think we should definitely dig into it, but I understand if you don't want it holding this up. You can either open an issue and skip the test (or that part at least), or feel free to investigate inline if you don't mind it holding this up. I was gonna ask internally whether there was any known issue where an update event coming before a start event on the server, but then I recall it's actually sent as a "protocol". So I wonder if that protocol event interleaving code either has a random issue putting it before start in the activation, or the event sequencing number put on the protocol message is wrong from the server side. I wonder if we can repro more easily by not starting the worker until after we've sent the update ensuring that start and update are in the same activation.

Copy link
Member

@cretz cretz Jan 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, we figured this out. We intentionally order updates before start, so in the case where your client command start-then-updates before the task is delivered, update will run first. And we don't have a buffer-update-for-life-of-task like Go, Java, and TS do. We will discuss internally, but in the meantime this test is a bit flawed in assuming updates always come after start. Can fix now, or skip that assertion and open issue (we're probably going to have to open an issue anyways).

Update testsrv protos (#668)
@Sushisource Sushisource merged commit 4904054 into main Jan 12, 2024
12 checks passed
@Sushisource Sushisource deleted the build-id-wfinfo branch January 12, 2024 21:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants