Adds Next Retry Delay for TS SDK #3069

mjameswh · 2024-09-06T19:24:01Z

What does this PR do?

Add instructions on how to use Next Retry Delay in TypeScript.
Update and restructure the TS Failure Detection dev guide page to be more in line with .Net, which better reflects what the Education team is aiming for.

Notes to reviewers

I diverted a bit from the Next Retry Delay doc for the Java SDK introduced in this PR, to try to better align with the style of the .Net's Failure Detection page.
You may want to also get a look at diff between my proposed TS Failure Detection page (/docs/develop/typescript/failure-detection.mdx) and the existing .Net's Failure Detection page. I diverged in a few places to fix minor things that were IMHO inconsistencies/errors. You will either want to accept my corrections and transpose them to the .Net page, or revert them to the original .Net text.

docs/develop/typescript/cancellation.mdx

docs/develop/typescript/failure-detection.mdx

docs/develop/java/failure-detection.mdx

fairlydurable · 2024-09-06T20:52:55Z

docs/develop/typescript/cancellation.mdx

+
+## Cancel an Activity from a Workflow {#cancel-an-activity}
+
+Canceling an Activity from within a Workflow requires that the Activity Execution sends Heartbeats and sets a Heartbeat Timeout.


Link to Heartbeat recommended, plus a sentence or so along the lines that a Heartbeat is a proof-of-life timeout used for long-running Activities.
This sentence also parallels 391 quite a lot. This is a good place once your thoughts are down to step back and consider wording for clarity. (Oh, I see Angela got there first. Thanks, Angela!)

@fairlydurable I only moved that subsection out from the Failure Detection page. I agree it needs rework, but if you don't mind, I would prefer to consider that one out of scope for this PR.

I haven't looked at what's already written on the .Net Cancellation page; when we get there (either I or somebody else), the .Net content should probably be the reference (i.e. rather than this legacy TS doc versino).

You're doing something awesome. We appreciate it.

fairlydurable · 2024-09-06T21:00:33Z

docs/develop/typescript/cancellation.mdx

+## Cancel an Activity from a Workflow {#cancel-an-activity}
+
+Canceling an Activity from within a Workflow requires that the Activity Execution sends Heartbeats and sets a Heartbeat Timeout.
+If the Heartbeat is not invoked, the Activity cannot receive a cancellation request.


Simpler: Activities without Heartbeats can't be canceled.
Good opportunity to explain why this is necessary. I'm asking in tech-qs to see if I understand it correctly.

Activity cancellations are sent by the server to the worker through the RecordActivityTaskHeartbeatResponse. So if there is no heartbeat request, then there is no heartbeat response, and consequently, the worker never learns about the activity cancellation request. (edited)

fairlydurable · 2024-09-06T21:01:39Z

docs/develop/typescript/cancellation.mdx

+
+Canceling an Activity from within a Workflow requires that the Activity Execution sends Heartbeats and sets a Heartbeat Timeout.
+If the Heartbeat is not invoked, the Activity cannot receive a cancellation request.
+When any non-immediate Activity is executed, the Activity Execution should send Heartbeats and set a [Heartbeat Timeout](/encyclopedia/detecting-activity-failures#heartbeat-timeout) to ensure that the server knows it is still working.


What is a "non-immediate" Activity? One that is asynchronous? One that doesn't return control and blocks? Long lived? If I were reading this, I'd be confused.

fairlydurable · 2024-09-06T21:06:23Z

docs/develop/typescript/cancellation.mdx

+When any non-immediate Activity is executed, the Activity Execution should send Heartbeats and set a [Heartbeat Timeout](/encyclopedia/detecting-activity-failures#heartbeat-timeout) to ensure that the server knows it is still working.
+
+When an Activity is canceled, an error is raised in the Activity at the next available opportunity.
+If cleanup logic needs to be performed, it can be done in a `finally` clause or inside a caught cancel error.


I'd specify the error if there's a parent cancelation error (not sure there is in typescript) and link to the API docs for it. If this doesn't apply, no worries.

Also, who is catching the error? The Workflow or the Activity if it isn't in the finally clause?

fairlydurable · 2024-09-06T21:09:24Z

docs/develop/typescript/cancellation.mdx

+
+When an Activity is canceled, an error is raised in the Activity at the next available opportunity.
+If cleanup logic needs to be performed, it can be done in a `finally` clause or inside a caught cancel error.
+However, for the Activity to appear canceled the exception needs to be re-raised.


More normally we "re-throw" exceptions.

I checked and there are 3 uses, but I have never heard re-raised outside of baking.

fairlydurable · 2024-09-06T21:13:53Z

docs/develop/typescript/cancellation.mdx

+
+:::note
+
+Unlike regular Activities, [Local Activities](/activities#local-activity) can be canceled if they don't send Heartbeats.


without sending Heartbeats.

Current phrasing makes it sound like it's a precondition rather than a choice.
They're used for short-lived operations and run in the same process as the Workflow that spawns them -- at least I believe so, even though you said Worker process on 400. They have some limitations like no routing but I don't think it helps mentioning them here

fairlydurable · 2024-09-06T21:16:21Z

docs/develop/typescript/cancellation.mdx

+:::note
+
+Unlike regular Activities, [Local Activities](/activities#local-activity) can be canceled if they don't send Heartbeats.
+Local Activities are handled locally, and all the information needed to handle the cancellation logic is available in the same Worker process.


I think s/locally/in the same process as the Worker that spawned them/ would be clearer (assuming I'm accurate, which is always something you should be skeptical about)

docs/develop/typescript/failure-detection.mdx

fairlydurable · 2024-09-06T21:20:40Z

docs/develop/typescript/failure-detection.mdx


 Each Workflow timeout controls the maximum duration of a different aspect of a Workflow Execution.

-Workflow timeouts are set when [starting the Workflow Execution](#workflow-timeouts).
+Workflow Timeouts are set when [starting the Workflow Execution](#workflow-timeouts).


Consider "A Client sets Workflow Timeouts when [...." because it gives this an actor vs passive voice, and also explains how those timeouts are set

Actually, the phrase was incomplete; clarified that this is possible on both the Client and Workflow APIs, which unfortunately makes the active voice less applicable.

fairlydurable · 2024-09-06T21:21:12Z

docs/develop/typescript/failure-detection.mdx


 - **[Workflow Execution Timeout](/encyclopedia/detecting-workflow-failures#workflow-execution-timeout)** - restricts the maximum amount of time that a single Workflow Execution can be executed.
 - **[Workflow Run Timeout](/encyclopedia/detecting-workflow-failures#workflow-run-timeout):** restricts the maximum amount of time that a single Workflow Run can last.
 - **[Workflow Task Timeout](/encyclopedia/detecting-workflow-failures#workflow-task-timeout):** restricts the maximum amount of time that a Worker can execute a Workflow Task.

-Create an instance of `WorkflowOptions` from the Client and set your Workflow Timeout.
+The following properties can be set on the [`WorkflowOptions`](https://typescript.temporal.io/api/interfaces/client.WorkflowOptions/) when starting a Workflow using either the Client or Workflow API:


Technically, isn't the Workflow API still acting as a client? Or am I missing something?

No.

One can't use the client API from a Workflow, as the client API is not replace-aware and doing network requests could introduce non-deterministic behaviors.

Therefore, one has to use the Workflow API, which creates Workflow commands. This goes through a completely different path. And even though some operations are available both through the Client and the Workflow APIs, available options and the call semantic may differ.

Oh, I get it now. I wasn't thinking of this from that angle. I thought you meant Workflow API meaning only "creating a new Workflow Execution", not "Using the Workflow type's API surface to create another Workflow" (Unless I've misunderstood again.)

docs/develop/typescript/failure-detection.mdx

fairlydurable · 2024-09-06T21:25:43Z

docs/develop/typescript/failure-detection.mdx


- **[Schedule-To-Close Timeout](/encyclopedia/detecting-activity-failures#schedule-to-close-timeout):** is the maximum amount of time allowed for the overall [Activity Execution](/activities#activity-execution).
- **[Start-To-Close Timeout](/encyclopedia/detecting-activity-failures#start-to-close-timeout):** is the maximum time allowed for a single [Activity Task Execution](/workers#activity-task-execution).
+- **[Schedule-To-Close Timeout](/encyclopedia/detecting-activity-failures#schedule-to-close-timeout):** is the maximum amount of time allowed for the overall [Activity Execution](/activities#activity-execution);


I think "entire" is better than "overall" because it relates to the duration and is more specific. I'd also use the whole from-to (from scheduling to reaching the Closed state or whatever it actually is). It doesn't take many words and it helps differentiate two things that sound really similar when doing a quick visual scan through docs.

We normally don't use semicolons in lists. We skip punctuation. (I know what you're replacing doesn't skip.)

fairlydurable · 2024-09-06T21:26:37Z

docs/develop/typescript/failure-detection.mdx


 A Retry Policy works in cooperation with the timeouts to provide fine controls to optimize the execution experience.

 Activity Executions are automatically associated with a default [Retry Policy](/encyclopedia/retry-policies) if a custom one is not provided.

-To set Activity Retry Policies in TypeScript, pass [`ActivityOptions.retry`](https://typescript.temporal.io/api/interfaces/common.ActivityOptions#retry) to [`proxyActivities`](https://typescript.temporal.io/api/namespaces/workflow/#proxyactivities).
+To set an Activity's Retry Policy in TypeScript, use the [`ActivityOptions.retry`](https://typescript.temporal.io/api/interfaces/common.ActivityOptions#retry) property when creating the corresponding Activity proxy function using the [`proxyActivities()`](https://typescript.temporal.io/api/namespaces/workflow#proxyactivities) API.


Maybe "assign the" vs "use the"

fairlydurable · 2024-09-06T21:29:01Z

docs/develop/typescript/failure-detection.mdx

-  }
-}
-```
+In addition to obtaining cancellation information, Heartbeats also support detail data that persists on the server for retrieval during Activity retry.


Does persisting on the server mean that it becomes part of the Event History? If so, I think it would be good to clarify that.

And is it cancellation information or cancellation requests?

Does persisting on the server mean that it becomes part of the Event History? If so, I think it would be good to clarify that.

To be exact, it gets persisted to the Workflow's Mutable State, which is a very advanced concept. There's no corresponding encyclopedia entry to which I could link to… I think it would be preferable not to mention that.

fairlydurable · 2024-09-06T21:30:25Z

docs/develop/typescript/failure-detection.mdx

-}
-```
+In addition to obtaining cancellation information, Heartbeats also support detail data that persists on the server for retrieval during Activity retry.
+If an Activity calls `heartbeat({ progress: 456 })` and then fails and gets retried, then on the next attempt, [`activityInfo().heartbeatDetails`](https://typescript.temporal.io/api/interfaces/activity.Info#heartbeatdetails) will contain the object `{ progress: 456 }`, which can be used to allow the activity to efficiently resume its work.


I'd make this more of an example, like an activity that is processing 1000 parts, so when it heartbeats it sends a progress payload, ensuring the activity picks up at the right place when re-tried, even if it is not on the same Worker, or similar.

fairlydurable · 2024-09-06T21:31:14Z

docs/develop/typescript/failure-detection.mdx


 A [Heartbeat Timeout](/encyclopedia/detecting-activity-failures#heartbeat-timeout) works in conjunction with [Activity Heartbeats](/encyclopedia/detecting-activity-failures#activity-heartbeat).

-To set a Heartbeat Timeout, use [`ActivityOptions.heartbeatTimeout`](https://typescript.temporal.io/api/interfaces/common.ActivityOptions#heartbeattimeout). If the Activity takes longer than that between heartbeats, the Activity is failed.
+To set an Activity's Heartbeat Timeout in TypeScript, set the [`ActivityOptions.heartbeatTimeout`](https://typescript.temporal.io/api/interfaces/common.ActivityOptions#heartbeattimeout) property when creating the corresponding Activity proxy functions using the [`proxyActivities()`](https://typescript.temporal.io/api/namespaces/workflow#proxyactivities) API.


If this time is exceeded...

drewhoskins-temporal · 2024-09-06T21:57:55Z

docs/develop/java/failure-detection.mdx

-### Overriding the retry interval with Next Retry Delay {#next-retry-delay}
+## Activity next Retry delay {#activity-next-retry-delay}
+
+**How to override the next Retry delay following an Activity failure using the Temporal Java SDK**


nit: "using the Temporal Java SDK" feels tedious and repetitive. Some of the above ones simply say "In Java" which seems better to me.

But docs team's house practices gets final word here.

I totally agree, though I think there's some SEO argument behind that.

I've been told to adhere to the structure and style of our .net's dev guide, as it is the latest and reflects the current standard that the doc team is trying to follow, hence that phrasing.

drewhoskins-temporal · 2024-09-06T22:05:07Z

docs/develop/typescript/failure-detection.mdx

+**How to override the next Retry delay following an Activity failure using the Temporal TypeScript SDK**
+
+The time to wait after a retryable Activity failure until the next retry is attempted is normally determined by that Activity's Retry Policy.
+However, an Activity may override that duration when explicitly failing with an [`ApplicationFailure`](https://typescript.temporal.io/api/classes/common.ApplicationFailure) by setting a next Retry delay.


I might suggest linking to "Application Failure" and the overview docs instead.

https://docs.temporal.io/references/failures#application-failure

That contains more broadly instructive information than the typescript one, while linking to typescript, and unfortunately AFAICT the typescript docs don't contain an uplink to the reference.

drewhoskins-temporal · 2024-09-06T22:11:39Z

docs/develop/typescript/failure-detection.mdx

+The time to wait after a retryable Activity failure until the next retry is attempted is normally determined by that Activity's Retry Policy.
+However, an Activity may override that duration when explicitly failing with an [`ApplicationFailure`](https://typescript.temporal.io/api/classes/common.ApplicationFailure) by setting a next Retry delay.
+
+To override the next Retry delay for an `ApplicationFailure` thrown by an Activity in TypeScript, provide the [`nextRetryDelay`](https://typescript.temporal.io/api/interfaces/common.ApplicationFailureOptions#nextretrydelay) property on the object argument of the [`ApplicationFailure.create()`](https://typescript.temporal.io/api/classes/common.ApplicationFailure#create) factory method.


"in Typescript" is already established by the heading.
Overall, this paragraph is simply restating what the code already says. I can't find any value in it other than the links, which would be better and more succinctly provided IMO in a "see also" links list.

I also prefer how the java example incorporates attempt count via a more real world example, explaining how to do backoff.

Adds Next Retry Delay for TS SDK

a1cf344

mjameswh requested a review from a team as a code owner September 6, 2024 19:24

Merge branch 'main' into typescript-nextretrydelay

ddd9c46

angelazhou32 reviewed Sep 6, 2024

View reviewed changes

docs/develop/typescript/cancellation.mdx Show resolved Hide resolved

angelazhou32 reviewed Sep 6, 2024

View reviewed changes

docs/develop/typescript/failure-detection.mdx Outdated Show resolved Hide resolved

angelazhou32 reviewed Sep 6, 2024

View reviewed changes

docs/develop/typescript/failure-detection.mdx Outdated Show resolved Hide resolved

Address review comments

57572e3

fairlydurable reviewed Sep 6, 2024

View reviewed changes

docs/develop/java/failure-detection.mdx Outdated Show resolved Hide resolved

fairlydurable reviewed Sep 6, 2024

View reviewed changes

docs/develop/typescript/failure-detection.mdx Outdated Show resolved Hide resolved

fairlydurable reviewed Sep 6, 2024

View reviewed changes

docs/develop/typescript/failure-detection.mdx Show resolved Hide resolved

fairlydurable reviewed Sep 6, 2024

View reviewed changes

docs/develop/typescript/failure-detection.mdx Show resolved Hide resolved

Change capitalization: next Retry delay

cf84889

fairlydurable reviewed Sep 6, 2024

View reviewed changes

Update page description

ad77234

fairlydurable reviewed Sep 6, 2024

View reviewed changes

drewhoskins-temporal reviewed Sep 6, 2024

View reviewed changes

Address multiple review comments

01e68fd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds Next Retry Delay for TS SDK #3069

Adds Next Retry Delay for TS SDK #3069

mjameswh commented Sep 6, 2024

fairlydurable Sep 6, 2024

mjameswh Sep 6, 2024

mjameswh Sep 6, 2024 •

edited

Loading

fairlydurable Sep 6, 2024

fairlydurable Sep 6, 2024

fairlydurable Sep 6, 2024

fairlydurable Sep 6, 2024

fairlydurable Sep 6, 2024

fairlydurable Sep 6, 2024

fairlydurable Sep 6, 2024

fairlydurable Sep 6, 2024

fairlydurable Sep 6, 2024

fairlydurable Sep 6, 2024

mjameswh Sep 6, 2024

fairlydurable Sep 6, 2024

mjameswh Sep 6, 2024

fairlydurable Sep 6, 2024

fairlydurable Sep 6, 2024

fairlydurable Sep 6, 2024

fairlydurable Sep 6, 2024

mjameswh Sep 6, 2024

fairlydurable Sep 6, 2024

fairlydurable Sep 6, 2024

drewhoskins-temporal Sep 6, 2024

mjameswh Sep 6, 2024

drewhoskins-temporal Sep 6, 2024

drewhoskins-temporal Sep 6, 2024


		## Cancel an Activity from a Workflow {#cancel-an-activity}

		Canceling an Activity from within a Workflow requires that the Activity Execution sends Heartbeats and sets a Heartbeat Timeout.


		:::note

		Unlike regular Activities, [Local Activities](/activities#local-activity) can be canceled if they don't send Heartbeats.

Adds Next Retry Delay for TS SDK #3069

Are you sure you want to change the base?

Adds Next Retry Delay for TS SDK #3069

Conversation

mjameswh commented Sep 6, 2024

What does this PR do?

Notes to reviewers

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mjameswh Sep 6, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mjameswh Sep 6, 2024 •

edited

Loading