Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Java based durable Functions are failing randomly #2946

Open
nayansc568 opened this issue Oct 18, 2024 · 2 comments
Open

Java based durable Functions are failing randomly #2946

nayansc568 opened this issue Oct 18, 2024 · 2 comments
Labels
out-of-proc Impacts non-.NET languages (e.g. JavaScript, Python, or PowerShell) which execute out-of-process P1 Priority 1

Comments

@nayansc568
Copy link

nayansc568 commented Oct 18, 2024

Environment details:

  • Service: Azure Function Apps
  • Language: Java 17
  • OS: Linux
  • Runtime version (current): 4.1036.2.2

We have an Orchestrator Durable function and this we invoke through HTTP. It further invokes the Activity Function.

This Activity Function does some processing logic and return the response back, which is then returned back from the Orchestrator Durable Function.

We are using Java based Durable Functions. The issue occurs in the 1st called Durable function. It is not able to invoke the next steps and we get the below error :

a0998fba-a586-4728-9aa3-8f1c34ce851e: Function '(Orchestrator)' failed with an error. Reason: DurableTask.Core.Exceptions.OrchestrationFailureException
   at Microsoft.Azure.WebJobs.Extensions.DurableTask.OutOfProcMiddleware.<>c__DisplayClass10_0.<<CallOrchestratorAsync>b__0>d.MoveNext() in D:\a\_work\1\s\src\WebJobs.Extensions.DurableTask\OutOfProcMiddleware.cs:line 145
--- End of stack trace from previous location ---
   at Microsoft.Azure.WebJobs.Host.Executors.TriggeredFunctionExecutor`1.<>c__DisplayClass7_0.<<TryExecuteAsync>b__0>d.MoveNext() in D:\a\_work\1\s\src\Microsoft.Azure.WebJobs.Host\Executors\TriggeredFunctionExecutor.cs:line 51
--- End of stack trace from previous location ---
   at Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor.InvokeWithTimeoutAsync(IFunctionInvoker invoker, ParameterHelper parameterHelper, CancellationTokenSource timeoutTokenSource, CancellationTokenSource functionCancellationTokenSource, Boolean throwOnTimeout, TimeSpan timerInterval, IFunctionInstance instance) in D:\a\_work\1\s\src\Microsoft.Azure.WebJobs.Host\Executors\FunctionExecutor.cs:line 581
   at Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor.ExecuteWithWatchersAsync(IFunctionInstanceEx instance, ParameterHelper parameterHelper, ILogger logger, CancellationTokenSource functionCancellationTokenSource) in D:\a\_work\1\s\src\Microsoft.Azure.WebJobs.Host\Executors\FunctionExecutor.cs:line 527
   at Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor.ExecuteWithLoggingAsync(IFunctionInstanceEx instance, FunctionStartedMessage message, FunctionInstanceLogEntry instanceLogEntry, ParameterHelper parameterHelper, ILogger logger, CancellationToken cancellationToken) in D:\a\_work\1\s\src\Microsoft.Azure.WebJobs.Host\Executors\FunctionExecutor.cs:line 306. IsReplay: False. State: Failed. RuntimeStatus: Failed. ExtensionVersion: 2.13.5. SequenceNumber: 17. TaskEventId: -1
2024-10-17T08:20:17Z   [Information]   a0998fba-a586-4728-9aa3-8f1c34ce851e: Orchestration awaited and scheduled 1 durable operation(s).
2024-10-17T08:20:17Z   [Information]   a0998fba-a586-4728-9aa3-8f1c34ce851e: Orchestration completed with a 'Failed' status and 0 bytes o

Below is the screenshot which shows the exception:

Image

The same durable function was working fine till 8-OCT-2024. We suspect that after this release of Azure Function's host, the issue has started, as it has changes related to middleware:

Image

@AnatoliB AnatoliB added P2 Priority 2 P1 Priority 1 and removed Needs: Triage 🔍 P2 Priority 2 labels Oct 18, 2024
@cgillum
Copy link
Member

cgillum commented Oct 22, 2024

@nayansc568 can you provide us with a small reproducer project which demonstrates these failures?

Also, you mentioned that the failures are random. Can you expand on this a bit? For example, what percentage of the time does the function work and what percentage does it fail (roughly)?

@lilyjma lilyjma added the out-of-proc Impacts non-.NET languages (e.g. JavaScript, Python, or PowerShell) which execute out-of-process label Nov 25, 2024
@nayansc568
Copy link
Author

@cgillum: Sorry for the delayed response.

Starting from 8th October 2024, we've started observing - OutOfProcMiddleware error on 1 of our durable function and then on 31 October we downgraded the function runtime version to 4.34.1.22669 and everything started working fine and in parallel we were working with Azure support to check the cause of the issue. Before around 1 week, we've upgraded function runtime back to latest version (4.1036.2.2) and now the issue is no longer reproducible.

Nature of the OutOfProcMiddleware error:

  • 1-2 API calls works, then rest of API calls to the durable function fails for some time and then 1-2 API calls work and so on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
out-of-proc Impacts non-.NET languages (e.g. JavaScript, Python, or PowerShell) which execute out-of-process P1 Priority 1
Projects
None yet
Development

No branches or pull requests

4 participants