Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python worker exiting with code 143 #10479

Open
gavin-aguiar opened this issue Sep 17, 2024 · 7 comments
Open

Python worker exiting with code 143 #10479

gavin-aguiar opened this issue Sep 17, 2024 · 7 comments

Comments

@gavin-aguiar
Copy link
Contributor

Description:

We have started seeing a rise in the error message "python exited with code 143".
We see this error when the worker is being deleted. The platform initiates a container delete command. It then sends a sigterm to the worker.
Python worker responds with the 143 which is logged as an error message.

This is the python app logs with the sequence of messages leading to the error log.
image

older logs from around 3 months ago, we see the same behavior but the only difference is the host is not logging any error message.

image

This error is reported for node apps too. I haven't seen a spike in error logs for other runtimes. And also it happens only for consumption. More details are on this teams channel.

Here are some starter kusto queries:
https://wawswus.kusto.windows.net/wawsprod?query=H4sIAAAAAAAEAGWNQQrCMBRE9z3FkFWFLqy6rVAQV0WEeoFP%2bmmiaROSX6vg4a0bXbgbHvNmaudydZxGLdaPqfF9UqvshdlwZJwja5v4YgduhYaAPaj3edn9Kg3f2QFVhQ2%2b8MBC1iUYSlDhKcaP4IcV7jBbMdC%2bY5S7rVqMEP2Vtfx9FahDONHABVo%2fRc2fnAJpXiSh2zKwXmfZG%2f25bsXAAAAA&web=0

https://wawswus.kusto.windows.net/wawsprod?query=H4sIAAAAAAAEAI2OOwvCMBhFd8H%2fEDpVaCVN2jQWK%2fgEwUHUyS3UD62kSUlSRfDH%2bxpEujhdONzDvWMpfW%2fRqMKVWtmVPlqv1%2b3c0fUEBtDaQFFa2JUVbJ2oajTK0UE4cE%2fgE0ziEPOQ8B3GWRJllPVpwtKIpfseEurQ1od%2f6JTzp%2f79sNESlso6oQpAeY68cV2Hacon8ymdh4xyhkkU8TimbDAgNPa%2b6gouIF%2bjyYvVRp%2bhcK1XwacX%2fCwFaKsb886mqoS5BTNwopT2AZyao0UxAQAA&web=0

Investigative information

Please provide the following:

  • Timestamp:
  • Function App version:
  • Function App name:
  • Function name(s) (as appropriate):
  • Invocation ID:
  • Region:

Repro steps

Provide the steps required to reproduce the problem:

Expected behavior

Provide a description of the expected behavior.

Actual behavior

Provide a description of the actual behavior observed.

Known workarounds

Provide a description of any known workarounds.

Related information

Provide any related information

  • Programming language used
  • Links to source
  • Bindings used
@marcingminski
Copy link

Same for us. Lots of "python exited with code 143" errors. This obviously results in lots of 502 errors.

@jviau
Copy link
Contributor

jviau commented Oct 30, 2024

The host terminates the worker process via Process.Kill. I am not sure if that is a SIGTERM or a SIGKILL implementation wise. I imagine it depends on the language worker what exit code they will return when they receive that message.

@gavin-aguiar can you clarify what the ask is here? Is it to just address the fact we are logging an error for a 'normal' exit flow? Or is there real impact to applications?

@marcingminski
Copy link

The host terminates the worker process via Process.Kill. I am not sure if that is a SIGTERM or a SIGKILL implementation wise. I imagine it depends on the language worker what exit code they will return when they receive that message.

@gavin-aguiar can you clarify what the ask is here? Is it to just address the fact we are logging an error for a 'normal' exit flow? Or is there real impact to applications?

Massive impact - the service restarts every few minutes causing few minutes downtime. It started happening randomly in only one region for us (Canada central), not observed in any other region despite the same code and config being deployed.

@jviau
Copy link
Contributor

jviau commented Oct 31, 2024

@marcingminski this sounds like a platform issue. The host most likely only started logging this error, but it was happening all along. We did make a change to log filtering in this area recently to capture logs which were accidentally excluded before.

When creating worker process, the host registers them as child processes with the OS so that any SIGTERM sent to the host will also be sent to the workers.

@RyanSchlenz
Copy link

RyanSchlenz commented Nov 5, 2024

Any updates/workarounds for this? I am running into the same 143 error, which is causing the timer trigger to not execute the python code at all.

@marcingminski
Copy link

Well and just like that, the problem went away on its own…

@RyanSchlenz
Copy link

Mine is now working when I set it to run at specific intervals (1 hours, 2 hours, etc.) however, it doesn't run if I set it to a specific time. Anyone else seeing that issue with the Linux consumption plan?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants