Content Node Always Down #32432

aryamanvinchhi · 2024-09-19T13:15:09Z

I have a content node that is constantly down (it keeps restarting every 30 min or so). The logs look mostly fine, but I did note this message.

Steps to reproduce:
Nothing specific here, I created a cluster, ingested documents and now I find 1 node is struggling.

Any ideas on how to debug or proceed here? I also tried replacing the node (no data loss since the data is persisted on a mount) but the problem still exists.

"terminate called after throwing as instance of search::chunkException
terminate called recursively
incremented restart penalty to 14 seconds"

aryamanvinchhi · 2024-09-19T13:22:00Z

Version 8.270.8

aryamanvinchhi · 2024-09-19T15:42:36Z

Quick correction - the pod itself does not restart but it is the vespa-proton indexing service that keeps starting again and again. From what I understand, this is actually not an issue but expected behavior.

I tried stopping and starting services again, but the node continues to show a "Connection reset" error on the cluster controller page. The restart penalty is up to 1800 seconds now.

bratseth · 2024-09-20T13:57:16Z

The document store data is corrupt for some reason (corruption, incomplete write, bug). We would be interested in looking at it, but I think that will be hard for non-technical reasons, and you are also on a quite old version.

Unless you have configured redundancy 1 the data will already be restored in secondary copies on the other nodes so you can get out of this situation by deleting the data of this node.

geirst · 2024-09-25T12:39:28Z

In Vespa 8.413.11 we have extended the chunk exception with more details (#32452) that will be logged if something similar happens again.

Please upgrade to the newest version and report back.

aryamanvinchhi · 2024-09-26T13:21:24Z

Sounds great, thank you!

kkraune closed this as completed Oct 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Content Node Always Down #32432

Content Node Always Down #32432

aryamanvinchhi commented Sep 19, 2024

aryamanvinchhi commented Sep 19, 2024

aryamanvinchhi commented Sep 19, 2024

bratseth commented Sep 20, 2024

geirst commented Sep 25, 2024

aryamanvinchhi commented Sep 26, 2024

Content Node Always Down #32432

Content Node Always Down #32432

Comments

aryamanvinchhi commented Sep 19, 2024

aryamanvinchhi commented Sep 19, 2024

aryamanvinchhi commented Sep 19, 2024

bratseth commented Sep 20, 2024

geirst commented Sep 25, 2024

aryamanvinchhi commented Sep 26, 2024