Skip to content

Transient issues spinning up servers related to pod reflector errors #1103

Closed
@sgibson91

Description

@sgibson91

Context

We often get support tickets along the lines of "can't spin up a user server" that appear to be transient in nature. Upon inspection of the logs, we see pod reflector errors which can be caused either by a k8s master API outage, or a race condition in the hub.

I have opened an issue in the jupyterhub/grafana-dashboards repo to ask for the k8s master API stats to be included since this will help us debug these types of issue: jupyterhub/grafana-dashboards#34

Deleting the hub pod and allowing it to be recreated should help things if it was a race condition in the hub too.

Specifically, the pilot-hubs cluster is zonal, not regional, which means it's k8s master API is not highly available and is therefore more prone to these issues. See #1102

Actions and updates

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions