Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

<install> statefulset , FailedScheduling #1657

Open
wiluen opened this issue Dec 10, 2024 · 19 comments
Open

<install> statefulset , FailedScheduling #1657

wiluen opened this issue Dec 10, 2024 · 19 comments

Comments

@wiluen
Copy link

wiluen commented Dec 10, 2024

It it not easy to install robusta for me, when I install robusta using helm, they can not start
alertmanager-robusta-kube-prometheus-st-alertmanager-0 0/2 Pending 0 18s
prometheus-robusta-kube-prometheus-st-prometheus-0 0/2 Pending 0 8s

the Event is:
Warning FailedScheduling 49s default-scheduler 0/6 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/6 nodes are available: 6 Preemption is not helpful for scheduling.

and kubectl get pv shows nothing
what's wrong with it?

Copy link

Hi 👋, thanks for opening an issue! Please note, it may take some time for us to respond, but we'll get back to you as soon as we can!

  • 💬 Slack Community: Join Robusta team and other contributors on Slack here.
  • 📖 Docs: Find our documentation here.
  • 🎥 YouTube Channel: Watch our videos here.

@wiluen
Copy link
Author

wiluen commented Dec 10, 2024

it seems no storageclass and no PV

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
creationTimestamp: "2024-12-01T11:55:07Z"
finalizers:

  • kubernetes.io/pvc-protection
    labels:
    alertmanager: robusta-kube-prometheus-st-alertmanager
    app.kubernetes.io/instance: robusta-kube-prometheus-st-alertmanager
    app.kubernetes.io/managed-by: prometheus-operator
    app.kubernetes.io/name: alertmanager
    name: alertmanager-robusta-kube-prometheus-st-alertmanager-db-alertmanager-robusta-kube-prometheus-st-alertmanager-0
    namespace: default
    resourceVersion: "398335"
    uid: 9c481da6-803e-43cf-8f34-c23137203bd0
    spec:
    accessModes:
  • ReadWriteOnce
    resources:
    requests:
    storage: 10Gi
    volumeMode: Filesystem
    status:
    phase: Pending

@arikalon1
Copy link
Contributor

Hi @wiluen ,

Thanks for reporting this.
Which k8s distribution are you using?
Is it on-prem? public cloud (amazon, google, other)?

This might happen if the cluster doesn't have a storage provisioner (the component responsible to create a PV from the PVC)

How do you typically create persistent volumes ?

Can you share the output of:
kubectl get storageclass ?

@wiluen
Copy link
Author

wiluen commented Dec 10, 2024

thanks for your reply.
my k8s is On-Prem
yes, I don't have storageclass, kubectl get storageclass is nothing.
it is a lab cluster in campus. I just created a PV manually and it can bound PVC.

but another question is prometheus-robusta-kube-prometheus-st-prometheus-db-prometheus-robusta-kube-prometheus-st-prometheus-0 requires 100Gi , but my VMs don't have 100Gi disk, and I also can not edit the field of resources.requests.storage.
How can I do?
spec:
accessModes:

  • ReadWriteOnce
    resources:
    requests:
    storage: 100Gi # I want a small value
    volumeMode: Filesystem

@arikalon1
Copy link
Contributor

Hi @wiluen

you can change the storage size in the generated_values.yaml file:

kube-prometheus-stack:
  prometheus:
    prometheusSpec:
      storageSpec:
        volumeClaimTemplate:
          spec:
            resources:
              requests:
                storage: 10Gi

@wiluen
Copy link
Author

wiluen commented Dec 10, 2024

thanks very much! @arikalon1
there are a still bug about the images of glusterfs

docker pull quay.io/gluster/gluster-centos:latest
latest: Pulling from gluster/gluster-centos
[DEPRECATION NOTICE] Docker Image Format v1 and Docker Image manifest version 2, schema 1 support is disabled by default and will be removed in an upcoming release. Suggest the author of quay.io/gluster/gluster-centos:latest to upgrade the image to the OCI Format or Docker Image manifest v2, schema 2. More information at https://docs.docker.com/go/deprecated-image-specs/

@wiluen
Copy link
Author

wiluen commented Dec 10, 2024

@arikalon1
how to get all of the configurable field in generated_values.yaml

@arikalon1
Copy link
Contributor

Hi @wiluen

Where do we have a reference to gluster-centos in Robusta?
Can you share more details?

Regarding the configuration options, you can see most of it in our defaults values.yaml file

It also has a dependency to the kube-prometheus-stack , you can also configure via the robusta generated_values.yaml file.
The config values of kube-prometheus-stack can be found here

@wiluen
Copy link
Author

wiluen commented Dec 10, 2024

Hi @arikalon1
image

image

actually I dont know what it was, but it appears in my k8s cluster. I thought this was part of the Robusta, and there are also some job when I deploy the Robusta, I dont know what is them, so in my picture, do I miss some important pods?

@arikalon1
Copy link
Contributor

hey @wiluen

The gclusterfs looks like some daemon set, but it's not a part of Robusta
When Robusta starts, it runs an efficiency scan, krr.
You can later see the results in the UI.
It helps right sizing your k8s workloads (setting the correct resources requests and limits)

Looks like your robusta installation is up, and healthy!

@wiluen
Copy link
Author

wiluen commented Dec 14, 2024

hi @arikalon1 there are so many problems, Thank you for your patient answer

I finish the install, it is easy to install if there are no network problem. and use enablePrometheusStack: true
and deploy a crashing pod.
(1)but i cant connect to prometheus
image

(2)I see the logs of Pod prometheus-robusta-kube-prometheus-st-prometheus-0, there are error :
ts=2024-12-14T07:18:39.171Z caller=notifier.go:530 level=error component=notifier alertmanager=http://172.20.245.213:9093/api/v2/alerts count=1 msg="Error sending alert" err="Post "http://172.20.245.213:9093/api/v2/alerts\": context deadline exceeded"
image

(3)besides,i see AI can do summary for logs, but in the UI of holmesgpt , it cant connect to the gpt
image
image
the summary of logs seems right.

@arikalon1
Copy link
Contributor

Hi @wiluen

Do you have network policies in your cluster?
The robusta components need to be able to connect to one another

Can you share the robusta-runner and alert manager logs?
Is the IP prometheus is trying to connect to, really belongs to alert manager?

@wiluen
Copy link
Author

wiluen commented Dec 14, 2024

Hi @arikalon1
image
the log of robusta-runner is:
ERROR Couldn't connect to Prometheus found under http://robusta-kube-prometheus-st-prometheus.default.svc.cluster.local:9090

@arikalon1
Copy link
Contributor

hey @wiluen

Can you share kubectl get pods -o wide so the pod ips are visible?
trying to check if this is indeed the alert manager pod ip

Do you have network policies defined in the cluster?

@wiluen
Copy link
Author

wiluen commented Dec 14, 2024

hi @arikalon1 the results is:
image

I don't think there are any additional network strategies because my cluster is just a simple testing cluster.

@arikalon1
Copy link
Contributor

the ip seems right, but prometheus is not able to connect to alert manager
In addition, looks like robusta-runner is not able to connect to prometheus or holmes

I suspect there's some networks restrictions in the cluster

Can you share:
kubectl get networkpolicies -A ?

@wiluen
Copy link
Author

wiluen commented Dec 14, 2024

nothing here
image

@wiluen
Copy link
Author

wiluen commented Dec 14, 2024

hi @arikalon1
what is the ip and port of alert manager and prometheus and holmes
I'm not sure. If I knew, I might have a way to solve it

@arikalon1
Copy link
Contributor

you can see it on the pods list you shared

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants