Skip to content
This repository was archived by the owner on May 11, 2022. It is now read-only.

Prometheus unable to scrape endpoints on cf app with routes associated #2

Closed
aegershman opened this issue Jun 20, 2020 · 7 comments
Closed

Comments

@aegershman
Copy link

aegershman commented Jun 20, 2020

I was inspired by the go-app-with-metrics example and setup running the prometheus-operator colocated with cf-for-k8s and using an application deployed using the same prometheus.io annotations -- which, tangentially, personally I think this is very cool, powerful, and could open the door to other interesting possibilities for plumbing together CF and other systems. anyhow--

When there are no routes associated to the application and the app is just a regular process, prometheus is able to scrape the eirini-built pod against the port /metrics is exposed on:

with-no-route

When there are routes associated to it, prometheus appears unable to scrape the pod, svc, or generally any endpoint associated to it:

with-route-associated

I also haven't had success using different ports e.g. prometheus.io/port: 80:

with-route-on-80

... or using a ServiceMonitor or PodMonitor against the pod's associated svc, etc.

I haven't dug too deep into this, mostly trial and error; I'm assuming it has to do with how routing gets wired up via istio ("something-something proxy?"). Forgive me for not having more definitive explanation, I just wanted to get this issue out there before digging around too much more.

Is this to be expected for the time being? any success with having prometheus scrape pods which have cf routes associated to them? thanks all 👍

EDIT this may have something to do with the pods prometheus leverages to perform scrapes requires istio sidecar injection. maybe. I'll just go poke around and find out.

@hev
Copy link

hev commented Jun 26, 2020

Thanks @aegershman - we have also been looking at this and there appears to be a couple of considerations.

  1. Annotations are not quite working fully declaratively for us using either cf6 or cf7. As a work around we updated some instructions about using cf6 curls but we hope to get this to a fully declarative UX in cf7.
  2. Istio sidecar is getting in the way of our scraping. We are working to figure out how to deploy prometheus to address this.

@aegershman
Copy link
Author

aegershman commented Jun 26, 2020

1.) the annotations for the cf manifest have worked for me as long as I wrap the booleans in quotes, e.g. prometheus.io/scrape: "true"

2.) 👍 I'm let me know if you find any leads.

I originally tried labeling the prometheus-operator's namespace with istio-injection: enabled to have the sidecar proxy injected into the prometheus pods which permits querying. Here's what I'm trying (but haven't had 100% success yet)

as an overlay:

---
apiVersion: v1
kind: Namespace
metadata:
  name: prometheus-operator

#@ load("@ytt:overlay", "overlay")
#@overlay/match by=overlay.and_op(overlay.subset({"metadata":{"name":"prometheus-operator"}}), overlay.subset({"kind": "Namespace"}))
---
metadata:
  labels:
    #@overlay/replace
    istio-injection: enabled

or

kubectl label --overwrite namespace prometheus-operator "istio-injection=enabled"

And as for the prometheus deployment itself, using the prometheus-operator helm chart, a snippet of the values look like this:

prometheusOperator:
  enabled: true
  admissionWebhooks:
    enabled: true
    patch:
      enabled: true
      podAnnotations:
        sidecar.istio.io/inject: "false"
  podAnnotations:
    sidecar.istio.io/rewriteAppHTTPProbers: "true"

prometheus:
  retention: 10m
  prometheusSpec:
    podMetadata:
      annotations:
        sidecar.istio.io/rewriteAppHTTPProbers: "true"

So far this still results in 503's, though, unfortunately. Still poking around, though.

EDIT: I haven't had enormous test success with this so far, it's just what I'm trying out

@heycait
Copy link
Contributor

heycait commented Jun 29, 2020

hey @aegershman what version of CF CLI are you using? Even using quotes in the app manifest, we haven't gotten this working with cf6 or cf7.

we just merged in some changes to master branch with newer instructions including the network policy we need. check it out and let us know if it works for you

@aegershman
Copy link
Author

aegershman commented Jun 30, 2020

hey hey @heycait !

1.) I'm using cf7 (on mac catalina 10.15.5, using zsh):

cf -v
cf version 7.0.1+fb3f929c2.2020-06-24

Here's an example of an app manifest (which uses a docker image on push) with metadata inlined in the manifest:

---
applications:
  - name: hash-browns
    instances: 1
    memory: 256M
    routes:
      - route: hash-browns.apps.vcap.me
    docker:
      image: alexellis2/hashbrowns:1.2.0
    metadata:
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: 8080
        prometheus.io/path: /metrics

and pushed with

cf push -f hash-browns-docker-no-routes.yml

Is it giving you grief with a 500 error? Let me know if I can help with any more information. This could be something peculiar to do with how quotes are being applied or something.

2.) Were you able to successfully scrape pods which have a route associated to them? I'm able to scrape pods which don't have any routes (and thus no k8s svc or virtual service) associated to them, but as soon as a route is applied, I get 503s when trying to scrape the pod itself. I'm assuming this has to do with how istio/envoy proxying gets set up, but I haven't 100% pinned it down yet. Curious, did you all do anything special with the prometheus deployment besides --set server.labels...?

Also, this isn't the end of the world or anything; to be honest this is more of a learning exercise for me, and a razzle-dazzle proof of value I can use to show to my employer 😉 as opposed to a "definitive production strategy for metrics". So if this doesn't quite work out, it's no worries.

Thanks again for your help and time I appreciate it

@heycait
Copy link
Contributor

heycait commented Jun 30, 2020

Interesting! We just used cf push without specifying the file since we named it manifest.yml. We also weren't using a docker image and used the go buildpack instead. We were able to cf push this app using cf7 cli without the manifest so we figured it was the manifest breaking something.

Here's our manifest.

---
applications:
- name: go-app-with-metrics
  buildpacks:
  - https://github.com/cloudfoundry/go-buildpack.git
  health-check-type: process
  instances: 2
  metadata:
    annotations:
      prometheus.io/scrape: "true"
      prometheus.io/port: "2112"
      prometheus.io/path: "/metrics"
  no-route: true

We weren't encountering the 500 issue after we started using the quotes. We were getting this staging failed message:

Staging app and tracing logs...
StagingTimeExpired - Staging time expired: staging failed
FAILED

We did some more testing and it turns out the issue was using buildpacks regardless of using a buildback url or trying to use one locally. Finally got the application manifest to work by removing that section and all the annotations and everything looks good.

I spoke with @hev and decided we'll keep the issue with the routes as a separate investigation. We'll update you when we get there. 😺

@hev
Copy link

hev commented Jul 14, 2020

@aegershman I think we have the instructions updated to add the network policy and cf-for-k8s is going to include the label in the cf-system namespace making this pretty straightforward (see #4 for a simplification). Any concern with me closing this out?

@aegershman
Copy link
Author

So I'm doing this by deploying prometheus-operator to a separate namespace and having it talk to apps running in the cf-workloads namespace, so to be 100% honest the prometheus setup in the examples provided aren't fitting my deployment topology and I haven't been able to scrape workloads with routes associated to them in either case (though I haven't tested again recently), BUT, that's okay, I'll get it figured out and squared away based on the work that's presented here, in the oss slack channels, and in more learnings on istio. Plus ultimately since there's so much in flux at the moment, figuring things out with prometheus to scrape pods/services of apps on cf directly is definitely not a requirement, it's a lot for experimentation and personal understanding on metrics within istio anyway. all good, thanks all 👍

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants