Prometheus unable to scrape endpoints on cf app with routes associated #2

aegershman · 2020-06-20T20:51:46Z

I was inspired by the go-app-with-metrics example and setup running the prometheus-operator colocated with cf-for-k8s and using an application deployed using the same prometheus.io annotations -- which, tangentially, personally I think this is very cool, powerful, and could open the door to other interesting possibilities for plumbing together CF and other systems. anyhow--

When there are no routes associated to the application and the app is just a regular process, prometheus is able to scrape the eirini-built pod against the port /metrics is exposed on:

When there are routes associated to it, prometheus appears unable to scrape the pod, svc, or generally any endpoint associated to it:

I also haven't had success using different ports e.g. prometheus.io/port: 80:

... or using a ServiceMonitor or PodMonitor against the pod's associated svc, etc.

I haven't dug too deep into this, mostly trial and error; I'm assuming it has to do with how routing gets wired up via istio ("something-something proxy?"). Forgive me for not having more definitive explanation, I just wanted to get this issue out there before digging around too much more.

Is this to be expected for the time being? any success with having prometheus scrape pods which have cf routes associated to them? thanks all 👍

EDIT this may have something to do with the pods prometheus leverages to perform scrapes requires istio sidecar injection. maybe. I'll just go poke around and find out.

The text was updated successfully, but these errors were encountered:

hev · 2020-06-26T16:22:03Z

Thanks @aegershman - we have also been looking at this and there appears to be a couple of considerations.

Annotations are not quite working fully declaratively for us using either cf6 or cf7. As a work around we updated some instructions about using cf6 curls but we hope to get this to a fully declarative UX in cf7.
Istio sidecar is getting in the way of our scraping. We are working to figure out how to deploy prometheus to address this.

aegershman · 2020-06-26T17:10:44Z

1.) the annotations for the cf manifest have worked for me as long as I wrap the booleans in quotes, e.g. prometheus.io/scrape: "true"

2.) 👍 I'm let me know if you find any leads.

I originally tried labeling the prometheus-operator's namespace with istio-injection: enabled to have the sidecar proxy injected into the prometheus pods which permits querying. Here's what I'm trying (but haven't had 100% success yet)

as an overlay:

---
apiVersion: v1
kind: Namespace
metadata:
  name: prometheus-operator

#@ load("@ytt:overlay", "overlay")
#@overlay/match by=overlay.and_op(overlay.subset({"metadata":{"name":"prometheus-operator"}}), overlay.subset({"kind": "Namespace"}))
---
metadata:
  labels:
    #@overlay/replace
    istio-injection: enabled

or

kubectl label --overwrite namespace prometheus-operator "istio-injection=enabled"

And as for the prometheus deployment itself, using the prometheus-operator helm chart, a snippet of the values look like this:

prometheusOperator:
  enabled: true
  admissionWebhooks:
    enabled: true
    patch:
      enabled: true
      podAnnotations:
        sidecar.istio.io/inject: "false"
  podAnnotations:
    sidecar.istio.io/rewriteAppHTTPProbers: "true"

prometheus:
  retention: 10m
  prometheusSpec:
    podMetadata:
      annotations:
        sidecar.istio.io/rewriteAppHTTPProbers: "true"

So far this still results in 503's, though, unfortunately. Still poking around, though.

EDIT: I haven't had enormous test success with this so far, it's just what I'm trying out

heycait · 2020-06-29T22:07:48Z

hey @aegershman what version of CF CLI are you using? Even using quotes in the app manifest, we haven't gotten this working with cf6 or cf7.

we just merged in some changes to master branch with newer instructions including the network policy we need. check it out and let us know if it works for you

aegershman · 2020-06-30T15:59:26Z

hey hey @heycait !

1.) I'm using cf7 (on mac catalina 10.15.5, using zsh):

cf -v
cf version 7.0.1+fb3f929c2.2020-06-24

Here's an example of an app manifest (which uses a docker image on push) with metadata inlined in the manifest:

---
applications:
  - name: hash-browns
    instances: 1
    memory: 256M
    routes:
      - route: hash-browns.apps.vcap.me
    docker:
      image: alexellis2/hashbrowns:1.2.0
    metadata:
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: 8080
        prometheus.io/path: /metrics

and pushed with

cf push -f hash-browns-docker-no-routes.yml

Is it giving you grief with a 500 error? Let me know if I can help with any more information. This could be something peculiar to do with how quotes are being applied or something.

2.) Were you able to successfully scrape pods which have a route associated to them? I'm able to scrape pods which don't have any routes (and thus no k8s svc or virtual service) associated to them, but as soon as a route is applied, I get 503s when trying to scrape the pod itself. I'm assuming this has to do with how istio/envoy proxying gets set up, but I haven't 100% pinned it down yet. Curious, did you all do anything special with the prometheus deployment besides --set server.labels...?

Also, this isn't the end of the world or anything; to be honest this is more of a learning exercise for me, and a razzle-dazzle proof of value I can use to show to my employer 😉 as opposed to a "definitive production strategy for metrics". So if this doesn't quite work out, it's no worries.

Thanks again for your help and time I appreciate it

heycait · 2020-06-30T18:22:26Z

Interesting! We just used cf push without specifying the file since we named it manifest.yml. We also weren't using a docker image and used the go buildpack instead. We were able to cf push this app using cf7 cli without the manifest so we figured it was the manifest breaking something.

Here's our manifest.

---
applications:
- name: go-app-with-metrics
  buildpacks:
  - https://github.com/cloudfoundry/go-buildpack.git
  health-check-type: process
  instances: 2
  metadata:
    annotations:
      prometheus.io/scrape: "true"
      prometheus.io/port: "2112"
      prometheus.io/path: "/metrics"
  no-route: true

We weren't encountering the 500 issue after we started using the quotes. We were getting this staging failed message:

Staging app and tracing logs...
StagingTimeExpired - Staging time expired: staging failed
FAILED

We did some more testing and it turns out the issue was using buildpacks regardless of using a buildback url or trying to use one locally. Finally got the application manifest to work by removing that section and all the annotations and everything looks good.

I spoke with @hev and decided we'll keep the issue with the routes as a separate investigation. We'll update you when we get there. 😺

hev · 2020-07-14T22:30:42Z

@aegershman I think we have the instructions updated to add the network policy and cf-for-k8s is going to include the label in the cf-system namespace making this pretty straightforward (see #4 for a simplification). Any concern with me closing this out?

aegershman · 2020-07-15T14:53:20Z

So I'm doing this by deploying prometheus-operator to a separate namespace and having it talk to apps running in the cf-workloads namespace, so to be 100% honest the prometheus setup in the examples provided aren't fitting my deployment topology and I haven't been able to scrape workloads with routes associated to them in either case (though I haven't tested again recently), BUT, that's okay, I'll get it figured out and squared away based on the work that's presented here, in the oss slack channels, and in more learnings on istio. Plus ultimately since there's so much in flux at the moment, figuring things out with prometheus to scrape pods/services of apps on cf directly is definitely not a requirement, it's a lot for experimentation and personal understanding on metrics within istio anyway. all good, thanks all 👍

aegershman closed this as completed Jul 15, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prometheus unable to scrape endpoints on cf app with routes associated #2

Prometheus unable to scrape endpoints on cf app with routes associated #2

aegershman commented Jun 20, 2020 •

edited

Loading

hev commented Jun 26, 2020

aegershman commented Jun 26, 2020 •

edited

Loading

heycait commented Jun 29, 2020

aegershman commented Jun 30, 2020 •

edited

Loading

heycait commented Jun 30, 2020

hev commented Jul 14, 2020

aegershman commented Jul 15, 2020

Prometheus unable to scrape endpoints on cf app with routes associated #2

Prometheus unable to scrape endpoints on cf app with routes associated #2

Comments

aegershman commented Jun 20, 2020 • edited Loading

hev commented Jun 26, 2020

aegershman commented Jun 26, 2020 • edited Loading

heycait commented Jun 29, 2020

aegershman commented Jun 30, 2020 • edited Loading

heycait commented Jun 30, 2020

hev commented Jul 14, 2020

aegershman commented Jul 15, 2020

aegershman commented Jun 20, 2020 •

edited

Loading

aegershman commented Jun 26, 2020 •

edited

Loading

aegershman commented Jun 30, 2020 •

edited

Loading