Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to verify that the promgen has been deployed correctly #266

Open
zhenghanyin opened this issue Mar 31, 2020 · 15 comments
Open

How to verify that the promgen has been deployed correctly #266

zhenghanyin opened this issue Mar 31, 2020 · 15 comments
Assignees
Labels

Comments

@zhenghanyin
Copy link

Hi,Paul
I follow the steps and register a new rule,but I don't konw how my new rule works in prometheus server.There are no alerts related to this rule.In addition, the configuration file promgen.rule.yml does not write relevant contents.
So I would like to ask you if there is a problem with configuration or other problem.Here are my configuration.

promgen.yml

prometheus:
  url: http://192.168.5.93:9090/
  promtool: /usr/local/bin/promtool
  rules: /etc/prometheus/promgen.rule.yml
  blackbox: /etc/prometheus/blackbox.json
  targets: /etc/prometheus/promgen.json

alertmanager:
  url: http://192.168.5.93:9093
  blacklist:
    severity: ["debug", "blackhole"]

promgen.notification.email:
  sender: [email protected]
promgen.notification.ikasan:
  server: http://ikasan.example
promgen.notification.linenotify:
  server: https://notify.example      

prometheus.yml

- job_name: 'prometheus'
   static_configs:
   - targets: ['192.168.5.93:9090']
 - job_name: 'consul'
   consul_sd_configs:
     - server: '192.168.5.93:8500'
       services: []
   relabel_configs:
     - source_labels: [__meta_consul_tags]
       regex: .*dev.*
       action: keep
 - job_name: 'promgen'
   file_sd_configs:
   - files:
     - "/etc/prometheus/promgen.json"
 - job_name: 'blackbox'
   metrics_path: /probe
   params:
   file_sd_configs:
   - files:
     - "/etc/prometheus/blackbox.json"
   relabel_configs:
     - source_labels: [__address__]
       regex: (.*)(:80)?
       target_label: __param_target
       replacement: ${1}
     - source_labels: [__param_target]
       regex: (.*)
       target_label: instance
       replacement: ${1}
     - source_labels: []
       regex: .*
       target_label: __address__
       replacement: 192.168.5.93:9115  # Blackbox exporter.

promgen.json

[
  { 
    "labels": {
        "__farm_source": "promgen",
        "__metrics_path__": "/metrics",
        "__shard": "Default",
        "farm": "hosts",
        "job": "node-exporter",
        "project": "test-project",
        "service": "test-service"
    },
    "targets": [ 
        "192.168.1.7:9100",
        "192.168.5.93:9100"
    ]
  }
]
@kfdm
Copy link
Collaborator

kfdm commented Mar 31, 2020

Looking at your promgen.yml configuration, I see you use /etc/prometheus/promgen.rule.yml for your rule path. You should be able to check to see that it exists on your target Prometheus server.

You don't have it written in your prometheus.yml snippet, but you should also have a rule section that looks like

rule_files:
   - /etc/prometheus/promgen.rule.yml

@kfdm kfdm self-assigned this Mar 31, 2020
@kfdm kfdm added the question label Mar 31, 2020
@zhenghanyin
Copy link
Author

promgen.rule.yml

groups:
- name: hostStatsAlert
  rules:
  - alert: hostCpuUsageAlert
    expr: node_load1  > 0.01
    for: 1m
    labels:
      severity: page
    annotations:
      summary: "Instance {{ $labels.instance }} CPU usgae high"
      description: "{{ $labels.instance }} CPU usage above 85% (current value: {{ $value }})"
  - alert: hostMemUsageAlert
    expr: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes)/node_memory_MemTotal_bytes > 0.85
    for: 1m
    labels:
      severity: page
    annotations:
      summary: "Instance {{ $labels.instance }} MEM usgae high"
      description: "{{ $labels.instance }} MEM usage above 85% (current value: {{ $value }})"

This file exists.Now the contents of this file are written by myself.

@zhenghanyin
Copy link
Author

image

@kfdm
Copy link
Collaborator

kfdm commented Mar 31, 2020

And to confirm, you also have a rule_files section?

# In the case someone is running Prometheus 1.x, then the .yml extension should
# be dropped
rule_files:
- "/etc/prometheus/promgen.rule.yml"

@kfdm
Copy link
Collaborator

kfdm commented Mar 31, 2020

One more thing I would check, is that the promgen worker has permission to write the files. For example, if you created promgen.rule.yml as root, but then are running promgen as a non-root user, it would not be able to write (you should generally rune Promgen and Prometheus as non-root users)

@zhenghanyin
Copy link
Author

Sorry,the full contents of the prometheus.yml file are as follows.

# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).
  
  external_labels:
    cluster_name: 'promgen'

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets: ['192.168.5.93:9093']
      # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  - /etc/prometheus/promgen.rule.yml
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
    - targets: ['192.168.5.93:9090']
  - job_name: 'consul'
    consul_sd_configs:
      - server: '192.168.5.93:8500'
        services: []
    relabel_configs:
      - source_labels: [__meta_consul_tags]
        regex: .*dev.*
        action: keep
  - job_name: 'promgen'
    file_sd_configs:
    - files:
      - "/etc/prometheus/promgen.json"
  - job_name: 'blackbox'
    metrics_path: /probe
    params:
    file_sd_configs:
    - files:
      - "/etc/prometheus/blackbox.json"
    relabel_configs:
      - source_labels: [__address__]
        regex: (.*)(:80)?
        target_label: __param_target
        replacement: ${1}
      - source_labels: [__param_target]
        regex: (.*)
        target_label: instance
        replacement: ${1}
      - source_labels: []
        regex: .*
        target_label: __address__
        replacement: 192.168.5.93:9115  # Blackbox exporter.

@zhenghanyin
Copy link
Author

I created promgen.rule.yml as root and run Prometheus and Promgen as root too.I would like to ask you if Prometheus and Promgen must be run by non-root users?

@kfdm
Copy link
Collaborator

kfdm commented Apr 1, 2020

There is no requirement for either to run as root. Typically it is better to run them as a non-root user. So for example you might create a new prometheus user on your system, and have both Promgen and Prometheus running as the prometheus user

@zhenghanyin
Copy link
Author

zhenghanyin commented Apr 1, 2020

okay,but the question I raised at the beginning still exists.In addition,another problem I found was that when I registerd a new rule, I clicked the test button and reported an error. The error information is as follows.

2020-04-01 01:18:31,246 ERROR Internal Server Error: /rule/0/test
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/django/core/handlers/exception.py", line 34, in inner
    response = get_response(request)
  File "/usr/local/lib/python3.6/site-packages/django/core/handlers/base.py", line 115, in _get_response
    response = self.process_exception_by_middleware(e, request)
  File "/usr/local/lib/python3.6/site-packages/django/core/handlers/base.py", line 113, in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "/usr/local/lib/python3.6/site-packages/django/views/decorators/csrf.py", line 54, in wrapped_view
    return view_func(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/django/views/generic/base.py", line 71, in view
    return self.dispatch(request, *args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/django/contrib/auth/mixins.py", line 52, in dispatch
    return super().dispatch(request, *args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/django/views/generic/base.py", line 97, in dispatch
    return handler(request, *args, **kwargs)
  File "/usr/src/app/promgen/views.py", line 1293, in post
    result = util.get(url, {'query': query}).json()
  File "/usr/local/lib/python3.6/site-packages/requests/models.py", line 897, in json
    return complexjson.loads(self.text, **kwargs)
  File "/usr/local/lib/python3.6/json/__init__.py", line 354, in loads
    return _default_decoder.decode(s)
  File "/usr/local/lib/python3.6/json/decoder.py", line 339, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/local/lib/python3.6/json/decoder.py", line 357, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

(edited to add formatting)

@kfdm
Copy link
Collaborator

kfdm commented Apr 1, 2020

Do you have a shard and prometheus servers registered via the /admin/ page ?

@zhenghanyin
Copy link
Author

image

@kfdm
Copy link
Collaborator

kfdm commented Apr 1, 2020

The URL for your cluster needs to be a real URL that Promgen can query.

I think I also need to update the RuleTest to make the error more obvious

@zhenghanyin
Copy link
Author

image
I changed the URL as shown in illustration,but the problem still hasn't been solved.

@kfdm
Copy link
Collaborator

kfdm commented Apr 3, 2020

Can you also check /admin/sites/site/
Ensure the promgen domain is the same as being served
Going to work on two patches to help make this more obvious.

@liuzh-sa
Copy link

image
I changed the URL as shown in illustration,but the problem still hasn't been solved.

Hi, have you solved this problem? I encountered the same problem as you, how did you solve it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants