Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Research option to replace Milliseconds with a Nginx module #5

Open
ottok opened this issue May 14, 2020 · 5 comments
Open

Research option to replace Milliseconds with a Nginx module #5

ottok opened this issue May 14, 2020 · 5 comments

Comments

@ottok
Copy link
Contributor

ottok commented May 14, 2020

Milliseconds is based on parsing Nginx access logs to print out statistics from them. The main downside in this approach is that the access log must be written before it can be parsed, and the time frame must be decided in advance.

An alternative approach would be to extend some existing Nginx stats module stub to do these same stats on the fly with Nginx internal counters, and server them on-demand from some Nginx stats module endpoint (or dump to file when asked to). This way the stats would be as real-time as we want and without the overhead of dumping lots of lines in a log and then reading it again to compute stats.

We could perhaps fork the module http://nginx.org/en/docs/http/ngx_http_stub_status_module.html and extend it, or research if some similar module for Nginx already exists (such as https://github.com/vozlt/nginx-module-vts or https://github.com/dedok/nginx-stat-module)

@ottok
Copy link
Contributor Author

ottok commented May 14, 2020

Stats could be shown by previous minute? And the previous minute value would update once in a minute and that value would be fetched and stored in monitoring (and alterting). Internally it naturally also needs to know about the ongoing minute, but the value is not shown externally until the minute is full and the value is comparable with the previous value.

@heikkiorsila heikkiorsila self-assigned this May 23, 2020
@heikkiorsila
Copy link
Contributor

A note from phone conversation that happened today: Do a time and risk estimation first.

@heikkiorsila
Copy link
Contributor

heikkiorsila commented Sep 7, 2020

Did some proof-of-concept testing:

All in all, nginx-stat-module seems promising but it needs some modifications for calculating statistics. It exports a simple textual timeseries format:

wordpress,location=nginx,parameter=bytes_sent,interval=10 value=336.947 1599480003
wordpress,location=nginx,parameter=body_bytes_sent,interval=10 value=114.947 1599480003
wordpress,location=nginx,parameter=request_length,interval=10 value=578.000 1599480003
wordpress,location=nginx,parameter=rps,interval=10 value=5.700 1599480003
wordpress,location=nginx,parameter=keepalive_rps,interval=10 value=0.000 1599480003
wordpress,location=nginx,parameter=response_2xx_rps,interval=10 value=5.700 1599480003
wordpress,location=nginx,parameter=response_3xx_rps,interval=10 value=0.000 1599480003
wordpress,location=nginx,parameter=response_4xx_rps,interval=10 value=0.000 1599480003
wordpress,location=nginx,parameter=response_5xx_rps,interval=10 value=0.000 1599480003
wordpress,location=nginx,parameter=request_time,percentile=p1 value=0.000 1599480003
wordpress,location=nginx,parameter=request_time,percentile=p5 value=0.000 1599480003
wordpress,location=nginx,parameter=request_time,percentile=p10 value=0.000 1599480003
wordpress,location=nginx,parameter=request_time,percentile=p50 value=0.010 1599480003
wordpress,location=nginx,parameter=request_time,percentile=p90 value=0.187 1599480003
wordpress,location=nginx,parameter=request_time,percentile=p95 value=0.185 1599480003
wordpress,location=nginx,parameter=request_time,percentile=p99 value=0.185 1599480003
wordpress,location=nginx,parameter=request_time,percentile=p100 value=0.181 1599480003

nginx-stat-module sends these statistics out with udp to a configured server. This server should be on the localhost for maximum reliability. UDP can be a problem when the server is overloaded (monitoring should be the last thing that fails) but perhaps unlikely. Some reliability might be won by remembering a short period of statistics (needs code change) and sending the stats over a local unix/tcp socket (needs a code change).

@ottok
Copy link
Contributor Author

ottok commented Sep 7, 2020

nginx-stat-module sends these statistics out with udp to a configured server

Such an architecture would create excess services for us to maintain. I was under the impression that Nginx has some simple stats module one can simply query using HTTP and dump to a file or something.

@ottok
Copy link
Contributor Author

ottok commented Nov 6, 2020

Status:

@ottok ottok added the priority label Nov 8, 2020
heikkiorsila added a commit that referenced this issue Dec 10, 2020
…ugin

Statistics are fetched once per minute from VTS. The system assumes a modified
VTS that is hosted at https://github.com/Seravo/nginx-module-vts.

vtsaggregator.py produces milliseconds-like output by aggregating statistics
from the module.

Notes:

* vtsaggregator.py ignores 503 status code in 5xx aggregation (a milliseconds
  policy).

Related #5
heikkiorsila added a commit that referenced this issue Dec 11, 2020
…ugin

Statistics are fetched once per minute from VTS. The system assumes a modified
VTS that is hosted at https://github.com/Seravo/nginx-module-vts.

vtsaggregator.py produces milliseconds-like output by aggregating statistics
from the module.

Notes:

* vtsaggregator.py ignores 503 status code in 5xx aggregation (a milliseconds
  policy).

Related #5
heikkiorsila added a commit that referenced this issue Dec 17, 2020
…ugin

Statistics are fetched once per minute from VTS. The system assumes a modified
VTS that is hosted at https://github.com/Seravo/nginx-module-vts.

vtsaggregator.py produces milliseconds-like output by aggregating statistics
from the module.

Notes:

* vtsaggregator.py ignores 503 status code in 5xx aggregation (a milliseconds
  policy).

Related #5
@ypcs ypcs removed the priority label Apr 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants