Research option to replace Milliseconds with a Nginx module #5

ottok · 2020-05-14T10:57:51Z

Milliseconds is based on parsing Nginx access logs to print out statistics from them. The main downside in this approach is that the access log must be written before it can be parsed, and the time frame must be decided in advance.

An alternative approach would be to extend some existing Nginx stats module stub to do these same stats on the fly with Nginx internal counters, and server them on-demand from some Nginx stats module endpoint (or dump to file when asked to). This way the stats would be as real-time as we want and without the overhead of dumping lots of lines in a log and then reading it again to compute stats.

We could perhaps fork the module http://nginx.org/en/docs/http/ngx_http_stub_status_module.html and extend it, or research if some similar module for Nginx already exists (such as https://github.com/vozlt/nginx-module-vts or https://github.com/dedok/nginx-stat-module)

ottok · 2020-05-14T11:06:00Z

Stats could be shown by previous minute? And the previous minute value would update once in a minute and that value would be fetched and stored in monitoring (and alterting). Internally it naturally also needs to know about the ongoing minute, but the value is not shown externally until the minute is full and the value is comparable with the previous value.

heikkiorsila · 2020-08-25T10:56:34Z

A note from phone conversation that happened today: Do a time and risk estimation first.

heikkiorsila · 2020-09-07T12:09:55Z

Did some proof-of-concept testing:

Investigate suitability of http://nginx.org/en/docs/http/ngx_http_stub_status_module.html -> not enough functionality.
Investigate suitability of https://github.com/dedok/nginx-stat-module ->
- maybe need nonblocking tcp/socket support for collecting stats.
- found a bug: 0-percentile latency not supported (probably trivial to fix)
- latency percentile computation may need to be improved. it either does percentiles only or the mean, but not both. this should be easy to fix.

All in all, nginx-stat-module seems promising but it needs some modifications for calculating statistics. It exports a simple textual timeseries format:

wordpress,location=nginx,parameter=bytes_sent,interval=10 value=336.947 1599480003
wordpress,location=nginx,parameter=body_bytes_sent,interval=10 value=114.947 1599480003
wordpress,location=nginx,parameter=request_length,interval=10 value=578.000 1599480003
wordpress,location=nginx,parameter=rps,interval=10 value=5.700 1599480003
wordpress,location=nginx,parameter=keepalive_rps,interval=10 value=0.000 1599480003
wordpress,location=nginx,parameter=response_2xx_rps,interval=10 value=5.700 1599480003
wordpress,location=nginx,parameter=response_3xx_rps,interval=10 value=0.000 1599480003
wordpress,location=nginx,parameter=response_4xx_rps,interval=10 value=0.000 1599480003
wordpress,location=nginx,parameter=response_5xx_rps,interval=10 value=0.000 1599480003
wordpress,location=nginx,parameter=request_time,percentile=p1 value=0.000 1599480003
wordpress,location=nginx,parameter=request_time,percentile=p5 value=0.000 1599480003
wordpress,location=nginx,parameter=request_time,percentile=p10 value=0.000 1599480003
wordpress,location=nginx,parameter=request_time,percentile=p50 value=0.010 1599480003
wordpress,location=nginx,parameter=request_time,percentile=p90 value=0.187 1599480003
wordpress,location=nginx,parameter=request_time,percentile=p95 value=0.185 1599480003
wordpress,location=nginx,parameter=request_time,percentile=p99 value=0.185 1599480003
wordpress,location=nginx,parameter=request_time,percentile=p100 value=0.181 1599480003

nginx-stat-module sends these statistics out with udp to a configured server. This server should be on the localhost for maximum reliability. UDP can be a problem when the server is overloaded (monitoring should be the last thing that fails) but perhaps unlikely. Some reliability might be won by remembering a short period of statistics (needs code change) and sending the stats over a local unix/tcp socket (needs a code change).

ottok · 2020-09-07T12:18:18Z

nginx-stat-module sends these statistics out with udp to a configured server

Such an architecture would create excess services for us to maintain. I was under the impression that Nginx has some simple stats module one can simply query using HTTP and dump to a file or something.

ottok · 2020-11-06T21:12:46Z

Status:

We decided to extend the existing vhost traffic status module, code at https://github.com/Seravo/nginx-module-vts
Custom .deb packaging to incorporate the module on top of standard Debian/Ubuntu Nginx at https://gitlab.com/Seravo/nginx
Test build and packages available at https://launchpad.net/~seravo/+archive/ubuntu/wip
The Python code that consumes JSON from nginx-module-vts will be added to this repository (to close this issue)

…ugin Statistics are fetched once per minute from VTS. The system assumes a modified VTS that is hosted at https://github.com/Seravo/nginx-module-vts. vtsaggregator.py produces milliseconds-like output by aggregating statistics from the module. Notes: * vtsaggregator.py ignores 503 status code in 5xx aggregation (a milliseconds policy). Related #5

heikkiorsila self-assigned this May 23, 2020

ottok added the priority label Nov 8, 2020

heikkiorsila mentioned this issue Dec 10, 2020

Real-time monitoring system using Nginx Vhost Traffic Status (VTS) pl… #6

Open

ypcs removed the priority label Apr 11, 2023

ypcs unassigned heikkiorsila Apr 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Research option to replace Milliseconds with a Nginx module #5

Research option to replace Milliseconds with a Nginx module #5

ottok commented May 14, 2020

ottok commented May 14, 2020

heikkiorsila commented Aug 25, 2020

heikkiorsila commented Sep 7, 2020 •

edited by ottok

Loading

ottok commented Sep 7, 2020

ottok commented Nov 6, 2020

Research option to replace Milliseconds with a Nginx module #5

Research option to replace Milliseconds with a Nginx module #5

Comments

ottok commented May 14, 2020

ottok commented May 14, 2020

heikkiorsila commented Aug 25, 2020

heikkiorsila commented Sep 7, 2020 • edited by ottok Loading

ottok commented Sep 7, 2020

ottok commented Nov 6, 2020

heikkiorsila commented Sep 7, 2020 •

edited by ottok

Loading