Processes stats v2 and new Hermes API concept #3180

jeanleonov · 2019-09-30T22:11:02Z

This branch was inspired by ELK and how processes stats can be analyzed there (and other types of stats actually). It doesn't modify existing functionality, but extends it with a new v2 API which currently supports only processes stats.

I'd highlight two aspects of changes:

Processes stats v2 structure provides more easy-to-use properties comparing to v1 (which is not removed btw).
New API v2 concept.

1. Processes stats v2 vs v1

Instead of 3 categorization fields monit_name, unified_service_name and application_id, in version two there are two repeated fields own_tags and all_tags, which suppose to get values like [appscale, solr] or [appscale, searchservice] or [<rabbitmq-childprocess-name>, appscale, rabbitmq] etc.
own_tags holds only explicitly assigned tags like [appscale, datastore] or a single [<process_name>]. all_tags contains own + ancestors' tags.
Such tagging allows flexible and powerful processes filtering in ELK.

Additionally to cumulative CPU and IO counters, v2 brings *_1h_diff fields which provides estimated hourly diff since previous measurement. So it suppose that a single client will request processes stats regularly. Otherwise estimated diffs might be inaccurate.

2. API v2 concept

Objects provided by API v2 suppose to be flat which matches analytic needs (like ELK) much better.
It doesn't support include lists. It should simplify caching on proxy side and simplify collecting stats from multiple nodes without need to parse and serialize JSON (just joining responses).

jeanleonov · 2019-10-04T15:29:57Z

Recommended to go after #3165 and it may have a conflict with it in solr management command related to sudo removal.

jeanleonov added 7 commits September 30, 2019 16:34

Removed unused helpers. Added subprocess helper.

40e5bd9

New monitoring resource concept and first process resource

be66198

Added v2 API routes to process resource

eec6527

Fix Subprocess failure handling

b76e9e1

Estimate 1h diff for cumulative parameters

e895b40

Update unit tests

9e2d888

Address merge issue related to moved file

cb50004

This comment has been minimized.

Sign in to view

jeanleonov added 2 commits October 2, 2019 16:00

Track previous processes state using long_pid

53d8b92

Improve systemd services recognition

377a7a3

jeanleonov force-pushed the processes-stats-v2 branch from b78cd14 to 377a7a3 Compare October 4, 2019 13:24

jeanleonov marked this pull request as ready for review October 4, 2019 15:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Processes stats v2 and new Hermes API concept #3180

Processes stats v2 and new Hermes API concept #3180

jeanleonov commented Sep 30, 2019 •

edited

Loading

This comment has been minimized.

jeanleonov commented Oct 4, 2019

Processes stats v2 and new Hermes API concept #3180

Are you sure you want to change the base?

Processes stats v2 and new Hermes API concept #3180

Conversation

jeanleonov commented Sep 30, 2019 • edited Loading

1. Processes stats v2 vs v1

2. API v2 concept

This comment has been minimized.

jeanleonov commented Oct 4, 2019

jeanleonov commented Sep 30, 2019 •

edited

Loading