Processes stats v2 and new Hermes API concept #3180
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This branch was inspired by ELK and how processes stats can be analyzed there (and other types of stats actually). It doesn't modify existing functionality, but extends it with a new v2 API which currently supports only processes stats.
I'd highlight two aspects of changes:
1. Processes stats v2 vs v1
Instead of 3 categorization fields
monit_name
,unified_service_name
andapplication_id
, in version two there are two repeated fieldsown_tags
andall_tags
, which suppose to get values like[appscale, solr]
or[appscale, searchservice]
or[<rabbitmq-childprocess-name>, appscale, rabbitmq]
etc.own_tags
holds only explicitly assigned tags like[appscale, datastore]
or a single[<process_name>]
.all_tags
contains own + ancestors' tags.Such tagging allows flexible and powerful processes filtering in ELK.
Additionally to cumulative CPU and IO counters, v2 brings
*_1h_diff
fields which provides estimated hourly diff since previous measurement. So it suppose that a single client will request processes stats regularly. Otherwise estimated diffs might be inaccurate.2. API v2 concept
Objects provided by API v2 suppose to be flat which matches analytic needs (like ELK) much better.
It doesn't support include lists. It should simplify caching on proxy side and simplify collecting stats from multiple nodes without need to parse and serialize JSON (just joining responses).