Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(host-metrics)!: fix process.cpu.* metrics #1785

Merged

Conversation

david-luna
Copy link
Contributor

Which problem is this PR solving?

This PR is changing/fixing process.cpu.time and process.cpu.utilization metrics which were not complying with the semantic conventions.

Short description of the changes

Get a 1st data collect when the modules is loaded and for each collection returns

  • time in seconds for user and system states
  • % of time in each state

Closes: #1718

Copy link

codecov bot commented Nov 9, 2023

Codecov Report

Merging #1785 (8afe0ed) into main (33b31d0) will increase coverage by 0.48%.
The diff coverage is 100.00%.

❗ Current head 8afe0ed differs from pull request most recent head eb13e64. Consider uploading reports for the commit eb13e64 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1785      +/-   ##
==========================================
+ Coverage   91.45%   91.94%   +0.48%     
==========================================
  Files         145      144       -1     
  Lines        7412     7854     +442     
  Branches     1484     1635     +151     
==========================================
+ Hits         6779     7221     +442     
  Misses        633      633              
Files Coverage Δ
...ages/opentelemetry-host-metrics/src/BaseMetrics.ts 100.00% <100.00%> (ø)
packages/opentelemetry-host-metrics/src/enum.ts 100.00% <100.00%> (ø)
packages/opentelemetry-host-metrics/src/metric.ts 100.00% <100.00%> (ø)
...ges/opentelemetry-host-metrics/src/stats/common.ts 97.91% <100.00%> (+0.13%) ⬆️

... and 14 files with indirect coverage changes

@@ -46,7 +46,7 @@ export abstract class BaseMetrics {
constructor(config: MetricsCollectorConfig) {
this._name = config.name || DEFAULT_NAME;
const meterProvider =
config.meterProvider! || api.metrics.getMeterProvider();
config.meterProvider || api.metrics.getMeterProvider();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for reviewer: I don't know why there was this non null assertion in place. It does not affect the expression and also it was giving an lint warning so I've decided to remove it.

@@ -314,15 +317,15 @@ describe('Host Metrics', () => {
it('should export Process CPU time metrics', async () => {
const metric = await getRecords(reader, 'process.cpu.time');

ensureValue(metric, { state: 'user' }, 90713.56);
ensureValue(metric, { state: 'system' }, 63192.630000000005);
ensureValue(metric, { state: 'user' }, 90.71356);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for reviewer: values have just changed in the magnitude since according to NodeJS docs the usage is reported in microseconds and not milliseconds

});

it('should export Process CPU utilization metrics', async () => {
const metric = await getRecords(reader, 'process.cpu.utilization');

ensureValue(metric, { state: 'user' }, 30247.935978659552);
ensureValue(metric, { state: 'system' }, 21071.23374458153);
ensureValue(metric, { state: 'user' }, 0.025);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for reviewer: this has changed to a gauge value within [0..1]. The fixture data in process.json file is added so we get these results.

@david-luna david-luna marked this pull request as ready for review November 21, 2023 11:11
@david-luna david-luna requested a review from a team November 21, 2023 11:11
@david-luna
Copy link
Contributor Author

Note for reviewer: I'm leveraging this PR also to update the attribute names for network and host metrics so we conform with the current status of semantic conventions

@david-luna david-luna closed this Dec 14, 2023
@david-luna david-luna deleted the 1718-fix-process-cpu-metrics branch December 14, 2023 17:41
@david-luna david-luna restored the 1718-fix-process-cpu-metrics branch December 14, 2023 17:41
@david-luna david-luna reopened this Dec 14, 2023
Copy link
Member

@pichlermarc pichlermarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for taking so long to review this. Looks good. 👍

@david-luna
Copy link
Contributor Author

Thanks @pichlermarc :)

unfortunately I'm not allowed to merge it myself. Could you merge it please? What would be the next steps to be able to?

Copy link
Contributor

@trentm trentm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Here is a sanity check I did on the getProcessCpuUsageData() impl:

// Sanity test of `getProcessCpuUsageData()`.
// - Setup a "busy" interval that does some busy work (`crypto.pbkdf2Sync()`)
//   for ~500ms every 1s. We then expect the "user" usage of this process to
//   be ~50% on this one CPU. Divide that by the number of CPUs to get the
//   expected.
// - Then every so often (5s), dump the output from `getProcessCpuUsageData()`.
//   The `userP` should match the expected value.

const os = require('os')
const crypto = require('crypto');
const common = require('./build/src/stats/common.js');

console.log('Expected `userP`: 50% / %d CPUs =', os.cpus().length, 0.50 / os.cpus().length);

const measure = setInterval(() => {
  console.log(common.getProcessCpuUsageData())
}, 5000)

const busy = setInterval(
  () => {
    const start = Date.now();
    let n = 0
    while (Date.now() - start < 500) {
      crypto.pbkdf2Sync('secret', 'salt', 10000, 64, 'sha512');
      n++
    }
    const end = Date.now();
    console.log('Busy iteration: n=%s, elapsed=%s', n, end-start);
  },
  1000);

Running that on my machine shows it returns the expected userP (ignoring the first interval):

% node foo.js
Expected `userP`: 50% / 16 CPUs = 0.03125
Busy iteration: n=88, elapsed=504
Busy iteration: n=78, elapsed=500
Busy iteration: n=80, elapsed=503
Busy iteration: n=88, elapsed=504
{
  user: 2.071539,
  system: 0.015813999999999998,
  userP: 0.025165888433879346,
  systemP: 0.000057293747502996406
}
Busy iteration: n=85, elapsed=503
Busy iteration: n=83, elapsed=501
Busy iteration: n=85, elapsed=505
Busy iteration: n=85, elapsed=504
Busy iteration: n=84, elapsed=501
{
  user: 4.600277,
  system: 0.019622999999999998,
  userP: 0.031609225,
  systemP: 0.0000476125
}
Busy iteration: n=86, elapsed=505
Busy iteration: n=87, elapsed=504
Busy iteration: n=89, elapsed=502
Busy iteration: n=89, elapsed=505
Busy iteration: n=87, elapsed=503
{
  user: 7.115746,
  system: 0.022226,
  userP: 0.031437075084983,
  systemP: 0.00003253099380123975
}

@trentm
Copy link
Contributor

trentm commented Jan 11, 2024

@legendecas Did you care to review this PR before it gets merged?

Copy link
Member

@legendecas legendecas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thanks

@legendecas legendecas merged commit 1e90a40 into open-telemetry:main Jan 12, 2024
15 checks passed
@dyladan dyladan mentioned this pull request Jan 12, 2024
@david-luna david-luna deleted the 1718-fix-process-cpu-metrics branch April 16, 2024 13:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bug: host-metrics reporting wrong system and process CPU utilization
4 participants