Request: Enhance Monitoring for User-specific CPU and Disk Usage #187

bendichter · 2024-08-06T09:42:03Z

Description:

We need a functional system for monitoring usage and cost by user, ideally with a no-code dashboard. This feature would empower us to manage resource allocation and open up registrations to new users more confidently.

Requirements:

CPU and Disk Usage Monitoring by User:
- Monitor disk usage and CPU usage (or relevant cost factors) for individual users.
- While disk usage can be monitored with du checks, we need a way to generate reports over time, not just at an instant.
Reporting and Analytics:
- Provide reports on server options used and duration by user.
- Create a system to monitor incremental and shared costs. This involves reporting the incremental cost for node creators and shared costs equally among node users.
Dashboard:
- Develop a no-code dashboard to visualize usage and cost data.
- Include functionality to pre-set usage limits for users from these dashboard.
Integration and Metrics:
- Integrate with Graphana and Prometheus for improved metrics collection from AWS and other cloud vendors.
- Ensure the system can handle cost anomaly detection.

Challenges:

Calculating "cost per user" is complex due to the shared nature of resources (e.g., multiple profiles on a single node).
Obtaining live cost information from AWS is challenging.
Supporting multiple cloud vendors adds another layer of complexity.

Proposed MVP:

Metrics Collection:
- Enhance AWS metrics collection to include hourly data instead of just daily totals.
Disk Usage Monitoring:
- Implement a disk usage monitoring and cleanup procedure.
Cost Anomaly Detection:
- Use existing tools (e.g., @satra 's anomaly detection system) for total cost anomaly detection.
Graphana and Prometheus Integration:
- Integrate with Graphana and Prometheus for comprehensive monitoring and alerting.

References:

Related Issues: Improved metrics collection, Disk usage monitoring, Graphana and Prometheus integration

This is a rough outline based on a convo with @asmacdo. Input and collaboration from the team will be crucial to refining the requirements to meet our needs.

The text was updated successfully, but these errors were encountered:

yarikoptic · 2024-08-06T13:51:37Z

might be worth investigating how/what nebari does that (#186).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request: Enhance Monitoring for User-specific CPU and Disk Usage #187

Request: Enhance Monitoring for User-specific CPU and Disk Usage #187

bendichter commented Aug 6, 2024

yarikoptic commented Aug 6, 2024

Request: Enhance Monitoring for User-specific CPU and Disk Usage #187

Request: Enhance Monitoring for User-specific CPU and Disk Usage #187

Comments

bendichter commented Aug 6, 2024

yarikoptic commented Aug 6, 2024