Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[D1] - Extract and Utilize System Data #723

Open
ydaponte opened this issue Sep 13, 2024 · 1 comment
Open

[D1] - Extract and Utilize System Data #723

ydaponte opened this issue Sep 13, 2024 · 1 comment
Assignees

Comments

@ydaponte
Copy link
Contributor

ydaponte commented Sep 13, 2024

As a Developer, I want to gather and utilize data from System Views in Microsoft Fabric to monitor system performance and query activity so that we can effectively score and tag stale content.

Resources

Queries:

Acceptance Criteria:

  • Implement data extraction from system views, DMVs, and Delta Lake logs.
  • Integrate this data into the consolidated denorm table.
  • Ensure that data from queries and insights are utilized for content scoring.
  • Potential Tech Stack to use: DMVs/Delta Logs exist under the Warehouse in Fabric, Delta logs are stored in OneLake. For Monitoring, query insights, purview and log analytics will be useful.
@ydaponte ydaponte changed the title 1. Extract and Utilize System Data 1 - [US] - Extract and Utilize System Data Sep 13, 2024
@ydaponte ydaponte assigned ydaponte and andreiafig-msft and unassigned ydaponte Sep 13, 2024
@ydaponte ydaponte changed the title 1 - [US] - Extract and Utilize System Data [D1] - Extract and Utilize System Data Sep 13, 2024
@andreiafig-msft
Copy link

andreiafig-msft commented Sep 16, 2024

STUDYING OUR DATA SOURCE - SYSTEM VIEWS IN DATAWAREHOUSE

What are we considering as a stale content in terms of Data warehouse query and performance? The system views will provide information about:

Query Performance Analysis

  • What is the historical performance of our queries?
  • Are there any long-running queries that need attention?
  • Can we identify the queries causing performance bottlenecks?

Query Optimization and Tuning

  • Which queries are frequently run, and can their performance be improved?
  • Can we identify queries that have failed or been canceled?
  • Can we track changes in query performance over time?
  • Are there any queries that consistently perform poorly?

User Activity Monitoring

  • Who submitted a particular query?
  • Who are the most active users or the users with the most long-running queries?

With the above information, customer will benefit from creating reports for analysis and statistics but not for archival.

In the Fabric Warehouse, we have the Query Insights that is automatically created. Query insights - Microsoft Fabric | Microsoft Learn

Comparing the feature Query Insights with the system views, there are some similarities (or one is complementary to the other). From what I could understand, the Query Insights is using the DMV's and returns a more complete information. The same if happening if we consider the Query Activity.

Query insights - Microsoft Fabric | Microsoft Learn
Monitor connections, sessions, and requests using DMVs - Microsoft Fabric | Microsoft Learn
Monitor your running and completed queries using Query activity - Microsoft Fabric | Microsoft Learn

The fact that you are able to find the same information in these different tools is highlighted in the Query Activity documentation.

As per my experience, the customer will always want to have access to this data for monitoring and tracking purposed. There is no benefit in archiving this data specially because we only save the data for the past 30 days (this is for the Query Insights, Query Activity).

For the DMV's, I am not sure about the limits but I would say it is same as the ones in Synapse. Please refer to the following documentation:
System views supported in Synapse SQL - Azure Synapse Analytics | Microsoft Learn
Capacity limits for dedicated SQL pool - Azure Synapse Analytics | Microsoft Learn

SUMMARY:
• We need to evaluate what we would consider as a stale content for the Data Warehouse system views and query execution.
• From my perspective, there is no need to identify any content here as stale or with the need of archival because this will be done automatically (the telemetry will expire after 30 days and will have a limit for 10.000 records, for example).

NEXT STEPS: Try to correlate the metadata information with the query text to verify if a table has not been used. (For example) - This will give us the information about idle resources.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants