Databricks Job Runs Data monitoring Tool

This script is designed to fetch and process job runs data from an Azure Databricks instance using the Databricks REST API. It extracts relevant information about job runs, processes the data, and provides an output in the form of a Pandas DataFrame and a CSV file.

Prerequisites

Before running this script, ensure you have the following:

Azure Databricks Instance: You need access to an Azure Databricks instance.
API Token: Generate an API token from your Databricks instance with appropriate permissions to access job run data.

Getting Started

Install the required libraries using the following command:

pip install requests pandas

Replace the placeholders in the code with your actual values:

baseURI: Replace with your Azure Databricks instance URL.

apiToken: Replace with your API token.

How the Script Works

The script starts by importing necessary libraries: requests, pandas, math, datetime, and json.

Function `fetch_and_process_job_runs`

The script defines the function fetch_and_process_job_runs responsible for fetching job run data using the Databricks API. The function takes three arguments:

base_uri: The base URL of your Databricks instance.
api_token: Your API token for authentication.
params: A dictionary containing query parameters, including start_time_from, start_time_to, and expand_tasks.

Inside the function:

An API request is made to the specified endpoint.
The response is processed to extract job run details.
Processed data is accumulated and transformed into a Pandas DataFrame.
Pagination is managed using the has_more field in the response.

Data Analysis and Output

After fetching and processing the job run data:

The resulting DataFrame is sorted based on the execution_duration_in_mins column in descending order.
The total execution time for all job runs is calculated and added as a row in the DataFrame.
The processed DataFrame is saved as a CSV file named jobs.csv.
The sorted DataFrame is printed to the console.

Usage

Make sure you have fulfilled the prerequisites and replaced the placeholder values in the code.

Run the script. It will fetch and process job runs data, display the sorted results, save them to a CSV file, and print a Markdown table.

Note: This script provides a basic example of how to fetch and process job runs data from Azure Databricks using the Databricks REST API. You can further enhance and customize the script to suit your specific use case and requirements.

##output

KPI Report for 2023-10-07:

Total Jobs: 160 Total Tasks: 214 Successful Tasks: 174 Failed Tasks: 15 Total Execution Time (mins): 1158 Average Execution Time (mins): 10.82 Min Execution Time (mins): 0 Max Execution Time (mins): 1158

Key Insights:

Task Status Distribution:

{
    "SUCCESS": 174,
    "CANCELED": 24,
    "FAILED": 15
}

Execution Duration Distribution:
- Min: 0 mins
- Max: 1158 mins
- Average: 10.82 mins
Jobs with Longest Execution Time:

job_id execution_duration_in_mins

260792223809789 140

74519312719017 93

371241484431340 88

655421446142082 85

887636488212750 65

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
README.md		README.md
jobs-runs.py		jobs-runs.py
report.PNG		report.PNG

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Databricks Job Runs Data monitoring Tool

Prerequisites

Getting Started

How the Script Works

Function `fetch_and_process_job_runs`

Data Analysis and Output

Usage

KPI Report for 2023-10-07:

About

Releases

Packages

Languages

job_id	execution_duration_in_mins
260792223809789	140
74519312719017	93
371241484431340	88
655421446142082	85
887636488212750	65

snhaider9977/monitoring-databricks-jobs

Folders and files

Latest commit

History

Repository files navigation

Databricks Job Runs Data monitoring Tool

Prerequisites

Getting Started

How the Script Works

Function fetch_and_process_job_runs

Data Analysis and Output

Usage

KPI Report for 2023-10-07:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Function `fetch_and_process_job_runs`

Packages