Databricks Receiver

The Databricks Receiver uses the Databricks API to generate metrics about the operation of a Databricks instance.

In addition to generating metrics from the Databricks API, it also generates metrics from the Spark subsystem running in a Databricks instance.

Supported pipeline types: metrics

🚧 This receiver is in DEVELOPMENT. Behavior, configuration fields, and metric data model are subject to change.

Configuration

The following fields are required:

instance_name: A string representing the name of the instance. This value gets set as a databricks.instance.name resource attribute.
endpoint: The URL containing a protocol (http or https), hostname, and (optional) port of the Databricks API, without a trailing slash.
token: An access token to authenticate to the Databricks API.
spark_org_id: The Spark Org ID. See the Spark Subsystem Configuration section below for how to get this value.
spark_endpoint: The URL containing a protocol (http or https), hostname, and (optional) port of the Spark API. See the Spark Subsystem Configuration section below for how to get this value.
spark_ui_port: A number representing the Spark UI Port (typically 40001). See the Spark Subsystem Configuration section below for how to get this value.

The following fields are optional:

collection_interval: How often this receiver fetches information from the Databricks API. Must be a string readable by time.ParseDuration. Defaults to 30s.
max_results: The maximum number of items to return per API call. Defaults to 25 which is the maximum value. If set explicitly, the API requires a value greater than 0 and less than or equal to 25.

Example

databricks:
  instance_name: my-instance
  endpoint: https://dbr.example.net
  token: abc123
  spark_org_id: 1234567890
  spark_endpoint: https://spark.example.net
  spark_ui_port: 40001
  collection_interval: 10s

Spark Subsystem Configuration

To get the configuration parameters this receiver will need to get Spark metrics, run the following Scala notebook and copy its output values into your config:

%scala
val sparkOrgId = spark.conf.get("spark.databricks.clusterUsageTags.clusterOwnerOrgId")
val sparkEndpoint = dbutils.notebook.getContext.apiUrl.get
val sparkUiPort = spark.conf.get("spark.ui.port")