@@ -12,42 +12,42 @@ Jobs and Datasets have their own namespaces, job namespaces being derived from s
12
12
13
13
A dataset, or ` table ` , is organized according to a producer, namespace, database and (optionally) schema.
14
14
15
- | Data Store | Type | Namespace | Name |
16
- | :-----------| :-----------| :---------------------| :-----------------|
17
- | Athena | Warehouse | awsathena://athena.{region_name}.amazonaws.com | {catalog}.{database}.{table} |
18
- | Azure Cosmos DB | Warehouse | azurecosmos://{host}/dbs/{database} | colls/{table} |
19
- | Azure Data Explorer | Warehouse | azurekusto://{host}.kusto.windows.net | {database}/{table} |
20
- | Azure Synapse | Warehouse | sqlserver://{host}:{port} | {schema}.{table} |
21
- | BigQuery | Warehouse | bigquery:// | {project id}.{dataset name}.{table name} |
22
- | Cassandra | Warehouse | cassandra://{host}:{port} | {keyspace}.{table} |
23
- | MySQL | Warehouse | mysql://{host}:{port} | {database}.{table} |
24
- | Oracle | Warehouse | oracle://{host}:{port} | {serviceName}.{schema}.{table} or {sid}.{schema}.{table} |
25
- | Postgres | Warehouse | postgres://{host}:{port} | {database}.{schema}.{table} |
26
- | Teradata | Warehouse | teradata://{host}:{port} | {database}.{table} |
27
- | Redshift | Warehouse | redshift://{cluster_identifier}.{region_name}:{port} | {database}.{schema}.{table} |
28
- | Snowflake | Warehouse | snowflake://{organization name}-{account name} | {database}.{schema}.{table} |
29
- | Trino | Warehouse | trino://{host}:{port} | {catalog}.{schema}.{table} |
30
- | ABFSS (Azure Data Lake Gen2) | Data lake | abfss://{container name}@{service name}.dfs.core.windows.net | {path} |
31
- | DBFS (Databricks File System) | Distributed file system | hdfs://{workspace name} | {path} |
32
- | GCS | Blob storage | gs://{bucket name} | {object key} |
33
- | HDFS | Distributed file system | hdfs://{namenode host}:{namenode port} | {path} |
34
- | Kafka | distributed event streaming platform | kafka://{bootstrap server host}:{port} | {topic} |
35
- | Local file system | File system | file://{host} | {path} |
36
- | S3 | Blob Storage | s3://{bucket name} | {object key} |
37
- | WASBS (Azure Blob Storage) | Blob Storage | wasbs://{container name}@{service name}.dfs.core.windows.net | {object key} |
15
+ | Data Store | Type | Namespace | Name |
16
+ | :------------------------------| :-------------------------------------| :-------------------------------------------------------------| :---------------------------------------------------------|
17
+ | Athena | Warehouse | awsathena://athena.{region_name}.amazonaws.com | {catalog}.{database}.{table} |
18
+ | AWS Glue | Data catalog | arn:aws:glue:{region}:{account id} | table/{database name}/{table name} |
19
+ | Azure Cosmos DB | Warehouse | azurecosmos://{host}/dbs/{database} | colls/{table} |
20
+ | Azure Data Explorer | Warehouse | azurekusto://{host}.kusto.windows.net | {database}/{table} |
21
+ | Azure Synapse | Warehouse | sqlserver://{host}:{port} | {schema}.{table} |
22
+ | BigQuery | Warehouse | bigquery:// | {project id}.{dataset name}.{table name} |
23
+ | Cassandra | Warehouse | cassandra://{host}:{port} | {keyspace}.{table} |
24
+ | MySQL | Warehouse | mysql://{host}:{port} | {database}.{table} |
25
+ | Oracle | Warehouse | oracle://{host}:{port} | {serviceName}.{schema}.{table} or {sid}.{schema}.{table} |
26
+ | Postgres | Warehouse | postgres://{host}:{port} | {database}.{schema}.{table} |
27
+ | Teradata | Warehouse | teradata://{host}:{port} | {database}.{table} |
28
+ | Redshift | Warehouse | redshift://{cluster_identifier}.{region_name}:{port} | {database}.{schema}.{table} |
29
+ | Snowflake | Warehouse | snowflake://{organization name}-{account name} | {database}.{schema}.{table} |
30
+ | Trino | Warehouse | trino://{host}:{port} | {catalog}.{schema}.{table} |
31
+ | ABFSS (Azure Data Lake Gen2) | Data lake | abfss://{container name}@{service name}.dfs.core.windows.net | {path} |
32
+ | DBFS (Databricks File System) | Distributed file system | hdfs://{workspace name} | {path} |
33
+ | GCS | Blob storage | gs://{bucket name} | {object key} |
34
+ | HDFS | Distributed file system | hdfs://{namenode host}:{namenode port} | {path} |
35
+ | Kafka | distributed event streaming platform | kafka://{bootstrap server host}:{port} | {topic} |
36
+ | Local file system | File system | file://{host} | {path} |
37
+ | S3 | Blob Storage | s3://{bucket name} | {object key} |
38
+ | WASBS (Azure Blob Storage) | Blob Storage | wasbs://{container name}@{service name}.dfs.core.windows.net | {object key} |
38
39
39
40
## Job Naming
40
41
41
42
A ` Job ` is a recurring data transformation with inputs and outputs. Each execution is captured as a ` Run ` with corresponding metadata.
42
43
A ` Run ` event identifies the ` Job ` it instances by providing the job’s unique identifier.
43
44
The ` Job ` identifier is composed of a ` Namespace ` and ` Name ` . The job namespace is usually set in OpenLineage client config. The job name is unique within its namespace.
44
45
45
-
46
- | Job type | Name | Example |
47
- | :------- | :------ | :------ |
48
- | Airflow task | {dag_id}.{task_id} | orders_etl.count_orders |
49
- | Spark job | {appName}.{command}.{table} | my_awesome_app.execute_insert_into_hive_table.mydb_mytable |
50
- | SQL | {schema}.{table} | gx.validate_datasets |
46
+ | Job type | Name | Example |
47
+ | :-------------| :----------------------------| :-----------------------------------------------------------|
48
+ | Airflow task | {dag_id}.{task_id} | orders_etl.count_orders |
49
+ | Spark job | {appName}.{command}.{table} | my_awesome_app.execute_insert_into_hive_table.mydb_mytable |
50
+ | SQL | {schema}.{table} | gx.validate_datasets |
51
51
52
52
## Run Naming
53
53
0 commit comments