Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update graphql querying docs #4496

Merged
merged 4 commits into from
Nov 17, 2023
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
285 changes: 161 additions & 124 deletions website/docs/docs/dbt-cloud-apis/sl-graphql.md
Original file line number Diff line number Diff line change
Expand Up @@ -150,6 +150,60 @@ metricsForDimensions(
): [Metric!]!
```

**Metric Types**

```graphql
Metric {
name: String!
description: String
type: MetricType!
typeParams: MetricTypeParams!
filter: WhereFilter
dimensions: [Dimension!]!
queryableGranularities: [TimeGranularity!]!
}
```

```
MetricType = [SIMPLE, RATIO, CUMULATIVE, DERIVED]
```

**Metric Type parameters**

```graphql
MetricTypeParams {
measure: MetricInputMeasure
inputMeasures: [MetricInputMeasure!]!
numerator: MetricInput
denominator: MetricInput
expr: String
window: MetricTimeWindow
grainToDate: TimeGranularity
metrics: [MetricInput!]
}
```


**Dimension Types**

```graphql
Dimension {
name: String!
description: String
type: DimensionType!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey @WilliamDee, if a user uses these parameters, does this mean the DimensionType! field needs to be filled in by them using either CATEGORICAL or TIME as state below?

DimensionType = [CATEGORICAL, TIME]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a return type so not an input type, so this is for them to retrieve certain values from the request

typeParams: DimensionTypeParams
isPartition: Boolean!
expr: String
queryableGranularities: [TimeGranularity!]!
}
```

```
DimensionType = [CATEGORICAL, TIME]
```

### Querying

**Create Dimension Values query**

```graphql
Expand Down Expand Up @@ -205,59 +259,128 @@ query(
): QueryResult!
```

**Metric Types**
The graphql API uses a polling process for querying as queries could be long running in some cases. This means the first mutation `createQuery` return a query ID and will be used to kick off the query. The query ID can then be used in the polling query `query` to fetch the results and status of the job. The typical flow would look as followed,
mirnawong1 marked this conversation as resolved.
Show resolved Hide resolved

1. Kick off a query
```graphql
Metric {
name: String!
description: String
type: MetricType!
typeParams: MetricTypeParams!
filter: WhereFilter
dimensions: [Dimension!]!
queryableGranularities: [TimeGranularity!]!
mutation {
createQuery(
environmentId: 123456
metrics: [{name: "order_total"}]
groupBy: [{name: "metric_time"}]
) {
queryId # => Returns 'QueryID_12345678'
}
}
```

```
MetricType = [SIMPLE, RATIO, CUMULATIVE, DERIVED]
2. Poll for results
```graphql
{
query(environmentId: 123456, queryId: "QueryID_12345678") {
sql
status
error
totalPages
jsonResult
arrowResult
}
}
```
3. Keep querying 2. at an appropriate interval until status is `FAILED` or `SUCCESSFUL`

### Output format and pagination

**Output format**

By default, the output is in Arrow format. You can switch to JSON format using the following parameter. However, due to performance limitations, we recommend using the JSON parameter for testing and validation. The JSON received is a base64 encoded string. To access it, you can decode it using a base64 decoder. The JSON is created from pandas, which means you can change it back to a dataframe using `pandas.read_json(json, orient="table")`. Or you can work with the data directly using `json["data"]`, and find the table schema using `json["schema"]["fields"]`. Alternatively, you can pass `encoded:false` to the jsonResult field to get a raw JSON string directly.

**Metric Type parameters**

```graphql
MetricTypeParams {
measure: MetricInputMeasure
inputMeasures: [MetricInputMeasure!]!
numerator: MetricInput
denominator: MetricInput
expr: String
window: MetricTimeWindow
grainToDate: TimeGranularity
metrics: [MetricInput!]
{
query(environmentId: BigInt!, queryId: Int!, pageNum: Int! = 1) {
sql
status
error
totalPages
arrowResult
jsonResult(orient: PandasJsonOrient! = TABLE, encoded: Boolean! = true)
}
}
```

The results default to the table but you can change it to any [pandas](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_json.html) supported value.

**Dimension Types**
**Pagination**

```graphql
Dimension {
name: String!
description: String
type: DimensionType!
typeParams: DimensionTypeParams
isPartition: Boolean!
expr: String
queryableGranularities: [TimeGranularity!]!
By default, we return 1024 rows per page. If your result set exceeds this, you need to increase the page number using the `pageNum` option.

### Run a Python query

The `arrowResult` in the GraphQL query response is a byte dump, which isn't visually useful. You can convert this byte data into an Arrow table using any Arrow-supported language. Refer to the following Python example explaining how to query and decode the arrow result:


```python
import base64
import pyarrow as pa
import time

headers = {"Authorization":"Bearer <token>"}
query_result_request = """
{
query(environmentId: 70, queryId: "12345678") {
sql
status
error
arrowResult
}
}
```
"""

```
DimensionType = [CATEGORICAL, TIME]
while True:
gql_response = requests.post(
"https://semantic-layer.cloud.getdbt.com/api/graphql",
json={"query": query_result_request},
headers=headers,
)
if gql_response.json()["data"]["status"] in ["FAILED", "SUCCESSFUL"]:
break
# Set an appropriate interval between polling requests
time.sleep(1)

"""
gql_response.json() =>
{
"data": {
"query": {
"sql": "SELECT\n ordered_at AS metric_time__day\n , SUM(order_total) AS order_total\nFROM semantic_layer.orders orders_src_1\nGROUP BY\n ordered_at",
"status": "SUCCESSFUL",
"error": null,
"arrowResult": "arrow-byte-data"
}
}
}
"""

def to_arrow_table(byte_string: str) -> pa.Table:
"""Get a raw base64 string and convert to an Arrow Table."""
with pa.ipc.open_stream(base64.b64decode(res)) as reader:
return pa.Table.from_batches(reader, reader.schema)


arrow_table = to_arrow_table(gql_response.json()["data"]["query"]["arrowResult"])

# Perform whatever functionality is available, like convert to a pandas table.
print(arrow_table.to_pandas())
"""
order_total ordered_at
3 2023-08-07
112 2023-08-08
12 2023-08-09
5123 2023-08-10
"""
```

### Create Query examples
### Additional Create Query examples

The following section provides query examples for the GraphQL API, such as how to query metrics, dimensions, where filters, and more.

Expand Down Expand Up @@ -359,7 +482,7 @@ mutation {
}
```

**Query with Explain**
**Query with just compiling SQL**

This takes the same inputs as the `createQuery` mutation.

Expand All @@ -373,90 +496,4 @@ mutation {
sql
}
}
```

### Output format and pagination

**Output format**

By default, the output is in Arrow format. You can switch to JSON format using the following parameter. However, due to performance limitations, we recommend using the JSON parameter for testing and validation. The JSON received is a base64 encoded string. To access it, you can decode it using a base64 decoder. The JSON is created from pandas, which means you can change it back to a dataframe using `pandas.read_json(json, orient="table")`. Or you can work with the data directly using `json["data"]`, and find the table schema using `json["schema"]["fields"]`. Alternatively, you can pass `encoded:false` to the jsonResult field to get a raw JSON string directly.


```graphql
{
query(environmentId: BigInt!, queryId: Int!, pageNum: Int! = 1) {
sql
status
error
totalPages
arrowResult
jsonResult(orient: PandasJsonOrient! = TABLE, encoded: Boolean! = true)
}
}
```

The results default to the table but you can change it to any [pandas](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_json.html) supported value.

**Pagination**

By default, we return 1024 rows per page. If your result set exceeds this, you need to increase the page number using the `pageNum` option.

### Run a Python query

The `arrowResult` in the GraphQL query response is a byte dump, which isn't visually useful. You can convert this byte data into an Arrow table using any Arrow-supported language. Refer to the following Python example explaining how to query and decode the arrow result:


```python
import base64
import pyarrow as pa

headers = {"Authorization":"Bearer <token>"}
query_result_request = """
{
query(environmentId: 70, queryId: "12345678") {
sql
status
error
arrowResult
}
}
"""

gql_response = requests.post(
"https://semantic-layer.cloud.getdbt.com/api/graphql",
json={"query": query_result_request},
headers=headers,
)

"""
gql_response.json() =>
{
"data": {
"query": {
"sql": "SELECT\n ordered_at AS metric_time__day\n , SUM(order_total) AS order_total\nFROM semantic_layer.orders orders_src_1\nGROUP BY\n ordered_at",
"status": "SUCCESSFUL",
"error": null,
"arrowResult": "arrow-byte-data"
}
}
}
"""

def to_arrow_table(byte_string: str) -> pa.Table:
"""Get a raw base64 string and convert to an Arrow Table."""
with pa.ipc.open_stream(base64.b64decode(res)) as reader:
return pa.Table.from_batches(reader, reader.schema)


arrow_table = to_arrow_table(gql_response.json()["data"]["query"]["arrowResult"])

# Perform whatever functionality is available, like convert to a pandas table.
print(arrow_table.to_pandas())
"""
order_total ordered_at
3 2023-08-07
112 2023-08-08
12 2023-08-09
5123 2023-08-10
"""
```
```
Loading