Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Source Github: Add dbt converter #49

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
103 changes: 97 additions & 6 deletions connectors/source_github/README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,100 @@
# Airbyte source_github dbt Package
# Github Airbyte dbt Package

This package contains dbt models for Airbyte source_github source.
---

What it includes:
- This package contains dbt models to work with Airbyte Github connector.
- The package is compatible with latest version of Airbyte Github connector.
- Currently, it is limited to creating transformations compatible with [Fivetran's modeling dbt package](https://github.com/fivetran/dbt_github/tree/main).
- In the future, specific models will be applied directly to Airbyte connector output. If you have an idea or want to propose an analytical model for this source, please refer to the contributing guide, which explains how to propose a new transformation model.
- This package was tested with BigQuery, Snowflake, and Postgres data warehouses.

* A complete source description
* ERD model for the source
* Diagram documentation for the source
---

## 🎯 Intructions how to use

### Airbyte dbt Package

For now Airbyte dbt packages aren't versioned. You must configure using git and subdirectory. For now there isn't any transformation model directly applied to this package. But you can generate docs and tests with dbt.

Create the following files:

**`dbt_project.yml`**

```yaml
vars:
using_fivetran_model: False
airbyte_database: "airbyte_db_default"
airbyte_schema: "airbyte_dbt_github"
```

**`packages.yml`**

```yaml
packages:
- git: "https://github.com/airbytehq/airbyte-dbt-models.git"
subdirectory: "connectors/source_github"
```

After you can run `dbt tests` or `dbt docs generate` to have a preview of Airbyte output data.

### Fivetran github Modeling dbt package

This package transforms Airbyte connector output data, making it compatible with Fivetran's github dbt package. You can check the analytical models Fivetran creates [here](https://github.com/fivetran/dbt_github/tree/main?tab=readme-ov-file#-what-does-this-dbt-package-do). The link also provides information about how the package works and what is configurable.

Create the require files to use Airbyte and Fivetran dbt packages:

**`packages.yml`**

```yaml
packages:
- git: "https://github.com/airbytehq/airbyte-dbt-models.git"
subdirectory: "connectors/source_github_support"

- package: fivetran/github
version: [">=0.16.0", "<0.17.0"]
```

This is a default variable definition you must configure to have the models created.
At the moment this package doesn't support (schedules, domains, user tags, ticket form history and organization tags) for that reason keep those variables set to `False`.
Variables starting with the prefix `github_..._identifier` represent the names of tables generated by the Airbyte connector. If you configured your sync with this prefix, ensure you edit it accordingly.

**`dbt_project.yml`**

```yaml
vars:
# Required by Airbyte dbt model
using_fivetran_model: True
airbyte_database: "airbyte_db_default"
airbyte_schema: "airbyte_dbt_github_support"

# Required by Fivetran dbt model
github_database: "airbyte_db_default"
github_schema: "airbyte_dbt_github_support"

using_schedules: False
using_domain_names: False
using_user_tags: False
using_ticket_form_history: False
using_organization_tags: False

github_organization_identifier: "organizations"
github_ticket_identifier: "tickets"
github_ticket_comment_identifier: "ticket_comments"
github_ticket_tag_identifier: "tags"
github_ticket_field_history_identifier: "ticket_field_history"
github_ticket_form_history_identifier: "ticket_forms"
github_brand_identifier: "brands"
github_group_identifier: "groups"
github_organization_tag_identifier: "organization_fields"
github_user_identifier: "users"
github_user_tag_identifier: "user_field"
```

After run `dbt run`, you can see the models being created.

---

## :package: Package Maintenance

- This package is maintained by the Airbyte Community.
- You can contribute any time please read the Contributing Guidelines or enter the Airbyte Slack Channel `#airbyte-dbt-packages`
61 changes: 61 additions & 0 deletions connectors/source_github/integration_tests/dbt_project.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
name: integration_test_github

config-version: 2

version: 0.1.0

profile: integration_tests

model-paths:
- models

macro-paths:
- macros

target-path: target

clean-targets:
- target
- dbt_modules
- logs

require-dbt-version:
- ">=1.0.0"
- <2.0.0

models:
airbyte_dbt_source_github:
materialized: view
+schema: dbt_github
staging:
materialized: view
tmp:
materialized: view

vars:
# Required by Airbyte dbt model
using_fivetran_model: True
airbyte_database: "airbyte_db_default"
airbyte_schema: "airbyte_dbt_github"

# Required by Github dbt model
github_database: "airbyte_db_default"
github_schema: "airbyte_dbt_github"

using_schedules: False
using_domain_names: False
using_user_tags: False
using_ticket_form_history: False
using_organization_tags: False

zendesk_organization_identifier: "organizations"
zendesk_ticket_identifier: "tickets"
zendesk_ticket_comment_identifier: "ticket_comments"
zendesk_ticket_tag_identifier: "tags"
zendesk_ticket_field_history_identifier: "ticket_field_history"
zendesk_ticket_form_history_identifier: "ticket_forms"
zendesk_brand_identifier: "brands"
zendesk_group_identifier: "groups"
zendesk_organization_tag_identifier: "organization_fields"
zendesk_user_identifier: "users"
zendesk_user_tag_identifier: "user_field"
marcosmarxm marked this conversation as resolved.
Show resolved Hide resolved
1 change: 1 addition & 0 deletions connectors/source_github/integration_tests/vars
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{airbyte_database: $AB_DB, github_database: $AB_DB}
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
version: 2

models:
- name: github_issue_labels
schema: "{{ var('airbyte_schema', target.schema) }}"
database: "{{ var('airbyte_database', target.database) }}"
identifier: "{{ var('zendesk_ticket_field_history_identifier', 'ticket_field_history') }}"
description: All fields and field values associated with tickets.
config:
+enabled: "{{ var('using_fivetran_model', False) }}"
columns:
- name: ticket_id
description: The ID of the ticket associated with the field
- name: field_name
description: The name of the ticket field
- name: updated
description: The time the ticket field value was created
- name: value
description: The value of the field
- name: user_id
description: The id of the user who made the update
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
{% if target.type == "snowflake" %}

with tmp as (
SELECT
t.id as TICKET_ID,
f.value:field_name::STRING AS FIELD_NAME,
f.value:value::STRING AS VALUE,
t.author_id as AUTHOR_ID,
t.created_at as UPDATED
FROM
{{ source('source_zendesk_support', 'ticket_audits') }} t,
LATERAL FLATTEN(input => t.events) f
)
select * from tmp where field_name is not null

{% elif target.type == "bigquery" %}

WITH tmp AS (
SELECT
t.id AS TICKET_ID,
JSON_EXTRACT_SCALAR(f, '$.field_name') AS FIELD_NAME,
JSON_EXTRACT_SCALAR(f, '$.value') AS VALUE,
t.author_id AS AUTHOR_ID,
t.created_at AS UPDATED
FROM
{{ source('source_zendesk_support', 'ticket_audits') }} t,
UNNEST(JSON_EXTRACT_ARRAY(t.events)) AS f
)
SELECT * FROM tmp
WHERE FIELD_NAME IS NOT NULL

{% elif target.type == "postgres" %}

WITH tmp AS (
SELECT
t.id AS ticket_id,
f.value->>'field_name' AS field_name,
f.value->>'value' AS value,
t.author_id AS author_id,
t.created_at AS updated
FROM
{{ source('source_zendesk_support', 'ticket_audits') }} t,
LATERAL jsonb_array_elements(t.events::jsonb) AS f(value)
)
SELECT * FROM tmp
WHERE FIELD_NAME IS NOT NULL

{%endif%}
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
version: 2

models:
- name: requested_reviewer_history
schema: "{{ var('airbyte_schema', target.schema) }}"
database: "{{ var('airbyte_database', target.database) }}"
identifier: "{{ var('github_requested_reviewer_history_identifier', 'requested_reviewer_history') }}"
description: Table containing when a user requests another user to review a pull request.
columns:
- name: pull_request_id
description: Foreign key that references the pull request table.
- name: created_at
description: Timestamp of when the review was submitted.
- name: requested_id
description: Foreign key that references the user table, representing the user that was requested to review a PR.
- name: removed
description: Boolean variable indicating if the requester was removed from the PR (true) or added to the PR (false).