Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Source Github: Add dbt converter #49

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
95 changes: 89 additions & 6 deletions connectors/source_github/README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,92 @@
# Airbyte source_github dbt Package
# Github Airbyte dbt Package

This package contains dbt models for Airbyte source_github source.
---

What it includes:
- This package contains dbt models to work with Airbyte Github connector.
- The package is compatible with latest version of Airbyte Github connector.
- Currently, it is limited to creating transformations compatible with [Fivetran's modeling dbt package](https://github.com/fivetran/dbt_github/tree/main).
- In the future, specific models will be applied directly to Airbyte connector output. If you have an idea or want to propose an analytical model for this source, please refer to the contributing guide, which explains how to propose a new transformation model.
- This package was tested with BigQuery, Snowflake, and Postgres data warehouses.

* A complete source description
* ERD model for the source
* Diagram documentation for the source
---

## 🎯 Intructions how to use

### Airbyte dbt Package

For now Airbyte dbt packages aren't versioned. You must configure using git and subdirectory. For now there isn't any transformation model directly applied to this package. But you can generate docs and tests with dbt.

Create the following files:

**`dbt_project.yml`**

```yaml
vars:
using_fivetran_model: False
airbyte_database: "airbyte_db_default"
airbyte_schema: "airbyte_dbt_github"
```

**`packages.yml`**

```yaml
packages:
- git: "https://github.com/airbytehq/airbyte-dbt-models.git"
subdirectory: "connectors/source_github"
```

After you can run `dbt tests` or `dbt docs generate` to have a preview of Airbyte output data.

### Fivetran Github Modeling dbt package

This package transforms Airbyte connector output data, making it compatible with Fivetran's Github dbt package. You can check the analytical models Fivetran creates [here](https://github.com/fivetran/dbt_github/tree/main?tab=readme-ov-file#-what-does-this-dbt-package-do). The link also provides information about how the package works and what is configurable.

Create the require files to use Airbyte and Fivetran dbt packages:

**`packages.yml`**

```yaml
packages:
- git: "https://github.com/airbytehq/airbyte-dbt-models.git"
subdirectory: "connectors/source_github_support"

- package: fivetran/github
version: [">=0.16.0", "<0.17.0"]
```

This is a default variable definition you must configure to have the models created.
At the moment this package doesn't support (schedules, domains, user tags, ticket form history and organization tags) for that reason keep those variables set to `False`.
Variables starting with the prefix `github_..._identifier` represent the names of tables generated by the Airbyte connector. If you configured your sync with this prefix, ensure you edit it accordingly.

**`dbt_project.yml`**

```yaml
vars:
# Required by Airbyte dbt model
using_fivetran_model: True
airbyte_database: "airbyte_db_default"
airbyte_schema: "airbyte_dbt_github_support"

# Required by Fivetran dbt model
github_database: "airbyte_db_default"
github_schema: "airbyte_dbt_github_support"

using_repo_team: False

github_issue_assignee_identifier: "issue_assignee"
github_issue_closed_history_identifier: "issue_closed_history"
github_issue_merged_identifier: "issue_merged"
github_pull_request_review_identifier: "pull_request_review"
github_repo_team_identifier: "repo_team"
github_requested_reviewer_history_identifier: "requested_reviewer_history"

```

After run `dbt run`, you can see the models being created.

---

## :package: Package Maintenance

- This package is maintained by the Airbyte Community.
- You can contribute any time please read the Contributing Guidelines or enter the Airbyte Slack Channel `#airbyte-dbt-packages`
7 changes: 4 additions & 3 deletions connectors/source_github/dbt_project.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,10 +25,11 @@ require-dbt-version:

models:
source_github:
materialized: table
materialized: view
staging:
materialized: view

vars:
database: snowflake
schema: source_github
database: airbyte_dbt_default
schema: airbyte_schema_default
using_fivetran: False
45 changes: 45 additions & 0 deletions connectors/source_github/integration_tests/dbt_project.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
name: integration_test_github

config-version: 2

version: 0.1.0

profile: integration_tests

model-paths:
- models

macro-paths:
- macros

target-path: target

clean-targets:
- target
- dbt_modules
- logs

require-dbt-version:
- ">=1.0.0"
- <2.0.0

models:
airbyte_dbt_source_github:
materialized: view
+schema: dbt_github
staging:
materialized: view
tmp:
materialized: view

vars:
# Required by Airbyte dbt model
using_fivetran_model: True
airbyte_database: "airbyte_db_default"
airbyte_schema: "airbyte_dbt_github"

# Required by Github dbt model
github_database: "airbyte_db_default"
github_schema: "airbyte_dbt_github"

using_repo_team: False
1 change: 1 addition & 0 deletions connectors/source_github/integration_tests/vars
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{airbyte_database: $AB_DB, github_database: $AB_DB}
15 changes: 15 additions & 0 deletions connectors/source_github/models/fivetran_converter/issue.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
select
cast(id as {{ dbt.type_string() }}) as id,
body,
closed_at,
created_at,
locked,
{{ fivetran_utils.json_extract('milestone', 'id') }} as milestone_id,
number,
pull_request,
repository as repository_id,
state,
title,
updated_at,
{{ fivetran_utils.json_extract('"user"', 'id') }} as user_id
from {{ source('source_github', 'issues') }}
35 changes: 35 additions & 0 deletions connectors/source_github/models/fivetran_converter/issue.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
version: 2

models:
- name: issue
schema: "{{ var('airbyte_schema', target.schema) }}"
database: "{{ var('airbyte_database', target.database) }}"
identifier: "{{ var('github_issue_merged_identifier', 'issue') }}"
description: "A model that extracts and transforms GitHub issues data from the source table."
columns:
- name: issue_id
description: "Unique identifier for each GitHub issue."
- name: body
description: "The body text of the GitHub issue."
- name: closed_at
description: "The timestamp when the GitHub issue was closed."
- name: created_at
description: "The timestamp when the GitHub issue was created."
- name: locked
description: "Indicates if the GitHub issue is locked."
- name: milestone_id
description: "The unique identifier for the milestone associated with the GitHub issue."
- name: number
description: "The issue number for the GitHub repository."
- name: pull_request
description: "Indicates if the GitHub issue is a pull request."
- name: repository_id
description: "The unique identifier for the repository where the GitHub issue resides."
- name: state
description: "The state of the GitHub issue (e.g., open, closed)."
- name: title
description: "The title of the GitHub issue."
- name: updated_at
description: "The timestamp when the GitHub issue was last updated."
- name: user_id
description: "The unique identifier for the user who created the GitHub issue."
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
with tmp as (
SELECT
cast(id as {{ dbt.type_string() }}) as issue_id,
{{ fivetran_utils.json_extract('assignee', 'id') }} as user_id
FROM
{{ source('source_github', 'issues') }}
)
select * from tmp
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
version: 2

models:
- name: issue_assignee
schema: "{{ var('airbyte_schema', target.schema) }}"
database: "{{ var('airbyte_database', target.database) }}"
identifier: "{{ var('github_issue_assignee_identifier', 'issue_assignee') }}"
description: All fields and field values associated with issue assignees.
config:
+enabled: "{{ var('using_fivetran_model', False) }}"
columns:
- name: issue_id
description: The ID of the issue associated with the field.
- name: user_id
description: The ID of the user assigned to the issue.
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
with tmp as (
SELECT
cast(id as {{ dbt.type_string() }}) as issue_id,
updated_at,
case when closed_at is not null then true else false end as closed
FROM
{{ source('source_github', 'issues') }}
)
select * from tmp
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
version: 2

models:
- name: issue_closed_history
schema: "{{ var('airbyte_schema', target.schema) }}"
database: "{{ var('airbyte_database', target.database) }}"
identifier: "{{ var('github_issue_closed_history_identifier', 'issue_closed_history') }}"
description: All fields and field values associated with issues with closed history.
config:
+enabled: "{{ var('using_fivetran_model', False) }}"
columns:
- name: issue_id
description: The ID of the issue associated with the field.
- name: updated_at
description: The timestamp for when the issue was last modified.
- name: is_closed
description: The boolean for whether the issue is closed or not.
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
select
cast(c.id as {{ dbt.type_string() }}) as id,
cast(i.id as {{ dbt.type_string() }}) issue_id,
{{ fivetran_utils.json_extract('c."user"', 'id') }} as user_id,
c.created_at
from {{ source('source_github', 'comments') }} c
left join {{ source('source_github', 'issues') }} i on c.issue_url = i.url
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
version: 2

models:
- name: issue_comment
schema: "{{ var('airbyte_schema', target.schema) }}"
database: "{{ var('airbyte_database', target.database) }}"
identifier: "{{ var('github_issue_assignee_identifier', 'issue_comment') }}"
description: All fields and field values associated with issue assignees.
config:
+enabled: "{{ var('using_fivetran_model', False) }}"
columns:
- name: id
description: The ID of the issue comment.
- name: issue_id
description: The ID of the issue.
- name: user_id
description: The ID of the user made the comment.
- name: created_at
description: Date the comment was made.
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
select
cast(id as {{ dbt.type_string() }}) as issue_id,
f.value ->> 'id' as label_id
from {{ source('source_github', 'issues') }} as i
join lateral jsonb_array_elements(i.labels) as f(value) on true
15 changes: 15 additions & 0 deletions connectors/source_github/models/fivetran_converter/issue_label.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
version: 2

models:
- name: issue_label
schema: "{{ var('airbyte_schema', target.schema) }}"
database: "{{ var('airbyte_database', target.database) }}"
identifier: "{{ var('github_issue_assignee_identifier', 'issue_label') }}"
description: All fields and field values associated with issue assignees.
config:
+enabled: "{{ var('using_fivetran_model', False) }}"
columns:
- name: issue_id
description: The ID of the issue associated with the field.
- name: label_id
description: The ID of the label to the issue.
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
SELECT
cast(id as {{ dbt.type_string() }}) as issue_id,
{{ fivetran_utils.json_extract('pull_request', 'merged_at') }} as merged_at
FROM
{{ source('source_github', 'issues') }}
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
version: 2

models:
- name: issue_merged
schema: "{{ var('airbyte_schema', target.schema) }}"
database: "{{ var('airbyte_database', target.database) }}"
identifier: "{{ var('github_issue_merged_identifier', 'issue_merged') }}"
description: All fields and field values associated with issues with issues merged.
config:
+enabled: "{{ var('using_fivetran_model', False) }}"
columns:
- name: issue_id
description: The ID of the issue associated with the field.
- name: merged_at
description: The timestamp for when the issue was merged.
8 changes: 8 additions & 0 deletions connectors/source_github/models/fivetran_converter/label.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
select
cast(id as {{ dbt.type_string() }}) as id,
color,
description,
"default" as is_default,
name,
url
from {{ source('source_github', 'issue_labels') }}
23 changes: 23 additions & 0 deletions connectors/source_github/models/fivetran_converter/label.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
version: 2

models:
- name: label
schema: "{{ var('airbyte_schema', target.schema) }}"
database: "{{ var('airbyte_database', target.database) }}"
identifier: "{{ var('github_issue_merged_identifier', 'label') }}"
description: "All labels associated with GitHub issues."
config:
+enabled: "{{ var('using_fivetran_model', False) }}"
columns:
- name: id
description: "The unique identifier of the label."
- name: color
description: "The color code associated with the label."
- name: description
description: "A textual description of the label."
- name: is_default
description: "Indicates if the label is the default label."
- name: name
description: "The name of the label."
- name: url
description: "The URL linking to the label in GitHub."
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
select
cast(pr.id as {{ dbt.type_string() }}) as id,
cast(i.id as {{ dbt.type_string() }}) as issue_id,
{{ fivetran_utils.json_extract('head', 'repo_id')}} as head_repo_id,
head -> 'user' ->> 'id' as head_user_id
from {{ source('source_github', 'pull_requests') }} as pr
left join {{ source('source_github', 'issues') }} as i on pr.issue_url = i.url
Loading
Loading