Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: enhance bq2bq plugin documentation for end user #24

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 41 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,16 +6,47 @@
Optimus's transformation plugins are implementations of Task and Hook interfaces that allows
execution of arbitrary jobs in optimus.

## To install plugins via homebrew
```shell
brew tap goto/taps
brew install optimus-plugins-goto
# Capabilities

- Transform data by BQ SQL syntax and store the transformed data to certain BQ table
- Execute the transformation process with a certain GCP project
- Support various load method, eg. APPEND, REPLACE, MERGE
- Support Bigquery DML Merge statement to handle spillover case
- Support transformation for partitioned tables such as partition by ingestion time (default) and partition by column
- Dry run support

# Use Cases

Base configurations:
```yaml
# ./job.yaml
...
task:
name: bq2bq
config:
LOAD_METHOD: REPLACE
SQL_TYPE: STANDARD
PROJECT: project
DATASET: dataset
TABLE: destination
BQ_SERVICE_ACCOUNT: bq_secret_here
...
...
```

```sql
-- ./assets/query.sql
select field1, field2 from `project.dataset.source`
```

## To install plugins via shell
## Basic transforming the data and store it to the destination BQ table

Use the base configuration above to extract the data from `project.dataset.source` table. The query written on `./assets/query.sql` is used for selecting the records to be loaded to `project.dataset.destination` table (it's configurable through `PROJECT`, `DATASET`, and `TABLE`). The schema of destination table should match with the schema of the record result of that query. `BQ_SERVICE_ACCOUNT` is mandatory credentials to access the BQ api to execute the query.

## Load the queried records to destination BQ table by appending / replace / merge

How the query result load to destination table is depend on `LOAD_METHOD` configuration. [more about load method](https://github.com/goto/transformers/tree/main/task/bq2bq)

## Extracting data through configurable BQ EXECUTION_PROJECT

```shell
curl -sL ${PLUGIN_RELEASE_URL} | tar xvz
chmod +x optimus-*
mv optimus-* /usr/bin/
```
`EXECUTION_PROJECT` is an additional configuration for the job to execute the query through non-default project. It's useful for customing the allocation of BQ slots. For example when the job requires a lot of resources, it's better to delegate this execution to another dedicated project.
Loading