Skip to content

Commit

Permalink
Reformating and enhancing the RosettaDB documentation, initial version
Browse files Browse the repository at this point in the history
  • Loading branch information
Femi3211 committed Nov 8, 2024
1 parent c3dee4e commit 83eeb5d
Show file tree
Hide file tree
Showing 15 changed files with 772 additions and 728 deletions.
763 changes: 35 additions & 728 deletions README.md

Large diffs are not rendered by default.

85 changes: 85 additions & 0 deletions docs/markdowns/apply.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
#### apply
Gets current model and compares with state of database, generates ddl for changes and applies to database. If you set `git_auto_commit` to `true` in `main.conf` it will automatically push the new model to your Git repo of the rosetta project.

rosetta [-c, --config CONFIG_FILE] apply [-h, --help] [-s, --source CONNECTION_NAME]

Parameter | Description
--- | ---
-h, --help | Show the help message and exit.
-c, --config CONFIG_FILE | YAML config file. If none is supplied it will use main.conf in the current directory if it exists.
-s, --source CONNECTION_NAME | The source connection is used to specify which models and connection to use.
-m, --model MODEL_FILE (Optional) | The model file to use for apply. Default is `model.yaml`


Example:

(Actual database)
```yaml
---
safeMode: false
databaseType: "mysql"
operationLevel: database
tables:
- name: "actor"
type: "TABLE"
columns:
- name: "actor_id"
typeName: "SMALLINT UNSIGNED"
ordinalPosition: 0
primaryKeySequenceId: 1
columnDisplaySize: 5
scale: 0
precision: 5
nullable: false
primaryKey: true
autoincrement: false
tests:
assertion:
- operator: '='
value: 16
expected: 1
```
(Expected database)
```yaml
---
safeMode: false
databaseType: "mysql"
operationLevel: database
tables:
- name: "actor"
type: "TABLE"
columns:
- name: "actor_id"
typeName: "SMALLINT UNSIGNED"
ordinalPosition: 0
primaryKeySequenceId: 1
columnDisplaySize: 5
scale: 0
precision: 5
nullable: false
primaryKey: true
autoincrement: false
tests:
assertion:
- operator: '='
value: 16
expected: 1
- name: "first_name"
typeName: "VARCHAR"
ordinalPosition: 0
primaryKeySequenceId: 0
columnDisplaySize: 45
scale: 0
precision: 45
nullable: false
primaryKey: false
autoincrement: false
tests:
assertion:
- operator: '!='
value: 'Michael'
expected: 1
```
Description: Our actual database does not contain `first_name` so we expect it to alter the table and add the column, inside the source directory there will be the executed DDL and a snapshot of the current database.
18 changes: 18 additions & 0 deletions docs/markdowns/compile.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
#### compile
This command generates a DDL for a target database based on the source DBML which was generated by the previous command (`extract`).

rosetta [-c, --config CONFIG_FILE] compile [-h, --help] [-t, --target CONNECTION_NAME] [-s, --source CONNECTION_NAME]

Parameter | Description
--- | ---
-h, --help | Show the help message and exit.
-c, --config CONFIG_FILE | YAML config file. If none is supplied it will use main.conf in the current directory if it exists.
-s, --source CONNECTION_NAME (Optional) | The source connection name where models are generated.
-t, --target CONNECTION_NAME | The target connection name in which source DBML converts to.
-d, --with-drop | Add query to drop tables when generating ddl.

Example:
```yaml
CREATE SCHEMA breathe;
CREATE TABLE breathe.profiles(id INTEGER not null AUTO_INCREMENT, name STRING not null);
```
10 changes: 10 additions & 0 deletions docs/markdowns/dbt.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
#### dbt
This is the command that generates dbt models for a source DBML which was generated by the previous command (`extract`).

rosetta [-c, --config CONFIG_FILE] dbt [-h, --help] [-s, --source CONNECTION_NAME]

Parameter | Description
--- | ---
-h, --help | Show the help message and exit.
-c, --config CONFIG_FILE | YAML config file. If none is supplied it will use main.conf in the current directory if it exists.
-s, --source CONNECTION_NAME | The source connection name where models are generated.
23 changes: 23 additions & 0 deletions docs/markdowns/diff.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
#### diff
Show the difference between the local model and the database. Check if any table is removed, or added or if any columns have changed.

rosetta [-c, --config CONFIG_FILE] diff [-h, --help] [-s, --source CONNECTION_NAME]

Parameter | Description
--- | ---
-h, --help | Show the help message and exit.
-c, --config CONFIG_FILE | YAML config file. If none is supplied it will use main.conf in the current directory if it exists.
-s, --source CONNECTION_NAME | The source connection is used to specify which models and connection to use.
-m, --model MODEL_FILE (Optional) | The model file to use for apply. Default is `model.yaml`


Example:
```
There are changes between local model and targeted source
Table Changed: Table 'actor' columns changed
Column Changed: Column 'actor_id' in table 'actor' changed 'Precision'. New value: '1', old value: '5'
Column Changed: Column 'actor_id' in table 'actor' changed 'Autoincrement'. New value: 'true', old value: 'false'
Column Changed: Column 'actor_id' in table 'actor' changed 'Primary key'. New value: 'false', old value: 'true'
Column Changed: Column 'actor_id' in table 'actor' changed 'Nullable'. New value: 'true', old value: 'false'
Table Added: Table 'address'
```
75 changes: 75 additions & 0 deletions docs/markdowns/download_drivers.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
## Downloading Drivers
You need the JDBC drivers to connect to the sources/targets that you will use with the rosetta tool.
The JDBC drivers for the rosetta supported databases can be downloaded from the following URLs:

- [BigQuery JDBC 4.2](https://storage.googleapis.com/simba-bq-release/jdbc/SimbaJDBCDriverforGoogleBigQuery42_1.3.0.1001.zip)
- [Snowflake JDBC 3.13.19](https://repo1.maven.org/maven2/net/snowflake/snowflake-jdbc/3.13.19/snowflake-jdbc-3.13.19.jar)
- [Postgresql JDBC 42.3.7](https://jdbc.postgresql.org/download/postgresql-42.3.7.jar)
- [MySQL JDBC 8.0.30](https://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-8.0.30.zip)
- [Kinetica JDBC 7.1.7.7](https://github.com/kineticadb/kinetica-client-jdbc/archive/refs/tags/v7.1.7.7.zip)
- [Google Cloud Spanner JDBC 2.6.2](https://search.maven.org/remotecontent?filepath=com/google/cloud/google-cloud-spanner-jdbc/2.6.2/google-cloud-spanner-jdbc-2.6.2-single-jar-with-dependencies.jar)
- [SQL Server JDBC 12.2.0](https://go.microsoft.com/fwlink/?linkid=2223050)
- [DB2 JDBC jcc4](https://repo1.maven.org/maven2/com/ibm/db2/jcc/db2jcc/db2jcc4/db2jcc-db2jcc4.jar)
- [Oracle JDBC 23.2.0.0](https://download.oracle.com/otn-pub/otn_software/jdbc/232-DeveloperRel/ojdbc11.jar)

### Example connection string configurations for databases

### BigQuery (service-based authentication OAuth 0)
```
url: jdbc:bigquery://https://www.googleapis.com/bigquery/v2:443;ProjectId=<PROJECT-ID>;AdditionalProjects=bigquery-public-data;OAuthType=0;OAuthServiceAcctEmail=<EMAIL>;OAuthPvtKeyPath=<SERVICE-ACCOUNT-PATH>
```

### BigQuery (pre-generated token authentication OAuth 2)
```
jdbc:bigquery://https://www.googleapis.com/bigquery/v2:443;OAuthType=2;ProjectId=<PROJECT-ID>;OAuthAccessToken=<ACCESS-TOKEN>;OAuthRefreshToken=<REFRESH-TOKEN>;OAuthClientId=<CLIENT-ID>;OAuthClientSecret=<CLIENT-SECRET>;
```

### BigQuery (application default credentials authentication OAuth 3)
```
jdbc:bigquery://https://www.googleapis.com/bigquery/v2:443;OAuthType=3;ProjectId=<PROJECT-ID>;
```

### Snowflake
```
url: jdbc:snowflake://<HOST>:443/?db=<DATABASE>&user=<USER>&password=<PASSWORD>
```

### PostgreSQL
```
url: jdbc:postgresql://<HOST>:5432/<DATABASE>?user=<USER>&password=<PASSWORD>
```

### MySQL
```
url: jdbc:mysql://<USER>:<PASSWORD>@<HOST>:3306/<DATABASE>
```

### Kinetica
```
url: jdbc:kinetica:URL=http://<HOST>:9191;CombinePrepareAndExecute=1
```

### Google Cloud Spanner
```
url: jdbc:cloudspanner:/projects/my-project/instances/my-instance/databases/my-db;credentials=/path/to/credentials.json
```

### Google CLoud Spanner (Emulator)
```
url: jdbc:cloudspanner://localhost:9010/projects/test/instances/test/databases/test?autoConfigEmulator=true
```

### SQL Server
```
url: jdbc:sqlserver://<HOST>:1433;databaseName=<DATABASE>
```

### DB2
```
url: jdbc:db2://<HOST>:50000;<DATABASE>
```

### ORACLE
```
url: jdbc:oracle:thin:<HOST>:1521:<SID>
```
22 changes: 22 additions & 0 deletions docs/markdowns/drivers.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
#### drivers
This command can list drivers that are listed in a `drivers.yaml` file and by choosing a driver you can download it to the `ROSETTA_DRIVERS` directory which will be automatically ready to use.

rosetta drivers [-h, --help] [-f, --file] [--list] <indexToDownload> [-dl, --download]

Parameter | Description
--- | ---
-h, --help | Show the help message and exit.
-f, --file DRIVERS_FILE | YAML drivers file path. If none is supplied it will use drivers.yaml in the current directory and then fallback to our default one.
--list | Used to list all available drivers.
-dl, --download | Used to download selected driver by index.
indexToDownload | Chooses which driver to download depending on the index of the driver.


***Example*** (drivers.yaml)

```yaml
- name: MySQL 8.0.30
link: https://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-8.0.30.zip
- name: Postgresql 42.3.7
link: https://jdbc.postgresql.org/download/postgresql-42.3.7.jar
```
46 changes: 46 additions & 0 deletions docs/markdowns/extract.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
#### extract
This is the command that extracts the schema from a database and generates declarative DBML models that can be used for conversion to alternate database targets.

rosetta [-c, --config CONFIG_FILE] extract [-h, --help] [-s, --source CONNECTION_NAME] [-t, --convert-to CONNECTION_NAME]

Parameter | Description
--- | ---
-h, --help | Show the help message and exit.
-c, --config CONFIG_FILE | YAML config file. If none is supplied it will use main.conf in the current directory if it exists.
-s, --source CONNECTION_NAME | The source connection name to extract schema from.
-t, --convert-to CONNECTION_NAME (Optional) | The target connection name in which source DBML converts to.

Example:
```yaml
---
safeMode: false
databaseType: bigquery
operationLevel: database
tables:
- name: "profiles"
type: "TABLE"
schema: "breathe"
columns:
- name: "id"
typeName: "INT64"
jdbcDataType: "4"
ordinalPosition: 0
primaryKeySequenceId: 1
columnDisplaySize: 10
scale: 0
precision: 10
primaryKey: false
nullable: false
autoincrement: true
- name: "name"
typeName: "STRING"
jdbcDataType: "12"
ordinalPosition: 0
primaryKeySequenceId: 0
columnDisplaySize: 255
scale: 0
precision: 255
primaryKey: false
nullable: false
autoincrement: false
```
13 changes: 13 additions & 0 deletions docs/markdowns/generate.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
#### generate
This command will generate Spark Python (file) or Spark Scala (file), firstly it extracts a schema from a source database and gets connection properties from the source connection, then it creates a python (file) or scala (file) that translates schemas, which is ready to transfer data from source to target.

rosetta [-c, --config CONFIG_FILE] generate [-h, --help] [-s, --source CONNECTION_NAME] [-t, --target CONNECTION_NAME] [--pyspark] [--scala]

Parameter | Description
--- | ---
-h, --help | Show the help message and exit.
-c, --config CONFIG_FILE | YAML config file. If none is supplied it will use main.conf in the current directory if it exists.
-s, --source CONNECTION_NAME | The source connection name to extract schema from.
-t, --target CONNECTION_NAME| The target connection name where the data will be transfered.
--pyspark | Generates the Spark SQL file.
--scala | Generates the Scala SQL file.
30 changes: 30 additions & 0 deletions docs/markdowns/init.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
#### init
This command will generate a project (directory) if specified, a default configuration file located in the current directory with example connections for `bigquery` and `snowflake`, and the model directory.

rosetta init [PROJECT_NAME]

Parameter | Description
--- | ---
(Optional) PROJECT_NAME | Project name (directory) where the configuration file and model directory will be created.

Example:
```yaml
#example with 2 connections
connections:
- name: snowflake_weather_prod
databaseName: SNOWFLAKE_SAMPLE_DATA
schemaName: WEATHER
dbType: snowflake
url: jdbc:snowflake://<account_identifier>.snowflakecomputing.com/?<connection_params>
userName: bob
password: bobPassword
- name: bigquery_prod
databaseName: bigquery-public-data
schemaName: breathe
dbType: bigquery
url: jdbc:bigquery://[Host]:[Port];ProjectId=[Project];OAuthType= [AuthValue];[Property1]=[Value1];[Property2]=[Value2];...
userName: user
password: password
tables:
- bigquery_table
```
Loading

0 comments on commit 83eeb5d

Please sign in to comment.