Reformating and enhancing the RosettaDB documentation, initial version

AdaptiveScale · Nov 8, 2024 · 83eeb5d · 83eeb5d
1 parent c3dee4e
commit 83eeb5d
Show file tree

Hide file tree

Showing 15 changed files with 772 additions and 728 deletions.
diff --git a/README.md b/README.md
diff --git a/docs/markdowns/apply.md b/docs/markdowns/apply.md
@@ -0,0 +1,85 @@
+#### apply
+Gets current model and compares with state of database, generates ddl for changes and applies to database. If you set `git_auto_commit` to `true` in `main.conf` it will automatically push the new model to your Git repo of the rosetta project.
+
+    rosetta [-c, --config CONFIG_FILE] apply [-h, --help] [-s, --source CONNECTION_NAME]
+
+Parameter | Description
+--- | ---
+-h, --help | Show the help message and exit.
+-c, --config CONFIG_FILE | YAML config file.  If none is supplied it will use main.conf in the current directory if it exists.
+-s, --source CONNECTION_NAME | The source connection is used to specify which models and connection to use.
+-m, --model MODEL_FILE (Optional) | The model file to use for apply. Default is `model.yaml`
+
+
+Example:
+
+(Actual database)
+```yaml
+---
+safeMode: false
+databaseType: "mysql"
+operationLevel: database
+tables:
+  - name: "actor"
+    type: "TABLE"
+    columns:
+      - name: "actor_id"
+        typeName: "SMALLINT UNSIGNED"
+        ordinalPosition: 0
+        primaryKeySequenceId: 1
+        columnDisplaySize: 5
+        scale: 0
+        precision: 5
+        nullable: false
+        primaryKey: true
+        autoincrement: false
+        tests:
+          assertion:
+            - operator: '='
+              value: 16
+              expected: 1
+```
+
+(Expected database)
+```yaml
+---
+safeMode: false
+databaseType: "mysql"
+operationLevel: database
+tables:
+  - name: "actor"
+    type: "TABLE"
+    columns:
+      - name: "actor_id"
+        typeName: "SMALLINT UNSIGNED"
+        ordinalPosition: 0
+        primaryKeySequenceId: 1
+        columnDisplaySize: 5
+        scale: 0
+        precision: 5
+        nullable: false
+        primaryKey: true
+        autoincrement: false
+        tests:
+          assertion:
+            - operator: '='
+              value: 16
+              expected: 1
+      - name: "first_name"
+        typeName: "VARCHAR"
+        ordinalPosition: 0
+        primaryKeySequenceId: 0
+        columnDisplaySize: 45
+        scale: 0
+        precision: 45
+        nullable: false
+        primaryKey: false
+        autoincrement: false
+        tests:
+          assertion:
+            - operator: '!='
+              value: 'Michael'
+              expected: 1
+```
+
+Description: Our actual database does not contain `first_name` so we expect it to alter the table and add the column, inside the source directory there will be the executed DDL and a snapshot of the current database.
diff --git a/docs/markdowns/compile.md b/docs/markdowns/compile.md
@@ -0,0 +1,18 @@
+#### compile
+This command generates a DDL for a target database based on the source DBML which was generated by the previous command (`extract`).
+
+    rosetta [-c, --config CONFIG_FILE] compile [-h, --help] [-t, --target CONNECTION_NAME] [-s, --source CONNECTION_NAME]
+
+Parameter | Description
+--- | ---
+-h, --help | Show the help message and exit.
+-c, --config CONFIG_FILE | YAML config file.  If none is supplied it will use main.conf in the current directory if it exists.
+-s, --source CONNECTION_NAME (Optional) | The source connection name where models are generated.
+-t, --target CONNECTION_NAME | The target connection name in which source DBML converts to.
+-d, --with-drop | Add query to drop tables when generating ddl.
+
+Example:
+```yaml
+CREATE SCHEMA breathe;
+CREATE TABLE breathe.profiles(id INTEGER not null AUTO_INCREMENT, name STRING not null);
+```
diff --git a/docs/markdowns/dbt.md b/docs/markdowns/dbt.md
@@ -0,0 +1,10 @@
+#### dbt
+This is the command that generates dbt models for a source DBML which was generated by the previous command (`extract`).
+
+    rosetta [-c, --config CONFIG_FILE] dbt [-h, --help] [-s, --source CONNECTION_NAME]
+
+Parameter | Description
+--- | ---
+-h, --help | Show the help message and exit.
+-c, --config CONFIG_FILE | YAML config file.  If none is supplied it will use main.conf in the current directory if it exists.
+-s, --source CONNECTION_NAME | The source connection name where models are generated.
diff --git a/docs/markdowns/diff.md b/docs/markdowns/diff.md
@@ -0,0 +1,23 @@
+#### diff
+Show the difference between the local model and the database. Check if any table is removed, or added or if any columns have changed.
+
+    rosetta [-c, --config CONFIG_FILE] diff [-h, --help] [-s, --source CONNECTION_NAME]
+
+Parameter | Description
+--- | ---
+-h, --help | Show the help message and exit.
+-c, --config CONFIG_FILE | YAML config file.  If none is supplied it will use main.conf in the current directory if it exists.
+-s, --source CONNECTION_NAME | The source connection is used to specify which models and connection to use.
+-m, --model MODEL_FILE (Optional) | The model file to use for apply. Default is `model.yaml`
+
+
+Example:
+```
+There are changes between local model and targeted source
+Table Changed: Table 'actor' columns changed
+Column Changed: Column 'actor_id' in table 'actor' changed 'Precision'. New value: '1', old value: '5'
+Column Changed: Column 'actor_id' in table 'actor' changed 'Autoincrement'. New value: 'true', old value: 'false'
+Column Changed: Column 'actor_id' in table 'actor' changed 'Primary key'. New value: 'false', old value: 'true'
+Column Changed: Column 'actor_id' in table 'actor' changed 'Nullable'. New value: 'true', old value: 'false'
+Table Added: Table 'address'
+```
diff --git a/docs/markdowns/download_drivers.md b/docs/markdowns/download_drivers.md
@@ -0,0 +1,75 @@
+## Downloading Drivers
+You need the JDBC drivers to connect to the sources/targets that you will use with the rosetta tool.
+The JDBC drivers for the rosetta supported databases can be downloaded from the following URLs:
+
+- [BigQuery JDBC 4.2](https://storage.googleapis.com/simba-bq-release/jdbc/SimbaJDBCDriverforGoogleBigQuery42_1.3.0.1001.zip)
+- [Snowflake JDBC 3.13.19](https://repo1.maven.org/maven2/net/snowflake/snowflake-jdbc/3.13.19/snowflake-jdbc-3.13.19.jar)
+- [Postgresql JDBC 42.3.7](https://jdbc.postgresql.org/download/postgresql-42.3.7.jar)
+- [MySQL JDBC 8.0.30](https://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-8.0.30.zip)
+- [Kinetica JDBC 7.1.7.7](https://github.com/kineticadb/kinetica-client-jdbc/archive/refs/tags/v7.1.7.7.zip)
+- [Google Cloud Spanner JDBC 2.6.2](https://search.maven.org/remotecontent?filepath=com/google/cloud/google-cloud-spanner-jdbc/2.6.2/google-cloud-spanner-jdbc-2.6.2-single-jar-with-dependencies.jar)
+- [SQL Server JDBC 12.2.0](https://go.microsoft.com/fwlink/?linkid=2223050)
+- [DB2 JDBC jcc4](https://repo1.maven.org/maven2/com/ibm/db2/jcc/db2jcc/db2jcc4/db2jcc-db2jcc4.jar)
+- [Oracle JDBC 23.2.0.0](https://download.oracle.com/otn-pub/otn_software/jdbc/232-DeveloperRel/ojdbc11.jar)
+
+### Example connection string configurations for databases
+
+### BigQuery (service-based authentication OAuth 0)
+```
+url: jdbc:bigquery://https://www.googleapis.com/bigquery/v2:443;ProjectId=<PROJECT-ID>;AdditionalProjects=bigquery-public-data;OAuthType=0;OAuthServiceAcctEmail=<EMAIL>;OAuthPvtKeyPath=<SERVICE-ACCOUNT-PATH>
+```
+
+### BigQuery (pre-generated token authentication OAuth 2)
+```
+jdbc:bigquery://https://www.googleapis.com/bigquery/v2:443;OAuthType=2;ProjectId=<PROJECT-ID>;OAuthAccessToken=<ACCESS-TOKEN>;OAuthRefreshToken=<REFRESH-TOKEN>;OAuthClientId=<CLIENT-ID>;OAuthClientSecret=<CLIENT-SECRET>;
+```
+
+### BigQuery (application default credentials authentication OAuth 3)
+```
+jdbc:bigquery://https://www.googleapis.com/bigquery/v2:443;OAuthType=3;ProjectId=<PROJECT-ID>;
+```
+
+### Snowflake
+```
+url: jdbc:snowflake://<HOST>:443/?db=<DATABASE>&user=<USER>&password=<PASSWORD>
+```
+
+### PostgreSQL
+```
+url: jdbc:postgresql://<HOST>:5432/<DATABASE>?user=<USER>&password=<PASSWORD>
+```
+
+### MySQL
+```
+url: jdbc:mysql://<USER>:<PASSWORD>@<HOST>:3306/<DATABASE>
+```
+
+### Kinetica
+```
+url: jdbc:kinetica:URL=http://<HOST>:9191;CombinePrepareAndExecute=1
+```
+
+### Google Cloud Spanner
+```
+url: jdbc:cloudspanner:/projects/my-project/instances/my-instance/databases/my-db;credentials=/path/to/credentials.json
+```
+
+### Google CLoud Spanner (Emulator)
+```
+url: jdbc:cloudspanner://localhost:9010/projects/test/instances/test/databases/test?autoConfigEmulator=true
+```
+
+### SQL Server
+```
+url: jdbc:sqlserver://<HOST>:1433;databaseName=<DATABASE>
+```
+
+### DB2
+```
+url: jdbc:db2://<HOST>:50000;<DATABASE>
+```
+
+### ORACLE
+```
+url: jdbc:oracle:thin:<HOST>:1521:<SID>
+```
diff --git a/docs/markdowns/drivers.md b/docs/markdowns/drivers.md
@@ -0,0 +1,22 @@
+#### drivers
+This command can list drivers that are listed in a `drivers.yaml` file and by choosing a driver you can download it to the `ROSETTA_DRIVERS` directory which will be automatically ready to use.
+
+    rosetta drivers [-h, --help] [-f, --file] [--list] <indexToDownload> [-dl, --download]
+
+Parameter | Description
+--- | ---
+-h, --help | Show the help message and exit.
+-f, --file DRIVERS_FILE | YAML drivers file path.  If none is supplied it will use drivers.yaml in the current directory and then fallback to our default one.
+--list | Used to list all available drivers.
+-dl, --download | Used to download selected driver by index.
+indexToDownload | Chooses which driver to download depending on the index of the driver.
+
+
+***Example*** (drivers.yaml)
+
+```yaml
+- name: MySQL 8.0.30
+  link: https://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-8.0.30.zip
+- name: Postgresql 42.3.7
+  link: https://jdbc.postgresql.org/download/postgresql-42.3.7.jar
+```
diff --git a/docs/markdowns/extract.md b/docs/markdowns/extract.md
@@ -0,0 +1,46 @@
+#### extract
+This is the command that extracts the schema from a database and generates declarative DBML models that can be used for conversion to alternate database targets.
+
+    rosetta [-c, --config CONFIG_FILE] extract [-h, --help] [-s, --source CONNECTION_NAME] [-t, --convert-to CONNECTION_NAME]
+
+Parameter | Description
+--- | ---
+-h, --help | Show the help message and exit.
+-c, --config CONFIG_FILE | YAML config file.  If none is supplied it will use main.conf in the current directory if it exists.
+-s, --source CONNECTION_NAME | The source connection name to extract schema from.
+-t, --convert-to CONNECTION_NAME (Optional) | The target connection name in which source DBML converts to.
+
+Example:
+```yaml
+---
+safeMode: false
+databaseType: bigquery
+operationLevel: database
+tables:
+- name: "profiles"
+  type: "TABLE"
+  schema: "breathe"
+  columns:
+  - name: "id"
+    typeName: "INT64"
+    jdbcDataType: "4"
+    ordinalPosition: 0
+    primaryKeySequenceId: 1
+    columnDisplaySize: 10
+    scale: 0
+    precision: 10
+    primaryKey: false
+    nullable: false
+    autoincrement: true
+  - name: "name"
+    typeName: "STRING"
+    jdbcDataType: "12"
+    ordinalPosition: 0
+    primaryKeySequenceId: 0
+    columnDisplaySize: 255
+    scale: 0
+    precision: 255
+    primaryKey: false
+    nullable: false
+    autoincrement: false
+```
diff --git a/docs/markdowns/generate.md b/docs/markdowns/generate.md
@@ -0,0 +1,13 @@
+#### generate
+This command will generate Spark Python (file) or Spark Scala (file), firstly it extracts a schema from a source database and gets connection properties from the source connection, then it creates a python (file) or scala (file) that translates schemas, which is ready to transfer data from source to target.
+
+    rosetta [-c, --config CONFIG_FILE] generate [-h, --help] [-s, --source CONNECTION_NAME] [-t, --target CONNECTION_NAME] [--pyspark] [--scala]
+
+Parameter | Description
+--- | ---
+-h, --help | Show the help message and exit.
+-c, --config CONFIG_FILE | YAML config file.  If none is supplied it will use main.conf in the current directory if it exists.
+-s, --source CONNECTION_NAME | The source connection name to extract schema from.
+-t, --target CONNECTION_NAME| The target connection name where the data will be transfered.
+--pyspark | Generates the Spark SQL file.
+--scala | Generates the Scala SQL file.
diff --git a/docs/markdowns/init.md b/docs/markdowns/init.md
@@ -0,0 +1,30 @@
+#### init
+This command will generate a project (directory) if specified, a default configuration file located in the current directory with example connections for `bigquery` and `snowflake`, and the model directory.
+
+    rosetta init [PROJECT_NAME]
+
+Parameter | Description
+--- | ---
+(Optional) PROJECT_NAME | Project name (directory) where the configuration file and model directory will be created.
+
+Example:
+```yaml
+#example with 2 connections
+connections:
+  - name: snowflake_weather_prod
+    databaseName: SNOWFLAKE_SAMPLE_DATA
+    schemaName: WEATHER
+    dbType: snowflake
+    url: jdbc:snowflake://<account_identifier>.snowflakecomputing.com/?<connection_params>
+    userName: bob
+    password: bobPassword
+  - name: bigquery_prod
+    databaseName: bigquery-public-data
+    schemaName: breathe
+    dbType: bigquery
+    url: jdbc:bigquery://[Host]:[Port];ProjectId=[Project];OAuthType= [AuthValue];[Property1]=[Value1];[Property2]=[Value2];...
+    userName: user
+    password: password
+    tables:
+      - bigquery_table
+```