Striim API Orchestration Python Program

This Python program utilizes the Striim API to orchestrate the creation, deployment, starting, reviewing status, undeploying, and dropping of Striim Applications. It helps parallelize and automate the load by splitting data into pieces, such as utilizing reading huge Oracle tables, computing read ranges for parallel reading, or splitting based on primary key values or date ranges. The resulting output is a set of queries.

This app currently utilizes either a BQ table or local TinyDB (current_position.json file) as both a historical record and a place to orchestrate progress.

Key Features

Automation: Handles the entire lifecycle of Striim applications, including creation, deployment, starting, monitoring, stopping, undeploying, and dropping.
Parallelization: Divides the data load into smaller units to run multiple Striim applications concurrently, optimizing throughput.
Query-Based Processing: Reads queries from a file (queryfile.txt) where each line contains a query and its target table.
TQL-Driven Deployment: Uses a TQL template file (admin.SW.tql) to generate and deploy Striim applications for each query.
State Management: Utilizes a database (BigQuery or TinyDB) to track the progress and status of each query execution.
Monitoring and Logging: Continuously monitors the status of applications and logs events for debugging and analysis.

Requirements

Python 3.6 or higher: The program is written in Python and requires a compatible version.
Striim Environment: Access to a Striim cluster with API connectivity.
Required Python Libraries: requests, google-cloud-bigquery (if using BigQuery), tinydb.
Configuration File: A config.py file to store Striim credentials, database settings, and other parameters.
Input Files:
- queryfile.txt: Contains the queries to be executed, one per line, with the target table separated by a delimiter (configurable in config.py).
- admin.SW.tql: A TQL file that serves as a template for creating Striim applications. This file should include placeholders for the query and the target table.

Configuration

The program's behavior can be customized through the config.py file. Here are some of the key configuration options:

Striim Connection Details: STRIIM_URL_PREFIX, STRIIM_NODE, STRIIM_ADMIN_USER, STRIIM_ADMIN_PWD, STRIIM_API_TOKEN.
Database Selection: STAGE_DB_LOCATION (choose between BQ for BigQuery or TinyDB for a local file-based database).
BigQuery Settings: BQ_KEYFILE_LOCATION, PROJECT_ID, DATASET_ID, TABLE_ID (if using BigQuery).
Concurrency Control: CONCURRENT_APPS_MAX (maximum number of concurrent Striim applications).
Monitoring Interval: APP_MONITOR_INTERVAL_SECONDS (how often to check the status of applications).
Deployment Delay: DEPLOY_WAIT_TIME_SECONDS (minimum time to wait between deploying new applications).
Logging: LOG_OUTPUT_NAME, LOG_OUTPUT_PATH.

queryfile.txt

This file is simply a two-column file, containing the query to run and the target table name. The delimeter is definied in config.py and is pipe (|) by default. Here is an example:

SELECT * FROM QATEST.WF_PENDING_ACTIVITY WHERE ID < 1000               |QATEST2.WF_PENDING_ACTIVITY
SELECT * FROM QATEST.WF_PENDING_ACTIVITY WHERE ID BETWEEN 1000 AND 2000|QATEST2.WF_PENDING_ACTIVITY
SELECT * FROM QATEST.WF_PENDING_ACTIVITY WHERE ID BETWEEN 2001 AND 3000|QATEST2.WF_PENDING_ACTIVITY
SELECT * FROM QATEST.WF_PENDING_ACTIVITY WHERE ID > 3000               |QATEST2.WF_PENDING_ACTIVITY
SELECT * FROM QATEST.BIGTABLE WHERE ID < 1000               |QATEST2.BIGTABLE
SELECT * FROM QATEST.BIGTABLE WHERE ID BETWEEN 1000 AND 2000|QATEST2.BIGTABLE
SELECT * FROM QATEST.BIGTABLE WHERE ID BETWEEN 2001 AND 3000|QATEST2.BIGTABLE
SELECT * FROM QATEST.BIGTABLE WHERE ID > 3000               |QATEST2.BIGTABLE

TQL Template File

The TQL template file should utilize Property Variables (for connection string, username, and password), and the following placeholder variables:

Query: "~QUERYTEXT~": This portion must exist in your TQL Sample App, on your Source Reader.
Query: "Tables: 'QUERY,~TARGETTABLE~'": This portion must exist in your TQL Sample App, on your Target Writer.

Running the Program

Create Python Virtual Environment & Activate:

# Mac/Linux
python -m venv .venv
source .venv/bin/activate

# Windows
python -m venv .venv
source .venv\Scripts\activate

Install Dependencies:
```
pip install -r requirements.txt
```
Configure: Update the config.py file with your Striim credentials, database settings, and other desired parameters.
Prepare Input Files: Create the queryfile.txt and admin.SW.tql files as described in the Requirements section.

Execute:

# Mac/Linux
python ./main.py

# Windows
python .\main.py

The program will then read the queries from queryfile.txt, generate TQL files for each query using the admin.SW.tql template, create and deploy Striim applications, monitor their execution, and finally undeploy and drop them.

Project Clean Up: While in development the need may arise to clear local cache/logs

# Mac/Linux
rm logging/*.{json,log}
rm stage/*.tql

# Windows
del logging\*.json logging\*.log stage\*.tql

/* BigQuery database orchestration only */
TRUNCATE TABLE `%DatabaseName%.%SchemaName%.striim_orchestration`

Additional Notes

The program includes basic error handling and retries to ensure robust operation.
The database (BigQuery or TinyDB) is used to persist the state of the orchestration process, allowing for resuming from interruptions.
The program provides logging capabilities for monitoring and debugging purposes.

This README provides a comprehensive overview of the Striim API Orchestration Python program, its features, requirements, and usage instructions.

BigQuery Table

The output is stored in the following BigQuery table:

CREATE TABLE `%DatabaseName%.%SchemaName%.striim_orchestration` (
    id INTEGER NOT NULL,
    roworder INTEGER,
    uniquerunid INTEGER,
    query STRING,
    appname STRING,
    targettbl STRING,
    status STRING,
    namespace STRING,
    started_datetime TIMESTAMP,
    finished_datetime TIMESTAMP,
    notes STRING,
    iscurrentrow BOOL
);

Table Fields

id: INTEGER, NOT NULL. Unique identifier for each record.
roworder: INTEGER. Order of the row in the sequence.
uniquerunid: INTEGER. Unique identifier for each run.
query: STRING. The actual query text.
appname: STRING. Keeps track of the full app name created.
targettbl: STRING. The target table (full schema.tablename) that the query results will write to. Include ColumnMap or KeyColumns if needed.
status: STRING. Keeps track of the status.
namespace: STRING. Keeps track of the Namespace used in deployment.
started_datetime: TIMESTAMP. Tracks when the app was confirmed started (may not be exact).
finished_datetime: TIMESTAMP. Tracks when the app was confirmed completed (may not be exact).
notes: STRING. Any additional notes related to the output.
iscurrentrow: BOOL. Indicates if this is the current row.

Possible statuses from 'status' above:

<blank> → Not yet started to process yet
RUNNING → Has been created, deployed, and started
COMPLETED → Has been detected as completed successfully.
FAILED → Has been detected as failed, and added any failure messages to notes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Striim API Orchestration Python Program

Key Features

Requirements

Configuration

queryfile.txt

TQL Template File

Running the Program

Additional Notes

BigQuery Table

Table Fields

Possible statuses from 'status' above:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
logging		logging
stage		stage
.gitignore		.gitignore
BQ_TableCreate.sql		BQ_TableCreate.sql
README.md		README.md
admin.SW.tql		admin.SW.tql
config.py		config.py
data.py		data.py
main.py		main.py
oracle_rowsplit.sql		oracle_rowsplit.sql
pvs.tql		pvs.tql
queryfile.txt		queryfile.txt
requirements.txt		requirements.txt

striim/Striim-InitialLoad-ParallelLoader

Folders and files

Latest commit

History

Repository files navigation

Striim API Orchestration Python Program

Key Features

Requirements

Configuration

queryfile.txt

TQL Template File

Running the Program

Additional Notes

BigQuery Table

Table Fields

Possible statuses from 'status' above:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages