Workflow

KaizenFlow workflow explanation
Work organization
Set-up
- Documentation_meta
Quant workflows
- DataPull
  - Universe
  - Dataset signature
- DataFlow
  - Meta
  - DAG
  - System
Quant dev workflows
- DataPull
- DataFlow
TradingOps workflows
- Trading execution
  - Intro
  - Components
  - Testing
  - Procedures
MLOps workflows
- Deploying
- Monitoring
DevOps workflows

KaizenFlow workflow explanation

This document is a roadmap of most activities that Quants, Quant devs, and DevOps can perform using KaizenFlow.

For each activity we point to the relevant resources (e.g., documents in docs, notebooks) in the repo.

A high-level description of KaizenFlow is KaizenFlow White Paper

Work organization

Issues workflow explained amp/docs/work_organization/ck.issue_workflow.explanation.md
GitHub and ZenHub workflows explained /docs/work_organization/all.use_github_and_zenhub.how_to_guide.md
TODO(Grisha): add more from /docs/work_organization/.

Set-up

TODO(gp): Add pointers to the docs we ask to read during the on-boarding

Documentation_meta

The dir docs/documentation_meta contains documents about writing the documentation
Conventions and suggestions on how to create diagrams in the documentation
- /docs/documentation_meta/all.architecture_diagrams.explanation.md
A summary of how to create how-to, tutorial, explanations, reference according to the Diataxis framework
- /docs/documentation_meta/all.diataxis.explanation.md
Writing documentation in Google Docs
- /docs/documentation_meta/all.gdocs.how_to_guide.md
Writing documentation in Markdown
- /docs/documentation_meta/all.writing_docs.how_to_guide.md
Plotting in Latex
- /docs/documentation_meta/plotting_in_latex.how_to_guide.md

Quant workflows

The life of a Quant is spent between:

Exploring the raw data
Computing features
Building models to predict output given features
Assessing models

These activities are mapped in KaizenFlow as follows:

Exploring the raw data
- This is performed by reading data using DataPull in a notebook and performing exploratory analysis
Computing features
- This is performed by reading data using DataPull in a notebook and creating some DataFlow nodes
Building models to predict output given features
- This is performed by connecting DataFlow nodes into a Dag
Assessing models
- This is performed by running data through a Dag in a notebook or in a Python script and post-processing the results in an analysis notebook
Comparing models
- The parameters of a model are exposed through a Config and then sweep over Config lists

`DataPull`

General intro to DataPull

Universe

Universe explanation
- /docs/datapull/ck.universe.explanation.md
Analyze universe metadata
- /im_v2/common/universe/notebooks/Master_universe_analysis.ipynb
- /im_v2/ccxt/notebooks/Master_universe.ipynb

Dataset signature

Organize and label datasets
- Helps to uniquely identify datasets across different sources, types, attributes etc.
- /docs/datapull/all.data_schema.explanation.md
- /docs/datapull/ck.handle_datasets.how_to_guide.md
Inspect RawData
- /im_v2/common/notebooks/Master_raw_data_gallery.ipynb
- /im_v2/common/data/client/im_raw_data_client.py
Convert data types
- /im_v2/common/data/transform/convert_csv_to_pq.py
- /im_v2/common/data/transform/convert_pq_to_csv.py
Data download pipelines explanation
- /docs/datapull/ck.binance_bid_ask_data_pipeline.explanation.md
- /docs/datapull/ck.binance_ohlcv_data_pipeline.explanation.md
Download data in bulk
- /im_v2/common/data/extract/download_bulk.py
- /im_v2/ccxt/data/extract/download_exchange_data_to_db.py
- TODO(Juraj): technically this could be joined into one script and also generalized for more sources
Download data in real time over a given time interval
- /im_v2/common/data/extract/periodic_download_exchange_data_to_db.py
Archive data
- Helps with optimizing data storage performance/costs by transferring older data from a storage like postgres to S3
- Suitable to apply to high frequency high volume realtime orderbook data
- /im_v2/ccxt/db/archive_db_data_to_s3.py
Resampling data
- /docs/datapull/all.datapull_derived_data.explanation.md
- /im_v2/common/data/transform/resample_daily_bid_ask_data.py
ImClient
- /docs/datapull/all.im_client.reference.ipynb
MarketData
- /docs/datapull/all.market_data.reference.ipynb
How to QA data
How to load Bloomberg data
- /im_v2/common/notebooks/CmTask5424_market_data.ipynb
- TODO: Generalize the name and make it Master_
Kibot guide
- /docs/datapull/ck.kibot_data.explanation.md
- /docs/datapull/ck.kibot_timing.reference.md
Interactive broker guide
- /docs/datapull/ck.run_ib_connect.how_to_guide.md
- /docs/datapull/ck.use_ib_metadata_crawler.how_to_guide.md
How to run IM app /docs/datapull/ck.run_im_app.how_to_guide.md
TODO(gp): Reorg /research_amp/cc/notebooks/Master_single_vendor_qa.ipynb /research_amp/cc/notebooks/Master_model_performance_analyser.old.ipynb /research_amp/cc/notebooks/Master_machine_learning.ipynb /research_amp/cc/notebooks/Master_cross_vendor_qa.ipynb /research_amp/cc/notebooks/Master_model_performance_analyser.ipynb /research_amp/cc/notebooks/Master_crypto_analysis.ipynb /research_amp/cc/notebooks/Master_model_prediction_analyzer.ipynb /research_amp/cc/notebooks/Master_Analysis_CrossSectionalLearning.ipynb /im/app/notebooks/Master_IM_DB.ipynb /im/ib/metadata/extract/notebooks/Master_analyze_ib_metadata_crawler.ipynb

`DataFlow`

DAG

General concepts of DataFlow
- Introduction to KaizenFlow, DAG nodes, DataFrame as unit of computation, DAG execution
  - /docs/dataflow/all.computation_as_graphs.explanation.md
- DataFlow data format
  - /docs/dataflow/all.dataflow_data_format.explanation.md
- Different views of System components, Architecture
  - /docs/dataflow/all.dataflow.explanation.md
- Conventions for representing time series
  - /docs/dataflow/all.time_series.explanation.md
- Explanation of how to debug a DAG
  - /docs/dataflow/all.dag.explanation.md
Learn how to build a DAG
- Build a DAG with two nodes
  - /docs/dataflow/all.build_first_dag.tutorial.ipynb
- Build a more complex DAG implementing a simple risk model
  - /docs/dataflow/all.build_simple_risk_model_dag.tutorial.ipynb
- Best practices to follow while building DAG
  - /docs/dataflow/all.best_practice_for_building_dags.explanation.md
Learn how to run a DAG
- Overview, DagBuilder, Dag, DagRunner
  - /docs/dataflow/ck.run_batch_computation_dag.explanation.md
- Configure a simple risk model, build a DAG, generate data and connect data source to the DAG, run the DAG
  - /docs/dataflow/ck.run_batch_computation_dag.tutorial.ipynb
- Build a DAG from a Mock2 DagBuilder and run it
  - /docs/kaizenflow/all.run_Mock2_pipeline_in_notebook.how_to_guide.ipynb
General intro about model simulation
- Property of tilability, batch vs streaming
  - /docs/dataflow/all.batch_and_streaming_mode_using_tiling.explanation.md
- Time semantics, How clock is handled, Flows
  - /docs/dataflow/all.timing_semantic_and_clocks.md
- Phases of evaluation of Dags
  - /docs/dataflow/all.train_and_predict_phases.explanation.md
- Event study explanation
  - /docs/dataflow/ck.event_study.explanation.md
Run a simulation of a DataFlow system
- Overview, Basic concepts, Implementation details
  - /docs/dataflow/ck.run_backtest.explanation.md
- How to build a system, run research backtesting, Process results of backtesting, How to run replayed time simulation, Running experiments
  - /docs/dataflow/ck.run_backtest.how_to_guide.md
- Simulation output explanation
  - /docs/dataflow/all.simulation_output.reference.md
Run a simulation sweep using a list of Config parameters
- /docs/dataflow/ck.run_backtest.how_to_guide.md
- TODO(gp): @grisha do we have anything here? It's like the stuff that Dan does
- TODO(Grisha): @Dan, add a link to the doc here once it is ready
Post-process the results of a simulation
- Build the Config dict, Load tile results, Compute portfolio bar metrics, Compute aggregate portfolio stats
- /dataflow/model/notebooks/Master_research_backtest_analyzer.ipynb
- TODO(Grisha): is showcasing an example with fake data enough? We could use Mock2 output
Analyze a DataFlow model in details
- Build Config, Initialize ModelEvaluator and ModelPlotter
- /dataflow/model/notebooks/Master_model_analyzer.ipynb
- TODO(gp): @grisha what is the difference with the other?
- TODO(Grisha): ask Paul about the notebook
Analyze features computed with DataFlow
- Read features from a Parquet file and perform some analysis
  - /dataflow/model/notebooks/Master_feature_analyzer.ipynb
- TODO(gp): Grisha do we have a notebook that reads data from ImClient/MarketData and performs some analysis?
- TODO(Grisha): create a tutorial notebook for analyzing features using some real (or close to real) data
Mix multiple DataFlow models
- /dataflow/model/notebooks/Master_model_mixer.ipynb
- TODO(gp): add more comments
Exporting PnL and trades

System

Learn how to build System
- TODO(gp): @grisha what do we have for this?
- TODO(Grisha): add a tutorial notebook that builds a System and explain the flow step-by-step
Configure a full system using a Config
- Fill the SystemConfig, build all the components and run the System
- /docs/dataflow/system/all.use_system_config.tutorial.ipynb
Create an ETL batch process using a System
- /dataflow_amp/system/risk_model_estimation/run_rme_historical_simulation.py
- TODO(Grisha): add an explanation doc and consider converting into a Jupyter notebook.
Create an ETL real-time process
- DagBuilder, Dag, DagRunner
  - /docs/dataflow/system/ck.build_real_time_dag.explanation.md
- Build a DAG that runs in real time
  - /dataflow_amp/system/realtime_etl_data_observer/scripts/run_realtime_etl_data_observer.py
  - TODO(Grisha): consider converting into a Jupyter notebook.
- Build a System that runs in real time
  - /dataflow_amp/system/realtime_etl_data_observer/scripts/DataObserver_template.run_data_observer_simulation.py
  - TODO(Grisha): consider converting into a Jupyter notebook.
Batch simulation a Mock2 System
- Description of the forecast system, Description of the System, Run a backtest, Explanation of the backtesting script, Analyze the results
- /docs/kaizenflow/all.run_Mock2_in_batch_mode.how_to_guide.md
- Build the config, Load tiled results, Compute portfolio bar metrics, Compute aggregate portfolio stats
- /docs/kaizenflow/all.analyze_Mock2_pipeline_simulation.how_to_guide.ipynb
Run an end-to-end timed simulation of Mock2 System
- /docs/kaizenflow/all.run_end_to_end_Mock2_system.tutorial.md
- /dataflow_amp/system/mock2/scripts/run_end_to_end_Mock2_system.py
TODO(gp): reorg the following files /oms/notebooks/Master_PnL_real_time_observer.ipynb /oms/notebooks/Master_bid_ask_execution_analysis.ipynb /oms/notebooks/Master_broker_debugging.ipynb /oms/notebooks/Master_broker_portfolio_reconciliation.ipynb /oms/notebooks/Master_c1b_portfolio_vs_portfolio_reconciliation.ipynb /oms/notebooks/Master_dagger_reconciliation.ipynb /oms/notebooks/Master_execution_analysis.ipynb /oms/notebooks/Master_model_qualifier.ipynb /oms/notebooks/Master_multiday_system_reconciliation.ipynb /oms/notebooks/Master_portfolio_vs_portfolio_reconciliation.ipynb /oms/notebooks/Master_portfolio_vs_research_stats.ipynb /oms/notebooks/Master_system_reconciliation_fast.ipynb /oms/notebooks/Master_system_reconciliation_slow.ipynb /oms/notebooks/Master_system_run_debugger.ipynb

Quant dev workflows

DataPull

Learn how to create a DataPull adapter for a new data source
- /docs/datapull/all.dataset_onboarding_checklist.reference.md
- /docs/datapull/ck.add_new_data_source.how_to_guide.md
How to update CCXT version
- /docs/datapull/all.update_CCXT_version.how_to_guide.md
Download DataPull historical data
- ?
Onboard new exchange
- /docs/datapull/ck.onboarding_new_exchange.md
Put a DataPull source in production with Airflow
- /docs/datapull/ck.create_airflow_dag.tutorial.md
  - TODO(gp): This file is missing
- /docs/datapull/ck.develop_an_airflow_dag_for_production.explanation.md
  - TODO(Juraj): See https://github.com/cryptokaizen/cmamp/issues/6444
Add QA for a DataPull source
Compare OHLCV bars
- /im_v2/ccxt/data/client/notebooks/CmTask6537_One_off_comparison_of_Parquet_and_DB_OHLCV_data.ipynb
- TODO(Grisha): review and generalize
How to import Bloomberg historical data
- /docs/datapull/ck.process_historical_data_without_dataflow.tutorial.ipynb
How to import Bloomberg real-time data
- TODO(*): add doc.
TODO(gp): Add docs /docs/datapull/ck.binance_trades_data_pipeline.explanation.md /docs/datapull/ck.database_schema_update.how_to_guide.md /docs/datapull/ck.datapull.explanation.md /docs/datapull/ck.relational_database.explanation.md

DataFlow

All software components
- /docs/dataflow/ck.data_pipeline_architecture.reference.md

TradingOps workflows

Trading execution

Intro

Binance trading terms
- /docs/oms/broker/ck.binance_terms.reference.md

Components

OMS explanation
- /docs/oms/ck.oms.explanation.md
CCXT log structure
- /docs/oms/broker/ck.ccxt_broker_logs_schema.reference.md

Testing

Replayed CCXT exchange explanation
- /docs/oms/broker/ck.replayed_ccxt_exchange.explanation.md
How to generate broker test data
- /docs/oms/broker/ck.generate_broker_test_data.how_to_guide.md

Procedures

Trading procedures (e.g., trading account information)
- /docs/trading_ops/ck.trading.how_to_guide.md
How to run broker only/full system experiments
- /docs/trading_ops/ck.trade_execution_experiment.how_to_guide.md
Execution notebooks explanation
- /docs/oms/broker/ck.execution_notebooks.explanation.md

MLOps workflows

Encrypt a model
- /docs/dataflow/ck.release_encrypted_models.explanation.md
- /docs/dataflow/ck.release_encrypted_models.how_to_guide.md

Deploying

Model deployment in production
- /docs/deploying/all.model_deployment.how_to_guide.md
Run production system
- /docs/deploying/ck.run_production_system.how_to_guide.md
Model references
- /docs/deploying/ck.supported_models.reference.md

Monitoring

Monitor system
- /docs/monitoring/ck.monitor_system.how_to_guide.md
System reconciliation explanation
- /docs/monitoring/ck.system_reconciliation.explanation.md
System Reconciliation How to guide
- /docs/monitoring/ck.system_reconciliation.how_to_guide.md

DevOps workflows

The documentation outlines the architecture and deployment processes for the Kaizen Infrastructure, leveraging a blend of AWS services, Kubernetes for container orchestration, and traditional EC2 for virtualized computing. Emphasizing Infrastructure as Code (IaC), the project employs Terraform for provisioning and Ansible for configuration, ensuring a maintainable and replicable environment.

Overview

Development and deployment stages
- /docs/infra/ck.development_stages.explanation.md
S3 Buckets overview
- /docs/infra/ck.s3_buckets.explanation.md
- This document provides an overview of the S3 buckets utilized by Kaizen Technologies.

Current set-up description

Document details steps for setting up Kaizen infrastructure
- /docs/infra/ck.kaizen_infrastructure.reference.md
EC2 servers overview
- /docs/infra/ck.ec2_servers.explanation.md

Set up infra

Document the implementation of Auto Scaling in the Kubernetes setup, focusing on the Cluster Autoscaler (CA), Horizontal Pod Autoscaler (HPA), and Auto Scaling Groups (ASG)
- /docs/infra/all.auto_scaling.explanation.md
Compare AWS RDS instance types and storage performance
- /docs/infra/all.rds.comparison.md
Setup S3 buckets with Terraform
- /docs/infra/ck.set_up_s3_buckets.how_to_guide.md
AWS API Key rotation guide
- /docs/infra/ck.aws_api_key_rotation.how_to_guide.md
Amazon Elastic File System (EFS) overview
- /docs/infra/ck.aws_elastic_file_system.explanation.md
Client VPN endpoint creation with Terraform
- /docs/infra/ck.create_client_vpn_endpoint.how_to_guide.md
Set-up AWS Client VPN
- /docs/infra/ck.set_up_aws_client_vpn.how_to_guide.md
Utility server application set-up overview
- /docs/infra/ck.set_up_utility_server_app.how_to_guide.md
Storing secret information (API keys, login credentials, access tokens etc.)
- /docs/infra/ck.storing_secrets.explanation.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

all.workflow.explanation.md

all.workflow.explanation.md

Workflow

KaizenFlow workflow explanation

Work organization

Set-up

Documentation_meta

Quant workflows

`DataPull`

Universe

Dataset signature

`DataFlow`

Meta

DAG

System

Quant dev workflows

DataPull

DataFlow

TradingOps workflows

Trading execution

Intro

Components

Testing

Procedures

MLOps workflows

Deploying

Monitoring

DevOps workflows

Overview

Current set-up description

Set up infra

Files

all.workflow.explanation.md

Latest commit

History

all.workflow.explanation.md

File metadata and controls

Workflow

KaizenFlow workflow explanation

Work organization

Set-up

Documentation_meta

Quant workflows

DataPull

Universe

Dataset signature

DataFlow

Meta

DAG

System

Quant dev workflows

DataPull

DataFlow

TradingOps workflows

Trading execution

Intro

Components

Testing

Procedures

MLOps workflows

Deploying

Monitoring

DevOps workflows

Overview

Current set-up description

Set up infra

`DataPull`

`DataFlow`