Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data mngmt package #250

Open
2 tasks
drew2323 opened this issue Oct 4, 2024 · 1 comment
Open
2 tasks

Data mngmt package #250

drew2323 opened this issue Oct 4, 2024 · 1 comment
Assignees
Labels
backend Covering backend functionality enhancement New feature or request important

Comments

@drew2323
Copy link
Owner

drew2323 commented Oct 4, 2024

Create a shared package that could be installed and then reused by different projects (local research notebook, different instances of v2realbot, scripts etc.) to serve as one point of fetching the data and sharing the cache.

Responsibility of this package:

  • accessing and managing local trade cache and remotely fetch
  • accesing and managing local agg cache and execute aggregation including resampling
  • support for stock, later for crypto

Ideas

  • trade store (file cache, day per file) - if not present loads from alpaca
  • agg data store (db or parquet daily files, start with parquet as 5mio parquet is loaded in 3s)
    • decide time granularity for agg file cache, ohlcv daily 1sec cbar is 700kb - optmize for this granularity (must be fast). 2 years 1s data (5.5mio) in parquet are loaded in 3s (440days). If it were daily files the overhead of opening 440 files would be immense. Optimize for speed.
    • support for various aggregation types
    • if not present aggregates from trades with vectorized aggregation and stores to cache
  • supports resampling (probably only highest resolution is stored)

Exposed IF:

  • get_trades (symbol, intervals, conditions)
  • get_agg_data(symbol, interval, agg_type, resolution, trade_conditions)

After installing the package you just configure the stores and access keys and use it within your app - and can use/reuse existing stores.

For stocks daily files always contain also extended hours, they can be filtered by API or by client)

Try to reuse v2trading cache structure - to avoid rework

  • note existing cache are dictionaries and not df (maybe migration might be necessary)

For speed - optimize remote fetching and loading as suggested in this conversation.

Tasks:

  • Develop data package, use (current code)[https://github.com/drew2323/strategy-lab/tree/master/research/data] as inspiration
  • Adapt v2trading to use this package (migration of some cache files might be required, or just refetching?)
    • all agg data loading (trades for backtest, initial loads, apis that accesses trades and data should be just reusing the new package
    • aim is to optimize current inefficient cache loading

Open

DB support

@drew2323
Copy link
Owner Author

drew2323 commented Oct 4, 2024

Inspiration from this design (originally meant primarily for database. It has to be decided yet if db will be supported in the first phase, decide during implementattion):

image

@drew2323 drew2323 added enhancement New feature or request important backend Covering backend functionality labels Oct 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend Covering backend functionality enhancement New feature or request important
Projects
None yet
Development

No branches or pull requests

2 participants