Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement generic time series data model #3

Open
grahamcrowell opened this issue Jul 12, 2017 · 5 comments
Open

Implement generic time series data model #3

grahamcrowell opened this issue Jul 12, 2017 · 5 comments
Assignees
Labels

Comments

@grahamcrowell
Copy link
Contributor

grahamcrowell commented Jul 12, 2017

analytical model

Subject

  • Has attributes
  • Has state processes
  • Has event processes

Attribute

  • Has a label
  • Has a value

State

  • Has a label
  • Has a value
  • Has a time span

Event

  • Has a label
  • Has a value
  • Has a time point

Process

  • Has a label
  • Has a time granularity
  • Is a map from a date to a state

Time Granularity

  • label

Sent from my Samsung SM-J120W using FastHub

@grahamcrowell
Copy link
Contributor Author

grahamcrowell commented Jul 12, 2017

rationale

generic/abstract model

  • reduces amount of code.
  • increases flexibility

analysis

  • is define metrics with interface something like:
trait Metric {
  def calculate(process: Process, dateId: Int) : Double
}

Metrics basically take a time series and a perform some calculation and return value of the metric for a given date.

Concrete examples of model components

Subjects

  • Asset
    • symbol = "MSFT" (attribute)
    • industry = "Software" (attribute)
    • price process = ... (state process)
    • stock splits process = ... (event process)
    • dividend ex-date process = ... (event process)
  • Portfolio
    • start date = 1999-10-23 (attribute)
    • name = ... (attribute)
    • market value process = ... (state process)
    • return process = ... (state process)
    • holdings process = ... (state process)
    • cash flows process = ... (event process)
    • trade process = ... (event process)
  • (Patient)
  • ... anything which has a State that changes over time

@grahamcrowell
Copy link
Contributor Author

Process as spark dataset

Denormalize price data into Spark Dataset[State]

case class State(objectLabel, processLabel, dateId, value) 
  • objectLabel id's the parent object which has this state at the time
    -- stock symbol, portfolio
  • processLabel id's the process
    -- close price, market value

Start with strings determine if performance blocks progress

@grahamcrowell grahamcrowell self-assigned this Aug 11, 2017
@grahamcrowell
Copy link
Contributor Author

analytical data structure

single time series

  • Subject
  • Time axis ie vector of date intervals or points
  • State process ie vector of state values

system of time synchronized processes

  • Each process is associated with a single subject instance
  • Process instances are indexed by standard interface id (symbol)
  • Each process shares a single time axis
  • Each process is assumed to have a accessible state value for each interval/point in the time axis

Map symbol to asset

@grahamcrowell
Copy link
Contributor Author

grahamcrowell commented Aug 12, 2017

@grahamcrowell this is basically the same as Spark's machine learning pipeline. Spark ml lib pipeline docs

@grahamcrowell
Copy link
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant