Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #127 from cvxgrp/improved_data_module
This PR includes a rewriting of the data cleaning logic applied by the `cvx,YahooFinance` class to the raw stock/ETF price data provided by Yahoo Finance. Some of the logic has been factored in the `OLHCV` (Open-Low-High-Close-Volume) base class and can be used by interfaces to other data providers. Cleaning is done by removing all impossible observations (negative prices, ...) and by filtering anomalous changes using a multi-window approach on median absolute logreturns. The cleaning has been thoroughly tested on the current example universes (of large capitalization US stocks) and changes are minimal. On non-US stocks however the improved cleaning is key, we added a new one (FTSE100) for which the previous cleaning code was inadequate and the new one gives instead much more reasonable results. We added an example, `data_cleaning.py`, which shows the cleaning procedure on both stocks for which the raw Yahoo Finance data is OK, and some for which it is (very) bad. The cleaning steps which make opinable assumptions have all parameters (thresholds) defined as class level constants in `cvx.YahooFinance` and `cvx.data.OLHCV`, which can be modified.
- Loading branch information