Skip to content

Commit

Permalink
Merge pull request #127 from cvxgrp/improved_data_module
Browse files Browse the repository at this point in the history
This PR includes a rewriting of the data cleaning logic applied by the `cvx,YahooFinance` class to the raw stock/ETF price data provided by Yahoo Finance. Some of the logic has been factored in the `OLHCV` (Open-Low-High-Close-Volume) base class and can be used by interfaces to other data providers. Cleaning is done by removing all impossible observations (negative prices, ...) and by filtering anomalous changes using a multi-window approach on median absolute logreturns. The cleaning has been thoroughly tested on the current example universes (of large capitalization US stocks) and changes are minimal. On non-US stocks however the improved cleaning is key, we added a new one (FTSE100) for which the previous cleaning code was inadequate and the new one gives instead much more reasonable results. We added an example, `data_cleaning.py`, which shows the cleaning procedure on both stocks for which the raw Yahoo Finance data is OK, and some for which it is (very) bad. The cleaning steps which make opinable assumptions have all parameters (thresholds) defined as class level constants in `cvx.YahooFinance` and `cvx.data.OLHCV`, which can be modified.
  • Loading branch information
enzbus authored Feb 15, 2024
2 parents 8ac499b + c9c11ff commit 67e1917
Show file tree
Hide file tree
Showing 12 changed files with 2,584 additions and 1,533 deletions.
1,398 changes: 0 additions & 1,398 deletions cvxportfolio/data.py

This file was deleted.

26 changes: 26 additions & 0 deletions cvxportfolio/data/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# Copyright 2023 Enzo Busseti
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""This module include classes that download, store, and serve market data.
The two main abstractions are :class:`SymbolData` and :class:`MarketData`.
Neither are exposed outside this module. Their derived classes instead are.
If you want to interface cvxportfolio with financial data source other
than the ones we provide, you should derive from either of those two classes.
"""

from .market_data import *
from .symbol_data import *

__all__ = [
"YahooFinance", "Fred", "UserProvidedMarketData", "DownloadedMarketData"]
Loading

0 comments on commit 67e1917

Please sign in to comment.