Skip to content

FinML: A Practical Machine Learning Framework for Dynamic Stock Selection

Notifications You must be signed in to change notification settings

AI4Finance-Foundation/FinML

Repository files navigation

FinML: A Practical Machine Learning Framework for Dynamic Stock Selection

Abstract:

Stock recommendation is vital to investment companies and investors. However, no single stock selection strategy will always win while analysts may not have enough time to check all S&P 500 stocks (the Standard & Poor’s 500). In this paper, we propose a practical scheme that recommends stocks from S&P 500 using machine learning. Our basic idea is to buy and hold the top 20% stocks dynamically. First, we select representative stock indicators with good explanatory power. Secondly, we take five frequently used machine learning methods, including linear regression, ridge regression, stepwise regression, random forest and generalized boosted regression, to model stock indicators and quarterly log-return in a rolling window. Thirdly, we choose the model with the lowest Mean Square Error in each period to rank stocks. Finally, we test the selected stocks by conducting portfolio allocation methods such as equally weighted, mean- variance, and minimum-variance. Our empirical results show that the proposed scheme outperforms the long-only strategy on the S&P 500 index in terms of Sharpe ratio and cumulative returns.

Index Term:

Stock recommendation, fundamental value investing, machine learning, model selection, risk management

Project summary:

  • We developed a practical approach to using machine-learning methods selecting S&P 500 stocks based on financial ratios (e.g., EPS, ROA, ROE, etc). Outperformed the S&P 500 index on out of sample data, achieved a Sharpe ratio of 0.5 (0.19 on SPX).
  • We performed feature selection by 11 GICS sectors based on a rolling window to choose the lowest MSE model among Linear Regression, Stepwise Regression, Regression with Ridge, Random Forest, and GBM. Applied a model ensemble method.

Data:

Retrieved from WRDS (Wharton Research Data Services), Compustat Industrial [27 years daily and quarterly Data]

  • S&P 500 Fundamental Quarterly Data (fundamental_final_table.xlsx)

    • Database: Compustat North America (Fundamentals Quarterly) and (Index Constituents)
    • Timeline: 27 years (1990-2017)
    • Tickers: 1193 stock (all historical S&P 500 component stocks)
    • Value: 20 financial ratios calculated from raw accouting report data
  • S&P 500 Historical Component Stocks Adjusted Daily Price (1-sp500_adj_price.csv.zip)

    • Database: Compustat North America (Security Daily)
    • Timeline: 27 years (1990-2017)
    • Tickers: 1193 stock (all historical S&P 500 component stocks)
    • Value: Adjusted Daily Close Price
  • S&P 500 Index Daily Price (1-spx_price.xlsx)

    • Database: Yahoo Finance
    • Timeline: 27 years (1990-2017)
    • Tickers: SPX
    • Value: Adjusted Daily Close Price

Code:

Focasting Model:

  • Input: 11 Excel files of cleaned data about fundamental financial ratios (sector 10-Energy, sector 15-Materials, sector 20-Industrials, sector 25-Consumer Discretionary, sector 30-Consumer Staples, sector 35-Health Care, sector 40-Financials, sector 45-Information Technology, sector 50-Telecommunication Services, sector 55-Utilities, sector 60-Real Estate)
  • Python Script: 2 Scripts
python3 fundamental_run_model.py \
  -sector_name sector10 \
  -fundamental Data/fundamental_final_table.xlsx \
  -sector Data/1-focasting_data/sector10_clean.xlsx 
  • Old R Script: 3 R Scripts
  • Output: a CSV file includes tic: the stock name, predicted_return: predicted return of next quarter by our model, trade_date: the date to execute the trades

Portfolio Allocation:

Back-testing Model:

An IEEE TrustCom 2018 Paper (http://www.cloud-conf.net/trustcom18/)

Hongyang Yang, Xiao-Yang Liu, and Qingwei Wu. 2018. A practical machine learn-ing approach for dynamic stock recommendation. In IEEE TrustCom/BiDataSE,2018.1693–1697. Download from (https://ieeexplore.ieee.org/abstract/document/8456121) and (https://ssrn.com/abstract=3302088)