Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vectorized implementations of 3/4 partner selection approaches #6

Open
wants to merge 21 commits into
base: main
Choose a base branch
from

Conversation

franz101
Copy link

Coded mostly in numpy, pandas and scipy.
I have focused on different strategies to improve the speed further.

  • Avoid for-loops as much as possible
  • Use broadcasting and matrix multiplication instead
  • Experiment with advanced indexing and one hot encoded matrix multiplication

In this PR the following is included:

  • The base structure for a module that can be used for partner selection.
  • A stock downloader which is build on top of y-finance
  • A notebook with a tutorial on how to use it.

Outlook:

  • I have not yet finished the vectorised copula approach.
  • I wanted to build a few dashboards outside of Python
  • Unit testing is still behind
  • An architecture to make it more usable with static methods for example.

If you have any questions. I'm looking forward to discuss them.

Copy link
Member

@Jackal08 Jackal08 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Franz,

Overall good application. I like that you went the extra mile to improve the quality, even though you didn't manage to finish all 4 approaches.

Will be setting up an interview.

  • Glad to see you started building unit tests
  • Cool package name
  • I like that you set up the framework for a python package
  • Good quality writeup. I would have liked it to be more verbose but I like that you took the effort to add a template and detail your process.
  • Good import order
  • Added hard typing
  • Used decorators
  • Used hidden and public methods
  • Dostrings match our style
  • Added error handling and assertions
  • Used class inheritance
  • Good use of inline comments
  • Good design choices
  • req.txt missing version numbers
  • Notebook is good. I would have liked to see some Latex. I like your visualizations, good to see the geometric selector.
  • Used CUSUM and should have been CUMPROD.

@@ -0,0 +1,13 @@
The vinecopulaslab is a basic implementation of the paper [Statistical Arbitrage with Vine Copulas [Stübinger, Mangold, Krauss (2016)]](https://www.econstor.eu/bitstream/10419/147450/1/870932616.pdf)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool name: vinecopulaslab


class TestDownload(TestCase):
def test_download_sample(self):
pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least the thought is there :)


class TestDownload(TestCase):
def test_distance_to_diagonal(self):
# test template
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Start with a capital letter. (Our convention)

# TODO: add tests
line = np.array([1, 1, 1])
pts = np.array([[0, 0, 0]])
self.assertEqual(GeometricSelection.distance_to_line(line, pts), 0, "Should be 0")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted

@@ -0,0 +1,3 @@
from vinecopulaslab.partnerselection import TraditionalSelection, ExtendedSelection, GeometricSelection, \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice to see that you put the framework in place to build your own package.

target_stock = group.name
partner_stocks = group.STOCK_PAIR.tolist()
stock_selection = [target_stock] + partner_stocks
# We create a subset of our ecdf dataframe to increase lookup speed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good use of inline comments.

# Here the math from the Mangold 2015 paper begins
permut_mat = np.array(list(itertools.product([-1, 1], repeat=d)), dtype=np.int8)
sub_mat = permut_mat @ permut_mat.T
F = (d + sub_mat) / 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of variables that are all CAPS unless they are env variables. Maybe you tried to match the notation in the paper? If not then perhaps a more descriptive name would help.

@@ -0,0 +1,6 @@
numpy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thank you for including this, however you need to add the version numbers.

:return: (List[str]) returns a list of SP500 symbols
"""
url = "https://raw.githubusercontent.com/datasets/s-and-p-500-companies/master/data/constituents_symbols.txt"
r = requests.get(url)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r is a bad variable name.

{
"cells": [
{
"cell_type": "markdown",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you take the CumSum of the returns? Thats wrong, it should be the cumproduct, otherwise, you are not compounding your returns.

sp500_prices[sample_partners].pct_change(fill_method='ffill').dropna(how='all').cumsum(axis=0).plot();

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants