-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vectorized implementations of 3/4 partner selection approaches #6
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Franz,
Overall good application. I like that you went the extra mile to improve the quality, even though you didn't manage to finish all 4 approaches.
Will be setting up an interview.
- Glad to see you started building unit tests
- Cool package name
- I like that you set up the framework for a python package
- Good quality writeup. I would have liked it to be more verbose but I like that you took the effort to add a template and detail your process.
- Good import order
- Added hard typing
- Used decorators
- Used hidden and public methods
- Dostrings match our style
- Added error handling and assertions
- Used class inheritance
- Good use of inline comments
- Good design choices
- req.txt missing version numbers
- Notebook is good. I would have liked to see some Latex. I like your visualizations, good to see the geometric selector.
- Used CUSUM and should have been CUMPROD.
@@ -0,0 +1,13 @@ | |||
The vinecopulaslab is a basic implementation of the paper [Statistical Arbitrage with Vine Copulas [Stübinger, Mangold, Krauss (2016)]](https://www.econstor.eu/bitstream/10419/147450/1/870932616.pdf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool name: vinecopulaslab
|
||
class TestDownload(TestCase): | ||
def test_download_sample(self): | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least the thought is there :)
|
||
class TestDownload(TestCase): | ||
def test_distance_to_diagonal(self): | ||
# test template |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Start with a capital letter. (Our convention)
# TODO: add tests | ||
line = np.array([1, 1, 1]) | ||
pts = np.array([[0, 0, 0]]) | ||
self.assertEqual(GeometricSelection.distance_to_line(line, pts), 0, "Should be 0") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted
@@ -0,0 +1,3 @@ | |||
from vinecopulaslab.partnerselection import TraditionalSelection, ExtendedSelection, GeometricSelection, \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice to see that you put the framework in place to build your own package.
target_stock = group.name | ||
partner_stocks = group.STOCK_PAIR.tolist() | ||
stock_selection = [target_stock] + partner_stocks | ||
# We create a subset of our ecdf dataframe to increase lookup speed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good use of inline comments.
# Here the math from the Mangold 2015 paper begins | ||
permut_mat = np.array(list(itertools.product([-1, 1], repeat=d)), dtype=np.int8) | ||
sub_mat = permut_mat @ permut_mat.T | ||
F = (d + sub_mat) / 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a fan of variables that are all CAPS unless they are env variables. Maybe you tried to match the notation in the paper? If not then perhaps a more descriptive name would help.
@@ -0,0 +1,6 @@ | |||
numpy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, thank you for including this, however you need to add the version numbers.
:return: (List[str]) returns a list of SP500 symbols | ||
""" | ||
url = "https://raw.githubusercontent.com/datasets/s-and-p-500-companies/master/data/constituents_symbols.txt" | ||
r = requests.get(url) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
r
is a bad variable name.
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you take the CumSum of the returns? Thats wrong, it should be the cumproduct, otherwise, you are not compounding your returns.
sp500_prices[sample_partners].pct_change(fill_method='ffill').dropna(how='all').cumsum(axis=0).plot();
Coded mostly in numpy, pandas and scipy.
I have focused on different strategies to improve the speed further.
In this PR the following is included:
Outlook:
If you have any questions. I'm looking forward to discuss them.