Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor design/response into Python #28

Open
e5c opened this issue Sep 16, 2016 · 4 comments
Open

Refactor design/response into Python #28

e5c opened this issue Sep 16, 2016 · 4 comments

Comments

@e5c
Copy link

e5c commented Sep 16, 2016

Hi all, creating this issue to track progress on this work.

The goal is to translate R_code/design-and_response.R into Python. Some progress is in the design_response_refactor branch of my fork, here:
https://github.com/e5c/inferelator_ng/blob/design_response_refactor/inferelator_ng/design_response.py

I'm hoping to get an in-person explanation of the steps handling the time series (starting at https://github.com/simonsfoundation/inferelator_ng/blob/master/inferelator_ng/R_code/design_and_response.R#L409) as I'm guessing that explanation will be more translatable to python/pandas than the R code.

@e5c
Copy link
Author

e5c commented Sep 19, 2016

Following up on discussion with @kostyat @nickdeveaux today (just noting this here before I forget; we can discuss in person if that's easier): is there any reason not to use a downsampling method of an irregular time series into a regular one to take care of the "chains of small del_t nodes" case? The advantage of this is that there are existing implementations in standard Python libraries (here's one Pandas example) -- I don't have intuition on what the effect might be on computing TFAs.

@kostyat
Copy link
Contributor

kostyat commented Sep 19, 2016

I don't think downsampling the time series would be a good way to go. It could be something we might want to do in the future, but since our current goal is to implement the same algorithm as we have in R into Python, it's not a good idea because it will result in very different outputs. It will mean that the number of samples will change, and for longer time series with many time points we will end up using only some of those time points, and thus such time series will be given less weight than identical time series with less frequent measurements, even though we have more data for them. I hope that explanation makes sense, but I can explain further if not.

@e5c
Copy link
Author

e5c commented Sep 26, 2016

@kostyat Got it, makes sense.

I'm currently working on building the tree/digraph representation of the conditioned time series measurements that we discussed. Could either of you (@kostyat @nickdeveaux) describe what the is_ts and is_first_last columns in metadata refer to? AFAICT neither are used in the design/response calculation, so I'm wondering if I need to keep those values around in the graph representation.

@e5c
Copy link
Author

e5c commented Sep 26, 2016

Relatedly, I had some fun visualizing the branching of the b. subtilis time series with networkx and (py)Cytoscape:

https://github.com/e5c/inferelator_ng/blob/design_response_refactor/notebooks/networkx%20playground.ipynb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants