Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

coxph_forestplot #838

Open
wants to merge 21 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 4 commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
74bab7c
coxph_forestplot
aGuyLearning Dec 6, 2024
aca8220
Merge branch 'main' into enhancement/issue-743
aGuyLearning Dec 11, 2024
39e4d42
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 11, 2024
225b606
changed-notebook
aGuyLearning Dec 11, 2024
47a5b12
Update ehrapy/plot/_survival_analysis.py
aGuyLearning Dec 11, 2024
a32c121
Update ehrapy/plot/_survival_analysis.py
aGuyLearning Dec 11, 2024
0c7cd45
Remove useless empty line
Zethson Dec 11, 2024
785a2cf
Remove useless comment
Zethson Dec 11, 2024
75ee4ba
undo again; check rtd build
eroell Dec 11, 2024
541e505
renamed function and updated documentation to mention, that it is a l…
aGuyLearning Dec 13, 2024
ca4530a
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 13, 2024
f77f468
removed fitted check and updated name in notebook
aGuyLearning Dec 13, 2024
66815c4
Update ehrapy/plot/_survival_analysis.py
aGuyLearning Dec 13, 2024
76f0f0c
updated variable names and moved test to better file
aGuyLearning Dec 13, 2024
2108681
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 13, 2024
88e18a7
made anything after coxphfitter keyword only
aGuyLearning Dec 13, 2024
613be89
changed type to iterable
aGuyLearning Dec 13, 2024
8c3d170
added title and show args
aGuyLearning Dec 13, 2024
b3a7c21
less ambiguous loop index
aGuyLearning Dec 13, 2024
cfdb1cc
fixed test. had to return figure and axis, so the test can save the i…
aGuyLearning Dec 13, 2024
f1b648b
Merge branch 'main' into enhancement/issue-743
eroell Dec 18, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions .github/pull_request_template.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,10 @@

<!-- Please fill in the appropriate checklist below (delete whatever is not relevant). These are the most common things requested on pull requests (PRs). -->

- [ ] This comment contains a description of changes (with reason)
- [ ] Referenced issue is linked
- [ ] If you've fixed a bug or added code that should be tested, add tests!
- [ ] Documentation in `docs` is updated
- [ ] This comment contains a description of changes (with reason)
- [ ] Referenced issue is linked
- [ ] If you've fixed a bug or added code that should be tested, add tests!
- [ ] Documentation in `docs` is updated

**Description of changes**

Expand Down
28 changes: 14 additions & 14 deletions CODE_OF_CONDUCT.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,23 +14,23 @@ religion, or sexual identity and orientation.
Examples of behavior that contributes to creating a positive environment
include:

- Using welcoming and inclusive language
- Being respectful of differing viewpoints and experiences
- Gracefully accepting constructive criticism
- Focusing on what is best for the community
- Showing empathy towards other community members
- Using welcoming and inclusive language
- Being respectful of differing viewpoints and experiences
- Gracefully accepting constructive criticism
- Focusing on what is best for the community
- Showing empathy towards other community members

Examples of unacceptable behavior by participants include:

- The use of sexualized language or imagery and unwelcome sexual
attention or advances
- Trolling, insulting/derogatory comments, and personal or political
attacks
- Public or private harassment
- Publishing others’ private information, such as a physical or
electronic address, without explicit permission
- Other conduct which could reasonably be considered inappropriate in a
professional setting
- The use of sexualized language or imagery and unwelcome sexual
attention or advances
- Trolling, insulting/derogatory comments, and personal or political
attacks
- Public or private harassment
- Publishing others’ private information, such as a physical or
electronic address, without explicit permission
- Other conduct which could reasonably be considered inappropriate in a
professional setting

## Our Responsibilities

Expand Down
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,10 @@

## Features

- Exploratory and targeted analysis of Electronic Health Records
- Quality control & preprocessing
- Visualization & Exploration
- Clustering & trajectory inference
- Exploratory and targeted analysis of Electronic Health Records
- Quality control & preprocessing
- Visualization & Exploration
- Clustering & trajectory inference

## Installation

Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
18 changes: 9 additions & 9 deletions docs/contributing.md
Original file line number Diff line number Diff line change
Expand Up @@ -126,11 +126,11 @@ in the cookiecutter-scverse template.

Please write documentation for new or changed features and use-cases. This project uses [sphinx][] with the following features:

- the [myst][] extension allows to write documentation in markdown/Markedly Structured Text
- Google-style docstrings
- Jupyter notebooks as tutorials through [myst-nb][] (See [Tutorials with myst-nb](#tutorials-with-myst-nb-and-jupyter-notebooks))
- [Sphinx autodoc typehints][], to automatically reference annotated input and output types
- Citations (like {cite:p}`Virshup_2023`) can be included with [sphinxcontrib-bibtex](https://sphinxcontrib-bibtex.readthedocs.io/)
- the [myst][] extension allows to write documentation in markdown/Markedly Structured Text
- Google-style docstrings
- Jupyter notebooks as tutorials through [myst-nb][] (See [Tutorials with myst-nb](#tutorials-with-myst-nb-and-jupyter-notebooks))
- [Sphinx autodoc typehints][], to automatically reference annotated input and output types
- Citations (like {cite:p}`Virshup_2023`) can be included with [sphinxcontrib-bibtex](https://sphinxcontrib-bibtex.readthedocs.io/)

See the [scanpy developer docs](https://scanpy.readthedocs.io/en/latest/dev/documentation.html) for more information
on how to write documentation.
Expand All @@ -144,10 +144,10 @@ These notebooks come from [pert-tutorials](https://github.com/theislab/ehrapy-tu

#### Hints

- If you refer to objects from other packages, please add an entry to `intersphinx_mapping` in `docs/conf.py`. Only
if you do so can sphinx automatically create a link to the external documentation.
- If building the documentation fails because of a missing link that is outside your control, you can add an entry to
the `nitpick_ignore` list in `docs/conf.py`
- If you refer to objects from other packages, please add an entry to `intersphinx_mapping` in `docs/conf.py`. Only
if you do so can sphinx automatically create a link to the external documentation.
- If building the documentation fails because of a missing link that is outside your control, you can add an entry to
the `nitpick_ignore` list in `docs/conf.py`

#### Building the docs locally

Expand Down
6 changes: 3 additions & 3 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,8 +61,8 @@ medRxiv 2023.12.11.23299816; doi: https://doi.org/10.1101/2023.12.11.23299816 ](

# Indices and tables

- {ref}`genindex`
- {ref}`modindex`
- {ref}`search`
- {ref}`genindex`
- {ref}`modindex`
- {ref}`search`

[scanpy genome biology (2018)]: https://doi.org/10.1186/s13059-017-1382-0
8 changes: 4 additions & 4 deletions docs/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,10 +51,10 @@ pip install ehrapy[medcat]

Available language models are

- en_core_web_md (python -m spacy download en_core_web_md)
- en-core-sci-sm (pip install <https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.4.0/en_core_sci_sm-0.4.0.tar.gz>)
- en-core-sci-md (pip install <https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.4.0/en_core_sci_md-0.4.0.tar.gz>)
- en-core-sci-lg (pip install <https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.4.0/en_core_sci_lg-0.4.0.tar.gz>)
- en_core_web_md (python -m spacy download en_core_web_md)
- en-core-sci-sm (pip install <https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.4.0/en_core_sci_sm-0.4.0.tar.gz>)
- en-core-sci-md (pip install <https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.4.0/en_core_sci_md-0.4.0.tar.gz>)
- en-core-sci-lg (pip install <https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.4.0/en_core_sci_lg-0.4.0.tar.gz>)

[github repo]: https://github.com/theislab/ehrapy
[pip]: https://pip.pypa.io
Expand Down
2 changes: 1 addition & 1 deletion docs/tutorials/notebooks
2 changes: 1 addition & 1 deletion ehrapy/plot/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,6 @@
from ehrapy.plot._colormaps import * # noqa: F403
from ehrapy.plot._missingno_pl_api import * # noqa: F403
from ehrapy.plot._scanpy_pl_api import * # noqa: F403
from ehrapy.plot._survival_analysis import kaplan_meier, kmf, ols
from ehrapy.plot._survival_analysis import coxph_forestplot, kaplan_meier, ols
from ehrapy.plot.causal_inference._dowhy import causal_effect
from ehrapy.plot.feature_ranking._feature_importances import rank_features_supervised
133 changes: 132 additions & 1 deletion ehrapy/plot/_survival_analysis.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,10 @@
from typing import TYPE_CHECKING

import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import numpy as np
import pandas as pd
from matplotlib import gridspec
from numpy import ndarray

from ehrapy.plot import scatter
Expand All @@ -14,7 +17,7 @@
from xmlrpc.client import Boolean

from anndata import AnnData
from lifelines import KaplanMeierFitter
from lifelines import CoxPHFitter, KaplanMeierFitter
from matplotlib.axes import Axes
from statsmodels.regression.linear_model import RegressionResults

Expand Down Expand Up @@ -293,5 +296,133 @@ def kaplan_meier(

if not show:
return ax

else:
return None


def coxph_forestplot(
aGuyLearning marked this conversation as resolved.
Show resolved Hide resolved
coxph: CoxPHFitter,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we actually redesign the interface and have this function take an AnnData object instead? We'd have to store the coxph results in the AnnData object. This would be much more in line with the rest of the ehrapy experience.

Also ping @eroell

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good point. Thanks. We should do this indeed - storing the results used for the plot in adata.uns I suppose

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might also need a key_added argument then or something to ensure that we don't always overwrite results. But yeah this behavior must be consistent across all of the surv analysis methods

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the kmf plot takes a list of KaplanMeierFitter Objects, so i went with that. in the survival analysis plots, ols is the only one taking an anndata.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be consistent about it and I think we should consider modifying all of them to take AnnData. It's what the rest of the API does

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the kmf plot takes a list of KaplanMeierFitter Objects, so i went with that. in the survival analysis plots, ols is the only one taking an anndata.

Yes, noticed where the idea here came from. Having .tl writing to AnnData, and .pl to read from it should be our go to. It has not been done like that everywhere yet unfortunately, and we strive for this consistency now :)

Copy link
Collaborator

@eroell eroell Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one way to go about this is to store the required table for plotting of the lifeline's fitter object in adata.uns[key_added] upon the .tl call;

.uns seems to me the only slot suitable for this table for the univariate functions such as kaplan_meier.

For cox_ph here, as the coefficients are on a per-variable level, I think adding them into the .var column would be kinda clean; basically adding into .var the columns {key_added}_{"coef"}, {key_added}_{"coef lower 95%"}, {key_added}_{"coef upper 95%"}, .... There could also be a boolean {key_added}_{cox_ph_variable} column, which the plotting function uses to pick only the variables used in the cox ph model

labels: list[str] | None = None,
aGuyLearning marked this conversation as resolved.
Show resolved Hide resolved
fig_size: tuple = (10, 10),
t_adjuster: float = 0.1,
ecolor: str = "dimgray",
size: int = 3,
marker: str = "o",
decimal: int = 2,
text_size: int = 12,
color: str = "k",
):
"""Plots a forest plot of the Cox Proportional Hazard model.
aGuyLearning marked this conversation as resolved.
Show resolved Hide resolved
Inspired by the forest plot in the zEpid package in Python.
aGuyLearning marked this conversation as resolved.
Show resolved Hide resolved
Link: https://zepid.readthedocs.io/en/latest/Graphics.html#effect-measure-plots
aGuyLearning marked this conversation as resolved.
Show resolved Hide resolved

Args:
coxph: Fitted CoxPHFitter object.
labels: List of labels for each coefficient, default uses the index of the coxph.summary.
aGuyLearning marked this conversation as resolved.
Show resolved Hide resolved
fig_size: Width, height in inches.
t_adjuster: Adjust the table to the right.
ecolor: Color of the error bars.
size: Size of the markers.
marker: Marker style.
decimal: Number of decimal places to display.
text_size: Font size of the text.
color: Color of the markers.

eroell marked this conversation as resolved.
Show resolved Hide resolved
aGuyLearning marked this conversation as resolved.
Show resolved Hide resolved
Examples:
>>> import ehrapy as ep
>>> adata = ep.dt.mimic_2(encoded=False)
>>> adata_subset = adata[:, ["mort_day_censored", "censor_flg", "gender_num", "afib_flg", "day_icu_intime_num"]]
>>> coxph = ep.tl.coxph(adata_subset, event_col="censor_flg", duration_col="mort_day_censored")
aGuyLearning marked this conversation as resolved.
Show resolved Hide resolved
>>> ep.pl.coxph_forestplot(coxph)

.. image:: /_static/docstring_previews/coxph_forestplot.png

"""

Zethson marked this conversation as resolved.
Show resolved Hide resolved
data = coxph.summary
aGuyLearning marked this conversation as resolved.
Show resolved Hide resolved
auc_col = "coef"

if labels is None:
labels = data.index
tval = []
ytick = []
for i in range(len(data)):
aGuyLearning marked this conversation as resolved.
Show resolved Hide resolved
if not np.isnan(data[auc_col][i]):
if (
(isinstance(data[auc_col][i], float))
& (isinstance(data["coef lower 95%"][i], float))
& (isinstance(data["coef upper 95%"][i], float))
):
tval.append(
[
round(data[auc_col][i], decimal),
(
"("
+ str(round(data["coef lower 95%"][i], decimal))
+ ", "
+ str(round(data["coef upper 95%"][i], decimal))
+ ")"
),
]
)
else:
tval.append(
[
data[auc_col][i],
("(" + str(data["coef lower 95%"][i]) + ", " + str(data["coef upper 95%"][i]) + ")"),
]
)
ytick.append(i)
else:
tval.append([" ", " "])
ytick.append(i)

maxi = round(((pd.to_numeric(data["coef upper 95%"])).max() + 0.1), 2) # setting x-axis maximum
aGuyLearning marked this conversation as resolved.
Show resolved Hide resolved

mini = round(((pd.to_numeric(data["coef lower 95%"])).min() - 0.1), 1) # setting x-axis minimum
aGuyLearning marked this conversation as resolved.
Show resolved Hide resolved

fig = plt.figure(figsize=fig_size)
gspec = gridspec.GridSpec(1, 6) # sets up grid
Zethson marked this conversation as resolved.
Show resolved Hide resolved
plot = plt.subplot(gspec[0, 0:4]) # plot of data
tabl = plt.subplot(gspec[0, 4:]) # table
plot.set_ylim(-1, (len(data))) # spacing out y-axis properly

plot.axvline(1, color="gray", zorder=1)
lower_diff = data[auc_col] - data["coef lower 95%"]
upper_diff = data["coef upper 95%"] - data[auc_col]
plot.errorbar(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly, seems that lifelines only draws the 95% errorbars and confidence intervals, did you also find no way to change this @aGuyLearning...?
If not, its something larger, and for now until someone requesting this, can keep like that

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in our coxph wrapper, we don't let user set the confidence intervalls, so this is the default. can be checked after fitting in the .confidence_intervals_ dataframe of the coxphfitter object.

data[auc_col],
data.index,
xerr=[lower_diff, upper_diff],
marker="None",
zorder=2,
ecolor=ecolor,
linewidth=0,
elinewidth=1,
)
plot.scatter(data[auc_col], data.index, c=color, s=(size * 25), marker=marker, zorder=3, edgecolors="None")
plot.xaxis.set_ticks_position("bottom")
plot.yaxis.set_ticks_position("left")
plot.get_xaxis().set_major_formatter(ticker.ScalarFormatter())
plot.get_xaxis().set_minor_formatter(ticker.NullFormatter())
plot.set_yticks(ytick)
plot.set_xlim([mini, maxi])
plot.set_xticks([mini, 1, maxi])
plot.set_xticklabels([mini, 1, maxi])
plot.set_yticklabels(labels)
plot.tick_params(axis="y", labelsize=text_size)
plot.yaxis.set_ticks_position("none")
plot.invert_yaxis() # invert y-axis to align values properly with table
tb = tabl.table(
cellText=tval, cellLoc="center", loc="right", colLabels=[auc_col, "95% CI"], bbox=[0, t_adjuster, 1, 1]
)
tabl.axis("off")
tb.auto_set_font_size(False)
tb.set_fontsize(text_size)
for _, cell in tb.get_celld().items():
cell.set_linewidth(0)
plot.spines["top"].set_visible(False)
plot.spines["right"].set_visible(False)
plot.spines["left"].set_visible(False)
return fig, plot
86 changes: 86 additions & 0 deletions tests/_scripts/coxph_forestplot_create_expected.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%load_ext autoreload\n",
"%autoreload 2"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import matplotlib.pyplot as plt\n",
"\n",
"import ehrapy as ep"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"current_notebook_dir = %pwd\n",
"_TEST_IMAGE_PATH = f\"{current_notebook_dir}/../plot/_images\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"adata = ep.dt.mimic_2(encoded=False)\n",
"adata_subset = adata[:, [\"mort_day_censored\", \"censor_flg\", \"gender_num\", \"afib_flg\", \"day_icu_intime_num\"]]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"genderafib_coxph = ep.tl.cox_ph(adata_subset, duration_col=\"mort_day_censored\", event_col=\"censor_flg\")\n",
"\n",
"fig, ax = ep.pl.coxph_forestplot(genderafib_coxph, fig_size=(12, 3), t_adjuster=0.15, marker=\"o\", size=2, text_size=14)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"fig.savefig(f\"{_TEST_IMAGE_PATH}/coxph_forestplot_expected.png\", dpi=80)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.10"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
6 changes: 6 additions & 0 deletions tests/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,12 @@ def rng():
return np.random.default_rng(seed=42)


@pytest.fixture
def mimic_2():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can your test not work with

@pytest.fixture
def mimic_2_10():
    mimic_2_10 = ep.dt.mimic_2()[:10]

    return mimic_2_10

?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using the smaller set we get the follwoing error:

ConvergenceError: Convergence halted due to matrix inversion problems. Suspicion is high collinearity. Please see the following tips in the lifelines documentation: https://lifelines.readthe

so i took the whole mimic_2 set

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, makes sense. I think it's okay if we use the smaller subset and filter the warning here https://github.com/theislab/ehrapy/blob/main/pyproject.toml#L132

adata = ep.dt.mimic_2()
return adata


@pytest.fixture
def mimic_2_encoded():
adata = ep.dt.mimic_2(encoded=True)
Expand Down
Binary file added tests/plot/_images/coxph_forestplot_expected.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Loading