From 18b43c7f4e451e53cf6c20ae140289a569833bf3 Mon Sep 17 00:00:00 2001 From: Lucie Contamin Date: Tue, 26 Nov 2024 15:47:01 -0500 Subject: [PATCH 1/8] Create release_protocol.md --- docs/release_protocol.md | 153 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 153 insertions(+) create mode 100644 docs/release_protocol.md diff --git a/docs/release_protocol.md b/docs/release_protocol.md new file mode 100644 index 0000000..2163bff --- /dev/null +++ b/docs/release_protocol.md @@ -0,0 +1,153 @@ +# Release Process + +Parts of this document is adapted for the U.S. Scenario Modeling Hub (SMH) from +[The Hubverse](https://hubverse-org.github.io/hubDevs/articles/release-process.html) +and from +[The Carpentries Developer's Handbook](https://carpentries.github.io/workbench-dev/releases.html) © +The Carpentries under the +[CC-BY 4.0 license](https://creativecommons.org/licenses/by/4.0/). + +## Workflow + +The release process follow a general workflow: + +1. Iterate on small bug fixes and PRs on branches: + - merging into `main` once ready to publish/deploy + - merging into a development branch for ongoing test/process +2. When ready to release on `main`: bump the version, add an annotated git tag, and release +3. Bump the version in main back to a development version + +Some of the steps in these instructions are specific for R packages, but they are largely process-based +and can apply to Python packages as well. + +### Versioning + +The SMH is built using very basic semantic versioning using the X.Y.Z[.9000] pattern. Everything that +has a .9000 attached is considered in-development. + +`X`: **Major version number**: this version number will change if there are significant breaking +changes to any of the user-facing workflows. That is, if a change requires users to modify their +scripts, then it is a breaking change. + +`Y`: **Minor version number**: this version number will change if there are new features or + enhanced behaviors available to the users in a way that *does not affect how users who do not +need the new features use the package*. This number grows the fastest in early stages of development. + +`Z`: **Patch version number**: this version number will change if something that was previously +broken was fixed, but no new features have been added. + +`9000`: **Development version indicator**: this version number indicates that the package is in a +development state and has the potential to change. When its on the main branch, it indicates +that the features or patches introduced have been reviewed and tested. This version is appended +after every successful release. + +### Hotfixes + +A hotfix is a bug fix for a situation where a bug has been found, but the main branch has new features +that are not yet ready to be released. + +## Checklist + +### Updates + +[] Create new branch from `main` (or `master`, or branch of interest) called `"//"` + +[] Update `NEWS.md` accordingly + +[] Commit, push + +[] Open Pull-Request (PR) on branch of interest (`main` for release we want to implement quickly or ready to deploy, other + branch of interest for ongoing updates) + +[] Merge after review, once all accepted + +**Create new release version only if important change, see version** + +### Release + +[] Create new branch from `main` (or `master`) called `"/release/X.Y.Z"` + +[] Update `DESCRIPTION` and `NEWS.md` accordingly + +[] Commit, push + +[] Open Pull-Request (PR) + +[] Merge after review, once all accepted + +[] Checkout `main` branch (or `master`) & make sure it's up to date + +[] Add new tag + +``` +git tag -a v.X.Y.Z -m '' +git push --tags +``` + +[] Create a new release on GitHub (can be done using R, for example) + +```r +usethis::use_github_release() +``` + +### Post-Release + +[] Create new branch from `main` (or `master`) called `"post-release-X.Y.Z"` + +[] Set project to dev version (can be done using R, for example): + - adding `.9000` to the version number + - adding new heading to `NEWS.md` (`## (development)`) + +```r +usethis::use_dev_version() +``` + +[] Commit, push, open Pull-Request (PR) + +[] Merge after review, once all accepted + + +### Subsequent updates + +[] Create new branch from `main` (or `master`, or branch of interest) called `"//"` + +[] Update `NEWS.md` accordingly + +[] Commit, push + +[] Open Pull-Request (PR) + +[] Merge after review, once all accepted + +**Create new release version only if important change, see version** + +### Hotfixes + +[] Create new branch from `main` (or `master`) called `"/hotfix/"` + +``` +git switch --detach v.X.Y.Z' +git switch -c /hotfix/ +``` + +[] Write a test, fix the bug, commit, push +** Don't change the version ** + +[] Open Pull-Request (PR) + +[] Update `NEWS.md` accordingly and bump the patch version in `DESCRIPTION` + +``` +git commit -m 'bump version to X.Y.Z+1' +git tag -a v.X.Y.Z+1 -m '' +git push +git push --tags +``` + +[] Create a new release on GitHub (can be done using R, for example) + +```r +usethis::use_github_release() +``` + +[] Resolve conflicts in PR & merge into `main` (or `master`) From 25bd683c808d1d674d62948567337418c04a42c7 Mon Sep 17 00:00:00 2001 From: Lucie Contamin Date: Tue, 10 Dec 2024 15:40:20 -0500 Subject: [PATCH 2/8] Update release_protocol.md --- docs/release_protocol.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/docs/release_protocol.md b/docs/release_protocol.md index 2163bff..635f581 100644 --- a/docs/release_protocol.md +++ b/docs/release_protocol.md @@ -13,7 +13,7 @@ The release process follow a general workflow: 1. Iterate on small bug fixes and PRs on branches: - merging into `main` once ready to publish/deploy - - merging into a development branch for ongoing test/process + - merging into a development branch (here called `dev`) for ongoing test/process 2. When ready to release on `main`: bump the version, add an annotated git tag, and release 3. Bump the version in main back to a development version @@ -52,7 +52,7 @@ that are not yet ready to be released. [] Create new branch from `main` (or `master`, or branch of interest) called `"//"` -[] Update `NEWS.md` accordingly +[] Update `Changelog.md` accordingly [] Commit, push @@ -67,7 +67,7 @@ that are not yet ready to be released. [] Create new branch from `main` (or `master`) called `"/release/X.Y.Z"` -[] Update `DESCRIPTION` and `NEWS.md` accordingly +[] Update `pyproject.toml` and `Changelog.md` accordingly [] Commit, push @@ -96,7 +96,7 @@ usethis::use_github_release() [] Set project to dev version (can be done using R, for example): - adding `.9000` to the version number - - adding new heading to `NEWS.md` (`## (development)`) + - adding new heading to `Changelog.md` (`## (development)`) ```r usethis::use_dev_version() @@ -111,7 +111,7 @@ usethis::use_dev_version() [] Create new branch from `main` (or `master`, or branch of interest) called `"//"` -[] Update `NEWS.md` accordingly +[] Update `Changelog.md` accordingly [] Commit, push @@ -135,7 +135,7 @@ git switch -c /hotfix/ [] Open Pull-Request (PR) -[] Update `NEWS.md` accordingly and bump the patch version in `DESCRIPTION` +[] Update `Changelog.md` accordingly and bump the patch version in `pyproject.toml` ``` git commit -m 'bump version to X.Y.Z+1' From 1fecb7bdb46848c5b392594de263fbb1374d2279 Mon Sep 17 00:00:00 2001 From: Lucie Contamin Date: Tue, 10 Dec 2024 15:43:50 -0500 Subject: [PATCH 3/8] Create changelog.md --- changelog.md | 7 +++++++ 1 file changed, 7 insertions(+) create mode 100644 changelog.md diff --git a/changelog.md b/changelog.md new file mode 100644 index 0000000..541cf09 --- /dev/null +++ b/changelog.md @@ -0,0 +1,7 @@ +# Changelog + +All notable changes to this project will be documented in this file. + +## 0.0.1 + +First version From e0e83bef1efed5af20bbe543ff1e3e45a1899aa0 Mon Sep 17 00:00:00 2001 From: kleinjr1 Date: Fri, 13 Dec 2024 11:04:10 -0500 Subject: [PATCH 4/8] band depth envelope --- SMHviz_plot/figures.py | 196 ++++++++++++++++++++++++++++++++------ SMHviz_plot/utils_data.py | 53 +++++++++++ 2 files changed, 220 insertions(+), 29 deletions(-) diff --git a/SMHviz_plot/figures.py b/SMHviz_plot/figures.py index 2edb39a..b57e150 100644 --- a/SMHviz_plot/figures.py +++ b/SMHviz_plot/figures.py @@ -1,9 +1,8 @@ from datetime import timedelta - -import numpy as np import pandas as pd from SMHviz_plot.utils import * +from utils_data import * def add_scatter_trace(fig, data, legend_name, x_col="time_value", y_col="value", width=2, connect_gaps=None, @@ -70,7 +69,7 @@ def add_scatter_trace(fig, data, legend_name, x_col="time_value", y_col="value", showlegend=show_legend, customdata=custom_data, hovertemplate=hover_text + - "Value: %{y:,.2f}
Epiweek: %{x|%Y-%m-%d}"), + "Value: %{y:,.2f}
Epiweek: %{x|%Y-%m-%d}"), row=subplot_coord[0], col=subplot_coord[1]) if connect_gaps is not None: fig.update_traces(connectgaps=connect_gaps) @@ -216,7 +215,7 @@ def ui_ribbons(fig, df_plot, quant_sel, legend_name, x_col="target_end_date", y_ name=legend_name, mode='lines', line=dict(width=line_width), - marker=dict(color=re.sub(", 1\)", ", " + str(opacity) + ")", color)), + marker=dict(color=re.sub(r", 1\)", ", " + str(opacity) + ")", color)), legendgroup=legend_name, showlegend=show_legend, hovertemplate=second_hover_text), @@ -228,10 +227,10 @@ def ui_ribbons(fig, df_plot, quant_sel, legend_name, x_col="target_end_date", y_ name=legend_name, line=dict(width=line_width), mode='lines', - marker=dict(color=re.sub(", 1\)", ", " + str(opacity) + ")", color)), + marker=dict(color=re.sub(r", 1\)", ", " + str(opacity) + ")", color)), legendgroup=legend_name, showlegend=False, - fillcolor=re.sub(", 1\)", ", " + str(opacity) + ")", color), + fillcolor=re.sub(r", 1\)", ", " + str(opacity) + ")", color), fill='tonexty', hovertemplate=first_hover_text), row=subplot_coord[0], col=subplot_coord[1]) @@ -335,7 +334,7 @@ def make_proj_plot(fig_plot, proj_data, intervals=None, intervals_dict=None, x_c elif len(intervals) > 1: intervals.sort(reverse=True) for i in range(0, len(intervals)): - if i is 0 and plot_df is None: + if i == 0 and plot_df is None: ui_show_legend = show_legend else: ui_show_legend = False @@ -540,7 +539,7 @@ def make_scatter_plot(proj_data, truth_data, intervals=None, intervals_dict=None else: show_legend = False if truth_facet is not None: - if truth_data_type is "scatter": + if truth_data_type == "scatter": if w_delay is not None: plot_truth_df = truth_facet[pd.to_datetime(truth_facet[x_truth_col]) <= (max(pd.to_datetime(truth_facet[x_truth_col])) - @@ -560,7 +559,7 @@ def make_scatter_plot(proj_data, truth_data, intervals=None, intervals_dict=None subplot_coord=subplot_coord, x_col=x_truth_col, y_col=y_truth_col, width=line_width, connect_gaps=connect_gaps, mode="markers", color="rgb(200, 200, 200)", line_width=0.5) - elif truth_data_type is "bar": + elif truth_data_type == "bar": fig_plot = add_bar_trace(fig_plot, truth_facet, truth_legend_name, show_legend=show_legend, hover_text=truth_legend_name + "
", subplot_coord=subplot_coord, x_col=x_truth_col) @@ -590,7 +589,7 @@ def make_scatter_plot(proj_data, truth_data, intervals=None, intervals_dict=None else: fig_plot = fig_plot if truth_data is not None: - if truth_data_type is "scatter": + if truth_data_type == "scatter": if w_delay is not None: plot_truth_df = truth_data[pd.to_datetime(truth_data[x_truth_col]) <= (max(pd.to_datetime(truth_data[x_truth_col])) - @@ -608,7 +607,7 @@ def make_scatter_plot(proj_data, truth_data, intervals=None, intervals_dict=None hover_text=truth_legend_name + "
", x_col=x_truth_col, width=line_width, connect_gaps=connect_gaps, mode="markers", color="rgb(200, 200, 200)", show_legend=False, line_width=0.5) - elif truth_data_type is "bar": + elif truth_data_type == "bar": fig_plot = add_bar_trace(fig_plot, truth_data, truth_legend_name, hover_text=truth_legend_name + "
", x_col=x_truth_col) else: @@ -631,11 +630,11 @@ def make_scatter_plot(proj_data, truth_data, intervals=None, intervals_dict=None # View update to_vis = list() leg_only = list() - if viz_truth_data is True: + if viz_truth_data == True: to_vis.append(truth_legend_name) elif viz_truth_data == "legendonly": leg_only.append(truth_legend_name) - if ensemble_view is True: + if ensemble_view == True: to_vis.append(ensemble_name) leg_only = leg_only + list(proj_data[legend_col].unique()) leg_only.remove(ensemble_name) @@ -656,7 +655,7 @@ def make_scatter_plot(proj_data, truth_data, intervals=None, intervals_dict=None if notes is not None: fig_plot.update_layout(legend={"title": {"text": notes + "
", "side": "top"}}) # Add buttons - if button is True and ensemble_name is not None: + if button == True and ensemble_name is not None: button = make_ens_button(fig_plot, viz_truth_data=viz_truth_data, truth_legend_name=truth_legend_name, ensemble_name=ensemble_name, button_name="Ensemble", button_opt=button_opt) fig_plot.update_layout( @@ -753,7 +752,7 @@ def add_point_scatter(fig, df, ens_name, color_dict=None, multiply=1, symbol="ci full_model_name = "".join(list(model)) # prerequisite color_marker = color_line_trace(color_dict, model, line_width=0) - color_marker = re.sub(", 1\)", ", " + str(opacity) + ")", color_marker[0]) + color_marker = re.sub(r", 1\)", ", " + str(opacity) + ")", color_marker[0]) model_marker = dict(size=20, color=color_marker, symbol=symbol) fig.add_trace(go.Scatter(x=df_model["full_x"], y=df_model["rel_change"] * multi, @@ -1063,8 +1062,9 @@ def add_spaghetti_plot(fig, df, color_dict, legend_dict=None, all_traj_df.loc[pd.isna(all_traj_df['value']), 'type_id'] = np.nan # Add single trace - color = re.sub(", 1\)", ", " + str(opacity) + ")", col_line[0]) - fig = add_scatter_trace(fig, all_traj_df, legend_name, x_col="target_end_date", mode="lines", color=color, + color = re.sub(r", 1\)", ", " + str(opacity) + ")", col_line[0]) + fig = add_scatter_trace(fig, all_traj_df, legend_name, x_col="target_end_date", + mode="lines", color=color, show_legend=show_legend, subplot_coord=subplot_coord, custom_data=all_traj_df['type_id'], hover_text=hover_text + "Model: " + legend_name + "
Type ID: %{customdata}
") @@ -1076,17 +1076,133 @@ def add_spaghetti_plot(fig, df, color_dict, legend_dict=None, return fig -def make_spaghetti_plot(df, legend_col="model_name", spag_col="type_id", show_legend=True, hover_text="", opacity=0.3, - subplot=False, title="", height=1000, subplot_col=None, subplot_titles=None, palette="turbo", - share_x="all", share_y="all", x_title="", y_title="N", theme="plotly_white", color_dict=None, - add_median=False, legend_dict=None): +def add_spaghetti_plot_envelope(fig, df, color_dict, band_depth_limit, legend_dict=None, + legend_col="model_name", spag_col="type_id", show_legend=True, + hover_text="", opacity=0.3, + subplot_coord=None, add_median=False, median=0.5): + """ + :param band_depth_limit: Show envelope around trajectories with band depth greater than X% + """ + + if add_median is True: + df_med = df[df[spag_col] == median] + df = df[df[spag_col] != median] + else: + df_med = None + for leg in df[legend_col].drop_duplicates(): + # df_plot contains all data for a given model (and scenario and age group) + df_plot = df[df[legend_col] == leg].drop(legend_col, axis=1) + if legend_dict is None: + legend_name = leg + col_line = color_line_trace(color_dict, leg) + else: + legend_name = legend_dict[leg] + col_line = color_line_trace(color_dict, legend_name) + + # Prepare df with all trajectories in a model, separated by null rows (which break up trajectories into different lines) + temp = pd.DataFrame() + traj_list = list(df_plot['type_id'].unique()) + temp.loc[:, 'value'] = [np.nan] * len(traj_list) + temp.loc[:, 'type_id'] = traj_list + temp.loc[:, 'target_end_date'] = [pd.NaT] * len(traj_list) + all_traj_df = pd.concat([df_plot, temp], axis=0) + all_traj_df = all_traj_df.sort_values(['type_id', 'target_end_date']) + # Once Nan's are inserted between typeIDs, insert Nan in type ID col so hover text renders correctly + all_traj_df.loc[pd.isna(all_traj_df['value']), 'type_id'] = np.nan + band_depth_df = generate_band_depth_df(df_plot) + all_traj_df = all_traj_df.merge(band_depth_df, how='left', on='type_id') + + # Add single trace + connect_gaps = None + color = re.sub(r", 1\)", ", " + str(opacity) + ")", col_line[0]) + fig.add_trace(go.Scatter(x=all_traj_df['target_end_date'], + y=all_traj_df['value'], + name=legend_name, + mode='lines', + marker=dict(color=color, line_width=0.0001), + legendgroup=legend_name, + line=dict(width=2, dash=None), + visible=True, + showlegend=show_legend, + customdata=all_traj_df['type_id'], + text=all_traj_df['band_depth'], + hovertemplate=hover_text + f"Model: {legend_name}
" + "Type ID: %{customdata}
" + "Modified band depth: %{text:.2%}
" + "Value: %{y:,.2f}
Epiweek: %{x|%Y-%m-%d}" + ), + row=subplot_coord[0], col=subplot_coord[1]) + if connect_gaps is not None: + fig.update_traces(connectgaps=connect_gaps) + if add_median is True and df_med is not None: + df_plot_med = df_med[df_med[legend_col] == leg] + add_scatter_trace(fig, df_plot_med, legend_name, x_col="target_end_date", + show_legend=False, + mode="lines", subplot_coord=subplot_coord, width=4, + hover_text=hover_text + spag_col.title() + ": Median
", + color=col_line[0]) + + # Add shaded region for trajectories with top X% of band depths + band_depth_filtered = \ + band_depth_df.quantile(q=band_depth_limit, axis=0, interpolation='nearest').iloc[1] + df_top_x_pctile = all_traj_df.loc[all_traj_df['band_depth'] >= band_depth_filtered, :] + # shade region + min_top_x_envelope = df_top_x_pctile.groupby('target_end_date')['value'].agg( + 'min').reset_index() + max_top_x_envelope = df_top_x_pctile.groupby('target_end_date')['value'].agg( + 'max').reset_index() + + # Add trace for min + fig.add_trace(go.Scatter(x=min_top_x_envelope['target_end_date'], + y=min_top_x_envelope['value'], + name=legend_name, + mode='lines', + legendgroup=legend_name, + marker=dict(color=color, line_width=0.0001), + line=dict(width=2, dash=None), + visible=True, + showlegend=False, + ), + row=subplot_coord[0], col=subplot_coord[1]) + # Add trace for max + fig.add_trace(go.Scatter(x=max_top_x_envelope['target_end_date'], + y=max_top_x_envelope['value'], + name=legend_name, + mode='lines', + legendgroup=legend_name, + marker=dict(color=color, line_width=0.0001), + line=dict(width=2, dash=None), + visible=True, + fill='tonexty', + showlegend=False, + ), + row=subplot_coord[0], col=subplot_coord[1]) + if connect_gaps is not None: + fig.update_traces(connectgaps=connect_gaps) + + return fig + + +def make_spaghetti_plot(df, legend_col="model_name", spag_col="type_id", show_legend=True, + hover_text="", opacity=0.3, + subplot=False, title="", height=1000, subplot_col=None, subplot_titles=None, + palette="turbo", + share_x="all", share_y="all", x_title="", y_title="N", theme="plotly_white", + color_dict=None, + add_median=False, legend_dict=None, band_depth_limit=None): + """ + :param band_depth_limit: if not None, must be a float X between 0 and 1 where the plot will + show envelope around trajectories with band depth greater than X% + """ + # Colorscale if color_dict is None: color_dict = make_palette_sequential(df, legend_col, palette=palette) # Plot if subplot is True: sub_var = list(df[subplot_col].unique()) - fig = prep_subplot(sub_var, subplot_titles, x_title, y_title, sort=False, share_x=share_x, share_y=share_y) + fig = prep_subplot(sub_var, subplot_titles, x_title, y_title, sort=False, share_x=share_x, + share_y=share_y) for var in sub_var: df_var = df[df[subplot_col] == var].drop(subplot_col, axis=1) plot_coord = subplot_row_col(sub_var, var) @@ -1094,16 +1210,38 @@ def make_spaghetti_plot(df, legend_col="model_name", spag_col="type_id", show_le show_legend = show_legend else: show_legend = False - add_spaghetti_plot(fig, df_var, color_dict=color_dict, legend_col=legend_col, - spag_col=spag_col, show_legend=show_legend, hover_text=hover_text, - opacity=opacity, subplot_coord=plot_coord, add_median=add_median, - legend_dict=legend_dict) + if band_depth_limit and band_depth_limit >= 0 and band_depth_limit <= 1: + add_spaghetti_plot_envelope(fig, df_var, color_dict=color_dict, + legend_col=legend_col, + spag_col=spag_col, show_legend=show_legend, + hover_text=hover_text, + opacity=opacity, subplot_coord=plot_coord, + add_median=add_median, + legend_dict=legend_dict, + band_depth_limit=band_depth_limit) + + else: + add_spaghetti_plot(fig, df_var, color_dict=color_dict, legend_col=legend_col, + spag_col=spag_col, show_legend=show_legend, + hover_text=hover_text, + opacity=opacity, subplot_coord=plot_coord, add_median=add_median, + legend_dict=legend_dict) else: fig = go.Figure() fig.update_layout(xaxis_title=x_title, yaxis_title=y_title) - add_spaghetti_plot(fig, df, color_dict=color_dict, legend_col=legend_col, - spag_col=spag_col, show_legend=show_legend, hover_text=hover_text, - opacity=opacity, subplot_coord=None, add_median=add_median, legend_dict=legend_dict) + if band_depth_limit and band_depth_limit >= 0 and band_depth_limit <= 1: + add_spaghetti_plot_envelope(fig, df, color_dict=color_dict, legend_col=legend_col, + spag_col=spag_col, show_legend=show_legend, + hover_text=hover_text, + opacity=opacity, subplot_coord=None, + add_median=add_median, + legend_dict=legend_dict, band_depth_limit=band_depth_limit) + + else: + add_spaghetti_plot(fig, df, color_dict=color_dict, legend_col=legend_col, + spag_col=spag_col, show_legend=show_legend, hover_text=hover_text, + opacity=opacity, subplot_coord=None, add_median=add_median, + legend_dict=legend_dict) subplot_fig_output(fig, title, subtitle="", height=height, theme=theme) return fig diff --git a/SMHviz_plot/utils_data.py b/SMHviz_plot/utils_data.py index 8b4e51e..a82f855 100644 --- a/SMHviz_plot/utils_data.py +++ b/SMHviz_plot/utils_data.py @@ -402,3 +402,56 @@ def prep_multipat_plot_comb(pathogen_information, calc_mean=False): detail_quantile.columns = (detail_quantile.columns.get_level_values(0) + "-" + detail_quantile.columns.get_level_values(1)) return {"all": all_quantile, "detail": detail_quantile} + +def generate_bands_constraints_df(band_list, date_list, all_traj_df_filtered_to_scenario_model_age_group): + c_df = pd.DataFrame({'target_end_date': date_list}) + for b in band_list: + # b represents tuple of trajectories (type_ids) + # Filter to only those type IDs + b_df = all_traj_df_filtered_to_scenario_model_age_group.loc[all_traj_df_filtered_to_scenario_model_age_group['type_id'].isin(b), :] + # Groupby date and get min/max in the value col + b_df = b_df.groupby('target_end_date').agg(min=('value', 'min'), max=('value', 'max')).reset_index() + b_df = b_df.rename(columns={'min': f'min_{b}', 'max': f'max_{b}'}) + + # Add these columns to c_df + c_df = c_df.merge(b_df, how='left', on='target_end_date') + + return c_df + + +def generate_band_depth_df(df: pd.DataFrame, N=5, j=3) -> pd.DataFrame: + """ + :param df: dataframe for all trajectory data for a given round/target/location (given by file loaded) + scenario/model/age group (filtered in df) + :param N: number of bands to test for inclusion (for a given trajectory) + :param j: number of randomly sampled trajectories that form a band + :returns 2-col df of trajectories + band depths + """ + # Select bands to test for inclusion + traj_list = list(df['type_id'].unique()) + # As an additional quality check, would be good to remove trajectories missing any dates + # I.e. check how many times each trajectory appears. If less than unique num of dates, remove from list + + selected_bands = [] + for i in range(N): + band = np.random.choice(a=traj_list, size=j, replace=False) + selected_bands.append(band) + # Additional check: Check that functions from all chosen bands have at least 2 values for every date. + # If not, won't be able to get bounds of the band and must choose a different one + + # Get large dataframe with min and a max by epiweek for each band + dates = sorted(list(df['target_end_date'].unique())) + bands_constraints_df = generate_bands_constraints_df(selected_bands, dates, df) + + # Merge in constraints + df = df.merge(bands_constraints_df, how='left', on='target_end_date') + # Determine inclusion in band at each epiweek + for b in selected_bands: + df[f'in_band_{b}'] = df.apply(lambda x: (x['value'] >= x[f'min_{b}']) & (x['value'] <= x[f'max_{b}']), axis=1) + df = df.drop(columns=[f'min_{b}', f'max_{b}']) + df = df.drop(columns=['value', 'target_end_date']) + # Per trajectory, get band depth + df = df.groupby('type_id').apply(lambda x: x.sum()/len(x)).drop(columns=['type_id']) + df['band_depth'] = df.apply(lambda x: x.mean(), axis=1) + df = df.reset_index()[['type_id', 'band_depth']] + + return df \ No newline at end of file From 43db624b419c5977ac92a1d03b65fee773d0094b Mon Sep 17 00:00:00 2001 From: kleinjr1 Date: Fri, 13 Dec 2024 11:06:19 -0500 Subject: [PATCH 5/8] increase N; preprocessing will speed up from here --- SMHviz_plot/utils_data.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/SMHviz_plot/utils_data.py b/SMHviz_plot/utils_data.py index a82f855..1bcdb5b 100644 --- a/SMHviz_plot/utils_data.py +++ b/SMHviz_plot/utils_data.py @@ -419,7 +419,7 @@ def generate_bands_constraints_df(band_list, date_list, all_traj_df_filtered_to_ return c_df -def generate_band_depth_df(df: pd.DataFrame, N=5, j=3) -> pd.DataFrame: +def generate_band_depth_df(df: pd.DataFrame, N=50, j=3) -> pd.DataFrame: """ :param df: dataframe for all trajectory data for a given round/target/location (given by file loaded) + scenario/model/age group (filtered in df) :param N: number of bands to test for inclusion (for a given trajectory) From fe9dbd460f7a24468bfe8bc59103ad2f8cc97a10 Mon Sep 17 00:00:00 2001 From: kleinjr1 Date: Fri, 13 Dec 2024 11:17:37 -0500 Subject: [PATCH 6/8] change N to run script quicker --- SMHviz_plot/utils_data.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/SMHviz_plot/utils_data.py b/SMHviz_plot/utils_data.py index 1bcdb5b..719c33d 100644 --- a/SMHviz_plot/utils_data.py +++ b/SMHviz_plot/utils_data.py @@ -419,7 +419,7 @@ def generate_bands_constraints_df(band_list, date_list, all_traj_df_filtered_to_ return c_df -def generate_band_depth_df(df: pd.DataFrame, N=50, j=3) -> pd.DataFrame: +def generate_band_depth_df(df: pd.DataFrame, N=10, j=3) -> pd.DataFrame: """ :param df: dataframe for all trajectory data for a given round/target/location (given by file loaded) + scenario/model/age group (filtered in df) :param N: number of bands to test for inclusion (for a given trajectory) From 0b66361abe97df1171bd81319fdd5db7e2fb3850 Mon Sep 17 00:00:00 2001 From: kleinjr1 Date: Fri, 13 Dec 2024 13:04:07 -0500 Subject: [PATCH 7/8] add additional documentation --- SMHviz_plot/figures.py | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/SMHviz_plot/figures.py b/SMHviz_plot/figures.py index b57e150..5d84221 100644 --- a/SMHviz_plot/figures.py +++ b/SMHviz_plot/figures.py @@ -52,7 +52,7 @@ def add_scatter_trace(fig, data, legend_name, x_col="time_value", y_col="value", :parameter dash: Option to print the line is dash, options include 'dash', 'dot', and 'dashdot'. By default, "None", no dash. :type dash: str | None - :parameter custom_data: Add custom data + :parameter custom_data: Add custom data, which can be referenced in the hover text :type dash: str | None | pandas.DataFrame :return: a plotly.graph_objs.Figure object with an added trace """ @@ -1081,7 +1081,10 @@ def add_spaghetti_plot_envelope(fig, df, color_dict, band_depth_limit, legend_di hover_text="", opacity=0.3, subplot_coord=None, add_median=False, median=0.5): """ - :param band_depth_limit: Show envelope around trajectories with band depth greater than X% + :param band_depth_limit: if not None, must be a float X between 0 and 1 where the plot will + show envelope around trajectories with band depth greater than X%. + Band depth is a measure of the representativeness of one trajectory among an ensemble. + For more details, see https://ieeexplore.ieee.org/document/6875964 - Curve Boxplot: Generalization of Boxplot for Ensembles of Curves by Mirzargar et al. """ if add_median is True: @@ -1192,7 +1195,9 @@ def make_spaghetti_plot(df, legend_col="model_name", spag_col="type_id", show_le add_median=False, legend_dict=None, band_depth_limit=None): """ :param band_depth_limit: if not None, must be a float X between 0 and 1 where the plot will - show envelope around trajectories with band depth greater than X% + show envelope around trajectories with band depth greater than X%. + Band depth is a measure of the representativeness of one trajectory among an ensemble. + For more details, see https://ieeexplore.ieee.org/document/6875964 - Curve Boxplot: Generalization of Boxplot for Ensembles of Curves by Mirzargar et al. """ # Colorscale From 65d346a3d9c002ca526400efb469b795b93192b4 Mon Sep 17 00:00:00 2001 From: April Nellis Date: Wed, 18 Dec 2024 11:08:46 -0500 Subject: [PATCH 8/8] Cleaned import statements in figures.py --- SMHviz_plot/figures.py | 1 - 1 file changed, 1 deletion(-) diff --git a/SMHviz_plot/figures.py b/SMHviz_plot/figures.py index 5d84221..aa207f1 100644 --- a/SMHviz_plot/figures.py +++ b/SMHviz_plot/figures.py @@ -2,7 +2,6 @@ import pandas as pd from SMHviz_plot.utils import * -from utils_data import * def add_scatter_trace(fig, data, legend_name, x_col="time_value", y_col="value", width=2, connect_gaps=None,