This file contains instructions for creating the Use Case Validation results.
output_subdir
: string, name of file inae-project/results/scans/use_case/best_initializations
where the best initializations are stored.ae-project/results/scans/use_case/best_initializations/[output_subdir].csv
, generated by the use case scanner. For details, see Use Case Scanner Documentation.
On Engaging, run the following commands in an interactive session:
- Load
python 3.9
:module load python/3.9.4
. - Run
python -m hmmscan.scripts.validation.generate-samples --validation_type use_case --init_file_name scans/use_case/best_initializations/[output_subdir].csv --sequence_name [sequence_name] --ae_type [ae_type] --n_samples 100
.
This generates a file for each sequence_name
, ae_type
, and model structure in ae-project/validation/use_case/samples
and ae-project/validation/use_case/state_paths
.
Run the following commands, where commands 4-6 create results for Section S5 in Online Resource 1:
python -m hmmscan.scripts.validation.generate-samples --validation_type use_case --init_file_name scans/use_case/best_initializations/by_date_ex_iqr.csv --sequence_name dfa_by_date_ex_iqr_outliers --ae_type serious_std --n_samples 100
python -m hmmscan.scripts.validation.generate-samples --validation_type use_case --init_file_name scans/use_case/best_initializations/by_date_ex_iqr.csv --sequence_name dfb_by_date_ex_iqr_outliers --ae_type serious_std --n_samples 100
python -m hmmscan.scripts.validation.generate-samples --validation_type use_case --init_file_name scans/use_case/best_initializations/by_date_ex_iqr.csv --sequence_name dfc_by_date_ex_iqr_outliers --ae_type serious_std --n_samples 100
python -m hmmscan.scripts.validation.generate-samples --validation_type use_case --init_file_name scans/use_case/best_initializations/by_date.csv --sequence_name dfa_by_date --ae_type serious_std_iqr_outlier_ceiling --n_samples 100
python -m hmmscan.scripts.validation.generate-samples --validation_type use_case --init_file_name scans/use_case/best_initializations/by_date.csv --sequence_name dfb_by_date --ae_type serious_std_iqr_outlier_ceiling --n_samples 100
python -m hmmscan.scripts.validation.generate-samples --validation_type use_case --init_file_name scans/use_case/best_initializations/by_date.csv --sequence_name dfc_by_date --ae_type serious_std_iqr_outlier_ceiling --n_samples 100
Evaluate which model has the lowest BIC on each sample.
For this section, you will need to look at ae-project/results/scans/use_case/best_initializations/[output_subdir].csv
and find the best number of states and mixture components for each sequence_name
and ae_type
combination.
The best structure is referred to below as best_n_states
and best_n_mix_comps
.
This best structure also defines a grid_index
, where grid_index = 9 * (best_n_states - 1) + best_n_mix_comps - 1
.
From the top level of this directory on Engaging, run the following command for each combination of sequence_name
and ae_type
:
sbatch --array=0-99 hmmscan/cluster/evaluate-use-case-samples.sh scans/use_case/best_initializations/[output_subdir].csv [sequence_name] [ae_type] [best_n_states] [best_n_mix_comps] [grid_index]
.
This generates a file for each sequence name, ae type, candidate model, and sample id in ae-project/validation/use_case/fits
and in ae-project/validation/use_case/state_paths
.
Run these commands:
sbatch --array=0-99 hmmscan/cluster/evaluate-use-case-samples.sh scans/use_case/best_initializations/by_date_ex_iqr.csv dfa_by_date_ex_iqr_outliers serious_std 3 2 19
sbatch --array=0-99 hmmscan/cluster/evaluate-use-case-samples.sh scans/use_case/best_initializations/by_date_ex_iqr.csv dfb_by_date_ex_iqr_outliers serious_std 3 3 20
sbatch --array=0-99 hmmscan/cluster/evaluate-use-case-samples.sh scans/use_case/best_initializations/by_date_ex_iqr.csv dfc_by_date_ex_iqr_outliers serious_std 2 3 11
sbatch --array=0-99 hmmscan/cluster/evaluate-use-case-samples.sh scans/use_case/best_initializations/by_date.csv dfa_by_date serious_std_iqr_outlier_ceiling 3 2 19
sbatch --array=0-99 hmmscan/cluster/evaluate-use-case-samples.sh scans/use_case/best_initializations/by_date.csv dfb_by_date serious_std_iqr_outlier_ceiling 3 3 20
sbatch --array=0-99 hmmscan/cluster/evaluate-use-case-samples.sh scans/use_case/best_initializations/by_date.csv dfc_by_date serious_std_iqr_outlier_ceiling 2 3 11
This script aggregates the outputs in ae-project/validation/use_case/fits
and ae-project/validation/use_case/state_paths
.
On Engaging, run the following commands in an interactive session:
- Load
python 3.9
:module load python/3.9.4
. - Run
python -m hmmscan.scripts.validation.aggregate_use_case_fits
.
These scripts create a CSV file ae-project/validation/use_case/fits/all_fits.csv
aggregating the individual files from step 2.
Fit HMMs to the use case samples.
This section uses best_n_states
and best_n_mix_comps
from step 2.
On Engaging, run the following commands in an interactive session:
- Load
python 3.9
:module load python/3.9.4
. - Run
python -m hmmscan.scripts.validation.evaluate-samples --validation_type ci --init_file_path scans/use_case/best_initializations/[output_subdir].csv --generating_sequence_name [sequence_name] --generating_ae_type [ae_type] --generating_n_states [best_n_states] --generating_n_mix_comps [best_n_mix_comps] --sample_index_min 0 --sample_index_max 99
.
Run these commands:
python -m hmmscan.scripts.validation.evaluate-samples --validation_type ci --init_file_path scans/use_case/best_initializations/by_date_ex_iqr.csv --generating_sequence_name dfa_by_date_ex_iqr_outliers --generating_ae_type serious_std --generating_n_states 3 --generating_n_mix_comps 2 --sample_index_min 0 --sample_index_max 99
python -m hmmscan.scripts.validation.evaluate-samples --validation_type ci --init_file_path scans/use_case/best_initializations/by_date_ex_iqr.csv --generating_sequence_name dfb_by_date_ex_iqr_outliers --generating_ae_type serious_std --generating_n_states 3 --generating_n_mix_comps 3 --sample_index_min 0 --sample_index_max 99
python -m hmmscan.scripts.validation.evaluate-samples --validation_type ci --init_file_path scans/use_case/best_initializations/by_date_ex_iqr.csv --generating_sequence_name dfc_by_date_ex_iqr_outliers --generating_ae_type serious_std --generating_n_states 2 --generating_n_mix_comps 3 --sample_index_min 0 --sample_index_max 99
python -m hmmscan.scripts.validation.evaluate-samples --validation_type ci --init_file_path scans/use_case/best_initializations/by_date.csv --generating_sequence_name dfa_by_date --generating_ae_type serious_std_iqr_outlier_ceiling --generating_n_states 3 --generating_n_mix_comps 2 --sample_index_min 0 --sample_index_max 99
python -m hmmscan.scripts.validation.evaluate-samples --validation_type ci --init_file_path scans/use_case/best_initializations/by_date.csv --generating_sequence_name dfb_by_date --generating_ae_type serious_std_iqr_outlier_ceiling --generating_n_states 3 --generating_n_mix_comps 3 --sample_index_min 0 --sample_index_max 99
python -m hmmscan.scripts.validation.evaluate-samples --validation_type ci --init_file_path scans/use_case/best_initializations/by_date.csv --generating_sequence_name dfc_by_date --generating_ae_type serious_std_iqr_outlier_ceiling --generating_n_states 2 --generating_n_mix_comps 3 --sample_index_min 0 --sample_index_max 99
It is probably easiest to generate the necessary plots on a local laptop using RStudio, rather than using Engaging. To do so, copy ae-project/validation/use_case/fits/all_fits.csv
to your local version of ae-project
.
Then, you can use the script use_case_validation.R
in the hmmscan/scipts/viz
directory to look at the model validation results.
For the confidence intervals, you can use the script ci.R
.