-
Notifications
You must be signed in to change notification settings - Fork 20
Validation
The name and number of the columns are corresponding to the expected format:
origin_date | scenario_id | target | horizon | location | age_group | output_type | output_type_id | run_grouping | stochastic_run | value |
---|
The order of the column is not important but it should contain the expected number of columns with each name correctly spelled.
The column should be in the expected format (no "factor" column accepted)
*Remarks:*If one column is missing, the submission test will directly stop and return an error message without running all the other tests.
- The ID of the scenarios are corresponding to the expected ID of the expected round without any typo.
The origin_date
is the start date for scenarios
(first date of simulated transmission/outcomes)
- The
origin_date
column contains:- one unique date value in the
YYYY-MM-DD
format (character or date format accepted, datetime will return a warning). - the date in the submission file is matching the date in the name of the file.
- the date in the submission file matching the projection starting date.
- one unique date value in the
-
The
output_type_id
column should only containNA
-
The column
run_grouping
andstochastic_run
must only contain integer value -
The submission file must contain an expected number of repetition (number of samples or trajectories) for each scenario/target/location/horizon/(age_group) group
-
The submission should at least contain a unique sample identifier by
"horizon"
and"age_group
""group". It means that in a submission, each unique sample identifier (calculate by concatenation of therun_grouping
andstochastic_run
columns) should contain at least all the possible horizon values and age_group values once, and optionally can contain the specific and multiple value for the other task id column (origin_date
,scenario
,location
,target
,horizon
, (age_group
, etc.))
If the submission contains "cdf" value only,
-
The
output_type_id
column contains the expected Epiweek values, noted in the EWYYYYWW format. -
The submission should contain a unique sample identifier for each scenario/target/location (age_group) combination.
If the submission contains quantiles value only,
- The submission file should contains quantiles matching the expected quantiles value:
0.010 0.025 0.050 0.100 0.150 0.200 0.250 0.300 0.350 0.400 0.450 0.500
0.550 0.600 0.650 0.700 0.750, 0.800 0.850 0.900 0.950 0.975 0.990
-
Two additional optional quantiles have been added to the list:
0
and1
. These 2 quantiles are not required. -
For each target/scenario/location/age_group group, the value increases with the quantiles. For example, for the 1st week ahead of target X for the location Y and for the scenario A, if quantile
0.01
= "5" than quantile0.5
should be equal or greater than "5".
-
Each type/type_id/target/scenario/location/(age_group, etc.) group combination has one unique value projected. For example: only 1 value for sample
1
, locationUS
, targetinc hosp
, horizon1
, age group0-130
and, scenarioA
) -
The projection contains only values greater than or equal to 0
-
For each target name/scenario/location/age_group group (except locations
66
(Guam),69
(Northern Mariana Island),60
(American Samoa),74
(US. Minor Outlying Islands)), the whole projection does not contain only 1 unique value. For example, the projection for the incidence cases for one location and for one scenario does not contain only one unique value for the whole time series. ** As there is a possibility that 0 death or case might be projected, the submission will still be accepted if the test failed but it will return a warning message asking to verify the projection.**
*Each projected value cannot by greater than the population size of the corresponding geographical entity. As an individual can be reinfected, the submission will still be accepted if the test failed but it will return a warning message asking to verify the projection.
-
The target are corresponding to the target name as expressed in the SMH Github README and wiki files:
"inc hosp"
,"cum hosp"
,"peak time hosp"
,"peak size hosp"
. -
The submission file contains projections for all the required targets. The submission file will be accepted if some targets are missing, but will return a warning message and the submission might not be included in the Ensembles
-
The submission file contains projection for an expected number of week. If the file contains more projected weeks than expected, the submission will still be accepted, but will return a warning message and the additional weeks will not be included in the visualization on the SMH website. If the file contains less projected weeks than expected, the submission might still be accepted, but will return an error message and might not be included in the Ensembles
Round | Minimal number of weeks | Maximal number of weeks |
---|---|---|
1 | 29 | 29 |
-
The submission should contains projection by location, the 'location' column contains the location in the format FIPS number as available in the location table in the SMH GitHub Repository. If the FIPS number are missing a trailing zero, the submission will be accepted but a warning message will be returned.
-
The submission contains only the expected location, here the location contains in the RSV-NET target-data
*Remarks: *If a submission file contains only state level projection (one or
multiple), the location
column might be automatically identify as numeric even
if it was submitted in a character format. In this case, a warning message will
be automatically print on the validation but, please feel free to ignore it.
-
The submission should contain a column
age_group
with values defined as<AGEMIN>-<AGEMAX>
, cannot be equal or greater than . -
For the target requiring only specific age group(s), no additional age group is provided in the submission file. If additional age group are provided, a warning will be returned and the additional information might not be integrated in the analysis and visualization.
Remarks: These tests are only run if the submission contains an
age_group
column. If an age_group value is not in the expected format
(<AGEMIN>-<AGEMAX>
) , some tests are skipped (802, 303, 805).
Each submission will be validated using the validate_submision()
function from the SMHvalidation R package.
The package is currently only available on GitHub, to install it please follow
the next steps:
install.packages("remotes")
remotes::install_github("midas-network/SMHvalidation",
build_vignettes = TRUE,
ref = "main")
or it can be manually installed by directly cloning/forking/downloading the package from GitHub.
To load the package, execute the following command:
library(SMHvalidation)
The package contains a validate_submission()
function allowing the user to check their SMH submissions locally.
To validate round 1, please use the SMHvalidation package version 0.0.22 (last version) to have all the tests.
The version 0.0.21 will also work, but does not include test on column and origin_date
format.
To test a submission file, the function requires multiple parameters:
-
path
: path to the submission file(s). The SMHvalidation package contains multiple examples files that can be used to test the function. Please refer to the package documentation for more information -
js_def
: path to a JSON file containing the round specific and scenario information, following the Consortium of Infectious Disease Modeling Hubs standard -
lst_gs
: This parameter can be set toNULL
is no COVID-19 observed data comparison is required. -
pop_path
: path to a table containing the population size of each geographical entities by FIPS (in a column "location") and by location name. -
merge_sample_col
: Boolean to indicate if for the output type"sample"
, the output_type_id column is set toNA
and the sample identifier information is contained into 2 columns:"run_grouping"
and"stochastic_run"
Run without testing against observed data:
js_def <- "https://raw.githubusercontent.com/midas-network/rsv-scenario-modeling-hub/main/hub-config/tasks.json"
pop_path <- "https://raw.githubusercontent.com/midas-network/rsv-scenario-modeling-hub/main/auxiliary-data/locations.csv"
lst_gs <- NULL
validate_submission("PATH/TO/SUBMISSION", js_def, lst_gs, pop_path, merge_sample_col = TRUE)
The SMHvalidation R package contains plotting functionality to output a plot of each location and target, with all scenarios and only for quantile
output type .
To run this visualization locally:
generate_validation_plots(path_proj = "PATH/TO/SUBMISSION", lst_gs=NULL , save_path=getwd(), y_sqrt = FALSE, plot_quantiles = c(0.025, 0.975))
If you have any questions or any issues, please feel free to contact us: