Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error flagged with as_epidist for epireview object #306

Closed
3 of 4 tasks
adamkucharski opened this issue May 16, 2024 · 5 comments
Closed
3 of 4 tasks

Error flagged with as_epidist for epireview object #306

adamkucharski opened this issue May 16, 2024 · 5 comments
Assignees
Milestone

Comments

@adamkucharski
Copy link
Member

Please place an "x" in all the boxes that apply

  • I have the most recent version of this package and R
  • I have found a bug
  • I have a reproducible example
  • I want to request a new feature

The below call to as_epidist with a Lassa gamma distributed parameter is returning the following error:
Error in is_epidist_params(prob_dist, prob_dist_params) : Assertion on 'prob_dist_params' failed: Must have unique names, but element 3 is duplicated.

lassa_data <- epireview::load_epidata("lassa")
lassa_params <- lassa_data$params
pick_gamma <- lassa_params[261,]
param1 <- as_epidist(pick_gamma)

@adamkucharski
Copy link
Member Author

Also found an error with Ebola parameter with exponential distribution:

# Load Ebola data
data_in <- load_epidata(pathogen = "ebola")

# Extract parameters for onset to death, with qualtiy score > 50
params_in <- data_in[["params"]] 
params_infectious <- params_in |> dplyr::filter(parameter_type=="Human delay - infectious period") 

# Select distribution
params_inf_with_dist <- params_infectious |> dplyr::filter(!is.na(distribution_type))

# Start by extracting only those with Exponential
as_epidist(params_inf_with_dist)

@jfunction
Copy link

Looks like when

epidist <- epireview_to_epidist(x, ...)
gets called, eventually new_epidist gets called and
mean = summary_stats$mean
makes a new vector with 3 elements. Ie, a vector is created where mean is represented twice.

@joshwlambert
Copy link
Member

Thanks for logging this issue. I've implemented a fix to these issues in #334.

Here are reproducible examples of the same code as above using the updated {epiparameter} functions.

Lassa example

library(epireview)
#> Loading required package: epitrix
#> Loading required package: ggplot2
#> Loading required package: ggforce
library(epiparameter)
lassa_data <- epireview::load_epidata("lassa")
#> Warning: One or more parsing issues, call `problems()` on your data frame for details,
#> e.g.:
#>   dat <- vroom(...)
#>   problems(dat)
#> One or more parsing issues, call `problems()` on your data frame for details,
#> e.g.:
#>   dat <- vroom(...)
#>   problems(dat)
#> One or more parsing issues, call `problems()` on your data frame for details,
#> e.g.:
#>   dat <- vroom(...)
#>   problems(dat)
#> Data loaded for lassa
lassa_params <- lassa_data$params
pick_gamma <- lassa_params[261,]
param1 <- as_epidist(pick_gamma)
#> Using Akhmetzhanov (2019). "<title not available>." _<journal not
#> available>_. 
#> To retrieve the citation use the 'get_citation' function
#> Warning: Cannot create full citation for epidemiological parameters without bibliographic information 
#>  see ?as_epidist for help.
param1
#> Disease: Lassa fever
#> Pathogen: Lassa mammarenavirus
#> Epi Distribution: human delay   incubation period
#> Study: Akhmetzhanov (2019). "<title not available>." _<journal not
#> available>_.
#> Distribution: gamma
#> Parameters:
#>   shape: 7.743
#>   scale: 1.653
unclass(param1)
#> $disease
#> [1] "Lassa fever"
#> 
#> $pathogen
#> [1] "Lassa mammarenavirus"
#> 
#> $epi_dist
#> [1] "Human delay - incubation period"
#> 
#> $prob_dist
#> <distribution[1]>
#> [1] Γ(7.7, 0.6)
#> 
#> $uncertainty
#> $uncertainty$shape
#> $uncertainty$shape$ci_limits
#> [1] NA
#> 
#> $uncertainty$shape$ci
#> [1] NA NA
#> 
#> $uncertainty$shape$ci_type
#> [1] NA
#> 
#> 
#> $uncertainty$scale
#> $uncertainty$scale$ci_limits
#> [1] NA
#> 
#> $uncertainty$scale$ci
#> [1] NA NA
#> 
#> $uncertainty$scale$ci_type
#> [1] NA
#> 
#> 
#> 
#> $summary_stats
#> $summary_stats$mean
#> [1] 12.8
#> 
#> $summary_stats$mean_ci_limits
#> [1] NA NA
#> 
#> $summary_stats$mean_ci
#> [1] NA
#> 
#> $summary_stats$sd
#> [1] 4.6
#> 
#> $summary_stats$sd_ci_limits
#> [1] NA NA
#> 
#> $summary_stats$sd_ci
#> [1] NA
#> 
#> $summary_stats$median
#> [1] NA
#> 
#> $summary_stats$median_ci_limits
#> [1] NA NA
#> 
#> $summary_stats$median_ci
#> [1] NA
#> 
#> $summary_stats$quantiles
#> [1] NA
#> 
#> $summary_stats$range
#> [1] NA NA
#> 
#> 
#> $citation
#> Akhmetzhanov (2019). "<title not available>." _<journal not
#> available>_.
#> 
#> $metadata
#> $metadata$sample_size
#> [1] NA
#> 
#> $metadata$region
#> [1] "Nigeria"
#> 
#> $metadata$transmission_mode
#> [1] NA
#> 
#> $metadata$vector
#> [1] NA
#> 
#> $metadata$extrinsic
#> [1] FALSE
#> 
#> $metadata$inference_method
#> [1] NA
#> 
#> 
#> $method_assess
#> $method_assess$censored
#> [1] NA
#> 
#> $method_assess$right_truncated
#> [1] NA
#> 
#> $method_assess$phase_bias_adjusted
#> [1] NA
#> 
#> 
#> $notes
#> [1] "No additional notes"
#> 
#> attr(,".epiparameter_namespace")
#> function () 
#> NULL
#> <bytecode: 0x114b79658>
#> <environment: namespace:epiparameter>
as.data.frame(lassa_params[261, ])
#>                                 id                parameter_data_id
#> 1 c6cead4f4e9802343ccbb00449f22471 e93361e40a2f1a00c337a420be54deb2
#>   covidence_id             pathogen                  parameter_type
#> 1         2617 Lassa mammarenavirus Human delay - incubation period
#>   parameter_value exponent parameter_unit parameter_lower_bound
#> 1            12.8        0           Days                    NA
#>   parameter_upper_bound parameter_value_type parameter_uncertainty_single_value
#> 1                    NA                 Mean                                 NA
#>   parameter_uncertainty_singe_type parameter_uncertainty_lower_value
#> 1                             <NA>                                NA
#>   parameter_uncertainty_upper_value parameter_uncertainty_type
#> 1                                NA                       <NA>
#>   cfr_ifr_numerator cfr_ifr_denominator distribution_type
#> 1                NA                  NA             Gamma
#>   distribution_par1_value distribution_par1_type distribution_par1_uncertainty
#> 1                    12.8                   Mean                          TRUE
#>   distribution_par2_value distribution_par2_type distribution_par2_uncertainty
#> 1                     4.6     Standard deviation                          TRUE
#>   method_from_supplement method_moment_value cfr_ifr_method method_r
#> 1                  FALSE             Endemic           <NA>     <NA>
#>   method_disaggregated_by method_disaggregated method_disaggregated_only
#> 1                    <NA>                FALSE                     FALSE
#>   riskfactor_outcome riskfactor_name riskfactor_occupation
#> 1               <NA>            <NA>                  <NA>
#>   riskfactor_significant riskfactor_adjusted population_sex
#> 1                   <NA>                <NA>    Unspecified
#>   population_sample_type            population_group population_age_min
#> 1         Hospital based Persons under investigation                 NA
#>   population_age_max population_sample_size population_country
#> 1                 NA                     NA            Nigeria
#>   population_location population_study_start_day population_study_start_month
#> 1                <NA>                         NA                         <NA>
#>   population_study_start_year population_study_end_day
#> 1                        2016                       NA
#>   population_study_end_month population_study_end_year genome_site
#> 1                       <NA>                      2018        <NA>
#>   genomic_sequence_available other_delay_start other_delay_end inverse_param
#> 1                      FALSE              <NA>            <NA>         FALSE
#>   parameter_from_figure r_pathway parameter_class parameter_type_short
#> 1                 FALSE      <NA>     Human delay                 <NA>
#>   first_author_surname year_publication     article_label
#> 1         Akhmetzhanov             2019 Akhmetzhanov 2019

Created on 2024-06-14 with reprex v2.1.0

Ebola example

library(epireview)
#> Loading required package: epitrix
#> Loading required package: ggplot2
#> Loading required package: ggforce
library(epiparameter)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
# Load Ebola data
data_in <- load_epidata(pathogen = "ebola")
#> Warning: One or more parsing issues, call `problems()` on your data frame for details,
#> e.g.:
#>   dat <- vroom(...)
#>   problems(dat)
#> Warning in load_epidata_raw(pathogen, "outbreak"): No data found for ebola
#> Warning: One or more parsing issues, call `problems()` on your data frame for details,
#> e.g.:
#>   dat <- vroom(...)
#>   problems(dat)
#> Warning in load_epidata(pathogen = "ebola"): No outbreaks information found for
#> ebola
#> Data loaded for ebola
# Extract parameters for onset to death, with qualtiy score > 50
params_in <- data_in[["params"]] 
params_infectious <- params_in |> dplyr::filter(parameter_type=="Human delay - infectious period") 

# Select distribution
params_inf_with_dist <- params_infectious |> dplyr::filter(!is.na(distribution_type))

# Start by extracting only those with Exponential
infectious <- as_epidist(params_inf_with_dist)
#> Using Lau (2017). "<title not available>." _<journal not available>_. 
#> To retrieve the citation use the 'get_citation' function
#> Warning: Cannot create full citation for epidemiological parameters without bibliographic information 
#>  see ?as_epidist for help.
unclass(infectious)
#> $disease
#> [1] "Ebola Virus Disease"
#> 
#> $pathogen
#> [1] "Ebola virus"
#> 
#> $epi_dist
#> [1] "Human delay - infectious period"
#> 
#> $prob_dist
#> <distribution[1]>
#> [1] Exp(0.25)
#> 
#> $uncertainty
#> $uncertainty$mean
#> $uncertainty$mean$ci_limits
#> [1] NA
#> 
#> $uncertainty$mean$ci
#> [1] NA NA
#> 
#> $uncertainty$mean$ci_type
#> [1] NA
#> 
#> 
#> 
#> $summary_stats
#> $summary_stats$mean
#> [1] 4.05
#> 
#> $summary_stats$mean_ci_limits
#> [1] 3.53 4.67
#> 
#> $summary_stats$mean_ci
#> [1] 95
#> 
#> $summary_stats$sd
#> [1] NA
#> 
#> $summary_stats$sd_ci_limits
#> [1] NA NA
#> 
#> $summary_stats$sd_ci
#> [1] NA
#> 
#> $summary_stats$median
#> [1] NA
#> 
#> $summary_stats$median_ci_limits
#> [1] NA NA
#> 
#> $summary_stats$median_ci
#> [1] NA
#> 
#> $summary_stats$quantiles
#> [1] NA
#> 
#> $summary_stats$range
#> [1] NA NA
#> 
#> 
#> $citation
#> Lau (2017). "<title not available>." _<journal not available>_.
#> 
#> $metadata
#> $metadata$sample_size
#> [1] 200
#> 
#> $metadata$region
#> [1] "Western Area, Sierra Leone"
#> 
#> $metadata$transmission_mode
#> [1] NA
#> 
#> $metadata$vector
#> [1] NA
#> 
#> $metadata$extrinsic
#> [1] FALSE
#> 
#> $metadata$inference_method
#> [1] "Maximum likelihood"
#> 
#> 
#> $method_assess
#> $method_assess$censored
#> [1] NA
#> 
#> $method_assess$right_truncated
#> [1] NA
#> 
#> $method_assess$phase_bias_adjusted
#> [1] NA
#> 
#> 
#> $notes
#> [1] "No additional notes"
#> 
#> attr(,".epiparameter_namespace")
#> function () 
#> NULL
#> <bytecode: 0x1211b36d0>
#> <environment: namespace:epiparameter>
as.data.frame(params_inf_with_dist)
#>                                 id                parameter_data_id
#> 1 1fb060c117ff954fa69e77429a0f9c7e a92dfc5da1285b856a016e71ce2bfbad
#>   covidence_id    pathogen                  parameter_type parameter_value
#> 1         3470 Ebola virus Human delay - infectious period            4.05
#>   exponent parameter_unit parameter_lower_bound parameter_upper_bound
#> 1        0           Days                    NA                    NA
#>   parameter_value_type parameter_uncertainty_single_value
#> 1                 Mean                                 NA
#>   parameter_uncertainty_singe_type parameter_uncertainty_lower_value
#> 1                             <NA>                              3.53
#>   parameter_uncertainty_upper_value parameter_uncertainty_type
#> 1                              4.67                     95% CI
#>   cfr_ifr_numerator cfr_ifr_denominator distribution_type
#> 1                NA                  NA       Exponential
#>   distribution_par1_value distribution_par1_type distribution_par1_uncertainty
#> 1                    4.05                   Mean                         FALSE
#>   distribution_par2_value distribution_par2_type distribution_par2_uncertainty
#> 1                      NA                   <NA>                         FALSE
#>   method_from_supplement method_moment_value cfr_ifr_method method_r
#> 1                   TRUE                <NA>           <NA>     <NA>
#>   method_disaggregated_by method_disaggregated method_disaggregated_only
#> 1                    <NA>                FALSE                     FALSE
#>   riskfactor_outcome riskfactor_name riskfactor_occupation
#> 1               <NA>            <NA>                  <NA>
#>   riskfactor_significant riskfactor_adjusted population_sex
#> 1                   <NA>                <NA>           Both
#>   population_sample_type   population_group population_age_min
#> 1       Population based General population                 NA
#>   population_age_max population_sample_size population_country
#> 1                 NA                    200       Sierra Leone
#>   population_location population_study_start_day population_study_start_month
#> 1        Western Area                         20                          Oct
#>   population_study_start_year population_study_end_day
#> 1                        2014                       30
#>   population_study_end_month population_study_end_year genome_site
#> 1                        Mar                      2015        <NA>
#>   genomic_sequence_available other_delay_start other_delay_end inverse_param
#> 1                      FALSE              <NA>            <NA>         FALSE
#>   parameter_from_figure parameter_class ebola_variant other_delay
#> 1                 FALSE     Human delay   Unspecified        <NA>
#>         delay_short       delay_start other_rf_outcome attack_rate_type
#> 1 Infectious period Infection process             <NA>             <NA>
#>   survey_start_date survey_end_date               survey_date parameter_bounds
#> 1       20 Oct 2014     30 Mar 2015 20 Oct 2014 - 30 Mar 2015             <NA>
#>   comb_uncertainty_type comb_uncertainty article_qa_score              outbreak
#> 1                95% CI      3.53 - 4.67         85.71429 West Africa 2013-2016
#>   ebola_species parameter_type_short first_author_surname year_publication
#> 1         Zaire                 <NA>                  Lau             2017
#>   article_label
#> 1  Lau 2017 (a)

Created on 2024-06-14 with reprex v2.1.0

@joshwlambert
Copy link
Member

@jfunction thanks for mentioning the replicated mean, it helped me to debug and fix the issue.

@joshwlambert
Copy link
Member

PR #334 is now merged, closing this issue. If you have any other issues using as_epidist() or other functions in the package please feel free to open a new issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Status: Done
Development

No branches or pull requests

3 participants