--- title: "Using vimpact for estimating vaccine impact - internal" author: "Rob Ashton" date: "2022-02-02" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Using vimpact for estimating vaccine impact - internal} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- This vignette describes how to use vimpact to calculate impact as a member of VIMC. This requires a connection to the montagu database so can only be used internally. Note that this is all in development and the interface is likely to change. ## Impact by calendar year & impact by birth year ### Function interface ```r impact <- vimpact::calculate_impact( con, method = "calendar_year", touchstone = "201910gavi-5", modelling_group = "CDA-Razavi", disease = "HepB", focal_scenario_type = "default", focal_vaccine_delivery = list( list(vaccine = "HepB_BD", activity_type = "routine"), list(vaccine = "HepB", activity_type = "routine") ), baseline_scenario_type = "novac", burden_outcomes = c("hepb_deaths_acute", "hepb_deaths_dec_cirrh", "hepb_deaths_hcc")) str(impact) #> tibble [9,898 × 4] (S3: tbl_df/tbl/data.frame) #> $ country : chr [1:9898] "AFG" "AFG" "AFG" "AFG" ... #> $ burden_outcome: chr [1:9898] "deaths" "deaths" "deaths" "deaths" ... #> $ year : int [1:9898] 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 ... #> $ impact : num [1:9898] 0 0 0 0 0 0 0 53 81 97 ... ``` Or for impact by birth year, use `method = "birth_year"`. This will get `burden_estimate_set` ids for the baseline and focal scenarios for this particular touchstone, modelling group, disease. Then uses those to pull the `burden_estimate` data for specified `burden_outcomes`. Optionally filtering on country via `countries` arg, year via `vaccination_years` and under 5 age groups if `is_under5 = TRUE`. We then use raw impact from `burden_estimate` table and call relevant public facing impact method. For `method = "calendar_year"` `impact_by_calendar_year`. For `method = "birth_year"` `impact_by_birth_year`. ### Recipe interface Define a recipe either as a csv or using `recipe_template` ```r recipe <- data.frame( touchstone = "201910gavi-5", modelling_group = "CDA-Razavi", disease = "HepB", focal = "default:HepB_BD-routine;HepB-routine", baseline = "novac", burden_outcome = "hepb_deaths_acute,hepb_deaths_dec_cirrh,hepb_deaths_hcc;hepb_cases_acute_severe,hepb_cases_dec_cirrh,hepb_cases_hcc;dalys") t <- tempfile(fileext = ".csv") write.csv(recipe, t, row.names = FALSE) ``` This is a set of properties defining what data we want to extract from the db e.g. `touchstone`, `modelling_group`, `disease`, focal and baseline scenarios and `burden_outcome`. It captures the same info as the args to `calculate_impact` above. Then use the recipe to define meta data frame ```r meta <- vimpact:::get_meta_from_recipe(default_recipe = FALSE, recipe = t, con = con) ``` And use this to calculate impact ```r old_impact <- vimpact:::get_raw_impact_details(con, meta, "deaths") str(old_impact) #> 'data.frame': 9898 obs. of 7 variables: #> $ country : int 104 104 104 104 104 104 104 104 104 104 ... #> $ time : int 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 ... #> $ baseline_value: num 5636 5765 5901 6034 6172 ... #> $ focal_value : num 5636 5765 5901 6015 6075 ... #> $ value : num 0 0 0 19 97 180 252 312 344 381 ... #> $ index : int 1 1 1 1 1 1 1 1 1 1 ... #> $ burden_outcome: chr "deaths" "deaths" "deaths" "deaths" ... ``` ## Impact by year of vaccination ### Function interface Very similar to examples for calendar year and birth year, to calculate impact by year of vaccination stratified by activity type run ```r impact <- vimpact::calculate_impact( con, method = "yov_activity_type", touchstone = "201910gavi-5", modelling_group = "CDA-Razavi", disease = "HepB", focal_scenario_type = "default", focal_vaccine_delivery = list( list(vaccine = "HepB_BD", activity_type = "routine"), list(vaccine = "HepB", activity_type = "routine") ), baseline_scenario_type = "novac", burden_outcomes = "dalys") str(impact) #> tibble [4,279 × 8] (S3: tbl_df/tbl/data.frame) #> $ country : chr [1:4279] "AFG" "AFG" "AFG" "AFG" ... #> $ vaccine : chr [1:4279] "HepB" "HepB" "HepB" "HepB" ... #> $ activity_type : chr [1:4279] "routine" "routine" "routine" "routine" ... #> $ year : int [1:4279] 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 ... #> $ burden_outcome: chr [1:4279] "dalys" "dalys" "dalys" "dalys" ... #> $ impact : num [1:4279] 103859 106575 105852 111953 115165 ... #> $ impact_ratio : num [1:4279] 0.154 0.154 0.154 0.154 0.154 ... #> $ fvps : num [1:4279] 676396 694088 689380 729111 750028 ... ``` or use `method = "yov_birth_cohort` for impact by year of vaccination stratified by birth cohort. This uses the same logic as other impact methods but before calling the public facing impact function it will also extract FVP data from the data base for this `touchstone` and vaccine delivery. It then calls either `impact_by_year_of_vaccination_activity_type` or `impact_by_year_of_vaccination_birth_cohort` to get impact. ### Recipe interface We can re-use the recipe from above and get meta table for this method ```r meta <- vimpact:::get_meta_from_recipe(default_recipe = FALSE, recipe = t, method = "method2a", con = con) ``` Then use this to get raw impact ```r old_raw_impact <- vimpact:::get_raw_impact_details(con, meta, "dalys") ``` Extract the fvp data and map output to required interface ```r fvps <- vimpact::extract_vaccination_history(con, "201910gavi-5", year_min = 2000, year_max = 2030, disease_to_extract = "HepB") #> User defined touchstone version is used. #> Converting input coverage data...... #> Extracted interpolated population. #> Extracted raw coverage data... #> Transformed coverage data. fvps$fvps <- fvps$fvps_adjusted fvps$country <- fvps$country_nid ``` Then can use `meta`, `raw_impact` and `fvps` to calculate impact by year of vaccination ```r old_impact <- vimpact:::impact_by_year_of_vaccination( meta, old_raw_impact, fvps, vaccination_years = 2000:2030) str(old_impact) #> 'data.frame': 4964 obs. of 27 variables: #> $ country : int 4 4 4 4 4 4 4 4 4 4 ... #> $ vaccine : chr "HepB_BD" "HepB" "HepB_BD" "HepB_BD" ... #> $ activity_type : chr "routine" "routine" "routine" "routine" ... #> $ scenario_description: chr "best-estimates" "best-estimates" "best-estimates" "best-estimates" ... #> $ coverage_set : int 1750 1751 1750 1750 1751 1751 1751 1751 1751 1751 ... #> $ delivery_id : int 487 3093 1169 1175 3376 4029 4272 3375 4564 3096 ... #> $ disease : chr "HepB" "HepB" "HepB" "HepB" ... #> $ scenario_type : chr "default" "default" "default" "default" ... #> $ gavi_support_level : chr "with" "with" "with" "with" ... #> $ year : int 2016 2017 2014 2025 2030 2015 2019 2027 2029 2020 ... #> $ gavi_support : logi FALSE FALSE FALSE FALSE FALSE FALSE ... #> $ gender : chr "Both" "Both" "Both" "Both" ... #> $ age : num 0 0 0 0 0 0 0 0 0 0 ... #> $ target_source : num 1118424 1131834 1103729 1180725 1188620 ... #> $ coverage_source : num 0.18 0.66 0.04 0.473 0.749 ... #> $ cohort_size : num 1118424 1131834 1103729 1180725 1188620 ... #> $ delivery_population : num 1118424 1131834 1103729 1180725 1188620 ... #> $ fvps_source : num 201316 747010 44149 558483 890514 ... #> $ fvps_adjusted : num 201316 747010 44149 558483 890514 ... #> $ coverage_adjusted : num 0.18 0.66 0.04 0.473 0.749 ... #> $ country_nid : int 4 4 4 4 4 4 4 4 4 4 ... #> $ fvps : num 201316 747010 44149 558483 890514 ... #> $ time : num 2016 2017 2014 2025 2030 ... #> $ burden_outcome : chr "dalys" "dalys" "dalys" "dalys" ... #> $ impact_ratio : num 0.154 0.154 0.154 0.154 0.154 ... #> $ impact : num 30912 114701 6779 85754 136736 ... #> $ index : int 1 1 1 1 1 1 1 1 1 1 ... ``` ## Comparison `calculate_impact` has a very similar interface as a single row in an impact recipe. The output is slightly different format but output from previous method can be transformed into same format. ```r country <- dplyr::tbl(con, "country") old_impact <- old_impact %>% dplyr::left_join(country, by = c("country" = "nid"), copy = TRUE) %>% dplyr::select(country = id, vaccine, activity_type, year = time, burden_outcome, impact) %>% dplyr::filter(!is.na(impact)) %>% dplyr::arrange(activity_type, country, year, vaccine) str(old_impact) #> 'data.frame': 4279 obs. of 6 variables: #> $ country : chr "AFG" "AFG" "AFG" "AFG" ... #> $ vaccine : chr "HepB" "HepB" "HepB" "HepB" ... #> $ activity_type : chr "routine" "routine" "routine" "routine" ... #> $ year : num 2007 2008 2009 2010 2011 ... #> $ burden_outcome: chr "dalys" "dalys" "dalys" "dalys" ... #> $ impact : num 103859 106575 105852 111953 115165 ... ``` and we can see from a plot that the output is very similar ### HepB ```{r} htmltools::tags$iframe( src = "impact_comparisons_hepb.html", width = "100%", height = "400", scrolling = "no", seamless = "seamless", frameBorder = "0", `data-external` = "1" ) ``` ### Measles ```{r} htmltools::tags$iframe( src = "impact_comparisons_measles.html", width = "100%", height = "400", scrolling = "no", seamless = "seamless", frameBorder = "0", `data-external` = "1" ) ``` ### Comparisons of HepB, Measles & YF This shows bars of mean difference between old impact & new impact method for each country for HepB, Measles and YF. Tooltips show the number of observations and the min and max order of the values. i.e. an impact of 23432.342 has order 5. ```{r} htmltools::tags$iframe( src = "impact_comparisons.html", width = "100%", height = "400", scrolling = "no", seamless = "seamless", frameBorder = "0", `data-external` = "1" ) ``` There are small differences in impact because of differences in precision. `calculate_impact` tries to do large parts of the aggregation on the database where `burden_estimate` values are stored as Postgres `real` type which has 4 bytes of storage size and 6 decimal digits of precision. Whereas if we pull this before aggregation the values are stored in R as doubles which are much higher precision leading to small differences when aggregated. `calculate_impact` at the moment won't be able to generate impact for you for more than 1 set of scenario comparisons. Next steps will be to add a wrapper which can take some data like the impact recipe and call `calculate_imapct` for multiple scenarios.