--- title: "popim technical documentation" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{popim technical documentation} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", echo = FALSE ) ``` ```{r setup} library(popim) ``` The intention of the package `popim` is to provide tools to easily evaluate and track the POPulation IMmunity of an age-structured population that results from vaccination. Given that vaccination has long lasting effects, this requires tracking the status of the population through time. ## Data structures To this end the package defines two S3 classes to store data in an appropriate format. Both classes are dataframes, but have additional requirements such as certain columns that must be present, and the format and range of these columns. ### S3 class `popim_population` The class `popim_population` is designed to hold information on a population of interest. #### The dataframe The dataframe holds information on the population size and immunity status of the population through time. The population is disaggregated into 1-year age groups, and potentially several spatial regions. Each row refers to a particular (1 year) birth cohort in a given location and year. The mandatory columns are: * `region`: The population of interest may be spatially disaggregated into separate geographical regions (e.g., countries or subnational administrative regions). This column holds a string that identifies which region each entry refers to. * `year`: The population is tracked through time using annual time steps. * `age`: The age-structure of the population is tracked in annual age groups. * `cohort`: This column is redundant as it equals `year` - `age`, and therefore refers to the year in which this cohort was born. It is included for ease of handling. * `immunity`: Gives the proportion of each cohort that is immune due to past vaccination. * `pop_size`: Number of people in any given age group and year. To facilitate using common units such as thousands this is not limited to integer values. It is however up to the user to ensure that such units are used consistently. | column | type | range | |------------|-----------|----------------| | `region` | character | | | `year` | integer | | | `age` | integer | non-negative | | `cohort` | integer | `year` - `age` | | `immunity` | numeric | [0, 1] | | `pop_size` | numeric | non-negative | #### Class attributes In addition to the attributes that every dataframe has (`names`, `class`, `row.names`), the objects of the `popim_population` class also retain the input parameters of the basic constructor `popim_population()`: * `region`: a character vector of all regions in this object, * `year_min` and `year_max`: the range of years, and * `age_min` and `age_max`: the age range covered in this population. #### Create, modify and validate objects of this class create: `popim_population()`, `read_popim_pop()`, `as_popim_pop()` modify: `apply_vacc()` validate: `is_population()` *(currently not exported)* ### S3 class `popim_vacc_activities` The class `popim_vacc_activities` is designed to hold information on vaccination activities that have occurred or are planned in the population of interest. Each row of the dataframe relates to one vaccination activity. #### The dataframe The mandatory columns are: * `region`: The region in which the vaccination activity takes place. While this is a free character field, when applied to a population, this must match a region that occurs in the population. * `year`: year during which the vaccination activity takes place. * `age_first`: youngest age group to be targeted. * `age_last`: oldest age group to be targeted. * `coverage`: proportion of the target population to be immunised. * `doses`: number of doses used in the vaccination activity. * `targeting`: determines how doses are allocated when there is pre-existing immunity in the population, see [below](#vaccinating-a-cohort-with-previous-immunity) for a description of the different `targeting` methods. | column | type | range | |-------------|-----------|-------------------| | `region` | character | | | `year` | integer | | | `age_first` | integer | non-negative | | `age_last` | integer | $\ge$ `age_first` | | `coverage` | numeric | [0, 1] | | `doses` | numeric | non-negative | | `targeting` | character | `"random"`, `"correlated"`, `"targeted"` | ##### Relationship between coverage, doses and target population If the target population size is known, the information on coverage and doses is redundant (and potentially conflicting) as coverage = doses / target population size. However, the `popim_vacc_activities` object does not store information on population size, this is held in the `popim_population` object to which the vaccination activities will be applied, so this potential conflict cannot be detected in a `popim_vacc_activities` object in isolation. The validator for this class, `validate_vacc_activities()`, checks that the required columns exist and are of correct data type and range (where applicatble). It also currently requires that at least one of `coverage` and `doses` is non-missing in each column. *(This behaviour is maybe not sensible as inconsistent with the other columns?)* #### Class attributes As this is a subclass of dataframe, it has the same attributes as a dataframe (`names`, `class` and `row_names`). It has no additional attributes. #### Create and validate objects of this class create: `popim_vacc_activities()`, `read_vacc_activities()`, `as_vacc_activities()`, `vacc_from_immunity()` modify: `complete_vacc_activities()` validate: `validate_vacc_activities()` *(currently not exported)* ## Primary functionality: applying vaccination activities to a population ```{r, results = "asis"} child_env <- new.env() child_env$type <- "vignette" res <- knitr::knit_child("../man/rmd/apply.Rmd", envir = child_env, quiet = TRUE) cat(res, sep = "\n") ``` ### The inverse: inferring vaccination activities from a population It is not possible to infer the vaccination activities that have given rise to a particular immunity profile of a population by age and over time: The same profile can be reached with different `targeting` choices (where different `targeting` choices will require different amount of vaccine). Furthermore, if there are cohorts that reached 100\% immunity, there may have been double vaccination of some individuals that left no trace in the immunity profile. However, the function `vacc_from_immunity()` does infer the vaccination activities from a population immunity profile as far as possible, returning an obejct of class `popim_vacc_activities`. To this end, in addition to passing the `popim_population` object of interest, the user needs to specify a `targeting` option (which will be assumed for all vaccination activities that are identified). The inferred vaccination activities will be the minimal activities possible to give rise to the observed immunity profile given the `targeting` option supplied. ## Population summary & visualisation An object of class `popim_population` holds a somewhat complex dataset recording population size, age distribution and immunity status through time, and potentially disaggregated into various geographical regions. To facilitate interpretation of this kind of data, there are a couple of functions to aggregate and visualise these objects. ### Visualisation The functions `plot_pop_size()` and `plot_immunity()` serve to plot the age-disaggregated population size and immunity through time in a grid where year and age are plotted on the x- and y-axis, respectively. The size of the cohort or proportion of the cohort that is immune is indicated by the colour of the grid cell. If there are several regions, these will be shown in separate panels. The implementation of these plot functions is based on ggplot2, and therefore the returned plot objects can be further modified using the ggplot2 syntax. ### Aggregation over age While the age structure of a population's immunity profile is important to understand how immunity is likely to develop through time, for the purposes of understanding how well a population is protected at any point in time, often the overall proportion immune is of greater interest. This can be calculated using the function `calc_pop_immunity()` which will aggregate the population over age and return a dataframe with the whole population immunity over time and geographical region.