Package 'threemc'

Title: (Matt's) Multi-Level Model of Male Circumcision in Sub-Saharan Africa
Description: Functions and datasets to support, and extend to other Sub-Saharan African countries, Thomas, M. et. al., 2021, A multi-level model for estimating region-age-time-type specific male circumcision coverage from household survey and health system data in South Africa, <arXiv:2108.091422>.
Authors: Matthew Thomas [aut] , Jeffrey Imai-Eaton [aut] , Patrick O'Toole [cre] , Imperial College of Science, Technology and Medicine [cph]
Maintainer: Patrick O'Toole <[email protected]>
License: MIT + file LICENSE
Version: 0.1.45
Built: 2024-12-10 03:58:36 UTC
Source: https://github.com/mrc-ide/threemc

Help Index


Calculate Quantiles for Rates and Cumulative Hazard

Description

Calculate quantiles for samples of rates and cumulative hazard outputted from threemc_fit_model, and add them as columns to the shell data.frame out with estimated empirical circumcision rates.

Usage

compute_quantiles(
  out,
  fit,
  area_lev = NULL,
  probs = c(0.025, 0.5, 0.975),
  names = FALSE,
  ...
)

Arguments

out

Shell dataset with a row for every unique record in circumcision survey data for a given area. Also includes empirical estimates for circumcision estimates for each unique record.

fit

Optional "small" fit object with no sample. Specifying fit means you do not need to specify dat_tmb or parameters, as argument specifications will be overridden by those stored in fit.

area_lev

PSNU area level for specific country.

probs

Specific quantiles to be calculated, Default: c(0.025, 0.5, 0.975)

names

Parameter with quantile: logical; if true, the result has a names attribute. Set to FALSE for speedup with many probs, Default: FALSE

...

Further arguments passed to quantile.

Value

Input out data.frame, including columns with quantiles for hazard rates etc for different circumcision types, and for overall circumcision.

See Also

threemc_fit_model quantile @importFrom dplyr %>% @importFrom rlang .data


Recursively Create Missing Directories

Description

Function to recursively create directories if any of the directories in a provided path are missing. Similar to mkdir -p from Bash.

Usage

create_dirs_r(dir_path)

Arguments

dir_path

Path to a file or directory which you want to generate.


Create Shell Dataset for Estimating Empirical Circumcision Rate

Description

Create a shell dataset with a row for every unique area ID, area name, year and circumcision age in survey data. Also, computes the empirical number of person years until circumcision and number of people circumcised for several "types" of circumcision; known medical circumcisions, known traditional circumcisions, censored survey entries (i.e. where surveyed individuals had not been circumcised) and left-censored survey entries (i.e. where circumcision occurred at an unknown age).

Usage

create_shell_dataset(
  survey_circumcision,
  populations,
  areas,
  area_lev = NULL,
  start_year,
  end_year = 2021,
  time1 = "time1",
  time2 = "time2",
  strat = "space",
  age = "age",
  circ = "indweight_st",
  ...
)

Arguments

survey_circumcision
  • Information on male circumcision status from surveys. If this is a list or contains more than one country, the function is performed for each country present, returning a list.

populations

data.frame containing populations for each region in tmb fits.

areas

sf shapefiles for specific country/region.

area_lev
  • Desired admin boundary level to perform the analysis on.

start_year

First year in shell dataset.

end_year

Last year in shell dataset, which is also the year to forecast/model until, Default: 2021

time1

Variable name for time of birth, Default: "time1"

time2

Variable name for time circumcised or censored, Default: "time2"

strat

Variable to stratify by in using a 3D hazard function, Default: NULL

age
  • Variable with age circumcised or censored. Default: "age"

circ

Variables with circumcision matrix, Default: "indweight_st"

...

Further arguments passed to or from other methods.

Value

data.frame with a row for every unique record in survey_circumcision for a given area. Also includes empirical estimates for circumcision estimates for each unique record.

See Also

datapack_psnu_area_level crossing create_integration_matrix_agetime create_hazard_matrix_agetime


PSNU Area Levels for SSA

Description

PSNU area levels for Sub-Saharan African countries. These are the recommended levels at which to perform modelling etc., for each respective country. Inferences on larger regions (i.e. lower PSNU area levels) can be made by aggregating results for higher area levels. The dataset contains the following fields:

  • iso3character ISO3 codes for Sub-Saharan African countries.

  • psnu_area_levelinteger The sub national level considered to be the organizational level in which a country has prioritised their program. Increasing values refer to more granular regional distinctions.

Usage

data(datapack_psnu_area_level)

Format

A data.frame with 29 rows and 2 variables:


Malawi shapefiles

Description

sf shapefile representation of Malawi, as a multipolygon.

  • iso3character ISO3 codes for Sub-Saharan African countries.

  • area_idUnique ID for each region in MWI. Formatted as "County_area_level_ID" (e.g. MWI_3_05 for Mzimba)

  • area_nameName of region in question

  • parent_area_id Unique ID for region's parent region.

  • area_level Numeric value denoting area level of area, in decreasing granularity.

  • area_level_label Translates numeric area level to meaning in country in question. For example, in Malawi a region of area level 3 is a "District".

  • area_sort_order Order to sort areas in when plotting, roughly equivalent to a geofacet grid.

  • center_x X coordinate for centre of region's multipolygon

  • center_y Y coordinate for centre of region's multipolygon

  • geometry sfc_MULTIPOLYGON representation of region's spatial geometry

Usage

data(demo_areas)

Format

A sf collecton of 6 features and 9 fields, including a data.frame with 387 rows, 10 variables, and a sf


Malawi populations

Description

Single age, aggregated male populations for each area in Malawi.

  • iso3character ISO3 codes for Sub-Saharan African countries.

  • area_idUnique ID for each region in MWI. Formatted as "County_area_level_ID" (e.g. MWI_3_05 for Mzimba)

  • area_level Numeric value denoting area level of area, in decreasing granularity.

  • area_nameName of region in question

  • yearYear for population in question

  • ageAge for population in question

  • population (Male) Population for each unique area-year-age combination.

Usage

data(demo_populations)

Format

A data.frame with 58806 rows and 7 variables.


Malawi surveys

Description

Circumcision surveys for Malawi.

  • iso3character ISO3 codes for Sub-Saharan African countries.

  • survey_idSurvey id for each record.

  • area_idUnique ID for each region in MWI. Formatted as "County_area_level_ID" (e.g. MWI_3_05 for Mzimba)

  • area_level Numeric value denoting area level of area, in decreasing granularity.

  • ageAge at interview.

  • dob_cmcCMC (Century Month Code) date of birth of individual.)

  • interview_cmcCMC date of interview.

  • indweightWeighting for survey record in question

  • circ_statusCircumcision status of individual, 1 indicating circumcision and 0 indicating right-censoring.

  • circ_ageAge at circumcision, if applicable.

  • circ_whoCircumcision provider, either medical or traditional.

  • circ_whereCircumcision location, either medical or traditional.

Usage

data(demo_survey_circumcision)

Format

A data.frame with 29313 rows and 12 variables.


WCA - ESA key for Sub-Saharan African countries

Description

Western and Central Africa (WCA) - Eastern and Southern Africa (ESA) categorisation for Sub-Saharan African countries. Also includes North-South-East-West categorisation.

  • iso3character ISO3 codes for Sub-Saharan African countries.

  • regioncharacter ESA-WCA categorisation for each iso3

  • four_regioncharacter North-South-East-West categorisation for each iso3

Usage

data(esa_wca_regions)

Format

A data.frame with 38 rows and 3 variables.


Minimise Fit Object Size

Description

Return minimised fit object. Often useful when saving the fit object for later aggregation.

Usage

minimise_fit_obj(fit, dat_tmb, parameters)

Arguments

fit

Optional "small" fit object with no sample. Specifying fit means you do not need to specify dat_tmb or parameters, as argument specifications will be overridden by those stored in fit.

dat_tmb

list of data required for model fitting, outputted by threemc_prepare_model_data, which includes:

  • design_matricesIncludesX_fixed_mmc, X_fixed_tmc, X_time_mmc, X_age_mmc, X_age_tmc, X_space_mmc, X_space_tmc, X_agetime_mmc, X_agespace_mmc, X_agespace_tmc, X_spacetime_mmc. Design Create design matrices for fixed effects and temporal, age, space and interaction random effects

  • integration matricesIncludes IntMat1, IntMat2. Integration matrices for selecting the instantaneous hazard rate.

  • survival matricesIncludes A_mmc, A_tmc, A_mc, B, C. Survival matrices for MMC, TMC, censored and left censored

  • Q_spacePrecision/Adjacency matrix for the spatial random effects.

parameters

list of fixed and random model parameters.

Value

Object of class "naomi_fit".


Prepare Survey Data

Description

Prepare survey data required to run the circumcision model. Can also optionally apply normalise_weights_kish, to normalise survey weights and apply Kish coefficients.

Usage

prepare_survey_data(
  areas,
  survey_circumcision,
  survey_individuals = NULL,
  survey_clusters = NULL,
  area_lev,
  start_year = 2006,
  cens_year = NULL,
  cens_age = 59,
  rm_missing_type = FALSE,
  norm_kisk_weights = TRUE,
  strata.norm = c("survey_id", "area_id"),
  strata.kish = c("survey_id")
)

Arguments

areas

sf shapefiles for specific country/region.

survey_circumcision
  • Information on male circumcision status from surveys. If this is a list or contains more than one country, the function is performed for each country present, returning a list.

survey_individuals
  • Information on the individuals surveyed.

survey_clusters
  • Information on the survey clusters.

area_lev
  • Desired admin boundary level to perform the analysis on.

start_year
  • Year to begin the analysis on, Default: 2006

cens_year
  • Year to censor the circumcision data by (Sometimes some weirdness at the final survey year, e.g. v small number of MCs), Default: NULL

cens_age
  • Age to censor the circumcision data at, Default: 59

rm_missing_type
  • Indicator to decide whether you would like to keep surveys where there is no MMC/TMC disinction. These surveys may still be useful for determining MC levels, Default: FALSE

norm_kisk_weights
  • Indicator to decide whether to normalise survey weights and apply Kish coefficients, Default: TRUE

strata.norm

Stratification variables for normalising survey weights, Default: c("survey_id", "area_id")

strata.kish

Stratification variables for estimating and applying the Kish coefficients, Default: "survey_id"

Value

Survey data with required variables to run circumcision model.

See Also

normalise_weights_kish


Function to read in Circumcision Data

Description

Function to read in circumcision data to fit model. Handles csv with fread (but outputs data as a data.frame), and geographical data with coderead_sf (for which it also adds unique identifiers for each area_level).

Usage

read_circ_data(path, filters = NULL, selected = NULL, ...)

Arguments

path

Path to data.

filters

Optional named vector, whose values dictate the values filtered for in the corresponding column names. Only supports filtering for one value for each column. default: NULL

selected

Optional columns to select, removing others, default = NULL

...

Further arguments passed to or from other methods.

Value

relevant data set, filtered as desired.

See Also

fread read_sf


Create data frame of all ages within provided age group. Convert survey coverage points & dmppt2 data to match convention of aggregated results.

Description

Create data frame of all ages within provided age group. Convert survey coverage points & dmppt2 data to match convention of aggregated results.

Usage

survey_points_dmppt2_convert_convention(.data)

Arguments

.data

Data frame with either survey calculated coverage, with associated error bounds, or DMPPT2 coverage estimates calculated from VMMC programme data.


Produce Population Weighted Aggregated Samples for All Area Levels

Description

Aggregate by area, year, age and type (weighted by population), and convert to a percentage/probability.

Usage

threemc_aggregate(
  .data,
  fit,
  areas,
  populations,
  age_var = c("age", "age_group"),
  type = c("probability", "incidence", "prevalence"),
  area_lev,
  N = 100,
  prev_year = 2008,
  probs = c(0.025, 0.5, 0.975),
  ...
)

Arguments

.data

data.frame of unaggregated modelling results.

fit

TMB list containing model parameters, nested list of samples for the (cumulative) incidence and hazard rate of circumcision for the region(s) in question.

areas

sf shapefiles for specific country/region.

populations

data.frame containing populations for each region in tmb fits.

age_var

Determines whether you wish to aggregate by discrete ages or age groups (0-4, 5-9, 10-14, and so on).

type

Determines which aspect of MC in the regions in question you wish to aggregate for. Can be one of "probability", "incidence" or "prevalence".

area_lev

PSNU area level for specific country.

N

Number of samples to be generated, Default: 100

prev_year

If type == "prevalence", choose year to compare prevalence with.

probs

Percentiles to provide quantiles at. Set to NULL to skip computing quantiles.

...

Further arguments to internal functions.

Value

data.frame with samples aggregated by aggr_cols and weighted by population.


Use model shell dataset to estimate empirical circumcision rates

Description

Takes the shell dataset with a row for every unique area ID, area name, year and circumcision age in survey data outputed by create_shell_dataset and returns the empirical circumcision rates for each row, aggregated to age groups from single ages. Also converts from wide format to long format.

Usage

threemc_empirical_rates(
  out,
  areas,
  area_lev,
  populations,
  age_groups = c("0-4", "5-9", "10-14", "15-19", "20-24", "25-29", "30-34", "35-39",
    "40-44", "45-49", "50-54", "54-59", "0+", "10+", "15+", "15-24", "10-24", "15-29",
    "10-29", "15-39", "10-39", "15-49", "10-49")
)

Arguments

out

Shell dataset outputted by create_shell_dataset

areas

sf shapefiles for specific country/region.

area_lev
  • Desired admin boundary level to perform the analysis on.

populations

data.frame containing populations for each region in tmb fits.

age_groups

Age groups to aggregate by, Default: c("0-4", "5-9", "10-14", "15-19", "20-24", "25-29", "30-34", "35-39", "40-44", "45-49", "50-54", "54-59", "15-24", "10-24", "15-29", "10-29", "15-39", "10-39", "15-49", "10-49" )

See Also

create_shell_dataset


Produce TMB model fit with sample, or re-sample from existing optimised model fit.

Description

Optimises threemc objective function and produces samples from model fit (if so desired). If provided with an existing optimised model fit, can also perform re-sampling.

Usage

threemc_fit_model(
  fit = NULL,
  dat_tmb = NULL,
  mod = NULL,
  parameters = NULL,
  maps = NULL,
  randoms = c("u_time_mmc", "u_age_mmc", "u_space_mmc", "u_agetime_mmc",
    "u_agespace_mmc", "u_spacetime_mmc", "u_age_tmc", "u_space_tmc", "u_agespace_tmc"),
  sample = TRUE,
  smaller_fit_obj = FALSE,
  sdreport = FALSE,
  N = 1000,
  verbose = TRUE,
  ...
)

Arguments

fit

Optional "small" fit object with no sample. Specifying fit means you do not need to specify dat_tmb or parameters, as argument specifications will be overridden by those stored in fit.

dat_tmb

list of data required for model fitting, outputted by threemc_prepare_model_data, which includes:

  • design_matricesIncludesX_fixed_mmc, X_fixed_tmc, X_time_mmc, X_age_mmc, X_age_tmc, X_space_mmc, X_space_tmc, X_agetime_mmc, X_agespace_mmc, X_agespace_tmc, X_spacetime_mmc. Design Create design matrices for fixed effects and temporal, age, space and interaction random effects

  • integration matricesIncludes IntMat1, IntMat2. Integration matrices for selecting the instantaneous hazard rate.

  • survival matricesIncludes A_mmc, A_tmc, A_mc, B, C. Survival matrices for MMC, TMC, censored and left censored

  • Q_spacePrecision/Adjacency matrix for the spatial random effects.

mod

TMB model, one of either "Surv_SpaceAgeTime_ByType_withUnknownType" or "Surv_SpaceAgeTime" if the surveys for the country in question make no distinction between circumcision type (i.e whether they were performed in a medical or traditional setting).

parameters

list of fixed and random model parameters.

maps

list of factors with value NA, the names of which indicate parameters to be kept fixed at their initial value throughout the optimisation process.

randoms

vector of random effects.

sample

If set to TRUE, has function also return N samples for medical, traditional and total circumcisions, Default: TRUE

smaller_fit_obj

Returns a smaller fit object. Useful for saving the fit object for later aggregations.

sdreport

If set to TRUE, produces the standard deviation report for the model, Default: FALSE

N

Number of samples to be generated, Default: 1000

verbose

Boolean specifying whether you want detailed updates on function operations and progress, default = TRUE

...

Further arguments passed to internal functions.

Value

TMB model fit, including optimised parameters, hessian matrix, samples and standard deviation report (if desired).


Initialise thremec (hyper)parameters.

Description

Return minimised fit object. Often useful when saving the fit object for later aggregation.

Usage

threemc_initial_pars(
  dat_tmb,
  custom_init = NULL,
  rw_order = NULL,
  rw_order_tmc_ar = FALSE,
  paed_age_cutoff = NULL,
  inc_time_tmc = FALSE
)

Arguments

dat_tmb

list of data required for model fitting, outputted by threemc_prepare_model_data, which includes:

  • design_matricesIncludesX_fixed_mmc, X_fixed_tmc, X_time_mmc, X_age_mmc, X_age_tmc, X_space_mmc, X_space_tmc, X_agetime_mmc, X_agespace_mmc, X_agespace_tmc, X_spacetime_mmc. Design Create design matrices for fixed effects and temporal, age, space and interaction random effects

  • integration matricesIncludes IntMat1, IntMat2. Integration matrices for selecting the instantaneous hazard rate.

  • survival matricesIncludes A_mmc, A_tmc, A_mc, B, C. Survival matrices for MMC, TMC, censored and left censored

  • Q_spacePrecision/Adjacency matrix for the spatial random effects.

custom_init

named list of custom fixed and random model parameters you want to supersede "hardcoded" defaults, default = NULL.

rw_order

Order of the random walk used for temporal precision matrix. Setting to NULL assumes you wish to specify an AR 1 temporal prior. Default: NULL

rw_order_tmc_ar

Whether to use an AR 1 temporal prior for TMC, regardless of whether you are using a RW temporal prior for TMC or not, Default: FALSE

paed_age_cutoff

Age at which to split MMC design matrices between paediatric and non-paediatric populations, the former of which are constant over time. Set to NULL if not desired, Default: NULL

inc_time_tmc

Indicator variable which decides whether to include temporal random effects for TMC as well as MMC, Default: FALSE

Value

Named list of intial (hyper)parameters for threemc_fit_model


Posterior Predictive Distribution and checks on OOS survey

Description

Aggregate specified numeric columns by population-weighted age groups (rather than single year ages), split by specified categories.

Usage

threemc_ppc(
  fit,
  out,
  survey_circumcision_test,
  areas = NULL,
  area_lev = 1,
  type = "MMC",
  age_groups = c("0-4", "5-9", "10-14", "15-19", "20-24", "25-29", "30-34", "35-39",
    "40-44", "45-49", "50-54", "54-59"),
  CI_range = c(0.5, 0.8, 0.95),
  N = 1000,
  seed = 123
)

Arguments

fit

Fit object returned by naomi::sample_tmb, which includes, among other things, the optimised parameters and subsequent sample for our TMB model.

out

Results of model fitting (at specified model area_lev) outputted by compute_quantiles.

survey_circumcision_test

survey_circumcision dataset loaded with read_circ_data. Do not preprocess with prepare_survey_data If performing an OOS validation of model performance, you should filter this dataset for the years "held back" from your model fit.

areas

sf shapefile for specific country/region. Only required if survey_circumcision_test has records for area levels higher (i.e. more granular) than area_lev, in which case they must be reassigned to their parent_area_id at area_lev, Default = NULL.

area_lev

Area level you wish to aggregate to when performing posterior predictive comparisons with survey estimates.

type

Decides type of circumcision coverage to perform PPC on, must be one of "MC", "MMC", or "TMC", Default = "MMC"

age_groups

Age groups to aggregate by, Default: c("0-4", "5-9", "10-14", "15-19", "20-24", "25-29", "30-34", "35-39", "40-44", "45-49", "50-54", "54-59")

CI_range

CI interval about which you want to compare empirical and posterior predictive estimates for left out surveys, Default = c(0.5, 0.8, 0.95)

N

Number of samples to generate, Default: 1000

seed

Random seed used for taking binomial sample from posterior predictive distribution.

Value

data.frame with samples aggregated by aggr_cols and weighted by population.


Posterior Predictive Distribution and checks on OOS survey

Description

Aggregate specified numeric columns by population-weighted age groups (rather than single year ages), split by specified categories. Using an alternative method to previously.

Usage

threemc_ppc2(
  fit,
  out,
  survey_circumcision_test,
  areas = NULL,
  area_lev = 1,
  age_groups = c("0-4", "5-9", "10-14", "15-19", "20-24", "25-29", "30-34", "35-39",
    "40-44", "45-49", "50-54", "54-59"),
  N = 1000,
  seed = 123
)

Arguments

fit

Fit object returned by naomi::sample_tmb, which includes, among other things, the optimised parameters and subsequent sample for our TMB model.

out

Results of model fitting (at specified model area_lev) outputted by compute_quantiles.

survey_circumcision_test

survey_circumcision dataset loaded with read_circ_data. Do not preprocess with prepare_survey_data If performing an OOS validation of model performance, you should filter this dataset for the years "held back" from your model fit.

areas

sf shapefile for specific country/region. Only required if survey_circumcision_test has records for area levels higher (i.e. more granular) than area_lev, in which case they must be reassigned to their parent_area_id at area_lev, Default = NULL.

area_lev

Area level you wish to aggregate to when performing posterior predictive comparisons with survey estimates.

age_groups

Age groups to aggregate by, Default: c("0-4", "5-9", "10-14", "15-19", "20-24", "25-29", "30-34", "35-39", "40-44", "45-49", "50-54", "54-59")

N

Number of samples to generate, Default: 1000

seed

Random seed used for taking binomial sample from posterior predictive distribution.

type

Decides type of circumcision coverage to perform PPC on, must be one of "MC", "MMC", or "TMC", Default = "MMC"

CI_range

CI interval about which you want to compare empirical and posterior predictive estimates for left out surveys, Default = c(0.5, 0.8, 0.95)

Value

data.frame with samples aggregated by aggr_cols and weighted by population.


Produce Data Matrices for Modelling

Description

Create data for modelling. Output detailed below.

Usage

threemc_prepare_model_data(
  out,
  areas,
  area_lev = NULL,
  aggregated = TRUE,
  weight = "population",
  k_dt_age = 5,
  k_dt_time = NULL,
  paed_age_cutoff = NULL,
  rw_order = NULL,
  inc_time_tmc = FALSE,
  type_info = NULL,
  ...
)

Arguments

out

Shell dataset (outputted by create_shell_dataset with a row for every unique record in circumcision survey data for a given area. Also includes empirical estimates for circumcision estimates for each unique record.

areas

sf shapefiles for specific country/region.

area_lev

PSNU area level for specific country.

aggregated

agggregated = FALSE treats every area_id as its own object, allowing for the use of surveys for lower area hierarchies. aggregated = TRUE means we only look at area level of interest.

weight

variable to weigh circumcisions by when aggregating for lower area hierarchies (only applicable for aggregated = TRUE)

k_dt_age

Age knot spacing in spline definitions, Default: 5

k_dt_time

Time knot spacing in spline definitions, set to NULL to disable temporal splines, Default: NULL

paed_age_cutoff

Age at which to split MMC design matrices between paediatric and non-paediatric populations, the former of which are constant over time. Set to NULL if not desired, Default: NULL

rw_order

Order of the random walk used for temporal precision matrix. Setting to NULL assumes you wish to specify an AR 1 temporal prior. Default: NULL

inc_time_tmc

Indicator variable which decides whether to include temporal random effects for TMC as well as MMC, Default: FALSE

...

Additional arguments to be passed to functions which create matrices.

Value

list of data required for model fitting, including:

  • design_matricesIncludesX_fixed_mmc, X_fixed_tmc, X_time_mmc, X_age_mmc, X_age_tmc, X_space_mmc, X_space_tmc, X_agetime_mmc, X_agespace_mmc, X_agespace_tmc, X_spacetime_mmc. Design Create design matrices for fixed effects and temporal, age, space and interaction random effects

  • integration matricesIncludes IntMat1, IntMat2. Integration matrices for selecting the instantaneous hazard rate.

  • survival matricesIncludes A_mmc, A_tmc, A_mc, B, C. Survival matrices for MMC, TMC, censored and left censored

  • Q_spacePrecision/Adjacency matrix for the spatial random effects.