Title: | (Matt's) Multi-Level Model of Male Circumcision in Sub-Saharan Africa |
---|---|
Description: | Functions and datasets to support, and extend to other Sub-Saharan African countries, Thomas, M. et. al., 2021, A multi-level model for estimating region-age-time-type specific male circumcision coverage from household survey and health system data in South Africa, <arXiv:2108.091422>. |
Authors: | Matthew Thomas [aut] , Jeffrey Imai-Eaton [aut] , Patrick O'Toole [cre] , Imperial College of Science, Technology and Medicine [cph] |
Maintainer: | Patrick O'Toole <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.45 |
Built: | 2024-10-10 05:46:28 UTC |
Source: | https://github.com/mrc-ide/threemc |
Calculate quantiles for samples of rates and cumulative hazard
outputted from threemc_fit_model, and add them as columns to
the shell data.frame
out
with estimated empirical circumcision rates.
compute_quantiles( out, fit, area_lev = NULL, probs = c(0.025, 0.5, 0.975), names = FALSE, ... )
compute_quantiles( out, fit, area_lev = NULL, probs = c(0.025, 0.5, 0.975), names = FALSE, ... )
out |
Shell dataset with a row for every unique record in circumcision survey data for a given area. Also includes empirical estimates for circumcision estimates for each unique record. |
fit |
Optional "small" fit object with no |
area_lev |
PSNU area level for specific country. |
probs |
Specific quantiles to be calculated, Default: c(0.025, 0.5, 0.975) |
names |
Parameter with quantile: logical; if true, the result has a names attribute. Set to FALSE for speedup with many probs, Default: FALSE |
... |
Further arguments passed to quantile. |
Input out
data.frame
, including columns with quantiles for
hazard rates etc for different circumcision types, and for overall
circumcision.
threemc_fit_model
quantile
@importFrom dplyr %>%
@importFrom rlang .data
Function to recursively create directories if any of the
directories in a provided path are missing. Similar to mkdir -p
from
Bash.
create_dirs_r(dir_path)
create_dirs_r(dir_path)
dir_path |
Path to a file or directory which you want to generate. |
Create a shell dataset with a row for every unique area ID, area name, year and circumcision age in survey data. Also, computes the empirical number of person years until circumcision and number of people circumcised for several "types" of circumcision; known medical circumcisions, known traditional circumcisions, censored survey entries (i.e. where surveyed individuals had not been circumcised) and left-censored survey entries (i.e. where circumcision occurred at an unknown age).
create_shell_dataset( survey_circumcision, populations, areas, area_lev = NULL, start_year, end_year = 2021, time1 = "time1", time2 = "time2", strat = "space", age = "age", circ = "indweight_st", ... )
create_shell_dataset( survey_circumcision, populations, areas, area_lev = NULL, start_year, end_year = 2021, time1 = "time1", time2 = "time2", strat = "space", age = "age", circ = "indweight_st", ... )
survey_circumcision |
|
populations |
|
areas |
|
area_lev |
|
start_year |
First year in shell dataset. |
end_year |
Last year in shell dataset, which is also the year to forecast/model until, Default: 2021 |
time1 |
Variable name for time of birth, Default: "time1" |
time2 |
Variable name for time circumcised or censored, Default: "time2" |
strat |
Variable to stratify by in using a 3D hazard function, Default: NULL |
age |
|
circ |
Variables with circumcision matrix, Default: "indweight_st" |
... |
Further arguments passed to or from other methods. |
data.frame
with a row for every unique record in
survey_circumcision
for a given area. Also includes empirical estimates
for circumcision estimates for each unique record.
datapack_psnu_area_level
crossing
create_integration_matrix_agetime
create_hazard_matrix_agetime
PSNU area levels for Sub-Saharan African countries. These are the recommended levels at which to perform modelling etc., for each respective country. Inferences on larger regions (i.e. lower PSNU area levels) can be made by aggregating results for higher area levels. The dataset contains the following fields:
iso3
character ISO3 codes for Sub-Saharan African
countries.
psnu_area_level
integer The sub national level considered
to be the organizational level in which a country has prioritised their
program. Increasing values refer to more granular regional distinctions.
data(datapack_psnu_area_level)
data(datapack_psnu_area_level)
A data.frame
with 29 rows and 2 variables:
sf
shapefile representation of Malawi, as a multipolygon.
iso3
character ISO3 codes for Sub-Saharan African
countries.
area_id
Unique ID for each region in MWI. Formatted as
"County_area_level_ID" (e.g. MWI_3_05 for Mzimba)
area_name
Name of region in question
parent_area_id
Unique ID for region's parent region.
area_level
Numeric value denoting area level of
area, in decreasing granularity.
area_level_label
Translates numeric area level to meaning
in country in question. For example, in Malawi a region of area level 3
is a "District".
area_sort_order
Order to sort areas in when plotting,
roughly equivalent to a geofacet
grid.
center_x
X coordinate for centre of region's multipolygon
center_y
Y coordinate for centre of region's multipolygon
geometry
sfc_MULTIPOLYGON
representation of region's
spatial geometry
data(demo_areas)
data(demo_areas)
A sf
collecton of 6 features and 9 fields, including a
data.frame
with 387 rows, 10 variables, and a sf
Single age, aggregated male populations for each area in Malawi.
iso3
character ISO3 codes for Sub-Saharan African
countries.
area_id
Unique ID for each region in MWI. Formatted as
"County_area_level_ID" (e.g. MWI_3_05 for Mzimba)
area_level
Numeric value denoting area level of
area, in decreasing granularity.
area_name
Name of region in question
year
Year for population in question
age
Age for population in question
population
(Male) Population for each unique
area-year-age combination.
data(demo_populations)
data(demo_populations)
A data.frame
with 58806 rows and 7 variables.
Circumcision surveys for Malawi.
iso3
character ISO3 codes for Sub-Saharan African
countries.
survey_id
Survey id for each record.
area_id
Unique ID for each region in MWI. Formatted as
"County_area_level_ID" (e.g. MWI_3_05 for Mzimba)
area_level
Numeric value denoting area level of
area, in decreasing granularity.
age
Age at interview.
dob_cmc
CMC (Century Month Code) date of birth of
individual.)
interview_cmc
CMC date of interview.
indweight
Weighting for survey record in question
circ_status
Circumcision status of individual, 1 indicating
circumcision and 0 indicating right-censoring.
circ_age
Age at circumcision, if applicable.
circ_who
Circumcision provider, either medical or
traditional.
circ_where
Circumcision location, either medical or
traditional.
data(demo_survey_circumcision)
data(demo_survey_circumcision)
A data.frame
with 29313 rows and 12 variables.
Western and Central Africa (WCA) - Eastern and Southern Africa (ESA) categorisation for Sub-Saharan African countries. Also includes North-South-East-West categorisation.
iso3
character ISO3 codes for Sub-Saharan African
countries.
region
character ESA-WCA categorisation for each
iso3
four_region
character North-South-East-West categorisation
for each iso3
data(esa_wca_regions)
data(esa_wca_regions)
A data.frame
with 38 rows and 3 variables.
Return minimised fit object. Often useful when saving the fit object for later aggregation.
minimise_fit_obj(fit, dat_tmb, parameters)
minimise_fit_obj(fit, dat_tmb, parameters)
fit |
Optional "small" fit object with no |
dat_tmb |
|
parameters |
|
Object of class "naomi_fit".
Prepare survey data required to run the circumcision model. Can also optionally apply normalise_weights_kish, to normalise survey weights and apply Kish coefficients.
prepare_survey_data( areas, survey_circumcision, survey_individuals = NULL, survey_clusters = NULL, area_lev, start_year = 2006, cens_year = NULL, cens_age = 59, rm_missing_type = FALSE, norm_kisk_weights = TRUE, strata.norm = c("survey_id", "area_id"), strata.kish = c("survey_id") )
prepare_survey_data( areas, survey_circumcision, survey_individuals = NULL, survey_clusters = NULL, area_lev, start_year = 2006, cens_year = NULL, cens_age = 59, rm_missing_type = FALSE, norm_kisk_weights = TRUE, strata.norm = c("survey_id", "area_id"), strata.kish = c("survey_id") )
areas |
|
survey_circumcision |
|
survey_individuals |
|
survey_clusters |
|
area_lev |
|
start_year |
|
cens_year |
|
cens_age |
|
rm_missing_type |
|
norm_kisk_weights |
|
strata.norm |
Stratification variables for normalising survey weights, Default: c("survey_id", "area_id") |
strata.kish |
Stratification variables for estimating and applying the Kish coefficients, Default: "survey_id" |
Survey data with required variables to run circumcision model.
Function to read in circumcision data to fit model. Handles
csv with fread
(but outputs data as a
data.frame
), and geographical data with coderead_sf (for which
it also adds unique identifiers for each area_level
).
read_circ_data(path, filters = NULL, selected = NULL, ...)
read_circ_data(path, filters = NULL, selected = NULL, ...)
path |
Path to data. |
filters |
Optional named vector, whose values dictate the values filtered for in the corresponding column names. Only supports filtering for one value for each column. default: NULL |
selected |
Optional columns to select, removing others, default = NULL |
... |
Further arguments passed to or from other methods. |
relevant data set, filtered as desired.
Create data frame of all ages within provided age group. Convert survey coverage points & dmppt2 data to match convention of aggregated results.
survey_points_dmppt2_convert_convention(.data)
survey_points_dmppt2_convert_convention(.data)
.data |
Data frame with either survey calculated coverage, with associated error bounds, or DMPPT2 coverage estimates calculated from VMMC programme data. |
Aggregate by area, year, age and type (weighted by population), and convert to a percentage/probability.
threemc_aggregate( .data, fit, areas, populations, age_var = c("age", "age_group"), type = c("probability", "incidence", "prevalence"), area_lev, N = 100, prev_year = 2008, probs = c(0.025, 0.5, 0.975), ... )
threemc_aggregate( .data, fit, areas, populations, age_var = c("age", "age_group"), type = c("probability", "incidence", "prevalence"), area_lev, N = 100, prev_year = 2008, probs = c(0.025, 0.5, 0.975), ... )
.data |
|
fit |
|
areas |
|
populations |
|
age_var |
Determines whether you wish to aggregate by discrete ages or age groups (0-4, 5-9, 10-14, and so on). |
type |
Determines which aspect of MC in the regions in question you wish to aggregate for. Can be one of "probability", "incidence" or "prevalence". |
area_lev |
PSNU area level for specific country. |
N |
Number of samples to be generated, Default: 100 |
prev_year |
If type == "prevalence", choose year to compare prevalence with. |
probs |
Percentiles to provide quantiles at. Set to NULL to skip computing quantiles. |
... |
Further arguments to internal functions. |
data.frame
with samples aggregated by aggr_cols
and
weighted by population.
Takes the shell dataset with a row for every unique area ID, area name, year and circumcision age in survey data outputed by create_shell_dataset and returns the empirical circumcision rates for each row, aggregated to age groups from single ages. Also converts from wide format to long format.
threemc_empirical_rates( out, areas, area_lev, populations, age_groups = c("0-4", "5-9", "10-14", "15-19", "20-24", "25-29", "30-34", "35-39", "40-44", "45-49", "50-54", "54-59", "0+", "10+", "15+", "15-24", "10-24", "15-29", "10-29", "15-39", "10-39", "15-49", "10-49") )
threemc_empirical_rates( out, areas, area_lev, populations, age_groups = c("0-4", "5-9", "10-14", "15-19", "20-24", "25-29", "30-34", "35-39", "40-44", "45-49", "50-54", "54-59", "0+", "10+", "15+", "15-24", "10-24", "15-29", "10-29", "15-39", "10-39", "15-49", "10-49") )
out |
Shell dataset outputted by create_shell_dataset |
areas |
|
area_lev |
|
populations |
|
age_groups |
Age groups to aggregate by, Default: c("0-4", "5-9", "10-14", "15-19", "20-24", "25-29", "30-34", "35-39", "40-44", "45-49", "50-54", "54-59", "15-24", "10-24", "15-29", "10-29", "15-39", "10-39", "15-49", "10-49" ) |
Optimises threemc objective function and produces samples from
model fit (if so desired). If provided with an existing optimised model
fit
, can also perform re-sampling.
threemc_fit_model( fit = NULL, dat_tmb = NULL, mod = NULL, parameters = NULL, maps = NULL, randoms = c("u_time_mmc", "u_age_mmc", "u_space_mmc", "u_agetime_mmc", "u_agespace_mmc", "u_spacetime_mmc", "u_age_tmc", "u_space_tmc", "u_agespace_tmc"), sample = TRUE, smaller_fit_obj = FALSE, sdreport = FALSE, N = 1000, verbose = TRUE, ... )
threemc_fit_model( fit = NULL, dat_tmb = NULL, mod = NULL, parameters = NULL, maps = NULL, randoms = c("u_time_mmc", "u_age_mmc", "u_space_mmc", "u_agetime_mmc", "u_agespace_mmc", "u_spacetime_mmc", "u_age_tmc", "u_space_tmc", "u_agespace_tmc"), sample = TRUE, smaller_fit_obj = FALSE, sdreport = FALSE, N = 1000, verbose = TRUE, ... )
fit |
Optional "small" fit object with no |
dat_tmb |
|
mod |
TMB model, one of either "Surv_SpaceAgeTime_ByType_withUnknownType" or "Surv_SpaceAgeTime" if the surveys for the country in question make no distinction between circumcision type (i.e whether they were performed in a medical or traditional setting). |
parameters |
|
maps |
|
randoms |
|
sample |
If set to TRUE, has function also return N samples for medical, traditional and total circumcisions, Default: TRUE |
smaller_fit_obj |
Returns a smaller fit object. Useful for saving the fit object for later aggregations. |
sdreport |
If set to TRUE, produces the standard deviation report for the model, Default: FALSE |
N |
Number of samples to be generated, Default: 1000 |
verbose |
Boolean specifying whether you want detailed updates on function operations and progress, default = TRUE |
... |
Further arguments passed to internal functions. |
TMB model fit, including optimised parameters, hessian matrix, samples and standard deviation report (if desired).
thremec
(hyper)parameters.Return minimised fit object. Often useful when saving the fit object for later aggregation.
threemc_initial_pars( dat_tmb, custom_init = NULL, rw_order = NULL, rw_order_tmc_ar = FALSE, paed_age_cutoff = NULL, inc_time_tmc = FALSE )
threemc_initial_pars( dat_tmb, custom_init = NULL, rw_order = NULL, rw_order_tmc_ar = FALSE, paed_age_cutoff = NULL, inc_time_tmc = FALSE )
dat_tmb |
|
custom_init |
named |
rw_order |
Order of the random walk used for temporal precision matrix. Setting to NULL assumes you wish to specify an AR 1 temporal prior. Default: NULL |
rw_order_tmc_ar |
Whether to use an AR 1 temporal prior for TMC, regardless of whether you are using a RW temporal prior for TMC or not, Default: FALSE |
paed_age_cutoff |
Age at which to split MMC design matrices between paediatric and non-paediatric populations, the former of which are constant over time. Set to NULL if not desired, Default: NULL |
inc_time_tmc |
Indicator variable which decides whether to include temporal random effects for TMC as well as MMC, Default: FALSE |
Named list
of intial (hyper)parameters for
threemc_fit_model
Aggregate specified numeric
columns by population-weighted
age groups (rather than single year ages), split by specified categories.
threemc_ppc( fit, out, survey_circumcision_test, areas = NULL, area_lev = 1, type = "MMC", age_groups = c("0-4", "5-9", "10-14", "15-19", "20-24", "25-29", "30-34", "35-39", "40-44", "45-49", "50-54", "54-59"), CI_range = c(0.5, 0.8, 0.95), N = 1000, seed = 123 )
threemc_ppc( fit, out, survey_circumcision_test, areas = NULL, area_lev = 1, type = "MMC", age_groups = c("0-4", "5-9", "10-14", "15-19", "20-24", "25-29", "30-34", "35-39", "40-44", "45-49", "50-54", "54-59"), CI_range = c(0.5, 0.8, 0.95), N = 1000, seed = 123 )
fit |
Fit object returned by |
out |
Results of model fitting (at specified model |
survey_circumcision_test |
|
areas |
|
area_lev |
Area level you wish to aggregate to when performing posterior predictive comparisons with survey estimates. |
type |
Decides type of circumcision coverage to perform PPC on, must be one of "MC", "MMC", or "TMC", Default = "MMC" |
age_groups |
Age groups to aggregate by, Default: c("0-4", "5-9", "10-14", "15-19", "20-24", "25-29", "30-34", "35-39", "40-44", "45-49", "50-54", "54-59") |
CI_range |
CI interval about which you want to compare empirical and posterior predictive estimates for left out surveys, Default = c(0.5, 0.8, 0.95) |
N |
Number of samples to generate, Default: 1000 |
seed |
Random seed used for taking binomial sample from posterior predictive distribution. |
data.frame
with samples aggregated by aggr_cols
and
weighted by population.
Aggregate specified numeric
columns by population-weighted
age groups (rather than single year ages), split by specified categories.
Using an alternative method to previously.
threemc_ppc2( fit, out, survey_circumcision_test, areas = NULL, area_lev = 1, age_groups = c("0-4", "5-9", "10-14", "15-19", "20-24", "25-29", "30-34", "35-39", "40-44", "45-49", "50-54", "54-59"), N = 1000, seed = 123 )
threemc_ppc2( fit, out, survey_circumcision_test, areas = NULL, area_lev = 1, age_groups = c("0-4", "5-9", "10-14", "15-19", "20-24", "25-29", "30-34", "35-39", "40-44", "45-49", "50-54", "54-59"), N = 1000, seed = 123 )
fit |
Fit object returned by |
out |
Results of model fitting (at specified model |
survey_circumcision_test |
|
areas |
|
area_lev |
Area level you wish to aggregate to when performing posterior predictive comparisons with survey estimates. |
age_groups |
Age groups to aggregate by, Default: c("0-4", "5-9", "10-14", "15-19", "20-24", "25-29", "30-34", "35-39", "40-44", "45-49", "50-54", "54-59") |
N |
Number of samples to generate, Default: 1000 |
seed |
Random seed used for taking binomial sample from posterior predictive distribution. |
type |
Decides type of circumcision coverage to perform PPC on, must be one of "MC", "MMC", or "TMC", Default = "MMC" |
CI_range |
CI interval about which you want to compare empirical and posterior predictive estimates for left out surveys, Default = c(0.5, 0.8, 0.95) |
data.frame
with samples aggregated by aggr_cols
and
weighted by population.
Create data for modelling. Output detailed below.
threemc_prepare_model_data( out, areas, area_lev = NULL, aggregated = TRUE, weight = "population", k_dt_age = 5, k_dt_time = NULL, paed_age_cutoff = NULL, rw_order = NULL, inc_time_tmc = FALSE, type_info = NULL, ... )
threemc_prepare_model_data( out, areas, area_lev = NULL, aggregated = TRUE, weight = "population", k_dt_age = 5, k_dt_time = NULL, paed_age_cutoff = NULL, rw_order = NULL, inc_time_tmc = FALSE, type_info = NULL, ... )
out |
Shell dataset (outputted by create_shell_dataset with a row for every unique record in circumcision survey data for a given area. Also includes empirical estimates for circumcision estimates for each unique record. |
areas |
|
area_lev |
PSNU area level for specific country. |
aggregated |
|
weight |
variable to weigh circumcisions by when aggregating for
lower area hierarchies (only applicable for |
k_dt_age |
Age knot spacing in spline definitions, Default: 5 |
k_dt_time |
Time knot spacing in spline definitions, set to NULL to disable temporal splines, Default: NULL |
paed_age_cutoff |
Age at which to split MMC design matrices between paediatric and non-paediatric populations, the former of which are constant over time. Set to NULL if not desired, Default: NULL |
rw_order |
Order of the random walk used for temporal precision matrix. Setting to NULL assumes you wish to specify an AR 1 temporal prior. Default: NULL |
inc_time_tmc |
Indicator variable which decides whether to include temporal random effects for TMC as well as MMC, Default: FALSE |
... |
Additional arguments to be passed to functions which create matrices. |
list
of data required for model fitting, including:
design_matricesIncludesX_fixed_mmc
, X_fixed_tmc
, X_time_mmc
,
X_age_mmc
, X_age_tmc
, X_space_mmc
, X_space_tmc
, X_agetime_mmc
,
X_agespace_mmc
, X_agespace_tmc
, X_spacetime_mmc
. Design
Create design matrices for fixed effects and temporal, age, space and
interaction random effects
integration matricesIncludes IntMat1
, IntMat2
. Integration
matrices for selecting the instantaneous hazard rate.
survival matricesIncludes A_mmc
, A_tmc
, A_mc
, B
, C
.
Survival matrices for MMC, TMC, censored and left censored
Q_spacePrecision/Adjacency matrix for the spatial random effects.