Title: | Bootstrapping for Propensity Score Analysis |
---|---|
Description: | It is often advantageous to test a hypothesis more than once in the context of propensity score analysis (Rosenbaum, 2012) <doi:10.1093/biomet/ass032>. The functions in this package facilitate bootstrapping for propensity score analysis (PSA). By default, bootstrapping using two classification tree methods (using 'rpart' and 'ctree' functions), two matching methods (using 'Matching' and 'MatchIt' packages), and stratification with logistic regression. A framework is described for users to implement additional propensity score methods. Visualizations are emphasized for diagnosing balance; exploring the correlation relationships between bootstrap samples and methods; and to summarize results. |
Authors: | Jason Bryer [aut, cre] |
Maintainer: | Jason Bryer <[email protected]> |
License: | GPL |
Version: | 1.3.8 |
Built: | 2025-01-15 03:10:21 UTC |
Source: | https://github.com/jbryer/psaboot |
Bootstrapping procedures for Propensity Score Analysis.
Convert the results of PSAboot summary to a data frame.
## S3 method for class 'PSAbootSummary' as.data.frame(x, row.names = NULL, optional = FALSE, ...)
## S3 method for class 'PSAbootSummary' as.data.frame(x, row.names = NULL, optional = FALSE, ...)
x |
results of |
row.names |
row names. |
optional |
unused. |
... |
unused. |
a data.frame.
This method provides some crude overall measures of balance.
balance(psaboot, na.rm = TRUE, pool.fun = mean)
balance(psaboot, na.rm = TRUE, pool.fun = mean)
psaboot |
results from |
na.rm |
should NAs be removed. NAs generally occur when there is insufficient sample for a particular covariate or an unused level. |
pool.fun |
a function specifying how the effect sizes across all covariates
should be combined. Possible values include |
a list with three elements:
named numeric vector with unadjusted effect size before adjustment for each covariate
a matrix with adjusted effect size for each covariate (columns) for each method (rows).
a matrix with mean adjusted effect size for all covariates for each method (columns) and each bootstrap sample (rows).
a list with an M x n covariates matrix for each method.
library(PSAboot) data(pisa.psa.cols) data(pisausa) bm.usa <- PSAboot(Tr = as.integer(pisausa$PUBPRIV) - 1, Y = pisausa$Math, X = pisausa[,pisa.psa.cols], control.ratio = 5, M = 100, seed = 2112) bm.usa.bal <- balance(bm.usa)
library(PSAboot) data(pisa.psa.cols) data(pisausa) bm.usa <- PSAboot(Tr = as.integer(pisausa$PUBPRIV) - 1, Y = pisausa$Math, X = pisausa[,pisa.psa.cols], control.ratio = 5, M = 100, seed = 2112) bm.usa.bal <- balance(bm.usa)
This function is function is primarily used by [PSAboot::balance()] and probably does not need to be called directly.
balance.matching(index.treated, index.control, covs)
balance.matching(index.treated, index.control, covs)
index.treated |
a vector with the index of treated rows in |
index.control |
a vector with the index of control rows in |
covs |
data frame or matrix of covariates. Factors should already be recoded.
See |
a named vector with one element per covariate.
Stratification using classification trees for bootstrapping.
boot.ctree(Tr, Y, X, X.trans, formu, minStrata = 5, ...)
boot.ctree(Tr, Y, X, X.trans, formu, minStrata = 5, ...)
Tr |
vector indicating treatment assignment. |
Y |
vector of outcome. |
X |
matrix or data frame of covariates. |
X.trans |
a data frame of |
formu |
the formula to use to estimate propensity scores. Note that the
dependent varaible (i.e. treatment varaible) name will be updated using
the |
minStrata |
minimum number of treatment or control units within a strata to include that strata. |
... |
other parameters passed from |
a list with three elements:
summary
a named numeric vector (with at minimum estimate
,
ci.min
, and ci.max
but other values allowed)
balance
a named numeric vector with one element per
covariate listed in X.trans
representing a balance statistic
(usually standardized effect size after adjustment)
details
an arbitrary object that contains the full results of the analysis
Matching package implementation for bootstrapping.
boot.matching(Tr, Y, X, X.trans, formu, estimand = "ATE", ...)
boot.matching(Tr, Y, X, X.trans, formu, estimand = "ATE", ...)
Tr |
vector indicating treatment assignment. |
Y |
vector of outcome. |
X |
matrix or data frame of covariates. |
X.trans |
a data frame of |
formu |
the formula to use to estimate propensity scores. Note that the
dependent varaible (i.e. treatment varaible) name will be updated using
the |
estimand |
character string for estimand, either ATE, ATT, or ATC. See
|
... |
other parameters passed to |
a list with three elements:
summary
a named numeric vector (with at minimum estimate
,
ci.min
, and ci.max
but other values allowed)
balance
a named numeric vector with one element per
covariate listed in X.trans
representing a balance statistic
(usually standardized effect size after adjustment)
details
an arbitrary object that contains the full results of the analysis
MatchIt package implementation for bootstrapping.
boot.matchit(Tr, Y, X, X.trans, formu, ...)
boot.matchit(Tr, Y, X, X.trans, formu, ...)
Tr |
vector indicating treatment assignment. |
Y |
vector of outcome. |
X |
matrix or data frame of covariates. |
X.trans |
a data frame of |
formu |
the formula to use to estimate propensity scores. Note that the
dependent varaible (i.e. treatment varaible) name will be updated using
the |
... |
other parameters passed from |
a list with three elements:
summary
a named numeric vector (with at minimum estimate
,
ci.min
, and ci.max
but other values allowed)
balance
a named numeric vector with one element per
covariate listed in X.trans
representing a balance statistic
(usually standardized effect size after adjustment)
details
an arbitrary object that contains the full results of the analysis
Stratification using classification trees for bootstrapping.
boot.rpart(Tr, Y, X, X.trans, formu, minStrata = 5, ...)
boot.rpart(Tr, Y, X, X.trans, formu, minStrata = 5, ...)
Tr |
vector indicating treatment assignment. |
Y |
vector of outcome. |
X |
matrix or data frame of covariates. |
X.trans |
a data frame of |
formu |
the formula to use to estimate propensity scores. Note that the
dependent varaible (i.e. treatment varaible) name will be updated using
the |
minStrata |
minimum number of treatment or control units within a strata to include that strata. |
... |
other parameters passed from |
a list with three elements:
summary
a named numeric vector (with at minimum estimate
,
ci.min
, and ci.max
but other values allowed)
balance
a named numeric vector with one element per
covariate listed in X.trans
representing a balance statistic
(usually standardized effect size after adjustment)
details
an arbitrary object that contains the full results of the analysis
Stratification implementation for bootstrapping.
boot.strata(Tr, Y, X, X.trans, formu, nstrata = 5, ...)
boot.strata(Tr, Y, X, X.trans, formu, nstrata = 5, ...)
Tr |
vector indicating treatment assignment. |
Y |
vector of outcome. |
X |
matrix or data frame of covariates. |
X.trans |
a data frame of |
formu |
the formula to use to estimate propensity scores. Note that the
dependent varaible (i.e. treatment varaible) name will be updated using
the |
nstrata |
number of strata to divide the propensity scores. |
... |
other parameters passed from |
a list with three elements:
summary
a named numeric vector (with at minimum estimate
,
ci.min
, and ci.max
but other values allowed)
balance
a named numeric vector with one element per
covariate listed in X.trans
representing a balance statistic
(usually standardized effect size after adjustment)
details
an arbitrary object that contains the full results of the analysis
Propensity score weighting implementation for bootstrapping.
boot.weighting(Tr, Y, X, X.trans, formu, estimand = "ATE", ...)
boot.weighting(Tr, Y, X, X.trans, formu, estimand = "ATE", ...)
Tr |
vector indicating treatment assignment. |
Y |
vector of outcome. |
X |
matrix or data frame of covariates. |
X.trans |
a data frame of |
formu |
the formula to use to estimate propensity scores. Note that the
dependent varaible (i.e. treatment varaible) name will be updated using
the |
estimand |
which treatment effect to estimate. Values can be ATE, ATT, ATC, or ATM. |
... |
other parameters passed from |
a list with three elements:
summary
a named numeric vector (with at minimum estimate
,
ci.min
, and ci.max
but other values allowed)
balance
a named numeric vector with one element per
covariate listed in X.trans
representing a balance statistic
(usually standardized effect size after adjustment)
details
an arbitrary object that contains the full results of the analysis
Boxplot of PSA bootstrap results.
## S3 method for class 'PSAboot' boxplot( x, bootstrap.mean.color = "blue", bootstrap.ci.color = "green", bootstrap.ci.width = 0.5, bootstrap.ci.size = 3, overall.mean.color = "red", tufte = FALSE, coord.flip = TRUE, ... )
## S3 method for class 'PSAboot' boxplot( x, bootstrap.mean.color = "blue", bootstrap.ci.color = "green", bootstrap.ci.width = 0.5, bootstrap.ci.size = 3, overall.mean.color = "red", tufte = FALSE, coord.flip = TRUE, ... )
x |
result of |
bootstrap.mean.color |
the color of the point for the bootstrap mean, or NA to omit. |
bootstrap.ci.color |
the color of the confidence intervals of the bootstrap samples, or NA to omit. |
bootstrap.ci.width |
the width of the confidence interval lines at the end. |
bootstrap.ci.size |
the size of the confidence interval lines. |
overall.mean.color |
the color of the point for the overall (before bootstrapping) mean, or NA to omit. |
tufte |
use Tufte's boxplot style. Requires the |
coord.flip |
Whether to flip the coordinates. |
... |
unused |
a ggplot2 expression.
Boxplot of the balance statistics for bootstrapped samples.
## S3 method for class 'PSAboot.balance' boxplot( x, unadjusted.color = "red", pooled.color = "blue", point.size = 3, point.alpha = 0.5, ... )
## S3 method for class 'PSAboot.balance' boxplot( x, unadjusted.color = "red", pooled.color = "blue", point.size = 3, point.alpha = 0.5, ... )
x |
results of |
unadjusted.color |
the color used for the unadjusted effect size. |
pooled.color |
the color used for the mean bootstrap effect size. |
point.size |
the size of the points. |
point.alpha |
the transparency level for the points. |
... |
other parameters passed to |
a ggplot2 expression.
library(PSAboot) data(pisa.psa.cols) data(pisausa) bm.usa <- PSAboot(Tr = as.integer(pisausa$PUBPRIV) - 1, Y = pisausa$Math, X = pisausa[,pisa.psa.cols], control.ratio = 5, M = 100, seed = 2112) bm.usa.bal <- balance(bm.usa) boxplot(bm.usa.bal, nrow = 1)
library(PSAboot) data(pisa.psa.cols) data(pisausa) bm.usa <- PSAboot(Tr = as.integer(pisausa$PUBPRIV) - 1, Y = pisausa$Math, X = pisausa[,pisa.psa.cols], control.ratio = 5, M = 100, seed = 2112) bm.usa.bal <- balance(bm.usa) boxplot(bm.usa.bal, nrow = 1)
Calculates propensity score weights.
calculate_ps_weights(treatment, ps, estimand = "ATE")
calculate_ps_weights(treatment, ps, estimand = "ATE")
treatment |
a logical vector for treatment status. |
ps |
numeric vector of propensity scores |
estimand |
character string indicating which estimand to be used. Possible values are ATE (average treatment effect), ATT (average treatment effect for the treated), ATC (average treatement effect for the controls), ATM (Average Treatment Effect Among the Evenly Matchable), ATO (Average Treatment Effect Among the Overlap Populatio) |
PSAboot
.The current default methods are:
getPSAbootMethods()
getPSAbootMethods()
The default methods can be changed by setting the PSAboot.methods
option
using options('PSAboot.methods'=c(...))
where ...
is a named
list of functions.
a vector of methods for use by PSAboot
Histogram of PSA bootstrap results
## S3 method for class 'PSAboot' hist(x, ...)
## S3 method for class 'PSAboot' hist(x, ...)
x |
result of |
... |
other parameters passed to |
a ggplot2 expression.
Matrix Plot of Bootstrapped Propensity Score Analysis
matrixplot(bm)
matrixplot(bm)
bm |
result from |
Nothing returned. Creates a plot using the [graphics::pairs()] function.
Character vector representing the list of covariates used for estimating propensity scores.
a character vector with covariate names for estimating propensity scores.
Student results from the 2009 Programme of International Student Assessment (PISA) as provided by the Organization for Economic Co-operation and Development (OECD). See https://www.oecd.org/pisa/ for more information including the code book.
a data frame with 4,622 rows and 65 columns.
CNT
Country
SCHOOLID
SchoolID
ST01Q01
Grade
ST04Q01
Sex
ST05Q01
Attend
ST06Q01
Age
ST07Q01
Repeat
ST08Q01
At home mother
ST08Q02
At home father
ST08Q03
At home brothers
ST08Q04
At home sisters
ST08Q05
At home grandparents
ST08Q06
At home others
ST10Q01
Mother highest schooling
ST12Q01
Mother current job status
ST14Q01
Father highest schooling
ST16Q01
Father current job status
ST19Q01
Language at home
ST20Q01
Desk
ST20Q02
Own room
ST20Q03
Study place
ST20Q04
Computer
ST20Q05
Software
ST20Q06
Internet
ST20Q07
Literature
ST20Q08
Poetry
ST20Q09
Art
ST20Q10
Textbooks
ST20Q12
Dictionary
ST20Q13
Dishwasher
ST20Q14
DVD
ST21Q01
How many cellphones
ST21Q02
How many TVs
ST21Q03
How many computers
ST21Q04
How many cars
ST21Q05
How many rooms bath or shower
ST22Q01
How many books
ST23Q01
Reading enjoyment time
ST31Q01
Enrich in test language
ST31Q02
Enrich in mathematics
ST31Q03
Enrich in science
ST31Q05
Remedial in test language
ST31Q06
Remedial in mathematics
ST31Q07
Remedial in science
ST32Q01
Out of school lessons in test language
ST32Q02
Out of school lessons maths
ST32Q03
Out of school lessons in science
PUBPRIV
Public or private school
STRATIO
Student to teacher ratio in school
Note that missing values have been imputed using the
mice package. Details on the specific procedure are in the pisa.impute
function
in the pisa
package.
Organisation for Economic Co-operation and Development (2009). Programme for International Student Assessment (PISA).
Student results from the 2009 Programme of International Student Assessment (PISA) as provided by the Organization for Economic Co-operation and Development (OECD). See www.oecd.org/pisa/ for more information including the code book.
a data frame with 5,233 rows and 65 columns.
CNT
Country
SCHOOLID
SchoolID
ST01Q01
Grade
ST04Q01
Sex
ST05Q01
Attend
ST06Q01
Age
ST07Q01
Repeat
ST08Q01
At home mother
ST08Q02
At home father
ST08Q03
At home brothers
ST08Q04
At home sisters
ST08Q05
At home grandparents
ST08Q06
At home others
ST10Q01
Mother highest schooling
ST12Q01
Mother current job status
ST14Q01
Father highest schooling
ST16Q01
Father current job status
ST19Q01
Language at home
ST20Q01
Desk
ST20Q02
Own room
ST20Q03
Study place
ST20Q04
Computer
ST20Q05
Software
ST20Q06
Internet
ST20Q07
Literature
ST20Q08
Poetry
ST20Q09
Art
ST20Q10
Textbooks
ST20Q12
Dictionary
ST20Q13
Dishwasher
ST20Q14
DVD
ST21Q01
How many cellphones
ST21Q02
How many TVs
ST21Q03
How many computers
ST21Q04
How many cars
ST21Q05
How many rooms bath or shower
ST22Q01
How many books
ST23Q01
Reading enjoyment time
ST31Q01
Enrich in test language
ST31Q02
Enrich in mathematics
ST31Q03
Enrich in science
ST31Q05
Remedial in test language
ST31Q06
Remedial in mathematics
ST31Q07
Remedial in science
ST32Q01
Out of school lessons in test language
ST32Q02
Out of school lessons maths
ST32Q03
Out of school lessons in science
PUBPRIV
Public or private school
STRATIO
Student to teacher ratio in school
Note that missing values have been imputed using the
mice package. Details on the specific procedure are in the pisa.impute
function
in the pisa
package.
Organisation for Economic Co-operation and Development (2009). Programme for International Student Assessment (PISA).
Plot the results of PSAboot
## S3 method for class 'PSAboot' plot( x, sort = "all", ci.sig.color = "red", plot.overall = FALSE, plot.bootstrap = TRUE, ... )
## S3 method for class 'PSAboot' plot( x, sort = "all", ci.sig.color = "red", plot.overall = FALSE, plot.bootstrap = TRUE, ... )
x |
result of |
sort |
how the sort the rows by mean difference. Options are to sort using the mean difference from matching, stratification, both individually, or no sorting. |
ci.sig.color |
the color used for confidence intervals that do not span zero. |
plot.overall |
whether to plot vertical lines for the overall (non-bootstrapped) estimate and confidence interval. |
plot.bootstrap |
whether to plot vertical lines for the bootstrap pooled estimate and confidence interval. |
... |
currently unused. |
a ggplot2 expression.
Plot method for balance.
## S3 method for class 'PSAboot.balance' plot( x, unadjusted.color = "red", complete.color = "blue", pooled.color = "black", ... )
## S3 method for class 'PSAboot.balance' plot( x, unadjusted.color = "red", complete.color = "blue", pooled.color = "black", ... )
x |
results from |
unadjusted.color |
color of the vertical line representing the mean unadjusted effect size for all covariates. |
complete.color |
color of the vertical line representing the mean adjusted effect size for all covariates using the complete dataset. |
pooled.color |
color of the vertical line representing the mean adjusted effect size for all covariates across all bootstrapped samples. |
... |
currently unused. |
a ggplot2 expression.
library(PSAboot) data(pisa.psa.cols) data(pisausa) bm.usa <- PSAboot(Tr = as.integer(pisausa$PUBPRIV) - 1, Y = pisausa$Math, X = pisausa[,pisa.psa.cols], control.ratio = 5, M = 100, seed = 2112) bm.usa.bal <- balance(bm.usa) plot(bm.usa.bal)
library(PSAboot) data(pisa.psa.cols) data(pisausa) bm.usa <- PSAboot(Tr = as.integer(pisausa$PUBPRIV) - 1, Y = pisausa$Math, X = pisausa[,pisa.psa.cols], control.ratio = 5, M = 100, seed = 2112) bm.usa.bal <- balance(bm.usa) plot(bm.usa.bal)
Print results of PSAboot
## S3 method for class 'PSAboot' print(x, ...)
## S3 method for class 'PSAboot' print(x, ...)
x |
result of |
... |
currently unused. |
Nothing returned. S3 generic function that calls the [PSAboot::summary()] function.
This is a crude measure of overall balance. Absolute value of the standardized effect sizes are calculated for each covariate. Overall balance statistics are the mean of those effect sizes after adjustment for each method across all bootstrap samples.
## S3 method for class 'PSAboot.balance' print(x, na.rm = TRUE, ...)
## S3 method for class 'PSAboot.balance' print(x, na.rm = TRUE, ...)
x |
results from |
na.rm |
whether NA balance statistics should be removed before averaging them. |
... |
currently unused. |
No valued returned.
Print method for PSAboot Summary.
## S3 method for class 'PSAbootSummary' print(x, digits = 3, ...)
## S3 method for class 'PSAbootSummary' print(x, digits = 3, ...)
x |
result of |
digits |
desired number of digits after the decimal point. |
... |
unused. |
Nothing returned.
Propensity Score Analysis using Stratification
psa.strata(Y, Tr, strata, trim = 0, minStrata = 5)
psa.strata(Y, Tr, strata, trim = 0, minStrata = 5)
Y |
response variable. |
Tr |
treatment variable. |
strata |
strata identifier. |
trim |
allows for a trimmed mean as outcome measure, where trim is from 0 to .5 (.5 implying median). |
minStrata |
minimum number of treatment or control units within a strata to include that strata. |
a character vector containing summary.strata, ATE, se.wtd, approx.t, df, and CI.95.
Bootstrapping has become a popular resampling method for estimating sampling distributions. And propensity score analysis (PSA) has become popular for estimating causal effects in observational studies. This function implements bootstrapping specifically for PSA. Like typical bootstrapping methods, this function estimates treatment effects for M random samples. However, unlike typical bootstrap methods, this function allows for separate sample sizes for treatment and control units. That is, under certain circumstances (e.g. when the ratio of treatment-to-control units is large) bootstrapping only the control units may be desirable. Additionally, this function provides a framework to use multiple PSA methods for each bootstrap sample.
PSAboot( Tr, Y, X, M = 100, formu = as.formula(paste0("treat ~ ", paste0(names(X), collapse = " + "))), control.ratio = 5, control.sample.size = min(control.ratio * min(table(Tr)), max(table(Tr))), control.replace = TRUE, treated.sample.size = min(table(Tr)), treated.replace = TRUE, methods = getPSAbootMethods(), parallel = TRUE, seed = NULL, ... )
PSAboot( Tr, Y, X, M = 100, formu = as.formula(paste0("treat ~ ", paste0(names(X), collapse = " + "))), control.ratio = 5, control.sample.size = min(control.ratio * min(table(Tr)), max(table(Tr))), control.replace = TRUE, treated.sample.size = min(table(Tr)), treated.replace = TRUE, methods = getPSAbootMethods(), parallel = TRUE, seed = NULL, ... )
Tr |
numeric (0 or 1) or logical vector of treatment indicators. |
Y |
vector of outcome variable |
X |
matrix or data frame of covariates used to estimate the propensity scores. |
M |
number of bootstrap samples to generate. |
formu |
formula used for estimating propensity scores. The default is to use
all covariates in |
control.ratio |
the ratio of control units to sample relative to the treatment units. |
control.sample.size |
the size of each bootstrap sample of control units. |
control.replace |
whether to use replacement when sampling from control units. |
treated.sample.size |
the size of each bootstrap sample of treatment units. The default uses all treatment units for each bootstrap sample. |
treated.replace |
whether to use replacement when sampling from treated units. |
methods |
a named vector of functions for each PSA method to use. |
parallel |
whether to run the bootstrap samples in parallel. |
seed |
random seed. Each iteration, i, will use a seed of |
... |
other parameters passed to |
a list with following elements:
Data frame with the results using the complete dataset (i.e. unbootstrapped results).
Objects returned from each method for complete dataset.
Data frame with results of each bootstrap sample.
List of objects returned from each method for each bootstrap sample.
sample size used for control units.
sample size used for treated units.
whether control units were sampled with replacement.
whether treated units were sampled with replacement.
vector of treatment assignment.
vector out outcome.
matrix or data frame of covariates.
number of bootstrap samples.
getPSAbootMethods
library(PSAboot) data(pisa.psa.cols) data(pisausa) bm.usa <- PSAboot(Tr = as.integer(pisausa$PUBPRIV) - 1, Y = pisausa$Math, X = pisausa[,pisa.psa.cols], control.ratio = 5, M = 100, seed = 2112)
library(PSAboot) data(pisa.psa.cols) data(pisausa) bm.usa <- PSAboot(Tr = as.integer(pisausa$PUBPRIV) - 1, Y = pisausa$Math, X = pisausa[,pisa.psa.cols], control.ratio = 5, M = 100, seed = 2112)
Return the 25th percentile.
q25(x, na.rm = FALSE, ...)
q25(x, na.rm = FALSE, ...)
x |
numeric vector. |
na.rm |
logical; if true, any NA and NaN's are removed from x before the quantiles are computed |
... |
other parameters passed to |
the 25th percentile.
Returns the 75th percentile.
q75(x, na.rm = FALSE, ...)
q75(x, na.rm = FALSE, ...)
x |
numeric vector. |
na.rm |
logical; if true, any NA and NaN's are removed from x before the quantiles are computed |
... |
other parameters passed to |
the 75th percentile.
Summary of pooled results from PSAboot
## S3 method for class 'PSAboot' summary(object, ...)
## S3 method for class 'PSAboot' summary(object, ...)
object |
result of |
... |
currently unused. |
a list with pooled summary statistics.
a list with the results from easch PSA method. For each method a list contains the following elements:
Percentage of boostrap samples where the confidence interval does not span zero.
Weighted mean difference across all bootstrap samples.
Overall confidence interval across all bootstrap samples.
Overall weighted bootstrap mean.
Contingency table of the number of bootstrap samples that don't span zero.
Results of the summary of the PSA method.