This takes a dataset, a column with a unique identifier and an
arbitrary number of covariates on which to stratify the splits.
It returns the original dataset with an additional column .split_id
corresponding to an identifier for the split.
make_splits(data, identifier, ..., .num_splits)
dataframe
Unquoted name of unique identifier column
variables on which to stratify (requires that quickblock
be installed.)
number of splits to create. If VIMP is requested in QoI_cfg
, this
must be an even number.
original dataframe with additional .split_id
column
To see an example analysis, read vignette("experimental_analysis")
in the context
of an experiment, vignette("experimental_analysis")
for an observational study, or
vignette("methodological_details")
for a deeper dive under the hood.
library("dplyr")
if(require("palmerpenguins")) {
data(package = 'palmerpenguins')
penguins$unitid = seq_len(nrow(penguins))
penguins$propensity = rep(0.5, nrow(penguins))
penguins$treatment = rbinom(nrow(penguins), 1, penguins$propensity)
cfg <- basic_config() %>%
add_known_propensity_score("propensity") %>%
add_outcome_model("SL.glm.interaction") %>%
remove_vimp()
attach_config(penguins, cfg) %>%
make_splits(unitid, .num_splits = 4) %>%
produce_plugin_estimates(outcome = body_mass_g, treatment = treatment, species, sex) %>%
construct_pseudo_outcomes(body_mass_g, treatment) %>%
estimate_QoI(species, sex)
}
#> Dropped 11 of 344 rows (3.2%) through listwise deletion.
#>
#> estimating nuisance models [-----------------------------------] splits: 0 / 4
#>
#> estimating nuisance models [========>--------------------------] splits: 1 / 4
#>
#> estimating nuisance models [=================>-----------------] splits: 2 / 4
#>
#> estimating nuisance models [=========================>---------] splits: 3 / 4
#>
#> estimating nuisance models [===================================] splits: 4 / 4
#>
#>
#> Dropped 11 of 344 rows (3.2%) through listwise deletion.
#> Skipping diagnostic on .pseudo_outcome due to lack of model.
#> # A tibble: 11 × 5
#> estimand term level estimate std_error
#> <chr> <chr> <chr> <dbl> <dbl>
#> 1 MSE body_mass_g Control Response 101019. 10670.
#> 2 MSE body_mass_g Treatment Response 97656. 9718.
#> 3 SL risk SL.glm.interaction_All Control Response 101139. 4266.
#> 4 SL risk SL.glm_All Control Response 103637. 5534.
#> 5 SL risk SL.glm.interaction_All Treatment Response 98217. 5473.
#> 6 SL risk SL.glm_All Treatment Response 100650. 3581.
#> 7 SL coefficient SL.glm.interaction_All Control Response 0.655 0.131
#> 8 SL coefficient SL.glm_All Control Response 0.345 0.131
#> 9 SL coefficient SL.glm.interaction_All Treatment Response 0.664 0.134
#> 10 SL coefficient SL.glm_All Treatment Response 0.336 0.134
#> 11 SATE NA NA -31.6 33.9