tidyhte provides tidy semantics for estimation of heterogeneous treatment effects through the use of Kennedy’s (n.d.) doubly-robust learner.

The goal of tidyhte is to use a sort of “recipe” design. This should (hopefully) make it extremely easy to scale an analysis of HTE from the common single-outcome / single-moderator case to many outcomes and many moderators. The configuration of tidyhte should make it extremely easy to perform the same analysis across many outcomes and for a wide-array of moderators. It’s written to be fairly easy to extend to different models and to add additional diagnostics and ways to output information from a set of HTE estimates.

The best place to start for learning how to use tidyhte are the vignettes which runs through example analyses from start to finish: vignette("experimental_analysis") and vignette("observational_analysis"). There is also a writeup summarizing the method and implementation in vignette("methodological-details").

Installation

You will be able to install the released version of tidyhte from CRAN with:

install.packages("tidyhte")

But this does not yet exist. In the meantime, install the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("ddimmery/tidyhte")

Setting up a configuration

To set up a simple configuration, it’s straightforward to use the Recipe API:

library(tidyhte)
library(dplyr)

basic_config() %>%
    add_propensity_score_model("SL.glmnet") %>%
    add_outcome_model("SL.glmnet") %>%
    add_moderator("Stratified", x1, x2) %>%
    add_moderator("KernelSmooth", x3) %>%
    add_vimp(sample_splitting = FALSE) -> hte_cfg

The basic_config includes a number of defaults: it starts off the SuperLearner ensembles for both treatment and outcome with linear models ("SL.glm")

Running an Analysis

data %>%
    attach_config(hte_cfg) %>%
    make_splits(userid, .num_splits = 12) %>%
    produce_plugin_estimates(
        outcome_variable,
        treatment_variable,
        covariate1, covariate2, covariate3, covariate4, covariate5, covariate6
    ) %>%
    construct_pseudo_outcomes(outcome_variable, treatment_variable) -> data

data %>%
    estimate_QoI(covariate1, covariate2) -> results

To get information on estimate CATEs for a moderator not included previously would just require rerunning the final line:

data %>%
    estimate_QoI(covariate3) -> results

Replicating this on a new outcome would be as simple as running the following, with no reconfiguration necessary.

data %>%
    attach_config(hte_cfg) %>%
    produce_plugin_estimates(
        second_outcome_variable,
        treatment_variable,
        covariate1, covariate2, covariate3, covariate4, covariate5, covariate6
    ) %>%
    construct_pseudo_outcomes(second_outcome_variable, treatment_variable) %>%
    estimate_QoI(covariate1, covariate2) -> results

This leads to the ability to easily chain together analyses across many outcomes in an easy way:

library("foreach")

data %>%
    attach_config(hte_cfg) %>%
    make_splits(userid, .num_splits = 12) -> data

foreach(outcome = list_of_outcomes, .combine = "bind_rows") %do% {
    data %>%
    produce_plugin_estimates(
        outcome,
        treatment_variable,
        covariate1, covariate2, covariate3, covariate4, covariate5, covariate6
    ) %>%
    construct_pseudo_outcomes(outcome, treatment_variable) %>%
    estimate_QoI(covariate1, covariate2) %>%
    mutate(outcome = rlang::as_string(outcome))
}

The function estimate_QoI returns results in a tibble format which makes it easy to manipulate or plot results.