`tidyhte`

provides tidy semantics for estimation of heterogeneous treatment effects through the use of Kennedy’s (n.d.) doubly-robust learner.

The goal of `tidyhte`

is to use a sort of “recipe” design. This should (hopefully) make it extremely easy to scale an analysis of HTE from the common single-outcome / single-moderator case to many outcomes and many moderators. The configuration of `tidyhte`

should make it extremely easy to perform the same analysis across many outcomes and for a wide-array of moderators. It’s written to be fairly easy to extend to different models and to add additional diagnostics and ways to output information from a set of HTE estimates.

The best place to start for learning how to use `tidyhte`

are the vignettes which runs through example analyses from start to finish: `vignette("experimental_analysis")`

and `vignette("observational_analysis")`

. There is also a writeup summarizing the method and implementation in `vignette("methodological-details")`

.

You will be able to install the released version of tidyhte from CRAN with:

`install.packages("tidyhte")`

But this does not yet exist. In the meantime, install the development version from GitHub with:

```
# install.packages("devtools")
devtools::install_github("ddimmery/tidyhte")
```

To set up a simple configuration, it’s straightforward to use the Recipe API:

```
library(tidyhte)
library(dplyr)
basic_config() %>%
add_propensity_score_model("SL.glmnet") %>%
add_outcome_model("SL.glmnet") %>%
add_moderator("Stratified", x1, x2) %>%
add_moderator("KernelSmooth", x3) %>%
add_vimp(sample_splitting = FALSE) -> hte_cfg
```

The `basic_config`

includes a number of defaults: it starts off the SuperLearner ensembles for both treatment and outcome with linear models (`"SL.glm"`

)

```
data %>%
attach_config(hte_cfg) %>%
make_splits(userid, .num_splits = 12) %>%
produce_plugin_estimates(
outcome_variable,
treatment_variable,
covariate1, covariate2, covariate3, covariate4, covariate5, covariate6
) %>%
construct_pseudo_outcomes(outcome_variable, treatment_variable) -> data
data %>%
estimate_QoI(covariate1, covariate2) -> results
```

To get information on estimate CATEs for a moderator not included previously would just require rerunning the final line:

```
data %>%
estimate_QoI(covariate3) -> results
```

Replicating this on a new outcome would be as simple as running the following, with no reconfiguration necessary.

```
data %>%
attach_config(hte_cfg) %>%
produce_plugin_estimates(
second_outcome_variable,
treatment_variable,
covariate1, covariate2, covariate3, covariate4, covariate5, covariate6
) %>%
construct_pseudo_outcomes(second_outcome_variable, treatment_variable) %>%
estimate_QoI(covariate1, covariate2) -> results
```

This leads to the ability to easily chain together analyses across many outcomes in an easy way:

```
library("foreach")
data %>%
attach_config(hte_cfg) %>%
make_splits(userid, .num_splits = 12) -> data
foreach(outcome = list_of_outcomes, .combine = "bind_rows") %do% {
data %>%
produce_plugin_estimates(
outcome,
treatment_variable,
covariate1, covariate2, covariate3, covariate4, covariate5, covariate6
) %>%
construct_pseudo_outcomes(outcome, treatment_variable) %>%
estimate_QoI(covariate1, covariate2) %>%
mutate(outcome = rlang::as_string(outcome))
}
```

The function `estimate_QoI`

returns results in a tibble format which makes it easy to manipulate or plot results.