We, the authors of the survtmle R package, use the same guide as is used for contributing to the development of the popular ggplot2 R package. This document is simply a formal re-statement of that fact.
The goal of this guide is to help you get up and contributing to survtmle as quickly as possible. The guide is divided into two main pieces:
Filing a bug report or feature request in an issue.
Suggesting a change via a pull request.
-
-
Issues
+
+
+Issues
When filing an issue, the most important thing is to include a minimal reproducible example so that we can quickly verify the problem, and then figure out how to fix it. There are three things you need to include to make your example reproducible: required packages, data, code.
Packages should be loaded at the top of the script, so it’s easy to see which ones the example needs.
-
The easiest way to include data is to use dput() to generate the R code to recreate it.
+
The easiest way to include data is to use dput() to generate the R code to recreate it.
Spend a little bit of time ensuring that your code is easy for others to read:
@@ -125,10 +139,11 @@
Issues
You can check you have actually made a reproducible example by starting up a fresh R session and pasting your script in.
-
(Unless you’ve been specifically asked for it, please don’t include the output of sessionInfo().)
+
(Unless you’ve been specifically asked for it, please don’t include the output of sessionInfo().)
-
-
Pull requests
+
+
+Pull requests
To contribute a change to survtmle, you follow these steps:
Create a branch in git and make your changes.
@@ -150,7 +165,7 @@
Pull requests
Each PR corresponds to a git branch, so if you expect to submit multiple changes make sure to create multiple branches. If you have multiple changes that depend on each other, start with the first one and don’t submit any others until the first one has been processed.
Use survtmle coding style. Please follow the official tidyverse style guide. Maintaining a consistent style across the whole code base makes it much easier to jump into the code. If you’re modifying existing survtmle code that doesn’t follow the style guide, a separate pull request to fix the style would be greatly appreciated. To lower the burden on contributors, we’ve included a recipe make style that will re-format code to follow these conventions, provided that you’ve installed the styler package.
-
If you’re adding new parameters or a new function, you’ll also need to document them with roxygen. Make sure to re-run devtools::document() on the code before submitting.
+
If you’re adding new parameters or a new function, you’ll also need to document them with roxygen. Make sure to re-run devtools::document() on the code before submitting.
This seems like a lot of work but don’t worry if your pull request isn’t perfect. It’s a learning process. A pull request is a process, and unless you’ve submitted a few in the past it’s unlikely that your pull request will be accepted as is. Please don’t submit pull requests that change existing behaviour. Instead, think about how you can add a new feature in a minimally invasive way.
The simple data structure contains a set of baseline covariates (adjustVars), a binary treatment variable (trt), a failure time that is a function of the treatment, adjustment variables, and a random error (ftime), and a failure type (ftype), which denotes the cause of failure (0 means no failure, 1 means failure). The first few rows of data can be viewed as follows.
+
## Warning: `as_data_frame()` is deprecated, use `as_tibble()` (but mind the new semantics).
+## This warning is displayed once per session.
It is important to note that the current survtmle distribution only supports integer-valued failure times. If failure times are continuous-valued, then, unfortunately, we require the user to perform an additional pre-processing step to convert the observed failure times to ranked integers prior to applying the survtmle function. We hope to build support for this situation in future versions of the package.
@@ -130,19 +141,19 @@
Covariate adjustment via logistic regression
A common goal is to compare the incidence of failure at a fixed time between the two treatment groups. Covariate adjustment is often desirable in this comparison to improve efficiency (Moore and Laan 2009). This covariate adjustment may be facilitated by estimating a series of iterated covariate-conditional means (Robins 1999,@bang2005doubly,@vdlgruber:2012:ijb). The final iterated covariate-conditional mean is marginalized over the empirical distribution of baseline covariates to obtain an estimate of the marginal cumulative incidence.
-
Here, we invoke the eponymous survtmle function to compute the iterated mean-based (method = "mean") covariate-adjusted estimates of the cumulative incidence at time six (t0 = 6) in each of the treatment groups using quasi-logistic regression (formula specified via glm.ftime) to estimate the iterated means. The glm.ftime argument should be a valid right-hand-side formula specification based on colnames(adjustVars) and "trt". Here we use a simple main terms regression.
Here, we invoke the eponymous survtmle function to compute the iterated mean-based (method = "mean") covariate-adjusted estimates of the cumulative incidence at time six (t0 = 6) in each of the treatment groups using quasi-logistic regression (formula specified via glm.ftime) to estimate the iterated means. The glm.ftime argument should be a valid right-hand-side formula specification based on colnames(adjustVars) and "trt". Here we use a simple main terms regression.
Internally, survtmle estimates the covariate-conditional treatment probability (via glm.trt or SL.trt, see below) and covariate-conditional censoring distribution (via glm.ctime or SL.ctime, see below). In the above example, the treatment probability does not depend on covariates (as in e.g., a randomized trial) and so we did not specify a way to adjust for covariates in estimating the treatment probability. In this case, survtmle sets glm.trt = "1", which corresponds with empirical estimates of treatment probability, and sets glm.ctime to be equivalent to the Kaplan-Meier censoring distribution estimates.
In practice, we may wish to adjust for covariates when computing estimates of the covariate-conditional treatment and censoring probabilities. In observational studies, the distribution of treatment may differ by measured covariates, while in almost any study (including randomized trials) it is possible that censoring differs by covariates. Thus, we often wish to adjust for covariates to account for measured confounders of treatment receipt and censoring.
-
This adjustment may be accomplished using logistic regression through the glm.trt and glm.ctime arguments, respectively. The glm.trt argument should be a valid right-hand-side formula specification based on colnames(adjustVars). The glm.ctime argument should be a valid right-hand-side formula specification based on colnames(adjustVars), "trt", and "t" used to model the hazard function for censoring. By including "trt" and "t", the function allows censoring probabilities to depend on treatment assignment and time, respectively. Here we call survtmle again, now adjusting for covariates in the treatment and censoring fits.
This adjustment may be accomplished using logistic regression through the glm.trt and glm.ctime arguments, respectively. The glm.trt argument should be a valid right-hand-side formula specification based on colnames(adjustVars). The glm.ctime argument should be a valid right-hand-side formula specification based on colnames(adjustVars), "trt", and "t" used to model the hazard function for censoring. By including "trt" and "t", the function allows censoring probabilities to depend on treatment assignment and time, respectively. Here we call survtmle again, now adjusting for covariates in the treatment and censoring fits.
While we can certainly use logistic regression to model the treatment, censoring, and iterated means, a large benefit afforded by the survtmle package is how it leverages SuperLearner ensemble machine learning to estimate these quantities in a more flexible manner. The Super Learner method is a generalization of stacked regression (Breiman 1996) that uses cross-validation to select the best-performing estimator from a library of candidate estimators (Laan, Polley, and Hubbard 2007). Many popular machine learning algorithms have been implemented in the SuperLearner.
To utilize SuperLearner estimates, we can utilize options SL.trt, SL.ctime, and SL.ftime to estimate conditional treatment, censoring, and iterated means, respectively. See ?SuperLearner for details on options for correctly specifying a super learner library and see listWrappers() to print the methods implemented in the SuperLearner package. Here we demonstrate a call to survtmle using a simple library that includes simple algorithms that are included in base R.
Remark: Invoking survtmle with method = "mean" and SL.ftime requires fitting a Super Learner for each time point from seq_len(t0). If there are many unique time points observed in the data, this can become a computationally intensive process. In such cases, we recommend either redefining the ftime variable to pool across time points or using method = "hazard" (see below).
+
Remark: Invoking survtmle with method = "mean" and SL.ftime requires fitting a Super Learner for each time point from seq_len(t0). If there are many unique time points observed in the data, this can become a computationally intensive process. In such cases, we recommend either redefining the ftime variable to pool across time points or using method = "hazard" (see below).
@@ -205,15 +216,15 @@
Using the method of cause-specific hazards
An alternative method to the iterated mean-based TMLE for estimating cumulative incidence is based on estimated the (cause-specific) hazard function. This estimator is implemented by specifying method = "hazard" in a call to survtmle. Just as with method = "mean", we can use either glm. or SL. to adjust for covariates. However, now the glm.ftime formula may additionally include functions of time, as this formula is now being used in a pooled regression to estimate cause-specific hazards over time.
Remark: The TMLE algorithm for the hazard-based estimator differs from the iterated mean-based TMLE. In particular, the algorithm is iterative and has no guarantee of convergence. While we have not identified instances where convergence is a serious problem, we encourage users to submit any such situations as GitHub issues or to write directly to benkeser@emory.edu. The stopping criteria for the iteration may be adjusted via tol and maxIter options. Increasing tol or decreasing maxIter will lead to faster convergence; however, it is recommended that tol be set no larger than 1 / sqrt(length(ftime)). If maxIter is reached without convergence, one should check that fit$meanIC are all less than 1 / sqrt(length(ftime)).
+
Remark: The TMLE algorithm for the hazard-based estimator differs from the iterated mean-based TMLE. In particular, the algorithm is iterative and has no guarantee of convergence. While we have not identified instances where convergence is a serious problem, we encourage users to submit any such situations as GitHub issues or to write directly to benkeser@emory.edu. The stopping criteria for the iteration may be adjusted via tol and maxIter options. Increasing tol or decreasing maxIter will lead to faster convergence; however, it is recommended that tol be set no larger than 1 / sqrt(length(ftime)). If maxIter is reached without convergence, one should check that fit$meanIC are all less than 1 / sqrt(length(ftime)).
@@ -251,27 +262,27 @@
Multiple failure types
In all of the preceding examples, we have restricted our attention to the case where there is only a single failure type of interest. Now we consider more scenarios where we observe multiple failure types. First, we simulate data with two types of failure.
This simulated data structure is similar to the single failure type data; however, now the failure type variable (ftype) now contains two distinct types of failure (with 0 still reserved for no failure).
When multiple failure types are present, a common goal is to compare the cumulative incidence of a particular failure type at a fixed time between the two treatment groups, while accounting for the fact that participants may fail due to other failure types. Covariate adjustment is again desirable to improve efficiency and account for measured confounders of treatment and censoring.
@@ -280,14 +291,14 @@
Covariate adjustment via logistic regression
The call to invoke survtmle is exactly the same as in the single failure type case.
The output object contains cumulative incidence estimates for each of the four groups defined by the two failure types and treatments.
There are sometimes failure types that are not of direct interest to out study. Because survtmle invoked with method = "mean" computes an estimate of the cumulative incidence of each failure type separately, we can save on computation time by specifying which failure types we care about via the ftypeOfInterest option.
The TMLE based on cause-specific hazards can also be used to compute cumulative incidence estimates in settings with multiple failure types. As above, the glm.ftime formula may additionally include functions of time, as this formula is now being used in a pooled regression to estimate cause-specific hazard of each failure type over time.
As with the iterated-mean based TMLE, we can obtain estimates of cumulative incidence of only certain failure types (via ftypeOfInterest); however, this does not necessarily result in faster computation, as it did in the case above. In situations where the convergence of the algorithm is an issue, it may be useful to invoke multiple calls to survtmle with singular ftypeOfInterest. If such convergence issues arise, please report them as GitHub issues or contact us at benkeser@emory.edu.
+
As with the iterated-mean based TMLE, we can obtain estimates of cumulative incidence of only certain failure types (via ftypeOfInterest); however, this does not necessarily result in faster computation, as it did in the case above. In situations where the convergence of the algorithm is an issue, it may be useful to invoke multiple calls to survtmle with singular ftypeOfInterest. If such convergence issues arise, please report them as GitHub issues or contact us at benkeser@emory.edu.
@@ -411,8 +422,8 @@
In certain situations, we have knowledge that the incidence of an event is bounded below/above for every strata in the population. It is possible to incorporate these bounds into the TMLE estimation procedure to ensure that any resulting estimate of cumulative incidence is compatible with these bounds. Please refer to Benkeser, Carone, and Gilbert (2017) for more on bounded TMLEs and their potential benefits.
Bounds can be passed to survtmle by creating a data.frame that contains columns with specific names. In particular, there should be a column named "t". There should additionally be columns for the lower and upper bound for each type of failure. For example if there is only one type of failure (ftype = 1 or ftype = 0) then the bounds data.frame can contain columns "l1", and "u1" denote the lower and upper bounds, respectively, on the iterated conditional mean (for method = "mean") or the conditional hazard function (for method = "hazard"). If there are two types of failure (ftype = 1, ftype = 2, or ftype = 0) then there can additionally be columns "l2" and "u2" denoting the lower and upper bounds, respectively, on the iterated conditional mean for type two failures (for method = "mean") or the conditional cause-specific hazard function for type two failures (for method = "hazard").
Now that we have specified our bounds, we can invoke survtmle repeating our first example (“Fit 1”), but now restricting the iterated conditional means to follow the bounds specified above.
When there are multiple failure types of interest, we can still provide bounds for the iterated conditional means (or the conditional hazard function, whichever is appropriate based on our specification of the method argument).
The survtmle function provides the function timepoints to compute the estimated cumulative incidence over multiple timepoints. This function is invoked after an initial call to survtmle with option returnModels = TRUE. By setting this option, the timepoints function is able to recycle fits for the conditional treatment probability, censoring distribution, and, in the case of method = "hazard", the hazard fits. Thus, invoking timepoints is faster than making repeated calls to survtmle with different t0.
There is some subtlety involved to properly leveraging this facility. Recall that the censoring distribution fit (and cause-specific hazard fit) pools over all time points. Thus, in order to most efficiently use timepoints, the initial call to survtmle should be made setting option t0 equal to the final time point at which one wants estimates of cumulative incidence. This allows these hazard fitting procedures to utilize all of the data to estimate the conditional hazard function.
We demonstrate the use of timepoints below based on the following simulated data.
Imagine that we would like cumulative incidence estimates at times seq_len(t_0) based on fit2 above (mean-based TMLE using glm covariate adjustment). However, note that when we originally called fit2 the option returnModels was set to its default value FALSE. Thus, we must refit this object setting the function to return the model fits.
Imagine that we would like cumulative incidence estimates at times seq_len(t_0) based on fit2 above (mean-based TMLE using glm covariate adjustment). However, note that when we originally called fit2 the option returnModels was set to its default value FALSE. Thus, we must refit this object setting the function to return the model fits.
## 1 NA 0.0001427635 0.0007369614 0.001131293 0.002291745 0.002947063
## 2 NA 0.0006741195 0.0015590438 0.002276863 0.002277642 0.001641125
Internally, timepoints is making calls to survtmle, but is passing in the fitted treatment and censoring fits from fit2_rm$trtMod and fit2_rm$ctimeMod. However, for method = "mean" the function is still fitting the iterated means separately for each time required by the call to timepoints. Thus, the call to timepoints may be quite slow if method = "mean", SL.ftime is specified (as opposed to glm.ftime), and/or many times are passed in via times. Future implementations may attempt to avoid this extra model fitting. For now, if many times are required, we recommend using method = "hazard", which is able to recycle all of the model fits. Below is an example of this.
Because the cumulative incidence function is being invoked pointwise, it is possible that the resulting curve is not monotone. However, it is possible to show that projecting this curve onto a monotone function via isotonic regression results in an estimate with identical asymptotic properties to the pointwise estimate. Therefore, we additionally provide an option type = "iso" (the default) that provides these smoothed curves.
Bang, Heejung, and James M Robins. 2005. “Doubly Robust Estimation in Missing Data and Causal Inference Models.” Biometrics 61 (4). Wiley Online Library:962–73. https://doi.org/10.1111/j.1541-0420.2005.00377.x.
Benkeser, David, Marco Carone, and Peter B Gilbert. 2017. “Improved Estimation of the Cumulative Incidence of Rare Outcomes.” Statistics in Medicine. Wiley Online Library. https://doi.org/10.1002/sim.7337.
+
Benkeser, David, Marco Carone, and Peter B Gilbert. 2017. “Improved Estimation of the Cumulative Incidence of Rare Outcomes.” Statistics in Medicine. https://doi.org/10.1002/sim.7337.
Laan, Mark J van der, and Susan Gruber. 2012. “Targeted Minimum Loss Based Estimation of Causal Effects of Multiple Time Point Interventions.” Journal Article. The International Journal of Biostatistics 8 (1):1–34. https://doi.org/10.1515/1557-4679.1370.
+
Laan, Mark J van der, and Susan Gruber. 2012. “Targeted Minimum Loss Based Estimation of Causal Effects of Multiple Time Point Interventions.” Journal Article. The International Journal of Biostatistics 8 (1): 1–34. https://doi.org/10.1515/1557-4679.1370.
-
Laan, Mark J van der, Eric C Polley, and Alan E Hubbard. 2007. “Super Learner.” Journal Article. Statistical Applications in Genetics and Molecular Biology 6 (1):1–23. https://doi.org/10.2202/1544-6115.1309.
+
Laan, Mark J van der, Eric C Polley, and Alan E Hubbard. 2007. “Super Learner.” Journal Article. Statistical Applications in Genetics and Molecular Biology 6 (1): 1–23. https://doi.org/10.2202/1544-6115.1309.
-
Moore, Kelly L, and Mark J van der Laan. 2009. “Increasing Power in Randomized Trials with Right Censored Outcomes Through Covariate Adjustment.” Journal of Biopharmaceutical Statistics 19 (6). Taylor & Francis:1099–1131. https://doi.org/10.1080/10543400903243017.
+
Moore, Kelly L, and Mark J van der Laan. 2009. “Increasing Power in Randomized Trials with Right Censored Outcomes Through Covariate Adjustment.” Journal of Biopharmaceutical Statistics 19 (6): 1099–1131. https://doi.org/10.1080/10543400903243017.
-
Robins, Jamie M. 1999. “Robust Estimation in Sequentially Ignorable Missing Data and Causal Inference Models.” Proceedings of the American Statistical Association Section on Bayesian Statistical Science. http://www.biostat.harvard.edu/robins/jsaprocpat1.pdf.
-
+
Robins, Jamie M. 1999. “Robust Estimation in Sequentially Ignorable Missing Data and Causal Inference Models.” Proceedings of the American Statistical Association Section on Bayesian Statistical Science. http://www.biostat.harvard.edu/robins/jsaprocpat1.pdf.
Adds support for the use of speedglm to fit the numerous regressions fit in the estimation procedure. Users may see warnings when speedglm fails, in which case the code defaults back to standard glm.
Fixes problems with the plot.tp.survtmle method induced, by changes in the inner working of tidyr as of tidyr v0.8.0.
Adds a method confint.tp.survtmle that computes and provides output tables for statistical inference directly from objects of class tp.survtmle. This provides information equivalent to that output by confint.survtmle.
Ar
See ?SuperLearner for more information on how to specify valid
SuperLearner libraries. It is expected that the wrappers used
in the library will play nicely with the input variables, which will
-be called "trt" and names(adjustVars).
+be called "trt" and names(adjustVars).
SL.ctime
@@ -164,7 +183,7 @@
Ar
estimate of the conditional hazard for censoring. It is expected that
the wrappers used in the library will play nicely with the input
variables, which will be called "trt" and
-names(adjustVars).
+names(adjustVars).
SL.trt
@@ -172,7 +191,7 @@
Ar
SL.library argument in the call to SuperLearner for the
estimate of the conditional probability of treatment. It is expected
that the wrappers used in the library will play nicely with the input
-variables, which will be names(adjustVars).
+variables, which will be names(adjustVars).
glm.ftime
@@ -182,7 +201,7 @@
Ar
conditional mean). Ignored if SL.ftime != NULL. Use "trt"
to specify the treatment in this formula (see examples). The formula
can additionally include any variables found in
-names(adjustVars).
+names(adjustVars).
glm.ctime
@@ -191,7 +210,7 @@
Ar
for the estimate of the conditional hazard for censoring. Ignored if
SL.ctime != NULL. Use "trt" to specify the treatment in
this formula (see examples). The formula can additionally include any
-variables found in names(adjustVars).
+variables found in names(adjustVars).
glm.trt
@@ -200,7 +219,7 @@
Ar
for the estimate of the conditional probability of treatment. Ignored
if SL.trt != NULL. By default set to "1", corresponding to
using empirical estimates of each value of trt. The formula can
-include any variables found in names(adjustVars).
+include any variables found in names(adjustVars).
returnIC
@@ -221,14 +240,14 @@
Ar
ftypeOfInterest
An input specifying what failure types to compute
estimates of incidence for. The default value computes estimates for
-values unique(ftype). Can alternatively be set to a vector of
+values unique(ftype). Can alternatively be set to a vector of
values found in ftype.
trtOfInterest
An input specifying which levels of trt are of
interest. The default value computes estimates for values
-unique(trt). Can alternatively be set to a vector of values
+unique(trt). Can alternatively be set to a vector of values
found in trt.
Computes an estimate of the hazard for censoring using either glm or
SuperLearner based on log-likelihood loss. The function then computes
@@ -130,15 +146,16 @@
Estimate Censoring Mechanisms
equal to each value of trtOfInterest in turn. One of these columns
must be named C that is a counting process for the right-censoring
variable. The function will fit a regression with C as the outcome and
-functions of trt and names(adjustVars) as specified by
+functions of trt and names(adjustVars) as specified by
glm.ctime or SL.ctime as predictors.
+
Ar
See ?SuperLearner for more information on how to specify valid
SuperLearner libraries. It is expected that the wrappers used
in the library will play nicely with the input variables, which will
-be called "trt" and names(adjustVars).
+be called "trt" and names(adjustVars).
glm.ctime
@@ -175,7 +192,7 @@
Ar
conditional mean). Ignored if SL.ctime != NULL. Use "trt"
to specify the treatment in this formula (see examples). The formula
can additionally include any variables found in
-names(adjustVars).
+names(adjustVars).
This function computes an estimate of the cause-specific hazard functions
over all times using either glm or SuperLearner. The structure
@@ -138,7 +154,7 @@
Estimation for the Method of Cause-Specific Hazards
first entry in dataList to iteratively fit hazard regression models
for each cause of failure. Thus, this data.frame needs to have a
column called Nj for each value of j in J. The first fit
-estimates the hazard of min(J), while subsequent fits estimate the
+estimates the hazard of min(J), while subsequent fits estimate the
pseudo-hazard of all other values of j, where pseudo-hazard is used to mean
the probability of a failure due to type j at a particular timepoint given
no failure of any type at any previous timepoint AND no failure due to type
@@ -149,11 +165,12 @@
Estimation for the Method of Cause-Specific Hazards
This structure ensures that no strata have estimated hazards that sum to more
than one over all possible causes of failure at a particular timepoint.
+
Ar
See ?SuperLearner for more information on how to specify valid
SuperLearner libraries. It is expected that the wrappers used
in the library will play nicely with the input variables, which will
-be called "trt" and names(adjustVars).
+be called "trt" and names(adjustVars).
glm.ftime
@@ -187,7 +204,7 @@
Ar
conditional mean). Ignored if SL.ftime != NULL. Use "trt"
to specify the treatment in this formula (see examples). The formula
can additionally include any variables found in
-names(adjustVars).
+names(adjustVars).
This function computes an estimate of the G-computation regression at a
specified time t using either glm or SuperLearner. The
@@ -132,7 +148,7 @@
Estimation for the Method of Iterated Means
observed value of trt. The remaining should in turn have all rows set
to each value of trtOfInterest in the survtmle call. Currently
the code requires each data.frame to have named columns for each name
-in names(adjustVars), as well as a column named trt. It must
+in names(adjustVars), as well as a column named trt. It must
also have a columns named Nj.Y where j corresponds with the numeric
values input in allJ. These are the indicators of failure due to the
various causes before time t and are necessary for determining who to
@@ -140,15 +156,16 @@
Estimation for the Method of Iterated Means
column call C.Y where Y is again t - 1, so that right censored
observations are not included in the regressions. The function will fit a
regression with Qj.star.t+1 (also needed as a column in
-wideDataList) on functions of trt and names(adjustVars)
+wideDataList) on functions of trt and names(adjustVars)
as specified by glm.ftime or SL.ftime.
+
Ar
See ?SuperLearner for more information on how to specify valid
SuperLearner libraries. It is expected that the wrappers used
in the library will play nicely with the input variables, which will
-be called "trt" and names(adjustVars).
+be called "trt" and names(adjustVars).
glm.ftime
@@ -197,7 +214,7 @@
Ar
conditional mean). Ignored if SL.ftime != NULL. Use "trt"
to specify the treatment in this formula (see examples). The formula
can additionally include any variables found in
-names(adjustVars).
+names(adjustVars).
A convenience utility to fit regression models more quickly in the main
internal functions for estimation, which usually require logistic regression.
Use of speedglm appears to provide roughly an order of magnitude
improvement in speed when compared to glm in custom benchmarks.
This function performs a fluctuation of an initial estimate of the
cause-specific hazard functions using a call to glm (i.e., a logistic
@@ -135,11 +151,12 @@
Fluctuation for the Method of Cause-Specific Hazards
then obtains predictions based on this fit on each of the data.frame
objects in dataList.
+
This function performs a fluctuation of an initial estimate of the
G-computation regression at a specified time t using a call to
@@ -144,7 +160,7 @@
Fluctuation for the Method of Iterated Means
will be used to obtain predictions that are then mapped into the estimates of
the cumulative incidence function at t0. Currently the code requires
each data.frame to have named columns for each name in
-names(adjustVars), as well as a column named trt. It must also
+names(adjustVars), as well as a column named trt. It must also
have a columns named Nj.Y where j corresponds with the numeric values
input in allJ. These are the indicators of failure due to the various
causes before time t and are necessary for determining who to include
@@ -152,8 +168,8 @@
Fluctuation for the Method of Iterated Means
a column call C.Y where Y is again t-1, so that right censored
observations are not included in the regressions. The function will fit a
logistic regression with Qj.star.t + 1 as outcome (also needed as a
-column in wideDataList) with offset qlogis(Qj.star.t) and
-number of additional covariates given by length(trtOfInterest). These
+column in wideDataList) with offset qlogis(Qj.star.t) and
+number of additional covariates given by length(trtOfInterest). These
additional covariates should be columns in the each data.frame in
wideDataList called H.z.t where z corresponds to a each
unique value of trtOfInterest. The function returns the same
@@ -161,11 +177,12 @@
Fluctuation for the Method of Iterated Means
which is the fluctuated initial regression estimate evaluated at the observed
data points.
+
This function computes the hazard-based efficient influence curve at the
final estimate of the fluctuated cause-specific hazard functions and
@@ -119,11 +135,12 @@
Extract Influence Curve for Estimated Hazard Functions
added corresponding to the sum over all timepoints of the estimated
efficient influence function evaluated at that observation.
+
This function estimates the marginal cumulative incidence for failures of
specified types using targeted minimum loss-based estimation based on the
@@ -117,16 +133,17 @@
TMLE for Cause-Specific Hazard Functions
method = "hazard" is specified. However, power users could, in theory,
make calls directly to this function.
+
Ar
on how to specify valid SuperLearner libraries. It is expected
that the wrappers used in the library will play nicely with the input
variables, which will be called "trt",
-names(adjustVars), and "t" if method = "hazard".
+names(adjustVars), and "t" if method = "hazard".
SL.ctime
@@ -174,7 +191,7 @@
Ar
estimate of the conditional hazard for censoring. It is expected that
the wrappers used in the library will play nicely with the input
variables, which will be called "trt" and
-names(adjustVars).
+names(adjustVars).
SL.trt
@@ -182,7 +199,7 @@
Ar
SL.library argument in the call to SuperLearner for the
estimate of the conditional probability of treatment. It is expected
that the wrappers used in the library will play nicely with the input
-variables, which will be names(adjustVars).
+variables, which will be names(adjustVars).
glm.ftime
@@ -191,7 +208,7 @@
Ar
for the outcome regression. Ignored if SL.ftime is not equal to
NULL. Use "trt" to specify the treatment in this formula
(see examples). The formula can additionally include any variables
-found in names(adjustVars).
+found in names(adjustVars).
glm.ctime
@@ -200,7 +217,7 @@
Ar
for the estimate of the conditional hazard for censoring. Ignored if
SL.ctime is not equal to NULL. Use "trt" to
specify the treatment in this formula (see examples). The formula can
-additionally include any variables found in names(adjustVars).
+additionally include any variables found in names(adjustVars).
glm.trt
@@ -208,7 +225,7 @@
Ar
equation passed to the formula option of a call to glm
for the estimate of the conditional probability of treatment. Ignored
if SL.trt is not equal to NULL. The formula can include
-any variables found in names(adjustVars).
+any variables found in names(adjustVars).
glm.family
@@ -237,22 +254,22 @@
Ar
ftypeOfInterest
An input specifying what failure types to compute
estimates of incidence for. The default value computes estimates for
-values unique(ftype). Can alternatively be set to a vector of
+values unique(ftype). Can alternatively be set to a vector of
values found in ftype.
trtOfInterest
An input specifying which levels of trt are of
interest. The default value computes estimates for values
-unique(trt). Can alternatively be set to a vector of values
+unique(trt). Can alternatively be set to a vector of values
found in trt.
bounds
A data.frame of bounds on the conditional hazard
function. The data.frame should have a column named "t"
-that includes values seq_len(t0). The other columns should be
-names paste0("l",j) and paste0("u",j) for each unique
+that includes values seq_len(t0). The other columns should be
+names paste0("l",j) and paste0("u",j) for each unique
failure type label j, denoting lower and upper bounds, respectively.
See examples.
@@ -310,10 +327,10 @@
Value
ftimeMod
If returnModels = TRUE the fit object(s) for the call
to glm or SuperLearner for the outcome
regression models. If method="mean" this will be a
- list of length length(ftypeOfInterest) each of length
+ list of length length(ftypeOfInterest) each of length
t0 (one regression for each failure type and for each
timepoint). If method = "hazard" this will be a list
- of length length(ftypeOfInterest) with one fit
+ of length length(ftypeOfInterest) with one fit
corresponding to the hazard for each cause of failure. If
returnModels = FALSE, this entry will be NULL.
ctimeMod
If returnModels = TRUE the fit object for the call to
@@ -337,13 +354,13 @@
Examp
## Single failure type examples# simulate data
-set.seed(1234)
+set.seed(1234)
n<-100
-trt<-rbinom(n, 1, 0.5)
-adjustVars<-data.frame(W1=round(runif(n)), W2=round(runif(n, 0, 2)))
+trt<-rbinom(n, 1, 0.5)
+adjustVars<-data.frame(W1=round(runif(n)), W2=round(runif(n, 0, 2)))
-ftime<-round(1 + runif(n, 1, 4) - trt + adjustVars$W1 + adjustVars$W2)
-ftype<-round(runif(n, 0, 1))
+ftime<-round(1 + runif(n, 1, 4) - trt + adjustVars$W1 + adjustVars$W2)
+ftype<-round(runif(n, 0, 1))
# Fit 1 - fit hazard_tmle object with GLMs for treatment, censoring, failurefit1<-hazard_tmle(ftime=ftime, ftype=ftype,
@@ -372,12 +389,13 @@
The function takes a data.frame of short format right-censored failure
times and reshapes the long format into the wide format needed for calls to
@@ -121,10 +137,11 @@
Convert Short Form Data to List of Wide Form Data
rows for each observation and will set trt column equal to each value
of trtOfInterest in turn.
+
The function takes a data.frame and list consisting of short
and long format right-censored failure times. The function reshapes the long
@@ -123,10 +139,11 @@
Convert Long Form Data to List of Wide Form Data
trt equal to each level of trtOfInterest and set C.t to
zero for everyone.
+
This function estimates the marginal cumulative incidence for failures of
specified types using targeted minimum loss-based estimation based on the
@@ -115,15 +131,17 @@
TMLE for G-Computation of Cumulative Incidence
by survtmle whenever method = "mean" is specified. However,
power users could, in theory, make calls directly to this function.
+
Ar
See ?SuperLearner for more information on how to specify valid
SuperLearner libraries. It is expected that the wrappers used
in the library will play nicely with the input variables, which will
-be called "trt", names(adjustVars), and "t" (if
+be called "trt", names(adjustVars), and "t" (if
method = "hazard").
@@ -172,7 +190,7 @@
Ar
estimate of the conditional hazard for censoring. It is expected that
the wrappers used in the library will play nicely with the input
variables, which will be called "trt" and
-names(adjustVars).
+names(adjustVars).
SL.trt
@@ -180,7 +198,7 @@
Ar
SL.library argument in the call to SuperLearner for the
estimate of the conditional probability of treatment. It is expected
that the wrappers used in the library will play nicely with the input
-variables, which will be names(adjustVars).
+variables, which will be names(adjustVars).
glm.ftime
@@ -189,7 +207,7 @@
Ar
for the outcome regression. Ignored if SL.ftime is not equal to
NULL. Use "trt" to specify the treatment in this formula
(see examples). The formula can additionally include any variables
-found in names(adjustVars).
+found in names(adjustVars).
glm.ctime
@@ -198,7 +216,7 @@
Ar
for the estimate of the conditional hazard for censoring. Ignored if
SL.ctime is not equal to NULL. Use "trt" to
specify the treatment in this formula (see examples). The formula can
-additionally include any variables found in names(adjustVars).
+additionally include any variables found in names(adjustVars).
glm.trt
@@ -206,7 +224,7 @@
Ar
equation passed to the formula option of a call to glm
for the estimate of the conditional probability of treatment. Ignored
if SL.trt is not equal to NULL. The formula can include
-any variables found in names(adjustVars).
+any variables found in names(adjustVars).
glm.family
@@ -235,14 +253,14 @@
Ar
ftypeOfInterest
An input specifying what failure types to compute
estimates of incidence for. The default value computes estimates for
-values unique(ftype). Can alternatively be set to a vector of
+values unique(ftype). Can alternatively be set to a vector of
values found in ftype.
trtOfInterest
An input specifying which levels of trt are of
interest. The default value computes estimates for values
-unique(trt). Can alternatively be set to a vector of values
+unique(trt). Can alternatively be set to a vector of values
found in trt.
@@ -251,7 +269,7 @@
Ar
function (if method = "hazard") or on the iterated conditional
means (if method = "mean"). The data.frame should have a
column named "t" that includes values 1:t0. The other
-columns should be names paste0("l",j) and paste0("u",j)
+columns should be names paste0("l",j) and paste0("u",j)
for each unique failure type label j, denoting lower and upper bounds,
respectively. See examples.
@@ -300,10 +318,10 @@
Value
ftimeMod
If returnModels=TRUE the fit object(s) for the call to
glm or SuperLearner for the outcome regression
models. If method="mean" this will be a list of length
- length(ftypeOfInterest) each of length t0 (one
+ length(ftypeOfInterest) each of length t0 (one
regression for each failure type and for each timepoint). If
method="hazard" this will be a list of length
- length(ftypeOfInterest) with one fit corresponding to
+ length(ftypeOfInterest) with one fit corresponding to
the hazard for each cause of failure. If
returnModels = FALSE, this entry will be NULL.
ctimeMod
If returnModels = TRUE the fit object for the call to
@@ -327,13 +345,13 @@
Examp
## Single failure type examples# simulate data
-set.seed(1234)
+set.seed(1234)
n<-100
-trt<-rbinom(n,1,0.5)
-adjustVars<-data.frame(W1=round(runif(n)), W2=round(runif(n, 0, 2)))
+trt<-rbinom(n,1,0.5)
+adjustVars<-data.frame(W1=round(runif(n)), W2=round(runif(n, 0, 2)))
-ftime<-round(1 + runif(n, 1, 4) - trt + adjustVars$W1 + adjustVars$W2)
-ftype<-round(runif(n, 0, 1))
+ftime<-round(1 + runif(n, 1, 4) - trt + adjustVars$W1 + adjustVars$W2)
+ftype<-round(runif(n, 0, 1))
# Fit 1 - fit mean_tmle object with GLMs for treatment, censoring, failurefit1<-mean_tmle(ftime=ftime, ftype=ftype,
@@ -362,12 +380,13 @@
character describing whether to provide a plot of raw
("raw") or monotonic ("iso") estimates in the resultant step function plot,
-with the latter being computed by a call to stats::isoreg
+with the latter being computed by a call to stats::isoreg
A dataset containing data that is similar in structure to the RTSS/AS01
malaria vaccine trial. Privacy agreements prevent the sharing of the real
@@ -119,6 +135,7 @@
Mock RTSS/AS01 data set
the ftype variable changes, simulating output data sets of multiply
infected trial participants.
+
A dataset containing data that is similar in structure to the RV144 "Thai
trial" of the ALVAC/AIDSVAX vaccine. Privacy agreements prevent the sharing
of the real data, so please note THAT THIS IS NOT THE REAL RV144 DATA.
A numeric vector of failure times. Right-censored observations
+
An integer-valued vector of failure times. Right-censored observations
should have corresponding ftype set to 0.
ftype
-
A numeric vector indicating the type of failure. Observations
+
An integer-valued vector indicating the type of failure. Observations
with ftype=0 are treated as a right-censored observation. Each
unique value besides zero is treated as a separate type of failure.
@@ -157,7 +175,7 @@
Ar
See ?SuperLearner for more information on how to specify valid
SuperLearner libraries. It is expected that the wrappers used
in the library will play nicely with the input variables, which will
-be called "trt", names(adjustVars), and "t" (if
+be called "trt", names(adjustVars), and "t" (if
method="hazard").
@@ -167,7 +185,7 @@
Ar
estimate of the conditional hazard for censoring. It is expected that
the wrappers used in the library will play nicely with the input
variables, which will be called "trt" and
-names(adjustVars).
+names(adjustVars).
SL.trt
@@ -175,7 +193,7 @@
Ar
SL.library argument in the call to SuperLearner for the
estimate of the conditional probability of treatment. It is expected
that the wrappers used in the library will play nicely with the input
-variables, which will be names(adjustVars).
+variables, which will be names(adjustVars).
glm.ftime
@@ -184,7 +202,7 @@
Ar
for the outcome regression. Ignored if SL.ftime is not equal
to NULL. Use "trt" to specify the treatment in this
formula (see examples). The formula can additionally include any
-variables found in names(adjustVars).
+variables found in names(adjustVars).
glm.ctime
@@ -193,7 +211,7 @@
Ar
for the estimate of the conditional hazard for censoring. Ignored if
SL.ctime is not equal to NULL. Use "trt" to
specify the treatment in this formula (see examples). The formula can
-additionally include any variables found in names(adjustVars).
+additionally include any variables found in names(adjustVars).
glm.trt
@@ -201,7 +219,7 @@
Ar
equation passed to the formula option of a call to glm
for the estimate of the conditional probability of treatment. Ignored
if SL.trt is not equal to NULL. The formula can include
-any variables found in names(adjustVars).
+any variables found in names(adjustVars).
returnIC
@@ -222,14 +240,14 @@
Ar
ftypeOfInterest
An input specifying what failure types to compute
estimates of incidence for. The default value computes estimates for
-values unique(ftype). Can alternatively be set to a vector of
+values unique(ftype). Can alternatively be set to a vector of
values found in ftype.
trtOfInterest
An input specifying which levels of trt are of
interest. The default value computes estimates for values
-unique(trt). Can alternatively be set to a vector of values
+unique(trt). Can alternatively be set to a vector of values
found in trt.
@@ -250,9 +268,9 @@
Ar
A data.frame of bounds on the conditional hazard
function (if method = "hazard") or on the iterated conditional
means (if method = "mean"). The data.frame should have a
-column named "t" that includes values seq_len(t0). The
-other columns should be names paste0("l",j) and
-paste0("u",j) for each unique failure type label j, denoting
+column named "t" that includes values seq_len(t0). The
+other columns should be names paste0("l",j) and
+paste0("u",j) for each unique failure type label j, denoting
lower and upper bounds, respectively. See examples.
@@ -313,10 +331,10 @@
Value
ftimeMod
If returnModels=TRUE the fit object(s) for the call to
glm or SuperLearner for the outcome regression
models. If method="mean" this will be a list of length
- length(ftypeOfInterest) each of length t0 (one
+ length(ftypeOfInterest) each of length t0 (one
regression for each failure type and for each timepoint). If
method="hazard" this will be a list of length
- length(ftypeOfInterest) with one fit corresponding to
+ length(ftypeOfInterest) with one fit corresponding to
the hazard for each cause of failure. If
returnModels = FALSE, this entry will be NULL.
ctimeMod
If returnModels=TRUE the fit object for the call to
@@ -339,13 +357,13 @@
Value
Examples
# simulate data
-set.seed(1234)
+set.seed(1234)
n<-200
-trt<-rbinom(n, 1, 0.5)
-adjustVars<-data.frame(W1=round(runif(n)), W2=round(runif(n, 0, 2)))
+trt<-rbinom(n, 1, 0.5)
+adjustVars<-data.frame(W1=round(runif(n)), W2=round(runif(n, 0, 2)))
-ftime<-round(1 + runif(n, 1, 4) - trt + adjustVars$W1 + adjustVars$W2)
-ftype<-round(runif(n, 0, 1))
+ftime<-round(1 + runif(n, 1, 4) - trt + adjustVars$W1 + adjustVars$W2)
+ftype<-round(runif(n, 0, 1))
# Fit 1# fit a survtmle object with glm estimators for treatment, censoring, and
@@ -371,8 +389,8 @@
Examp
# censoring and empirical estimators for treatment using the "mean" methodfit2<-survtmle(ftime=ftime, ftype=ftype,
trt=trt, adjustVars=adjustVars,
- SL.ftime=c("SL.mean"),
- SL.ctime=c("SL.mean"),
+ SL.ftime=c("SL.mean"),
+ SL.ctime=c("SL.mean"),
method="mean", t0=6)
#> Warning: glm.trt and SL.trt not specified. Proceeding with glm.trt = '1'
Wrapper function for survtmle that takes a fitted survtmle
object and computes the TMLE estimated incidence for all times specified in
@@ -127,10 +143,11 @@
Evaluate Results over Time Points of Interest
the original call to survtmle. This can be ensured be making the
original call to survtmle with t0 = max(ftime).
+
timepoints(object, times, returnModels=FALSE)
-
Arguments
+
Arguments
@@ -153,18 +170,18 @@
Ar
Value
An object of class tp.survtmle with number of entries equal to
- length(times). Each entry is named "tX", where X denotes a
+ length(times). Each entry is named "tX", where X denotes a
single value of times.
A helper function that maps hazard estimates into estimates of cumulative
incidence and updates the "clever covariates" used by the targeted minimum
loss-based estimation fluctuation step.