Time-to-event data, including both survival and censoring times, are created using functions defSurv
and genSurv
. The survival data definitions require a variable name as well as a specification of a scale value, which determines the mean survival time at a baseline level of covariates (i.e. all covariates set to 0). The Weibull distribution is used to generate these survival times. In addition, covariates (which have been defined previously) that influence survival time can be included in the formula
field. Positive coefficients are associated with longer survival times (and lower hazard rates). Finally, the shape of the distribution can be specified. A shape
value of 1 reflects the exponential distribution.
# Baseline data definitions def <- defData(varname = "x1", formula = 0.5, dist = "binary") def <- defData(def, varname = "x2", formula = 0.5, dist = "binary") def <- defData(def, varname = "grp", formula = 0.5, dist = "binary") # Survival data definitions set.seed(282716) sdef <- defSurv(varname = "survTime", formula = "1.5*x1", scale = "grp*50 + (1-grp)*25", shape = "grp*1 + (1-grp)*1.5") sdef <- defSurv(sdef, varname = "censorTime", scale = 80, shape = 1) sdef
## varname formula scale shape
## 1: survTime 1.5*x1 grp*50 + (1-grp)*25 grp*1 + (1-grp)*1.5
## 2: censorTime 0 80 1
The data are generated with calls to genData
and genSurv
:
# Baseline data definitions dtSurv <- genData(300, def) dtSurv <- genSurv(dtSurv, sdef) head(dtSurv)
## id x1 x2 grp survTime censorTime
## 1: 1 0 0 1 9.206 95.976
## 2: 2 0 1 0 25.525 46.754
## 3: 3 0 1 0 604.203 31.620
## 4: 4 1 1 0 23.631 338.427
## 5: 5 1 0 0 108.276 287.553
## 6: 6 0 1 1 8.122 53.406
## grp x1 V1
## 1: 0 0 156.2
## 2: 0 1 19.0
## 3: 1 0 43.3
## 4: 1 1 14.1
Observed survival times and censoring indicators can be generated by defining new fields:
cdef <- defDataAdd(varname = "obsTime", formula = "pmin(survTime, censorTime)", dist = "nonrandom") cdef <- defDataAdd(cdef, varname = "status", formula = "I(survTime <= censorTime)", dist = "nonrandom") dtSurv <- addColumns(cdef, dtSurv) head(dtSurv)
## id x1 x2 grp survTime censorTime obsTime status
## 1: 1 0 0 1 9.206 95.976 9.206 TRUE
## 2: 2 0 1 0 25.525 46.754 25.525 TRUE
## 3: 3 0 1 0 604.203 31.620 31.620 FALSE
## 4: 4 1 1 0 23.631 338.427 23.631 TRUE
## 5: 5 1 0 0 108.276 287.553 108.276 TRUE
## 6: 6 0 1 1 8.122 53.406 8.122 TRUE
# estimate proportion of censoring by x1 and group dtSurv[, round(1 - mean(status), 2), keyby = .(grp, x1)]
## grp x1 V1
## 1: 0 0 0.51
## 2: 0 1 0.13
## 3: 1 0 0.37
## 4: 1 1 0.17
Here is a Kaplan-Meier plot of the data by the four groups: