Create longitudinal/panel data
addPeriods(
dtName,
nPeriods = NULL,
idvars = "id",
timevars = NULL,
timevarName = "timevar",
timeid = "timeID",
perName = "period",
periodVec = NULL
)
Name of existing data table
Number of time periods for each record
Names of index variables (in a string vector) that will be repeated during each time period
Names of time dependent variables. Defaults to NULL.
Name of new time dependent variable
Variable name for new index field. Defaults to "timevar"
Variable name for period field. Defaults to "period"
Vector of period times. Defaults to NULL
An updated data.table that that has multiple rows per observation in dtName
It is possible to generate longitudinal data with varying numbers of measurement periods as well as varying time intervals between each measurement period. This is done by defining specific variables in the data set that define the number of observations per subject and the average interval time between each observation. nCount defines the number of measurements for an individual; mInterval specifies the average time between intervals for a subject; and vInterval specifies the variance of those interval times. If mInterval is not defined, no intervals are used. If vInterval is set to 0 or is not defined, the interval for a subject is determined entirely by the mean interval. If vInterval is greater than 0, time intervals are generated using a gamma distribution with specified mean and dispersion. If either nPeriods or timevars is specified, that will override any nCount, mInterval, and vInterval data.
periodVec is used to specify measurement periods that are different the default counting variables. If periodVec is not specified, the periods default to 0, 1, ... n-1, with n periods. If periodVec is specified as c(x_1, x_2, ... x_n), then x_1, x_2, ... x_n represent the measurement periods.
tdef <- defData(varname = "T", dist = "binary", formula = 0.5)
tdef <- defData(tdef, varname = "Y0", dist = "normal", formula = 10, variance = 1)
tdef <- defData(tdef, varname = "Y1", dist = "normal", formula = "Y0 + 5 + 5 * T", variance = 1)
tdef <- defData(tdef, varname = "Y2", dist = "normal", formula = "Y0 + 10 + 5 * T", variance = 1)
dtTrial <- genData(5, tdef)
dtTrial
#> Key: <id>
#> id T Y0 Y1 Y2
#> <int> <int> <num> <num> <num>
#> 1: 1 0 9.811207 16.45975 19.21951
#> 2: 2 1 10.312756 19.55707 24.12884
#> 3: 3 1 10.074717 19.78654 26.27507
#> 4: 4 0 9.395655 13.66550 18.48758
#> 5: 5 1 10.045247 20.56940 23.49643
dtTime <- addPeriods(dtTrial,
nPeriods = 3, idvars = "id",
timevars = c("Y0", "Y1", "Y2"), timevarName = "Y"
)
dtTime
#> Key: <timeID>
#> id period T Y timeID
#> <int> <int> <int> <num> <int>
#> 1: 1 0 0 9.811207 1
#> 2: 1 1 0 16.459747 2
#> 3: 1 2 0 19.219507 3
#> 4: 2 0 1 10.312756 4
#> 5: 2 1 1 19.557073 5
#> 6: 2 2 1 24.128844 6
#> 7: 3 0 1 10.074717 7
#> 8: 3 1 1 19.786538 8
#> 9: 3 2 1 26.275070 9
#> 10: 4 0 0 9.395655 10
#> 11: 4 1 0 13.665505 11
#> 12: 4 2 0 18.487584 12
#> 13: 5 0 1 10.045247 13
#> 14: 5 1 1 20.569395 14
#> 15: 5 2 1 23.496429 15
# Varying # of periods and intervals - need to have variables
# called nCount and mInterval
def <- defData(varname = "xbase", dist = "normal", formula = 20, variance = 3)
def <- defData(def, varname = "nCount", dist = "noZeroPoisson", formula = 6)
def <- defData(def, varname = "mInterval", dist = "gamma", formula = 30, variance = .01)
def <- defData(def, varname = "vInterval", dist = "nonrandom", formula = .07)
dt <- genData(200, def)
dt[id %in% c(8, 121)]
#> Key: <id>
#> id xbase nCount mInterval vInterval
#> <int> <num> <num> <num> <num>
#> 1: 8 19.79319 2 29.97654 0.07
#> 2: 121 21.54566 5 30.11969 0.07
dtPeriod <- addPeriods(dt)
dtPeriod[id %in% c(8, 121)] # View individuals 8 and 121 only
#> Key: <timeID>
#> id period xbase time timeID
#> <int> <int> <num> <num> <int>
#> 1: 8 0 19.79319 0 38
#> 2: 8 1 19.79319 33 39
#> 3: 121 0 21.54566 0 758
#> 4: 121 1 21.54566 24 759
#> 5: 121 2 21.54566 56 760
#> 6: 121 3 21.54566 89 761
#> 7: 121 4 21.54566 127 762