Create longitudinal/panel data

addPeriods(
  dtName,
  nPeriods = NULL,
  idvars = "id",
  timevars = NULL,
  timevarName = "timevar",
  timeid = "timeID",
  perName = "period",
  periodVec = NULL
)

Arguments

dtName

Name of existing data table

nPeriods

Number of time periods for each record

idvars

Names of index variables (in a string vector) that will be repeated during each time period

timevars

Names of time dependent variables. Defaults to NULL.

timevarName

Name of new time dependent variable

timeid

Variable name for new index field. Defaults to "timevar"

perName

Variable name for period field. Defaults to "period"

periodVec

Vector of period times. Defaults to NULL

Value

An updated data.table that that has multiple rows per observation in dtName

Details

It is possible to generate longitudinal data with varying numbers of measurement periods as well as varying time intervals between each measurement period. This is done by defining specific variables in the data set that define the number of observations per subject and the average interval time between each observation. nCount defines the number of measurements for an individual; mInterval specifies the average time between intervals for a subject; and vInterval specifies the variance of those interval times. If mInterval is not defined, no intervals are used. If vInterval is set to 0 or is not defined, the interval for a subject is determined entirely by the mean interval. If vInterval is greater than 0, time intervals are generated using a gamma distribution with specified mean and dispersion. If either nPeriods or timevars is specified, that will override any nCount, mInterval, and vInterval data.

periodVec is used to specify measurement periods that are different the default counting variables. If periodVec is not specified, the periods default to 0, 1, ... n-1, with n periods. If periodVec is specified as c(x_1, x_2, ... x_n), then x_1, x_2, ... x_n represent the measurement periods.

Examples

tdef <- defData(varname = "T", dist = "binary", formula = 0.5)
tdef <- defData(tdef, varname = "Y0", dist = "normal", formula = 10, variance = 1)
tdef <- defData(tdef, varname = "Y1", dist = "normal", formula = "Y0 + 5 + 5 * T", variance = 1)
tdef <- defData(tdef, varname = "Y2", dist = "normal", formula = "Y0 + 10 + 5 * T", variance = 1)

dtTrial <- genData(5, tdef)
dtTrial
#> Key: <id>
#>       id     T        Y0       Y1       Y2
#>    <int> <int>     <num>    <num>    <num>
#> 1:     1     0  9.038907 13.77221 18.97153
#> 2:     2     0  9.402125 13.70966 19.96253
#> 3:     3     1  7.979656 18.63066 24.97564
#> 4:     4     1  9.500015 20.69839 24.42393
#> 5:     5     1 12.331997 22.71072 28.14005

dtTime <- addPeriods(dtTrial,
  nPeriods = 3, idvars = "id",
  timevars = c("Y0", "Y1", "Y2"), timevarName = "Y"
)
dtTime
#> Key: <timeID>
#>        id period     T         Y timeID
#>     <int>  <int> <int>     <num>  <int>
#>  1:     1      0     0  9.038907      1
#>  2:     1      1     0 13.772206      2
#>  3:     1      2     0 18.971532      3
#>  4:     2      0     0  9.402125      4
#>  5:     2      1     0 13.709659      5
#>  6:     2      2     0 19.962525      6
#>  7:     3      0     1  7.979656      7
#>  8:     3      1     1 18.630661      8
#>  9:     3      2     1 24.975640      9
#> 10:     4      0     1  9.500015     10
#> 11:     4      1     1 20.698390     11
#> 12:     4      2     1 24.423926     12
#> 13:     5      0     1 12.331997     13
#> 14:     5      1     1 22.710725     14
#> 15:     5      2     1 28.140052     15

# Varying # of periods and intervals - need to have variables
# called nCount and mInterval

def <- defData(varname = "xbase", dist = "normal", formula = 20, variance = 3)
def <- defData(def, varname = "nCount", dist = "noZeroPoisson", formula = 6)
def <- defData(def, varname = "mInterval", dist = "gamma", formula = 30, variance = .01)
def <- defData(def, varname = "vInterval", dist = "nonrandom", formula = .07)

dt <- genData(200, def)
dt[id %in% c(8, 121)]
#> Key: <id>
#>       id    xbase nCount mInterval vInterval
#>    <int>    <num>  <num>     <num>     <num>
#> 1:     8 19.52645     11  29.03541      0.07
#> 2:   121 18.53334      6  31.91971      0.07

dtPeriod <- addPeriods(dt)
dtPeriod[id %in% c(8, 121)] # View individuals 8 and 121 only
#> Key: <timeID>
#>        id period    xbase  time timeID
#>     <int>  <int>    <num> <num>  <int>
#>  1:     8      0 19.52645     0     43
#>  2:     8      1 19.52645    25     44
#>  3:     8      2 19.52645    58     45
#>  4:     8      3 19.52645    78     46
#>  5:     8      4 19.52645   114     47
#>  6:     8      5 19.52645   138     48
#>  7:     8      6 19.52645   172     49
#>  8:     8      7 19.52645   208     50
#>  9:     8      8 19.52645   235     51
#> 10:     8      9 19.52645   272     52
#> 11:     8     10 19.52645   294     53
#> 12:   121      0 18.53334     0    744
#> 13:   121      1 18.53334    24    745
#> 14:   121      2 18.53334    65    746
#> 15:   121      3 18.53334    94    747
#> 16:   121      4 18.53334   129    748
#> 17:   121      5 18.53334   175    749