R/generate_correlated_data.R
genCorGen.Rd
Create multivariate (correlated) data - for general distributions
genCorGen(
n,
nvars,
params1,
params2 = NULL,
dist,
rho,
corstr,
corMatrix = NULL,
wide = FALSE,
cnames = NULL,
method = "copula",
idname = "id"
)
Number of observations
Number of variables
A single vector specifying the mean of the distribution. The vector is of length 1 if the mean is the same across all observations, otherwise the vector is of length nvars. In the case of the uniform distribution the vector specifies the minimum.
A single vector specifying a possible second parameter for the distribution. For the normal distribution, this will be the variance; for the gamma distribution, this will be the dispersion; and for the uniform distribution, this will be the maximum. The vector is of length 1 if the mean is the same across all observations, otherwise the vector is of length nvars.
A string indicating "binary", "poisson" or "gamma", "normal", or "uniform".
Correlation coefficient, -1 <= rho <= 1. Use if corMatrix is not provided.
Correlation structure of the variance-covariance matrix defined by sigma and rho. Options include "cs" for a compound symmetry structure and "ar1" for an autoregressive structure.
Correlation matrix can be entered directly. It must be symmetrical and positive semi-definite. It is not a required field; if a matrix is not provided, then a structure and correlation coefficient rho must be specified.
The layout of the returned file - if wide = TRUE, all new correlated variables will be returned in a single record, if wide = FALSE, each new variable will be its own record (i.e. the data will be in long form). Defaults to FALSE.
Explicit column names. A single string with names separated by commas. If no string is provided, the default names will be V#, where # represents the column.
Two methods are available to generate correlated data. (1) "copula" uses the multivariate Gaussian copula method that is applied to all other distributions; this applies to all available distributions. (2) "ep" uses an algorithm developed by Emrich and Piedmonte (1991).
Character value that specifies the name of the id variable.
data.table with added column(s) of correlated data
Emrich LJ, Piedmonte MR. A Method for Generating High-Dimensional Multivariate Binary Variates. The American Statistician 1991;45:302-4.
set.seed(23432)
lambda <- c(8, 10, 12)
genCorGen(100, nvars = 3, params1 = lambda, dist = "poisson", rho = .7, corstr = "cs")
#> Key: <id>
#> id period X
#> <int> <num> <num>
#> 1: 1 0 4
#> 2: 1 1 5
#> 3: 1 2 6
#> 4: 2 0 8
#> 5: 2 1 11
#> ---
#> 296: 99 1 9
#> 297: 99 2 10
#> 298: 100 0 6
#> 299: 100 1 7
#> 300: 100 2 8
genCorGen(100, nvars = 3, params1 = 5, dist = "poisson", rho = .7, corstr = "cs")
#> Key: <id>
#> id period X
#> <int> <num> <num>
#> 1: 1 0 3
#> 2: 1 1 3
#> 3: 1 2 4
#> 4: 2 0 7
#> 5: 2 1 7
#> ---
#> 296: 99 1 1
#> 297: 99 2 2
#> 298: 100 0 2
#> 299: 100 1 3
#> 300: 100 2 1
genCorGen(100, nvars = 3, params1 = lambda, dist = "poisson", rho = .7, corstr = "cs", wide = TRUE)
#> Key: <id>
#> id V1 V2 V3
#> <int> <num> <num> <num>
#> 1: 1 5 11 9
#> 2: 2 8 12 14
#> 3: 3 5 8 12
#> 4: 4 9 9 13
#> 5: 5 4 7 10
#> 6: 6 6 3 8
#> 7: 7 9 13 11
#> 8: 8 11 12 16
#> 9: 9 11 14 17
#> 10: 10 7 8 8
#> 11: 11 7 12 19
#> 12: 12 6 8 10
#> 13: 13 13 8 10
#> 14: 14 14 13 16
#> 15: 15 4 7 9
#> 16: 16 13 12 15
#> 17: 17 7 6 9
#> 18: 18 12 12 13
#> 19: 19 4 8 7
#> 20: 20 11 9 8
#> 21: 21 11 14 15
#> 22: 22 6 11 11
#> 23: 23 10 12 12
#> 24: 24 4 8 7
#> 25: 25 4 4 8
#> 26: 26 14 10 14
#> 27: 27 11 14 13
#> 28: 28 4 6 7
#> 29: 29 7 9 11
#> 30: 30 8 9 9
#> 31: 31 11 8 12
#> 32: 32 10 14 14
#> 33: 33 13 12 14
#> 34: 34 7 9 11
#> 35: 35 7 13 14
#> 36: 36 6 10 10
#> 37: 37 10 11 11
#> 38: 38 6 5 5
#> 39: 39 7 8 9
#> 40: 40 11 12 12
#> 41: 41 11 14 13
#> 42: 42 5 7 11
#> 43: 43 5 9 10
#> 44: 44 12 13 20
#> 45: 45 7 8 11
#> 46: 46 3 5 8
#> 47: 47 7 13 14
#> 48: 48 4 5 10
#> 49: 49 7 10 7
#> 50: 50 6 7 14
#> 51: 51 6 9 12
#> 52: 52 11 13 13
#> 53: 53 8 11 13
#> 54: 54 7 7 11
#> 55: 55 8 12 13
#> 56: 56 7 11 9
#> 57: 57 8 9 15
#> 58: 58 9 8 10
#> 59: 59 8 15 15
#> 60: 60 9 6 9
#> 61: 61 11 17 17
#> 62: 62 8 6 12
#> 63: 63 6 6 12
#> 64: 64 3 5 8
#> 65: 65 9 8 10
#> 66: 66 9 10 16
#> 67: 67 8 17 23
#> 68: 68 9 9 10
#> 69: 69 10 10 14
#> 70: 70 5 7 6
#> 71: 71 9 13 14
#> 72: 72 7 12 12
#> 73: 73 9 7 7
#> 74: 74 12 16 14
#> 75: 75 8 7 8
#> 76: 76 11 10 9
#> 77: 77 12 17 15
#> 78: 78 11 10 18
#> 79: 79 9 12 13
#> 80: 80 7 8 11
#> 81: 81 10 12 13
#> 82: 82 5 6 7
#> 83: 83 8 8 11
#> 84: 84 9 5 6
#> 85: 85 6 11 11
#> 86: 86 15 10 16
#> 87: 87 13 15 16
#> 88: 88 12 9 12
#> 89: 89 4 8 7
#> 90: 90 6 9 14
#> 91: 91 10 11 12
#> 92: 92 12 13 15
#> 93: 93 7 8 10
#> 94: 94 9 9 13
#> 95: 95 7 9 9
#> 96: 96 6 8 11
#> 97: 97 4 10 8
#> 98: 98 16 18 20
#> 99: 99 4 11 11
#> 100: 100 7 9 6
#> id V1 V2 V3
genCorGen(100, nvars = 3, params1 = 5, dist = "poisson", rho = .7, corstr = "cs", wide = TRUE)
#> Key: <id>
#> id V1 V2 V3
#> <int> <num> <num> <num>
#> 1: 1 4 10 3
#> 2: 2 3 4 6
#> 3: 3 5 5 7
#> 4: 4 4 4 4
#> 5: 5 2 1 2
#> 6: 6 7 8 5
#> 7: 7 6 7 6
#> 8: 8 6 4 5
#> 9: 9 5 5 4
#> 10: 10 9 9 8
#> 11: 11 1 1 2
#> 12: 12 2 3 4
#> 13: 13 4 3 3
#> 14: 14 6 6 8
#> 15: 15 8 5 2
#> 16: 16 5 7 4
#> 17: 17 5 4 4
#> 18: 18 2 3 2
#> 19: 19 3 3 4
#> 20: 20 2 2 0
#> 21: 21 5 3 4
#> 22: 22 8 9 9
#> 23: 23 3 3 4
#> 24: 24 3 0 2
#> 25: 25 3 3 3
#> 26: 26 1 3 4
#> 27: 27 11 9 12
#> 28: 28 4 2 5
#> 29: 29 7 6 7
#> 30: 30 4 2 4
#> 31: 31 6 6 8
#> 32: 32 4 4 5
#> 33: 33 9 9 8
#> 34: 34 4 4 3
#> 35: 35 3 2 2
#> 36: 36 8 5 6
#> 37: 37 5 4 5
#> 38: 38 3 2 1
#> 39: 39 9 7 5
#> 40: 40 3 3 2
#> 41: 41 9 9 9
#> 42: 42 6 4 5
#> 43: 43 8 7 5
#> 44: 44 6 4 5
#> 45: 45 4 4 4
#> 46: 46 3 3 2
#> 47: 47 6 5 5
#> 48: 48 1 2 4
#> 49: 49 4 7 4
#> 50: 50 4 5 4
#> 51: 51 11 8 9
#> 52: 52 6 7 5
#> 53: 53 4 6 7
#> 54: 54 3 4 3
#> 55: 55 4 2 2
#> 56: 56 6 8 6
#> 57: 57 4 3 3
#> 58: 58 8 6 7
#> 59: 59 6 6 8
#> 60: 60 4 3 5
#> 61: 61 5 6 5
#> 62: 62 6 7 6
#> 63: 63 6 3 2
#> 64: 64 5 6 6
#> 65: 65 6 6 9
#> 66: 66 4 4 5
#> 67: 67 6 7 6
#> 68: 68 8 7 5
#> 69: 69 5 5 7
#> 70: 70 1 0 2
#> 71: 71 5 3 2
#> 72: 72 4 6 6
#> 73: 73 7 7 4
#> 74: 74 3 4 5
#> 75: 75 6 4 3
#> 76: 76 7 4 3
#> 77: 77 4 3 2
#> 78: 78 5 6 6
#> 79: 79 6 6 3
#> 80: 80 7 6 3
#> 81: 81 2 1 2
#> 82: 82 3 2 3
#> 83: 83 3 4 5
#> 84: 84 7 6 6
#> 85: 85 5 5 2
#> 86: 86 9 6 6
#> 87: 87 4 3 4
#> 88: 88 4 9 6
#> 89: 89 9 4 6
#> 90: 90 7 5 4
#> 91: 91 2 1 2
#> 92: 92 6 3 4
#> 93: 93 2 4 5
#> 94: 94 7 9 10
#> 95: 95 8 4 5
#> 96: 96 0 0 0
#> 97: 97 4 2 4
#> 98: 98 5 5 8
#> 99: 99 3 3 4
#> 100: 100 2 4 1
#> id V1 V2 V3
genCorGen(100,
nvars = 3, params1 = lambda, dist = "poisson", rho = .7, corstr = "cs",
cnames = "new_var"
)
#> Key: <id>
#> id period new_var
#> <int> <num> <num>
#> 1: 1 0 11
#> 2: 1 1 12
#> 3: 1 2 16
#> 4: 2 0 9
#> 5: 2 1 15
#> ---
#> 296: 99 1 10
#> 297: 99 2 11
#> 298: 100 0 6
#> 299: 100 1 10
#> 300: 100 2 13
genCorGen(100,
nvars = 3, params1 = lambda, dist = "poisson", rho = .7, corstr = "cs",
wide = TRUE, cnames = "a, b, c"
)
#> Key: <id>
#> id a b c
#> <int> <num> <num> <num>
#> 1: 1 8 10 10
#> 2: 2 5 9 10
#> 3: 3 6 8 12
#> 4: 4 7 8 13
#> 5: 5 9 9 19
#> 6: 6 10 10 13
#> 7: 7 7 10 9
#> 8: 8 10 10 14
#> 9: 9 10 11 14
#> 10: 10 9 12 16
#> 11: 11 8 13 15
#> 12: 12 6 9 13
#> 13: 13 11 11 16
#> 14: 14 10 11 14
#> 15: 15 10 11 10
#> 16: 16 12 10 15
#> 17: 17 10 11 12
#> 18: 18 13 12 15
#> 19: 19 13 13 14
#> 20: 20 6 7 13
#> 21: 21 7 9 11
#> 22: 22 9 12 9
#> 23: 23 9 10 13
#> 24: 24 9 9 14
#> 25: 25 6 5 6
#> 26: 26 5 5 8
#> 27: 27 9 13 10
#> 28: 28 4 11 12
#> 29: 29 9 12 14
#> 30: 30 11 13 15
#> 31: 31 6 9 10
#> 32: 32 9 14 15
#> 33: 33 3 2 6
#> 34: 34 6 7 12
#> 35: 35 5 9 11
#> 36: 36 4 7 7
#> 37: 37 6 12 9
#> 38: 38 7 11 13
#> 39: 39 7 12 12
#> 40: 40 10 8 13
#> 41: 41 7 7 11
#> 42: 42 11 11 15
#> 43: 43 5 4 8
#> 44: 44 6 8 11
#> 45: 45 9 10 17
#> 46: 46 6 10 12
#> 47: 47 11 13 15
#> 48: 48 13 21 21
#> 49: 49 8 13 12
#> 50: 50 13 10 21
#> 51: 51 7 9 11
#> 52: 52 7 8 12
#> 53: 53 7 12 12
#> 54: 54 10 9 15
#> 55: 55 11 12 15
#> 56: 56 13 15 25
#> 57: 57 13 15 17
#> 58: 58 11 11 14
#> 59: 59 8 13 10
#> 60: 60 6 10 6
#> 61: 61 9 14 15
#> 62: 62 8 10 10
#> 63: 63 6 7 11
#> 64: 64 7 12 12
#> 65: 65 17 13 16
#> 66: 66 5 7 9
#> 67: 67 11 9 14
#> 68: 68 4 8 11
#> 69: 69 11 13 16
#> 70: 70 12 11 15
#> 71: 71 11 12 15
#> 72: 72 11 11 14
#> 73: 73 16 14 19
#> 74: 74 5 6 13
#> 75: 75 4 5 8
#> 76: 76 13 12 13
#> 77: 77 7 9 9
#> 78: 78 9 14 15
#> 79: 79 8 11 9
#> 80: 80 6 5 8
#> 81: 81 7 12 12
#> 82: 82 9 11 13
#> 83: 83 5 9 6
#> 84: 84 6 5 8
#> 85: 85 10 9 11
#> 86: 86 9 14 18
#> 87: 87 11 12 11
#> 88: 88 6 9 8
#> 89: 89 9 11 14
#> 90: 90 11 11 14
#> 91: 91 3 4 6
#> 92: 92 5 8 14
#> 93: 93 5 10 10
#> 94: 94 6 7 10
#> 95: 95 8 13 16
#> 96: 96 11 17 18
#> 97: 97 5 6 11
#> 98: 98 7 7 11
#> 99: 99 3 7 9
#> 100: 100 9 11 14
#> id a b c