This help file describes the distributions used for data creation in
simstudy
.
Desired mean as a Number or an R expression for mean as a
String. Variables defined via defData()
and variables within the
parent environment (prefixed with ..
) can be used within the formula.
Functions from the parent environment can be used without a prefix.
Number. Default is 0
.
String identifying the link function to be used. Default is
identity
.
For details about the statistical distributions please see stats::distributions, any non-statistical distributions will be explained below. Required variables and expected pattern for each distribution can be found in this table:
name | formula | format | variance | link |
beta | mean | String or Number | dispersion value | identity or logit |
binary | probability for 1 | String or Number | NA | identity, log, or logit |
binomial | probability of success | String or Number | number of trials | identity, log, or logit |
categorical | probabilities | p_1;p_2;..;p_n | category labels: a;b;c , 50;130;20 | identity or logit |
custom | name of function | String | arguments | identity |
exponential | mean (lambda) | String or Number | NA | identity or log |
gamma | mean | String or Number | dispersion value | identity or log |
mixture | formula | x_1 |p_1 + x_2 |p_2 ... x_n | p_n | NA | NA |
negBinomial | mean | String or Number | dispersion value | identity or log |
nonrandom | formula | String or Number | NA | NA |
normal | mean | String or Number | variance | NA |
noZeroPoisson | mean | String or Number | NA | identity or log |
poisson | mean | String or Number | NA | identity or log |
trtAssign | ratio | r_1;r_2;..;r_n | stratification | identity or nonbalanced |
uniform | range | from;to | NA | NA |
uniformInt | range | from;to | NA | NA |
The mixture distribution makes it possible to mix to
previously defined distributions/variables. Each variable that should be
part of the new distribution x_1,...,X_n
is assigned a probability
p_1,...,p_n
. For more information see
rdatagen.net.
ext_var <- 2.9
def <- defData(varname = "external", formula = "3 + log(..ext_var)", variance = .5)
def
#> varname formula variance dist link
#> <char> <char> <num> <char> <char>
#> 1: external 3 + log(..ext_var) 0.5 normal identity
genData(5, def)
#> Key: <id>
#> id external
#> <int> <num>
#> 1: 1 4.864108
#> 2: 2 3.665887
#> 3: 3 5.448120
#> 4: 4 3.531613
#> 5: 5 5.689579