This help file describes the distributions used for data creation in simstudy.



Desired mean as a Number or an R expression for mean as a String. Variables defined via defData() and variables within the parent environment (prefixed with ..) can be used within the formula. Functions from the parent environment can be used without a prefix.


Number. Default is 0.


String identifying the link function to be used. Default is identity.


For details about the statistical distributions please see stats::distributions, any non-statistical distributions will be explained below. Required variables and expected pattern for each distribution can be found in this table:

betameanString or Numberdispersion valueidentity or logit
binaryprobability for 1String or NumberNAidentity or logit
binomialprobability of successString or Numbernumber of trialsidentity or logit
exponentialmean (lambda)String or NumberNAidentity or log
gammameanString or Numberdispersion valueidentity or log
mixtureformulax_1 |p_1 + x_2|p_2 ... x_n| p_nNANA
negBinomialmeanString or Numberdispersion valueidentity or log
nonrandomformulaString or NumberNANA
normalmeanString or NumbervarianceNA
noZeroPoissonmeanString or NumberNAidentity or log
poissonmeanString or NumberNAidentity or log


The mixture distribution makes it possible to mix to previously defined distributions/variables. Each variable that should be part of the new distribution x_1,...,X_n is assigned a probability p_1,...,p_n. For more information see


ext_var <- 2.9 def <- defData(varname = "external", formula = "3 + log(..ext_var)", variance = .5) def
#> varname formula variance dist link #> 1: external 3 + log(..ext_var) 0.5 normal identity
genData(5, def)
#> id external #> 1: 1 4.864108 #> 2: 2 3.665887 #> 3: 3 5.448120 #> 4: 4 3.531613 #> 5: 5 5.689579