Distributions for Data Definitions — distributions • simstudy

This help file describes the distributions used for data creation in simstudy.

Arguments

formula: Desired mean as a Number or an R expression for mean as a String. Variables defined via defData() and variables within the parent environment (prefixed with ..) can be used within the formula. Functions from the parent environment can be used without a prefix.
variance: Number. Default is 0.
link: String identifying the link function to be used. Default is identity.

Details

For details about the statistical distributions please see stats::distributions, any non-statistical distributions will be explained below. Required variables and expected pattern for each distribution can be found in this table:

name	formula	format	variance	link
beta	mean	String or Number	dispersion value	identity or logit
binary	probability for 1	String or Number	NA	identity, log, or logit
binomial	probability of success	String or Number	number of trials	identity, log, or logit
categorical	probabilities	`p_1;p_2;..;p_n`	category labels: `a;b;c` , `50;130;20`	identity or logit
custom	name of function	String	arguments	identity
exponential	mean (lambda)	String or Number	NA	identity or log
gamma	mean	String or Number	dispersion value	identity or log
mixture	formula	`x_1` \|`p_1 + x_2`\|`p_2 ... x_n`\| `p_n`	NA	NA
negBinomial	mean	String or Number	dispersion value	identity or log
nonrandom	formula	String or Number	NA	NA
normal	mean	String or Number	variance	NA
noZeroPoisson	mean	String or Number	NA	identity or log
poisson	mean	String or Number	NA	identity or log
trtAssign	ratio	`r_1;r_2;..;r_n`	stratification	identity or nonbalanced
uniform	range	`from;to`	NA	NA
uniformInt	range	`from;to`	NA	NA

Mixture

The mixture distribution makes it possible to mix to previously defined distributions/variables. Each variable that should be part of the new distribution x_1,...,X_n is assigned a probability p_1,...,p_n. For more information see rdatagen.net.

Examples

ext_var <- 2.9
def <- defData(varname = "external", formula = "3 + log(..ext_var)", variance = .5)
def
#>     varname            formula variance   dist     link
#>      <char>             <char>    <num> <char>   <char>
#> 1: external 3 + log(..ext_var)      0.5 normal identity
genData(5, def)
#> Key: <id>
#>       id external
#>    <int>    <num>
#> 1:     1 4.505798
#> 2:     2 3.214826
#> 3:     3 4.528568
#> 4:     4 4.304653
#> 5:     5 4.353056