Often, we’d like to explore data generation and modeling under different scenarios. For example, we might want to understand the operating characteristics of a model given different variance or other parametric assumptions. There is functionality built into `simstudy`

to facilitate this type of dynamic exploration. First, the functions `updateDef`

and `updateDefAdd`

essentially allow us to edit lines of existing data definition tables. Second, there is a built-in mechanism - called *double-dot* reference - to access external variables that do not exist in a defined data set or data definition.

The `updateDef`

function updates a row in a definition table created by functions `defData`

or `defRead`

. Analogously, `updateDefAdd`

function updates a row in a definition table created by functions `defDataAdd`

or `defReadAdd`

.

The original data set definition includes three variables `x`

, `y`

, and `z`

, all normally distributed:

defs <- defData(varname = "x", formula = 0, variance = 3, dist = "normal") defs <- defData(defs, varname = "y", formula = "2 + 3*x", variance = 1, dist = "normal") defs <- defData(defs, varname = "z", formula = "4 + 3*x - 2*y", variance = 1, dist = "normal") defs

```
## varname formula variance dist link
## 1: x 0 3 normal identity
## 2: y 2 + 3*x 1 normal identity
## 3: z 4 + 3*x - 2*y 1 normal identity
```

In the first case, we are changing the relationship of `y`

with `x`

as well as the variance:

defs <- updateDef(dtDefs = defs, changevar = "y", newformula = "x + 5", newvariance = 2) defs

```
## varname formula variance dist link
## 1: x 0 3 normal identity
## 2: y x + 5 2 normal identity
## 3: z 4 + 3*x - 2*y 1 normal identity
```

In this second case, we are changing the distribution of `z`

to *Poisson* and updating the link function to *log*:

defs <- updateDef(dtDefs = defs, changevar = "z", newdist = "poisson", newlink = "log") defs

```
## varname formula variance dist link
## 1: x 0 3 normal identity
## 2: y x + 5 2 normal identity
## 3: z 4 + 3*x - 2*y 1 poisson log
```

And in the last case, we remove a variable from a data set definition. Note in the case of a definition created by `defData`

that it is not possible to remove a variable that is a predictor of a subsequent variable, such as `x`

or `y`

in this case.

defs <- updateDef(dtDefs = defs, changevar = "z", remove = TRUE) defs

```
## varname formula variance dist link
## 1: x 0 3 normal identity
## 2: y x + 5 2 normal identity
```

For a truly dynamic data definition process, `simstudy`

(as of `version 0.2.0`

) allows users to reference variables that exist outside of data generation. These can be thought of as a type of hyperparameter of the data generation process. The reference is made directly in the formula itself, using a double-dot (“..”) notation before the variable name. Here is a simple example:

def <- defData(varname = "x", formula = 0, variance = 5, dist = "normal") def <- defData(def, varname = "y", formula = "..B0 + ..B1 * x", variance = "..sigma2", dist = "normal") def

```
## varname formula variance dist link
## 1: x 0 5 normal identity
## 2: y ..B0 + ..B1 * x ..sigma2 normal identity
```

B0 <- 4; B1 <- 2; sigma2 <- 9 set.seed(716251) dd <- genData(100, def) fit <- summary(lm(y ~ x, data = dd)) coef(fit)

```
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.00368 0.2839423 14.10033 2.559075e-25
## x 2.01001 0.1303472 15.42043 5.904268e-28
```

fit$sigma

`## [1] 2.827271`

It is easy to create a new data set on the fly with a difference variance assumption without having to go to the trouble of updating the data definitions.

```
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.351536 0.4268310 10.194987 4.569364e-17
## x 2.123973 0.2179406 9.745651 4.315348e-16
```

fit$sigma

`## [1] 4.210897`

The double-dot notation can be flexibly applied using `lapply`

(or the parallel version `mclapply`

) to create a range of data sets under different assumptions:

sigma2s <- c(1, 2, 6, 9) gen_data <- function(sigma2, d) { dd <- genData(200, d) dd$sigma2 <- sigma2 dd } dd_4 <- lapply(sigma2s, function(s) gen_data(s, def)) dd_4 <- rbindlist(dd_4) ggplot(data = dd_4, aes(x = x, y = y)) + geom_point(size = .5, color = "grey30") + facet_wrap(sigma2 ~ .) + theme(panel.grid = element_blank())

The double-dot notation is also *array-friendly*. For example if we want to create a mixture distribution from a vector of values (which we can also do using a *categorical* distribution), we can define the mixture formula in terms of the vector. In this case we are generating permuted block sizes of 2 and 4:

defblk <- defData(varname = "blksize", formula = "..sizes[1] | .5 + ..sizes[2] | .5", dist = "mixture") defblk

```
## varname formula variance dist link
## 1: blksize ..sizes[1] | .5 + ..sizes[2] | .5 0 mixture identity
```

```
## id blksize
## 1: 1 4
## 2: 2 4
## 3: 3 4
## 4: 4 2
## 5: 5 4
## ---
## 996: 996 2
## 997: 997 2
## 998: 998 4
## 999: 999 4
## 1000: 1000 4
```