Changelog

New features

Added the ability to generate data from an empirical distribution by using new functions genDataDensity and addDataDensity.
The binary and binomial distributions can now accommodate a “log” link.

Minor fix

addCorGen no longer requires all clusters to have the same size when using the rho and corstr arguments to define the correlation.
Fixed an issue that prevented functions defined outside the global namespace from being referenced in defData.

New features

Added the option to specify a customized distribution in defData and defDataAdd by specifying dist = "custom". *addPeriods now includes a new argument periodVec that allows users to designate specific measurement time periods using vector.

Minor fix

Function logisticCoefs now correctly handles double dot notation.

Breaking Changes

trtAssign with ratio=NULL used to produce 0-index values but 1-indexed values if ratio was set. This was adjusted so now both versions produce 0-indexed values. This is a potentially breaking change for existing scripts that use the generated treatment values while assuming the old behavior (e.g. using hard coded values to filter).

New features

Function logisticCoefs determines the intercept and treatment/exposure parameter for a data generating process (based on a logistic regression model) that has a specific target population prevalence of a binary outcome, and an option to target a risk ratio, risk difference, or AUC.

Major fix

Data generation speed has been improved for very large data sets with many variables.

New features

Added double-dot (dynamic) functionality to defSurv. Users can now specify double-dot variables in scale, shape, and formula parameters.
It is possible to generate variable cluster sizes using the clusterSize distribution in defData and defDataAdd.
Can set x-axis limits in plot generated by survParamPlot

Major fix

Improved the random effect variance generation for function iccRE under the Poisson distribution. The current approach is based on the 2013 paper by Nakagawa & Schielzeth titled “A general and simple method for obtaining $R^2$ from generalized linear mixed-effects models.”

Minor fix

Modified internal function to speed up beta distribution data generation.

New features

Added function blockExchangekMat and blockDecayMat. Users can now generate correlation matrices that can accommodate clustered observations over time where the within-cluster correlation in the same time period can be different from the within-cluster correlation across time periods.
Updated function genCorMat to allow generation of cluster-specific correlation matrices in case one wants to induce variability in correlation across clusters.

Major fixes

Overhauled function addCorGen to make it more flexible. It can now handle cluster-dependent data, and not just time-dependent data. In addition, performance has been dramatically improved.

Minor fixes

Fixed bug in genSpline.

Minor fixes

Fixed bug in trtAssign.

New features

Updated genFormula to allow for ‘double dot’ functionality.
Added new functions genSynthetic and addSynthetic. Allows users to sample records with replacement from an existing data table.
Added argument ‘startProb’ to genMarkov. Allows user to set probability distribution of start state.
Added utility functions survGetParams and survParamPlot to aid users in identifying parameters that can be used to generate desired distributions of time to event data.
Major updates to functions defSurv and genSurv. It is now possible to generate survival outcomes with hazard functions that change over time. In addition, competing risk outcomes can be explicitly generated.

New features

genOrdCat now supports non-proportional odds.
Added functions defRepeat and defRepeatAdd to facilitate the definition of multiple variables that share identical data definitions.

Minor improvements and fixes

Fixed bug resulting from rounding error when specifying probabilities for ‘categorical’ distributions.

New features

You can now use non-scalar variables with double-dot notation. See the Dynamic Data Definition Vignette.
The ‘categorical’ distribution now supports the variance parameter to introduce categories other than 1…n.
You can now use [trtAssign()] as a distribution with [defData()].

Minor improvements and fixes

Added CITATION
genData now warns that a set ‘id’ parameter will override previously defined ‘id’ names from the data definition.
genData now handles NULL as ‘id’ value in data definitions (e.g. when definitions are not created via defData etc.) by defaulting to ‘id’.
Fix an error in genOrdCat when only a single adjustment variable is given but more than one new category will be created.
Fix a bug where ..variables did not work within a function using the dist="beta".

Improve documentation and vignettes.

Add ‘backports’ for compatibility with R < 4.0
Fix a bug on R < 4.0 in genOrdCat
Current version is now only compatible with R version >= 3.3.0

Deprecated Functions

Moved genCorOrdCat’s functionality into genOrdCat. genCorOrdCat is now deprecated.
Renamed catProbs to genCatFormula for naming consistency. catProbs is now deprecated.

New features

Introduced a new system for formula definitions and completely reworked the underlying code. See vignette “Dynamic Data Definition”.
The new function genMixFormula generates mixture formulas from different inputs.
Some simstudy functions now produce custom errors and warnings. Eventually all conditions will be replaced by the new system to make error handling easier for the user.
Added new vignettes.
Created documentation pages for:
- the release version https://kgoldfeld.github.io/simstudy/
- and development version https://kgoldfeld.github.io/simstudy/dev

Minor improvements and fixes

genCatFormula now warns if an additional category is created or probabilities are normalized.
Fixed bug in trtAssign related new ratio argument.
Fixed bug in trtAssign when strata had count of one.
defData now also checks the first row in the definition table for validity.

Added “mixture” distribution that takes a value from an existing column with a specified probability.
Modified function trtAssign to improve speed performance of stratified sampling with very large numbers of strata.
Add argument “ratio” to function trtAssign to allow users to specify more than 1:1 randomization.

Added function trimData (that uses new rcpp function clipVec) to clip or truncate a longitudinal data set after a certain event has occurred.
Fixed bug in addMarkov, added trimvalue argument to use trimData function
Added trimvalue argument in genMarkov

Added functions genMarkov and addMarkov to create data.table with (or add to existing data.table) individual chains of Markov processes.
Added function genNthEvent to create data.table with binary event outcome in a longitudinal setting.
Updated function genCluster so that cluster size can be specified as an integer, and will be constant across all clusters.
Updated function addPeriods that period name can be specified.
Updated function trtStepWedge so that a transition period can be included.
Fixed bug in function delColumns related to multiple keys.

Added negative binomial distribution as an option to function iccRE
Fixed function genCorOrdCat so that it can accept user-specified correlation matrix
Added function trtStepWedge to generate treatment assignment for a stepped-wedge design cluster randomized trial.

Fixed genCorFlex and addMultiFac to accommodate bug fixes with package data.table

Added negative binomial option to genCorGen, addCorGen, genCorFlex, and addCorFlex
Fixed bug in function genFactor
Added LAG() functionality to missing data generation - updated functions genMiss and added two new internal functions .checkLags and .addLags
Function catProbs now accepts a vector of probabilities or weights as an argument
Fixed bugs in function addCondition

Added function genCorMat - generate an n x n correlation matrix
Added function genCorOrdCat - generate correlated ordinal categorical data
Added beta distribution option to function defData (and associated functions)
Added function betaGetShapes
Implemented Emrich and Piedmonte algorithm for correlated binary data for function genCorGen and addCorGen
Modified function genOrdCat - allows adjVar = NULL
Fixed bug in function addCorFlex

Added function catProbs - to be used to generate categorical data
Added binomial distribution
Added ability to specify formula in variance
Added function genMultiFac - generates multi-factorial design data
Added function addMultiFac - adds multi-factorial design data
Added function iccRE - generates required random effect variance for specified intra-class coefficients (ICCs)
Fixed bug in function genCorFlex
Fixed bug in numerous functions related to error checking and scoping
Fixed bug in function addCondition

Fixed function updateDef
Fixed bug in internal function genbinom
Added function genCorFlex - generate correlated data from variables that have different marginal distributions
Added function genCorFlex - generate correlated data from variables that have different marginal distributions, can be dependent on previously defined data

Added function genOrdCat - creates ordinal categorical data
Added function genFormula - creates a linear formula in the form of a string
Added function updateDef - modify existing data definition table (to be used in genData())
Added function updateDefData - modify existing data def table (to be used in addColumns())

Fixed function genSurv
Added spline generating functions

Added uniform integer distribution (uniformInt)
Added negative binomial distribution (negBinomial)
Added exponential distribution (exponential)
Added function delColumns - deletes one or more columns from data.table

Added error check to verify that specified distributions are valid
Added function genFactor - converts an existing (non-double) field in a data.table to a factor
Added function genDummy - creates dummy variables from an integer or factor field in a data.table
Added function defCondition - define distribution conditional on existing fields
Added function defReadCond - read in conditional definitions from external csv file
Added function addCondition - generate data based on conditional definition

Modified “nonrandom” data generation to allow “log”” and “logit”” link options.
Added function genCorGen - generate a new data.table with correlated data from various distributions.
Added function addCorData - add correlated data from various distributions to existing data.tables.

Fixed index variable issue related to generating categorical data
Fixed index variable issue related to generating longitudinal data
Fixed issue that arose when creating categorical variable in first field
Increased speed required to generate categorical data with large sample sizes
Categorical data can now accommodate probabilities condition on covariates
Fix: package data.table 1.10.0 broke genMissDataMat. genMissDataMat has been updated.

This is the first submission of simstudy, so there is no news yet!

simstudy 0.8.1

New features

Minor fix

simstudy 0.8.02024-05-15

New features

Minor fix

simstudy 0.7.12023-11-23

Breaking Changes

New features

Major fix

simstudy 0.7.02023-06-01

New features

Major fix

Minor fix

simstudy 0.6.02023-02-18

New features

Major fixes

Minor fixes

simstudy 0.5.12022-10-03

Minor fixes

simstudy 0.5.02022-07-08

New features

simstudy 0.4.02022-01-20

New features

Minor improvements and fixes

simstudy 0.3.02021-11-04

New features

Minor improvements and fixes

simstudy 0.2.2

simstudy 0.2.12020-10-07

simstudy 0.2.02020-10-06

Deprecated Functions

New features

Minor improvements and fixes

simstudy 0.1.162020-03-31

simstudy 0.1.152019-10-16

simstudy 0.1.142019-08-09

simstudy 0.1.132019-05-16

simstudy 0.1.122019-02-26

simstudy 0.1.112019-02-07

simstudy 0.1.102018-09-16

simstudy 0.1.92018-05-11

simstudy 0.1.82018-01-08

simstudy 0.1.72017-11-02

simstudy 0.1.62017-10-18

simstudy 0.1.52017-10-03

simstudy 0.1.42017-09-20

simstudy 0.1.32017-07-03

simstudy 0.1.22016-12-07

simstudy 0.1.12016-06-21