Breaking Changes

  • trtAssign with ratio=NULL used to produce 0-index values but 1-indexed values if ratio was set. This was adjusted so now both versions produce 0-indexed values. This is a potentially breaking change for existing scripts that use the generated treatment values while assuming the old behavior (e.g. using hardcoded values to filter).

New features

  • Function logisticCoefs determines the intercept and treatment/exposure parameter for a data generating process (based on a logistic regression model) that has a specific target population prevalence of a binary outcome, and an option to target a risk ratio, risk difference, or AUC.

Major fix

  • Data generation speed has been improved for very large data sets with many variables.

New features

  • Added double-dot (dynamic) functionality to defSurv. Users can now specify double-dot variables in scale, shape, and formula parameters.
  • It is possible to generate variable cluster sizes using the clusterSize distribution in defData and defDataAdd.
  • Can set x-axis limits in plot generated by survParamPlot

Major fix

  • Improved the random effect variance generation for function iccRE under the Poisson distribution. The current approach is based on the 2013 paper by Nakagawa & Schielzeth titled “A general and simple method for obtaining R2 from generalized linear mixed-effects models”

Minor fix

  • Modified internal function to speed up beta distribution data generation.

New features

  • Added function blockExchangekMat and blockDecayMat. Users can now generate correlation matrices that can accommodate clustered observations over time where the within-cluster correlation in the same time period can be different from the within-cluster correlation across time periods.
  • Updated function genCorMat to allow generation of cluster-specific correlation matrices in case one wants to induce variability in correlation across clusters.

Major fixes

  • Overhauled function addCorGen to make it more flexible. It can now handle cluster-dependent data, and not just time-dependent data. In addition, performance has been dramatically improved.

Minor fixes

  • Fixed bug in genSpline

Minor fixes

  • Fixed bug in trtAssign

New features

  • Updated genFormula to allow for ‘double dot’ functionality.
  • Added new functions genSynthetic and addSynthetic. Allows users to sample records with replacement from an existing data table.
  • Added argument ‘startProb’ to genMarkov. Allows user to set probability distribution of start state.
  • Added utility functions survGetParams and survParamPlot to aid users in identifying parameters that can be used to generate desired distributions of time to event data.
  • Major updates to functions defSurv and genSurv. It is now possible to generate survival outcomes with hazard functions that change over time. In addition, competing risk outcomes can be explicitly generated.

New features

  • genOrdCat now supports non-proportional odds
  • Added functions defRepeat and defRepeatAdd to facilitate the definition of multiple variables that share identical data definitions.

Minor improvements and fixes

  • Fixed bug resulting from rounding error when specifying probabilities for ‘categorical’ distributions.

New features

  • You can now use non-scalar variables with double-dot notation. See the Dynamic Data Definition Vignette.
  • The ‘categorical’ distribution now supports the variance parameter to introduce categories other than 1…n.
  • You can now use [trtAssign()] as a distribution with [defData()].

Minor improvements and fixes

  • Added CITATION
  • genData now warns that a set ‘id’ parameter will override previously defined ‘id’ names from the data definition.
  • genData now handles NULL as ‘id’ value in data definitions (e.g. when definitions are not created via defData etc.) by defaulting to ‘id’.
  • Fix an error in genOrdCat when only a single adjustment variable is given but more than one new category will be created.
  • Fix a bug where ..variables did not work within a function using the dist="beta".
  • Improve documentation and vignettes.
  • Add ‘backports’ for compatibility with R < 4.0
  • Fix a bug on R < 4.0 in genOrdCat
  • Current version is now only compatible with R version >= 3.3.0

Deprecated Functions

  • Moved genCorOrdCat’s functionality into genOrdCat. genCorOrdCat is now deprecated.
  • Renamed catProbs to genCatFormula for naming consistency. catProbs is now deprecated.

New features

  • Introduced a new system for formula definitions and completely reworked the underlying code. See vignette “Dynamic Data Definition”.
  • The new function genMixFormula generates mixture formulas from different inputs.
  • Some simstudy functions now produce custom errors and warnings. Eventually all conditions will be replaced by the new system to make error handling easier for the user.
  • Added new vignettes.
  • Created documentation pages for:

Minor improvements and fixes

  • genCatFormula now warns if an additional category is created or probabilities are normalized.
  • Fixed bug in trtAssign related new ratio argument.
  • Fixed bug in trtAssign when strata had count of one.
  • defData now also checks the first row in the definition table for validity.
  • Added “mixture” distribution that takes a value from an existing column with a specified probability.
  • Modified function trtAssign to improve speed performance of stratified sampling with very large numbers of strata.
  • Add argument “ratio” to function trtAssign to allow users to specify more than 1:1 randomization.
  • Added function trimData (that uses new rcpp function clipVec) to clip or truncate a longitudinal data set after a certain event has occurred.
  • Fixed bug in addMarkov, added trimvalue argument to use trimData function
  • Added trimvalue argument in genMarkov
  • Added functions genMarkov and addMarkov to create data.table with (or add to existing data.table) individual chains of Markov processes.
  • Added function genNthEvent to create data.table with binary event outcome in a longitudinal setting.
  • Updated function genCluster so that cluster size can be specified as an integer, and will be constant across all clusters.
  • Updated function addPeriods that period name can be specified.
  • Updated function trtStepWedge so that a transition period can be included.
  • Fixed bug in function delColumns related to multiple keys.
  • Added negative binomial distribution as an option to function iccRE
  • Fixed function genCorOrdCat so that it can accept user-specified correlation matrix
  • Added function trtStepWedge to generate treatment assignment for a stepped-wedge design cluster randomized trial.
  • Fixed genCorFlex and addMultiFac to accommodate bug fixes with package data.table
  • Added negative binomial option to genCorGen, addCorGen, genCorFlex, and addCorFlex
  • Fixed bug in function genFactor
  • Added LAG() functionality to missing data generation - updated functions genMiss and added two new internal functions .checkLags and .addLags
  • Function catProbs now accepts a vector of probabilities or weights as an argument
  • Fixed bugs in function addCondition
  • Added function genCorMat - generate an n x n correlation matrix
  • Added function genCorOrdCat - generate correlated ordinal categorical data
  • Added beta distribution option to function defData (and associated functions)
  • Added function betaGetShapes
  • Implemented Emrich and Piedmonte algorithm for correlated binary data for function genCorGen and addCorGen
  • Modified function genOrdCat - allows adjVar = NULL
  • Fixed bug in function addCorFlex
  • Added function catProbs - to be used to generate categorical data
  • Added binomial distribution
  • Added ability to specify formula in variance
  • Added function genMultiFac - generates multi-factorial design data
  • Added function addMultiFac - adds multi-factorial design data
  • Added function iccRE - generates required random effect variance for specified intra-class coefficients (ICCs)
  • Fixed bug in function genCorFlex
  • Fixed bug in numerous functions related to error checking and scoping
  • Fixed bug in function addCondition
  • Fixed function updateDef
  • Fixed bug in internal function genbinom
  • Added function genCorFlex - generate correlated data from variables that have different marginal distributions
  • Added function genCorFlex - generate correlated data from variables that have different marginal distributions, can be dependent on previously defined data
  • Added function genOrdCat - creates ordinal categorical data
  • Added function genFormula - creates a linear formula in the form of a string
  • Added function updateDef - modify existing data definition table (to be used in genData())
  • Added function updateDefData - modify existing data def table (to be used in addColumns())
  • Fixed function genSurv
  • Added spline generating functions
  • Added uniform integer distribution (uniformInt)
  • Added negative binomial distribution (negBinomial)
  • Added exponential distribution (exponential)
  • Added function delColumns - deletes one or more columns from data.table
  • Added error check to verify that specified distributions are valid
  • Added function genFactor - converts an existing (non-double) field in a data.table to a factor
  • Added function genDummy - creates dummy variables from an integer or factor field in a data.table
  • Added function defCondition - define distribution conditional on existing fields
  • Added function defReadCond - read in conditional definitions from external csv file
  • Added function addCondition - generate data based on conditional definition
  • Modified “nonrandom” data generation to allow “log”” and “logit”” link options.
  • Added function genCorGen - generate a new data.table with correlated data from various distributions.
  • Added function addCorData - add correlated data from various distributions to existing data.tables.
  • Fixed index variable issue related to generating categorical data
  • Fixed index variable issue related to generating longitudinal data
  • Fixed issue that arose when creating categorical variable in first field
  • Increased speed required to generate categorical data with large sample sizes
  • Categorical data can now accommodate probabilities condition on covariates
  • Fix: package data.table 1.10.0 broke genMissDataMat. genMissDataMat has been updated.
  • This is the first submission of simstudy, so there is no news yet!