--- title: "Multiple Realization Methods" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Multiple Realization Methods} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup} library(Colossus) library(data.table) ``` ## Available Methods There are many instances in which a user may want to use data that is known up to a distribution, in these cases it may not be sufficient to use a single realization. Colossus provides two ways to use multiple realizations. The first runs multiple regressions in a row, each with a different realization. The second optimizes the average Log-Likelihood over the realizations. This method was designed to assist in circumstances in which combinations of shared and unshared error are present, and a two-dimensional Monte Carlo method is used to generate realizations for at least one covariate column. Note that this method does not apply to the event/cases or interval/duration columns. ### Specifics of Use Let us suppose that we are interested in performing a regression with three covariates: an age bin, amount of exposure to a radioisotope during an interval, and average sleep during an interval. Age bin is known, exposure is known up to a distribution which could be shared among individuals, and sleep is randomly distributed and independent each interval.

$$ \begin{aligned} \lambda(a, r, s) = \exp{(\beta_a \times a + \beta_a \times r + \beta_a \times s)} \end{aligned} $$

```{r} names <- c("a", "r", "s") term_n <- c(0, 0, 0) tform <- c("loglin", "loglin", "loglin") modelform <- "M" fir <- 0 a_n <- c(0.1, 0.1, 0.1) model_control <- list("mcml" = FALSE) # setting to true uses the Monte-Carlo Maximum Likelihood option and optimizes the average log-likelihood, setting to false optimizes each realization seperately. ``` Colossus requires two items to apply multiple realizations, a list of columns to replace and a matrix of columns for each realization. Suppose the user generates 5 realizations. Then there are two distributed columns and five realizations. ```{r, eval=FALSE} dose_index <- c("r", "s") # the two columns in the model to replace are the radiation and sleeping covariates dose_realizations <- matrix( c("r0", "r1", "r2", "r3", "r4", "s0", "s1", "s2", "s3", "s4"), nrow = 2 ) # columns to be used for realizations 0-4, rows for each column being replaced ``` The first method runs each regression and returns matrices for the final parameter estimates, standard deviations, and final log-likelihoods of each regression. These results can be used to create likelihood-weighted parameter distributions, to account for uncertainty in the final parameter estimates. The second method returns the standard regression output, only using the average log-likelihood. Note that there is no strict requirement for the realization columns to actually be realizations of a distributed column. A user could use this method to compare models with the same formula but different covariates. The only difficulty is that Colossus does not refresh the parameter estimates between realization regressions, so the user may need to keep in mind feasible parameter space. In the event that individual realizations are being optimized and the initial parameter estimate is infeasible, the parameter estimates associated with the negative term are reduced in magnitude. In most cases this is enough to find a nearby feasible space to start at.