--- title: "Matched Case-Control Logistic Regression" output: rmarkdown::html_vignette: css: mystyle.css vignette: > %\VignetteIndexEntry{Matched Case-Control Logistic Regression} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup} library(Colossus) library(data.table) library(survival) ``` ## Matched Case-Control Modeling One upcoming regression method in Colossus is matched case-control logistic regression. The implementation is ongoing, but the theory is presented in the following section. ### General Theory Suppose we have matched case-control data and divide our data into each matched set. Each set has $m$ cases and $n$ records. We denote the relative risk for individual $i$ in the set by $r_i$. We can calculate the probability of case exposures conditional on all exposures in the set by taking the ratio of the product of relative risks in the cases to the sum of the product of relative risks for every way of selecting $m$ individuals from the $n$ at risk.
$$ \begin{aligned} \frac{\prod_{i}^{m} r_i}{\sum_{c \in R} \left ( \prod_{j=1}^{m} r_{c_j} \right )} \\ L = \sum_{i=1}^{m} log(r_i) - log \left ( \sum_{c \in R} \left ( \prod_{j=1}^{m} r_{c_j} \right ) \right ) \end{aligned} $$
Using the methods presented in Gail et al. (1981) we can calculate the combination of all $n!/m!(n-m)!$ ways to select $m$ items with a more manageable recursive formula $B(m,n)$.$$ \begin{aligned} B(m, n) = \sum_{c \in R} \left ( \prod_{j=1}^{m} r_{c_j} \right ) \\ B(m,n) = B(m, n-1) + r_n B(m-1, n-1) \\ B(m,n) = \begin{cases} \sum_{j}^{n} r_j & m = 1 \\ 0 & m > n \end{cases} \end{aligned} $$
We can then directly solve for the first and second derivatives and their recursive formula.$$ \begin{aligned} \frac{\partial r_i}{\partial \beta_\mu} =: r_{i}^{\mu} \\ \frac{\partial B(m,n)}{\partial \beta_\mu} = B^{\mu}(m, n) = \sum_{c \in R} \left [ \left ( \sum_{j=1}^{m} \frac{r_{c_j}^{\mu}}{r_{c_j}} \right ) \prod_{j=1}^{m} r_{c_j} \right ] \\ B^{\mu}(m,n) = B^{\mu}(m, n-1) + r_n B^{\mu}(m-1, n-1) + r_n^{\mu} B(m-1, n-1) \\ B^{\mu}(m,n) = \begin{cases} \sum_{j}^{n} r_j^{\mu} & m = 1 \\ 0 & m > n \end{cases} \end{aligned} $$
$$ \begin{aligned} \frac{\partial^2 r_i}{\partial \beta_\mu \partial \beta_\nu} =: r_{i}^{\mu,\nu} \\ \frac{\partial^2 B(m,n)}{\partial \beta_\mu \partial \beta_\nu} = B^{\mu,\nu}(m, n) = \sum_{c \in R} \left [ \left ( \sum_{j=1}^{m} \frac{r_{c_j}^{\mu,\nu}}{r_{c_j}} + \left ( \sum_{j=1}^{m} \frac{r_{c_j}^{\mu}}{r_{c_j}} \right ) \left ( \sum_{j=1}^{m} \frac{r_{c_j}^{\nu}}{r_{c_j}} \right ) - \sum_{j=1}^{m} \frac{r_{c_j}^{\mu}}{r_{c_j}} \frac{r_{c_j}^{\nu}}{r_{c_j}} \right ) \prod_{j=1}^{m} r_{c_j} \right ] \\ B^{\mu,\nu}(m,n) = B^{\mu,\nu}(m, n-1) + r_n^{\mu,\nu} B(m-1, n-1) + r_n^{\nu} B^{\mu}(m-1, n-1) + r_n^{\mu} B^{\nu}(m-1, n-1) + r_n B^{\mu,\nu}(m-1, n-1) \\ B^{\mu,\nu}(m,n) = \begin{cases} \sum_{j}^{n} r_j^{\mu,\nu} & m = 1 \\ 0 & m > n \end{cases} \end{aligned} $$
Finally these expressions for $B(m,n)$ can be substituted into the equations for the contribution of Log-Likelihood and it's derivatives from each matched set. The model is then optimized via the same methods as the other regression models.$$ \begin{aligned} L_{set} = \sum_{i=1}^{m} log(r_i) - log \left ( B(m,n) \right ) \\ L^{\mu}_{set} = \sum_{i=1}^{m} \frac{r_i^{\mu}}{r_i} - \frac{B^{\mu}(m,n)}{B(m,n)} \\ L^{\mu, \nu}_{set} = \sum_{i=1}^{m} \left ( \frac{r_i^{\mu,\nu}}{r_i} - \frac{r_i^{\mu}}{r_i}\frac{r_i^{nu}}{r_i} \right ) - \left ( \frac{B^{\mu,\nu}(m,n)}{B(m,n)} - \frac{B^{\mu}(m,n)}{B(m,n)}\frac{B^{\nu}(m,n)}{B(m,n)} \right ) \end{aligned} $$