Handling Censored Data
1 Data Below the Limit of Quantification
Oftentimes, we have observations that are below the limit of quantification (BLOQ), meaning that the assay has only been validated down to a certain value, (the lower limit of quantification, LLOQ). A paper by Stuart Beal goes through 7 methods of handling this BLOQ data in NONMEM. A more recent paper introduces two methods that inflate the residual error for BLOQ values.
In my experience, I’ve seen M1 (dropping any BLOQ values completely), M5 (replace BLOQ values with
As mentioned, M3 and M4 suffer from numerical issues with optimization methods. However, there are no such issues when performing MCMC sampling in Stan, so I write all of my models to use M4 if the error model is a distribution that is typically unbounded (Gaussian, Student’s-t, …), and M3 if the error model is naturally bounded below at 0 (lognormal - M3 and M4 are equivalent in this case).
2 Treat the BLOQ values as Left-Censored Data (M3)
Instead of tossing out the BLOQ data (M1) or assigning them some arbitrary value (M5-M7), we should keep them in the data set and treat them as left-censored data. This means that the likelihood contribution for observation
where
2.1 Stan Code Example for M3
I’ve mostly written Stan models on this site that fit the model with within-chain parallelization, but I’ll demonstrate the concept with code without within-chain parallelization1.
First, let’s imagine something simple (like in the simple linear regression example) where there are no BLOQs, and we can just write the likelihood just how we would write it on paper:
model{
// Priors
...
// Likelihood
y ~ normal(ipred, sigma);
}
or we could equivalently incorporate the likelihood by using the target +=
syntax:
model{
// Priors
...
// Likelihood
target += normal_lpdf(y | ipred, sigma);
}
The above likelihood is vectorized. We can equivalently write it with a for
loop:
model{
// Priors
...
// Likelihood
for(i in 1:n){
target += normal_lpdf(y[i] | mu[i], sigma);
}
}
Now if we have BLOQs2, we just use the target +=
syntax and give the likelihood for each observation taking into account whether it is BLOQ or not.
model{
// Priors
...
// Likelihood
for(i in 1:n)
if(bloq[i] == 1){
target += normal_lcdf(lloq[i] | mu[i], sigma);
else{
}target += normal_lpdf(y[i] | mu[i], sigma);
}
}
}
These lines implement the math (on the log scale, since Stan calculates the log-posterior).
where
3 Treat the BLOQ values as Left-Censored Data and Truncated Below at 0 (M4)
We know that drug concentrations cannot be
3.1 Stan Code Example for M4
Now that you’ve seen the steps to go from the ~
operator to target +=
to a for
loop in Section 2.1, we’ll just go ahead and write the M4 implementation:
model{
// Priors
...
// Likelihood
for(i in 1:n)
if(bloq[i] == 1){
target += log_diff_exp(normal_lcdf(lloq[i] | mu[i], sigma),
0.0 | mu[i], sigma)) -
normal_lcdf(0.0 | mu[i], sigma);
normal_lccdf(else{
}target += normal_lpdf(y[i] | mu[i], sigma) -
0.0 | mu[i], sigma);
normal_lccdf(
}
}
}
These lines implement the math (on the log scale, since Stan calculates the log-posterior).
where
Footnotes
the code is actually very similar. It’s more a matter of where it goes - if there’s no within-chain parallelization, it goes in the
model
block. If there is, then it goes into youpartial_sum
function in thefunctions
block.↩︎Having
lloq[i]
allows for a dataset where some observations have different LLOQs than others↩︎For more information, see the Stan documentation for normal_lcdf(), and normal_lpdf().↩︎
A model that assumes log-normal error has support over
, so truncation is not an issue. In that case, M3 and M4 are equivalent. Mathematically, you can see this by noting that .↩︎For observed data with this truncated distribution, we need to “correct” the density so it integrates to 1. Division by
has this effect. For the censored data, the numerator is similar to the M3 method, but we must also account for the fact that it must be , hence . The denominator is corrected in the same manner as for the observed data.↩︎For more information, see the Stan documentation for log_diff_exp(), normal_lcdf(), normal_lccdf(), and normal_lpdf().↩︎