Stay tuned for (ir)regular updates!

Handling Censored Data

1 Data Below the Limit of Quantification

Oftentimes, we have observations that are below the limit of quantification (BLOQ), meaning that the assay has only been validated down to a certain value, (the lower limit of quantification, LLOQ). A paper by Stuart Beal goes through 7 methods of handling this BLOQ data in NONMEM. A more recent paper introduces two methods that inflate the residual error for BLOQ values.

In my experience, I’ve seen M1 (dropping any BLOQ values completely), M5 (replace BLOQ values with LLOQ/2), and M6 (in a series of consecutive BLOQ values, impute the BLOQ value closes to the maximum with LLOQ/2 and drop the rest) most commonly in the pharmacometrics world. M3 (treat the BLOQ values as left-censored data) is very close to being technically correct (it is tecnically for lognormal error models) and is the gold standard, producing less-biased results than the above imputation methods, but it often has trouble converging with optimization methods. M4 (treat the BLOQ values as left-censored data and truncated below at 0) is the most technically correct, but I’ve never seen a single usage in practice. I tried it in NONMEM once, and it was a disaster.

As mentioned, M3 and M4 suffer from numerical issues with optimization methods. However, there are no such issues when performing MCMC sampling in Stan, so I write all of my models to use M4 if the error model is a distribution that is typically unbounded (Gaussian, Student’s-t, …), and M3 if the error model is naturally bounded below at 0 (lognormal - M3 and M4 are equivalent in this case).

2 Treat the BLOQ values as Left-Censored Data (M3)

Instead of tossing out the BLOQ data (M1) or assigning them some arbitrary value (M5-M7), we should keep them in the data set and treat them as left-censored data. This means that the likelihood contribution for observation yij is calculated differently for observed values than for BLOQ values:

observed dataf(yij|θi,σ,tij)BLOQ dataF(LLOQ|θi,σ,tij)

where f(yij|θi,σ,tij) is the density (pdf) and F(LLOQ|θi,σ,tij)=P(cijLLOQ|θi,σ,tij) is the cumulative distribution function (cdf).

2.1 Stan Code Example for M3

I’ve mostly written Stan models on this site that fit the model with within-chain parallelization, but I’ll demonstrate the concept with code without within-chain parallelization.

First, let’s imagine something simple (like in the simple linear regression example) where there are no BLOQs, and we can just write the likelihood just how we would write it on paper:

model{ 
  
  // Priors
  ...
  
  // Likelihood
  y ~ normal(ipred, sigma);

}

or we could equivalently incorporate the likelihood by using the target += syntax:

model{ 
  
  // Priors
  ...
  
  // Likelihood
  target += normal_lpdf(y | ipred, sigma);

}

The above likelihood is vectorized. We can equivalently write it with a for loop:

model{ 
  
  // Priors
  ...
  
  // Likelihood
  for(i in 1:n){
    target += normal_lpdf(y[i] | mu[i], sigma);
  }
  
}

Now if we have BLOQs, we just use the target += syntax and give the likelihood for each observation taking into account whether it is BLOQ or not.

model{ 
  
  // Priors
  ...
  
  // Likelihood
  for(i in 1:n)
    if(bloq[i] == 1){
      target += normal_lcdf(lloq[i] | mu[i], sigma);
    }else{
      target += normal_lpdf(y[i] | mu[i], sigma);
    }
  } 

}

These lines implement the math (on the log scale, since Stan calculates the log-posterior).

normal_lcdf(x)=log(F(x)),normal_lpdf(x)=log(f(x))

where f() and F() are the normal density and cumulative distribution functions, respectively.

3 Treat the BLOQ values as Left-Censored Data and Truncated Below at 0 (M4)

We know that drug concentrations cannot be <0, but the normal distribution has support over (,), so we will assume a normal distribution truncated below at 0. This will have the effect of limiting the support of our assumed distribution to (0,). Since we’re assuming a truncated distribution, we need to adjust the likelihood contributions of our data: observed dataf(yij|θi,σ,tij)1F(0|θi,σ,tij)BLOQ dataF(LLOQ|θi,σ,tij)F(0|θi,σ,tij)1F(0|θi,σ,tij)

3.1 Stan Code Example for M4

Now that you’ve seen the steps to go from the ~ operator to target += to a for loop in , we’ll just go ahead and write the M4 implementation:

model{ 
  
  // Priors
  ...
  
  // Likelihood
  for(i in 1:n)
    if(bloq[i] == 1){
      target += log_diff_exp(normal_lcdf(lloq[i] | mu[i], sigma),
                             normal_lcdf(0.0 | mu[i], sigma)) -
                normal_lccdf(0.0 | mu[i], sigma);
    }else{
      target += normal_lpdf(y[i] | mu[i], sigma) -
                normal_lccdf(0.0 | mu[i], sigma);
    }
  } 

}

These lines implement the math (on the log scale, since Stan calculates the log-posterior).

log_diff_exp(normal_lcdf(lloq),normal_lcdf(0))=log(F(lloq)F(0)),normal_lcdf(x)=log(F(x)),normal_lccdf(x)=log(1F(x)),normal_lpdf(x)=log(f(x))

where f() and F() are the normal density and cumulative distribution functions, respectively.

Back to top

Footnotes

  1. the code is actually very similar. It’s more a matter of where it goes - if there’s no within-chain parallelization, it goes in the model block. If there is, then it goes into you partial_sum function in the functions block.↩︎

  2. Having lloq[i] allows for a dataset where some observations have different LLOQs than others↩︎

  3. For more information, see the Stan documentation for normal_lcdf(), and normal_lpdf().↩︎

  4. A model that assumes log-normal error has support over (0,), so truncation is not an issue. In that case, M3 and M4 are equivalent. Mathematically, you can see this by noting that F(0|θi,σ,tij)=0.↩︎

  5. For observed data with this truncated distribution, we need to “correct” the density so it integrates to 1. Division by 1F(|) has this effect. For the censored data, the numerator is similar to the M3 method, but we must also account for the fact that it must be >0, hence P(0yijLLOQ)=F(LLOQ)F(0). The denominator is corrected in the same manner as for the observed data.↩︎

  6. For more information, see the Stan documentation for log_diff_exp(), normal_lcdf(), normal_lccdf(), and normal_lpdf().↩︎