2  Introduction to Stan

Stan is a platform for statistical modeling and statistical computation. While it can perform maximum likelihood estimation (similar to NONMEM’s FOCE), it is mainly used for Bayesian inference.

2.1 The No-U-Turn-Sampler

See here for an illustration of Stan’s No-U-Turn Sampler (NUTS).

2.2 The Stan Language

The Stan language is strongly statically typed compiled language (similar to C or C++), meaning we declare a type (int, real, array, vector, matrix, ...) for each declared variable, and this type cannot change. This is contrary to interpreted languages like R and Python where we don’t have to declare a variable type, and we can overwrite and change a variable throughout the program.

We write a Stan model down in a .stan file, after which the Stan program is internally translated to C++ and compiled.

2.3 Stan Code Blocks

A stan model is written in code blocks, similarly to NONMEM with $PROB, $DATA, $PK, .... There is a good explanation of the Stan code blocks here. Here will give a brief overview:

2.3.1 functions

  • The functions block is an optional block at the beginning of the program where user-defined functions appear.
  • User defined random number generator functions and probability distributions can be defined here
  • Void functions (those that return no value) are allowed
  • Example: Function to define a one-compartment model
functions{
  
  real depot_1cmt(real dose, real cl, real v, real ka, 
                  real time_since_dose){
    
    real ke = cl/v;
    
    real cp = dose/v * ka/(ka - ke) * 
              (exp(-ke*time_since_dose) - exp(-ka*time_since_dose));
    
    return cp;
    
  }
  
}

2.3.2 data

  • Data are specified upfront and remain fixed
  • They are either specified in the block or read from outside
  • They are read once at the beginning of the process
  • Example: Define observed PK (dv) data (and PD if you have it). We can also define our independent variables (time), parameters for our prior distributions (scale_x), covariates, times at which we want to make predictions (time_pred), or anything else we want to input into the model.
data{
  
  int n_obs;                    // Number of observations
  real<lower = 0> dose;         // Dose amount
  array[n_obs] real time;       // Times at which we have observations
  real time_of_first_dose;      // Time of first dose
  vector[n_obs] dv;             // Observed PK data
  
  real<lower = 0> scale_cl;     // Prior Scale parameter for CL
  real<lower = 0> scale_v;      // Prior Scale parameter for V
  real<lower = 0> scale_ka;     // Prior Scale parameter for KA
  
  real<lower = 0> scale_sigma;  // Prior Scale parameter for lognormal error
  
  int n_pred;                   // Number of new times at which to make a prediction
  array[n_pred] real time_pred; // New times at which to make a prediction
 
}

2.3.3 transformed data

  • We declare and define variables that do not need to be changed when running the program.
  • We can hard code variables here (n_cmt).
  • We can also manipulate our data variables into a form we will use later in the Stan program.
  • The statements in transformed data are executed only once and directly after reading the data in the data block.
transformed data{ 
  
  vector[n_obs] time_since_dose = to_vector(time) - time_of_first_dose;
  vector[n_pred] time_since_dose_pred = to_vector(time_pred) - 
                                        time_of_first_dose;
  int n_cmt = 2;                                        
  
}

2.3.4 parameters

  • Parameters are altered during the sampling process. They are sampled by Stan
  • These are the ones we provide priors and initial estimates for later on
  • Can specify bounds here
  • Example: Define the parameters for the one-compartment depot model and constrain the absorption rate constant to be larger than elimination to ensure no flip-flop kinetics.
parameters{  
  
  real<lower = 0> CL;     
  real<lower = 0> V;
  real<lower = CL/V> KA;
  
  real<lower = 0> sigma;
  
}

2.3.5 transformed parameters

  • We define and calculate variables that are needed for the calculation of the posterior density or other values we want to keep. In practice, this means we calculate values needed to compute the likelihood.
  • If parameters depend on both data and parameters, we specify them in the transformed parameters block.
  • If parameters depend on only data, they should be specified in transformed data.
  • The statements in transformed parameters are calculated at every leapfrog step in the NUTS algorithm, so the calculation is relatively expensive. Quantities that you wish to keep but aren’t necessary for computing the posterior density should be computed in generated quantities.
  • Example: Calculate the PK expected value (ipred) before accounting for the residual error
  • This calculation can be done here or in the model block but usually model is reserved for stochastic elements where as this block is typically used for deterministic calculations.
transformed parameters{
  vector[n_obs] ipred;
  
  for(i in 1:n_obs){
    ipred[i] = depot_1cmt(dose, CL, V, KA, time_since_dose[i]);
  }
  
}

2.3.6 model

  • We define the model here
  • Stochastic definitions and sampling statements are included here
  • Constraints on parameters and the statements in this block define prior distributions
  • Likelihood statement is defined here
  • Example: Specifying the prior distributions (CL ~ , V ~, KA ~)
  • Example: Likelihood dv is defined in vectorized notation here.
model{ 
  
  // Priors
  CL ~ cauchy(0, scale_cl);
  V ~ cauchy(0, scale_v);
  KA ~ normal(0, scale_ka) T[CL/V, ];
  
  sigma ~ normal(0, scale_sigma);
  
  // Likelihood
  dv ~ lognormal(log(ipred), sigma);
}

2.3.7 generated quantities

  • Used to calculate a derived quantity or some other quantity you wish to keep in the output
  • Used to make predictions
  • This block is executed only once per iteration, so is computationally inexpensive.
  • Example: Posterior predictive check (dv_ppc) or a prediction of plasma concentration (cp) or a measurement of a plasma concentration (dv_pred) at an unobserved time.
  • Example: We might want to draw samples for the elimination rate constant, KE, but it did not play a role in the model, so we do that here rather than in transformed parameters.
generated quantities{
  
  real<lower = 0> KE = CL/V;
  real<lower = 0> sigma_sq = square(sigma);
  vector[n_obs] dv_ppc;
  vector[n_obs] log_lik;
  vector[n_pred] cp;
  vector[n_pred] dv_pred;
  vector[n_obs] ires = log(dv) - log(ipred);
  vector[n_obs] iwres = ires/sigma;
  
  for(i in 1:n_obs){
    dv_ppc[i] = lognormal_rng(log(ipred[i]), sigma);
    log_lik[i] = lognormal_lpdf(dv[i] | log(ipred[i]), sigma);
  }
  for(j in 1:n_pred){
    if(time_since_dose_pred[j] <= 0){
      cp[j] = 0;
      dv_pred[j] = 0;
    }else{
      cp[j] = depot_1cmt(dose, CL, V, KA, time_since_dose_pred[j]);
      dv_pred[j] = lognormal_rng(log(cp[j]), sigma);
    }
  }
}