Treatment of boundaries in Markov chain Monte Carlo

Thomas House

2025-04-14 14:35

I have been asked by multiple students what to do with MCMC at boundaries, in particular how to treat proposals that lie outside the support of the target. Suppose we have a vector of parameters \(\theta\) and want to target a density (proportional to) \(\pi(\theta)\) using a proposal distribution \(q(\theta'|\theta)\). In MCMC, we seek to define a Markov process in discrete time, \((\theta_0, \theta_1, \ldots, \theta_N)\) that has stationary distribution with density \(\pi\). In the Metropolis Hastings scheme, the law of such a process is that at step \(m\), when the state of the system is \(\theta_m\), we pick a proposed new value \(\tilde{\theta}\) with density \(q(\cdot | \theta_m)\), then

\begin{equation*} \mathrm{Pr}(\theta_{m+1} = \tilde{\theta}) = \mathrm{min}\left( 1, \frac{\pi(\tilde{\theta}) q(\theta | \tilde{\theta})}{\pi(\theta)q(\tilde{\theta} | \theta)}\right) \, , \quad \mathrm{Pr}(\theta_{m+1} = \theta_{m}) = 1 - \mathrm{Pr}(\theta_{m+1} = \tilde{\theta}) \, . \quad (\dagger) \end{equation*}

The natural question is, if there is somewhere we can propose but not move (e.g. because the likelihood is not defined) then what should be done?

Let us consider one parameter, a probability \(p\in [0,1]\), with a density \(\pi(p) = \mathrm{Uniform}(0,1)\), and a proposal distribution \(q(p'|p) = \mathcal{N}(p'|p,1)\). We will run chains of length \(N=10^7\). Code is all in R.

N <- 10000000
output <- matrix(0,ncol=2,nrow=N)
sig <- 1

In the first chain (‘Keep picking’) we will keep proposing new values until one is within the allowed region

p <- 0.5
for(m in 1:N){
  pprop <- -1
  while((pprop<0)|(pprop>1)){
    pprop <- p + rnorm(1,mean=0,sd=sig)
  }
  p <- pprop
  output[m,1] <- p
}

In the second chain (‘Reject’) we will reject proposals outside the region and remain at the position the proposal was made.

p <- 0.5
for(m in 1:N){
  pprop <- p + rnorm(1,mean=0,sd=sig)
  if((pprop>=0)&(pprop<=1)){
    p <- pprop
  }
  output[m,2] <- p
}

Now look at some plots

hist(output[,1], breaks=seq(0,1,0.1), freq=FALSE, ylim=c(0,1.2),
     xlab='p', main='Keep picking')

hist(output[,2], breaks=seq(0,1,0.1), freq=FALSE, ylim=c(0,1.2),
     xlab='p', main='Reject')

ts.plot(output[1:500,1],gpars=list(xlab="m", ylab="p"))

ts.plot(output[1:500,2],gpars=list(xlab="m", ylab="p"))

What we see is that the first approach, the trace plot looks superficially ‘better’, but the output is biased. From the Metropolis-Hastings algorith as specified in \((\dagger)\) this is what we would expect; mathematically, we can interpret the ‘Reject’ approach as taking \(\pi(\theta<0) = \pi(\theta>1) = 0\), while the ‘Keep picking’ approach the proposal distribution effectively becomes a truncated normal. Such a truncated normal could be evaluated as the \(q(\cdot|\cdot)\) in \((\dagger)\), and this might be a good algorithm, but the naïve implementation introduces bias.