Bayesian Statistical Methods

Partial solutions

Chapter 2: From prior information to posterior inference

Jump to probem: 1, 3, 5, 7, 9, 11, 13, 15, 17

(1a) The posterior given in the chapter is \(\mu|Y \sim\mbox{Normal}\left(\frac{n}{n+m}{\bar Y},\frac{\sigma^2}{n+m}\right)\) where \({\bar Y}=\sum_{i=1}^nY_i/n\), so the 95% credible interval is \[\frac{n}{n+m}{\bar Y} \pm 1.96\sigma/\sqrt{n+m}.\]

(1b) If \(m\rightarrow 0\) the 95% crebible interval converges to the classic z-interval and thus has frequentist coverage 0.95.

(3) The likelihood is \[ f(Y|\theta)=\theta\exp(-\theta Y)\] which is the kernel of a gamma distribution when viewed as a function of \(\theta\). Therefore the conjugate prior is \(\theta\sim\mbox{Gamma}(a,b)\). The resulting posterior is \[\begin{align} p(\theta|Y) &\propto f(Y|\theta)\pi(\theta)\\ &\propto [\theta\exp(-\theta Y)]\cdot[\theta^{a-1}\exp(-\theta b)]\\ &\propto \theta^{(a+1)-1}\exp[-\theta (Y+b)] \end{align}\] and \(\theta|Y\sim\mbox{Gamma}(a+1,Y+b)\).

(5a) We need to select \(a\) and \(b\) so that the variance is 100 and median is 75. The variance is \(a/b^2\) and so we must have \(a=100b^2\). The median doesn't have a closed form so we simply search over a grid of \(b\)

  b_grid <- seq(0.1,1,0.0001)
  med    <- qgamma(0.5,100*b_grid^2,b_grid)
  b      <- b_grid[which.min(abs(med-75))]
  a      <- 100*b^2
  a
## [1] 56.91194
  b
## [1] 0.7544
  plot(b_grid,med); abline(v=b)  # Check graphically

plot of chunk p5

  lambda <- rgamma(10000000,a,b) # Check using MC sampling
  hist(lambda,breaks=100)

plot of chunk p5

  var(lambda);mean(lambda>75)
## [1] 99.89221
## [1] 0.5000673

(7) Trial and error reveals that \(c=1.3\) is a reasonable choice. This prior is actually somewhat informative for \(\alpha\) and \(\beta\), but gives an uninformative prior on the success probability.

  S     <- 1000000
  c     <- 1.3
  alpha <- rnorm(S,0,c)
  beta  <- rnorm(S,0,c)
  X     <- rnorm(S)
  prob  <- 1/(1+exp(-alpha-X*beta))
  hist(prob,breaks=50)

plot of chunk p7

(9a) A negative binomial assumes that each day is independent and has the same probability of a relapse, and that the experiment stops after a prespecified number of failures (in this case one), which describes the smoking cessation attempt fairly well. The main assumptions are independence and equal probability. Surely the probability varies with time since the beginning of the quit attempt, but it perhaps reasonable for a short study such as this.

(9b) This is approximated using Monte Carlo sampling (the plot is truncated at 14 because the PMF has a very heavy right tail):

  S     <- 100000
  theta <- runif(S)
  Y     <- rnbinom(S,1,theta)
  plot(table(Y)/S,xlab="Number of days",ylab="Prior probability",xlim=c(0,14))

plot of chunk p9b

  mean(Y>=7);mean(Y>14)
## [1] 0.12502
## [1] 0.06283

(9c) The posterior is \[\begin{align} p(\theta|Y) &\propto f(Y|\theta)\pi(\theta)\\ &\propto [\theta^{1}(1-\theta)^Y][1]\\ &\propto \theta^{(2)-1}(1-\theta)^{(Y+1)-1} \end{align}\] and therefore \(\theta|Y\sim\mbox{Beta}(1+1,Y+1)\), which is plotted below for \(Y=5\)

  S     <- 100000
  theta <- rbeta(S,2,6)
  hist(theta,breaks=50,xlab=expression(theta),ylab="Posterior")

plot of chunk p9c

(9d) The PPD is approximated using Monte Carlo sampling:

  S     <- 100000
  theta <- rbeta(S,2,6)
  Y     <- rnbinom(S,1,theta)
  plot(table(Y)/S,xlab="Number of days",ylab="PPD",xlim=c(0,14))

plot of chunk p9d

  mean(Y>=7)
## [1] 0.22992

(11) The plots are below. I wouldn't say that any of the priors are uninformative for both variables, but the prior in (b) is the Jeffreys' prior so this is as close as you can get.

  S      <- 1000000
  par(mfrow=c(1,2))

  theta <- rbeta(S,1,1); 
  gamma <- theta/(1-theta); gamma <- gamma[gamma<100]
  hist(theta,main="(11a)",breaks=50)
  hist(gamma,main="(11a)",breaks=50)

plot of chunk p11

  theta <- rbeta(S,0.5,0.5)
  gamma <- theta/(1-theta); gamma <- gamma[gamma<100]
  hist(theta,main="(11b)",breaks=50)
  hist(gamma,main="(11b)",breaks=50)

plot of chunk p11

  gamma <- runif(S,0,100); gamma <- gamma[gamma<100]
  theta <- gamma/(gamma+1)
  hist(theta,main="(11c)",breaks=50)
  hist(gamma,main="(11c)",breaks=50)

plot of chunk p11

  gamma <- rgamma(S,1,1); gamma <- gamma[gamma<100]
  theta <- gamma/(gamma+1)
  hist(theta,main="(11d)",breaks=50)
  hist(gamma,main="(11d)",breaks=50)

plot of chunk p11

  par(mfrow=c(1,1))

(13) Denote the posterior as \(p(\theta|Y) = k f(Y|\theta)\pi(\theta)\) for some constant \(k\). By assumption \(p(\theta|Y) \ge kc(Y)\pi(\theta)\) and thus \[ \int p(\theta|Y)d\theta \ge \int kc(Y)\pi(\theta)d\theta \ge kc(Y)\int\pi(\theta)d\theta = \infty\] since \(\pi\) is an improper distribution.

(15a) The log likelihood and its first two derivatives are \[\begin{align} l(\lambda) & = -\lambda + Y\log(\lambda)\\ l'(\lambda) & = -1 + Y/\lambda\\ l'‘(\lambda) & =-Y/\lambda^2 \end{align}\] \(E(Y)=\lambda\) and so the negative expected second derivative is \(1/\lambda\) and the Jeffreys' prior is \(\pi(\lambda) = \sqrt{1/\lambda} = \lambda^{-½}\).

  lambda <- seq(0.01,5,0.01)
  plot(lambda,1/sqrt(lambda),type="l",ylab="Jeffreys' prior")

plot of chunk p15a

(15b) The prior is not proper since \(\int_{0}^{\infty}\lambda^{-½}d\lambda = \infty\).

(15c) The posterior is \[ p(\lambda|Y) = [\exp(-\lambda)\lambda^Y][\lambda^{-½}] = \lambda^{Y+½-1}\exp(-\lambda) \] and so \(\lambda|Y\sim\mbox{Gamma}(Y+½,1)\) which is proper for all \(Y\).

(17) The Jeffreys' prior from problem 15 is \(\pi(\lambda) = \lambda^{-½}\) and the posterior is \(\lambda|Y\sim\mbox{Gamma}(Y+½,1)\). The posterior is summarized below for \(Y=10\):

  Y      <- 10
  qgamma(0.5,Y+1/2,1)            # Posterior median
## [1] 10.16861
  qgamma(c(0.025,0.975),Y+1/2,1) # Posterior 95% credible interval 
## [1]  5.141449 17.739438