Point process Bayesian SCR models with nimbleSCR

Cyril Milleret, Wei Zhang, Pierre Dupont and Richard Bischof

2022-11-30

In this vignette, we demonstrate how to use the nimbleSCR (Bischof et al. 2020) and NIMBLE packages (de Valpine et al. 2017; NIMBLE Development Team 2020) to simulate spatial capture-recapture (SCR) data and fit flexible and efficient Bayesian SCR models via a set of point process functions. Users with real-life SCR data can use this vignette as a guidance for preparing the input data and fitting appropriate Bayesian SCR models in NIMBLE.

## Load packages
library(nimble)
library(nimbleSCR)
library(basicMCMCplots)

1. Simulate SCR data

1.1 Habitat and trapping grid

As an example, we create a \(80 \times 100\) habitat grid with a resolution of 10 for each dimension. On the habitat, we center a \(60 \times 80\) trapping grid with also a resolution of 10 for each dimension, leaving an untrapped perimeter (buffer) with a width of 20 distance units on each side of the grid.

1.2 Rescale coordinates

To implement the local evaluation approach when fitting the SCR model (see Milleret et al. (2019) and Turek et al. (2021) for further details), we need to rescale the habitat and trapping grid coordinates so that each habitat cell is of dimension \(1 \times 1\). We also need to identify the lower and upper coordinates of each habitat cell using the ‘getWindowCoords’ function.

1.3 Model definition

modelCode <- nimbleCode({
  ##---- SPATIAL PROCESS 
  ## Prior for AC distribution parameter
  habCoeffSlope ~ dnorm(0, sd = 10)
  
  ## Intensity of the AC distribution point process
  habIntensity[1:numHabWindows] <- exp(habCoeffSlope * habCovs[1:numHabWindows])
  sumHabIntensity <- sum(habIntensity[1:numHabWindows])
  logHabIntensity[1:numHabWindows] <- log(habIntensity[1:numHabWindows])
  logSumHabIntensity <- log(sumHabIntensity)
  
  ## AC distribution
  for(i in 1:M){
    sxy[i, 1:2] ~ dbernppAC(
      lowerCoords = lowerHabCoords[1:numHabWindows, 1:2],
      upperCoords = upperHabCoords[1:numHabWindows, 1:2],
      logIntensities = logHabIntensity[1:numHabWindows],
      logSumIntensity = logSumHabIntensity,
      habitatGrid = habitatGrid[1:numGridRows,1:numGridCols],
      numGridRows =  numGridRows,
      numGridCols = numGridCols
    )
  }
  
  ##---- DEMOGRAPHIC PROCESS
  ## Prior for data augmentation
  psi ~ dunif(0,1)
  
  ## Data augmentation
  for (i in 1:M){
    z[i] ~ dbern(psi)
  }
  
  ##---- DETECTION PROCESS
  ## Priors for detection parameters
  sigma ~ dunif(0, 50)
  detCoeffInt ~ dnorm(0, sd = 10)
  detCoeffSlope ~ dnorm(0, sd = 10)
  
  ## Intensity of the detection point process
  detIntensity[1:numObsWindows] <- exp(detCoeffInt + detCoeffSlope * detCovs[1:numObsWindows]) 
  
  ## Detection process
  for (i in 1:M){
    y[i, 1:numMaxPoints, 1:3] ~ dpoisppDetection_normal(
      lowerCoords = obsLoCoords[1:numObsWindows, 1:2],
      upperCoords = obsUpCoords[1:numObsWindows, 1:2],
      s = sxy[i, 1:2],
      sd = sigma,
      baseIntensities = detIntensity[1:numObsWindows],
      numMaxPoints = numMaxPoints,
      numWindows = numObsWindows,
      indicator = z[i]
    )
  }
  
  ##---- DERIVED QUANTITIES
  ## Number of individuals in the population
  N <- sum(z[1:M])
})

1.4 Set up parameter values

We set parameter values for the simulation as below.

We use the data augmentation approach (Royle and Dorazio 2012) to estimate population size N. Thus, we need to choose a value M for the size of the superpopulation (detected + augmented individuals). Here we set M to be 150. The expected total number of individuals that are truly present in the population is M *psi.

When simulating individual detections using the Poisson point process function ‘dpoispp_Detection_normal’, all the information is stored in y, a 3D array containing i) the number of detections per individual, ii) the x- and y-coordinates of each detection, and iii) the index of the habitat grid cell for each detection (see ?dpoisppDetection_normal for more details):

Next, we need to provide the maximum number of detections that can be simulated per individual. We set this to be 19 + 1 to account for the fact that the first element of the second dimension of the detection array (y[ ,1,1]) does not contain detection data but the total number of detections for each individual.

In this simulation, we also incorporate spatial covariates on the intensity of the point processes for AC distribution and individual detections. Values of both covariates are generated under a uniform distribution: Unif[-1, 1].

1.5 Create data, constants and initial values

Here we prepare objects containing data, constants, and initial values that are needed for creating the NIMBLE model below.

In order to simulate directly from the NIMBLE model, we set the true parameter values as initial values. These will be used by the NIMBLE model object to randomly generate SCR data.

1.6 Create NIMBLE model

We can then build the NIMBLE model.

1.7 Simulate data

In this section, we demonstrate how to simulate data using the NIMBLE model code. Here, we want to simulate individual AC locations (‘sxy’), individual states (‘z’), and observation data (‘y’), based on the values provided as initial values. We first need to identify which nodes in the model need to be simulated, via the ‘getDependencies’ function in NIMBLE. Then, we can generate values for these nodes using the ‘simulate’ function in NIMBLE.

After running the code above, simulated data are stored in the ‘model’ object. For example, we can access the simulated ‘z’ and check the number of individuals that are truly present in the population:

We have simulated 89 individuals truly present in the population, of which 83 are detected.

To check the simulate data, we can also plot the locations of the simulated activity center and detections for a particular individual.

## [1] 0

2. Fit model with data augmentation

2.1. Prepare the input data

We have already defined the model above and now need to build the NIMBLE model again using the simulated data ‘y’. For simplicity, we use the simulated ‘z’ as initial values. When using real-life SCR data you will need to generate initial ‘z’ values for augmented individuals and initial ‘sxy’ values for all individuals.

## [1] -1416.63

2.2. Run MCMC with NIMBLE

Now we can configure and run the MCMC in NIMBLE to fit the model.

## ===== Monitors =====
## thin = 10: N, detCoeffInt, detCoeffSlope, habCoeffSlope, psi, sigma
## ===== Samplers =====
## binary sampler (150)
##   - z[]  (150 elements)
## RW_block sampler (150)
##   - sxy[]  (150 multivariate elements)
## RW sampler (5)
##   - habCoeffSlope
##   - psi
##   - sigma
##   - detCoeffInt
##   - detCoeffSlope
##    user  system elapsed 
## 136.356   0.328 137.179

3. Fit model without data augmentation

3.1. Model definition

We use the same simulated dataset to demonstrate how to fit a model using the semi-complete data likelihood (SCDL) approach (King et al. 2016). We first need to re-define the model.

modelCodeSemiCompleteLikelihood <- nimbleCode({
  #----- SPATIAL PROCESS
  ## Priors
  habCoeffInt ~ dnorm(0, sd = 10)
  habCoeffSlope ~ dnorm(0, sd = 10)

  ## Intensity of the AC distribution point process
  habIntensity[1:numHabWindows] <- exp(habCoeffInt + habCoeffSlope * habCovs[1:numHabWindows])
  sumHabIntensity <- sum(habIntensity[1:numHabWindows])
  logHabIntensity[1:numHabWindows] <- log(habIntensity[1:numHabWindows])
  logSumHabIntensity <- log(sum(habIntensity[1:numHabWindows] ))

  ## AC distribution
  for(i in 1:nDetected){
    sxy[i, 1:2] ~ dbernppAC(
      lowerCoords = lowerHabCoords[1:numHabWindows, 1:2],
      upperCoords = upperHabCoords[1:numHabWindows, 1:2],
      logIntensities = logHabIntensity[1:numHabWindows],
      logSumIntensity = logSumHabIntensity,
      habitatGrid = habitatGrid[1:numGridRows,1:numGridCols],
      numGridRows =  numGridRows,
      numGridCols = numGridCols
    )
  }

  ##---- DEMOGRAPHIC PROCESS
  ## Number of individuals in the population
  N ~ dpois(sumHabIntensity)
  ## Number of detected individuals
  nDetectedIndiv ~ dbin(probDetection, N)

  ##---- DETECTION PROCESS
  ## Probability that an individual in the population is detected at least once
  ## i.e. 1 - void probability over all detection windows
  probDetection <- 1 - marginalVoidProbNumIntegration(
    quadNodes = quadNodes[1:nNodes, 1:2, 1:numHabWindows],
    quadWeights = quadWeights[1:numHabWindows],
    numNodes = numNodes[1:numHabWindows],
    lowerCoords = obsLoCoords[1:numObsWindows, 1:2],
    upperCoords = obsUpCoords[1:numObsWindows, 1:2],
    sd = sigma,
    baseIntensities = detIntensity[1:numObsWindows],
    habIntensities = habIntensity[1:numHabWindows],
    sumHabIntensity = sumHabIntensity,
    numObsWindows = numObsWindows,
    numHabWindows = numHabWindows
  )

  ## Priors for detection parameters
  sigma ~ dunif(0, 50)
  detCoeffInt ~ dnorm(0, sd = 10)
  detCoeffSlope ~ dnorm(0, sd = 10)

  ## Intensity of the detection point process
  detIntensity[1:numObsWindows] <- exp(detCoeffInt + detCoeffSlope * detCovs[1:numObsWindows])
  ## Detection process
  ## Note that this conditions on the fact that individuals are detected (at least once)
  ## So, at the bottom of this model code we deduct log(probDetection) from the log-likelihood
  ## function for each individual
  for (i in 1:nDetected){
    y[i, 1:numMaxPoints, 1:3] ~ dpoisppDetection_normal(
      lowerCoords = obsLoCoords[1:numObsWindows, 1:2],
      upperCoords = obsUpCoords[1:numObsWindows, 1:2],
      s = sxy[i, 1:2],
      sd = sigma,
      baseIntensities = detIntensity[1:numObsWindows],
      numMaxPoints = numMaxPoints,
      numWindows = numObsWindows,
      indicator = 1
    )
  }
  ## Normalization: normData can be any scalar in the data provided when building the model
  ## The dnormalizer is a custom distribution defined for efficiency, where the input data
  ## does not matter. It makes it possible to use the general dpoippDetection_normal function
  ## when either data augmentation or the SCDL is employed
  logDetProb <- log(probDetection)
  normData ~ dnormalizer(logNormConstant = -nDetected * logDetProb)
})

3.2. Prepare the input data

We use the same simulated data as above. Since we do not use data augmentation here, we have to remove all individuals that are not detected from ‘y’ and ‘sxy’.

The values below are needed to calculate the void probability numerically (i.e. the probability that one individual is detected at least once) using the midpoint rule.

3.3. Run MCMC with NIMBLE

Finally we can re-build the model and run the MCMC to fit the SCDL model.

## [1] -1058.118
## ===== Monitors =====
## thin = 10: N, detCoeffInt, detCoeffSlope, habCoeffInt, habCoeffSlope, probDetection, sigma
## ===== Samplers =====
## slice sampler (1)
##   - N
## RW_block sampler (83)
##   - sxy[]  (83 multivariate elements)
## RW sampler (5)
##   - habCoeffInt
##   - habCoeffSlope
##   - sigma
##   - detCoeffInt
##   - detCoeffSlope
##    user  system elapsed 
## 594.192   1.807 598.858