Chapter 11 Understanding the likelihood function

11.1 Background

Ari Martinez spent multiple summers doing field work in the Peruvian Amazonian. In 2011, Ari observed an interesting behavior in heterospecific flocks. Depending on where on the forest a bird was feeding its response would differ. For example if a bird is feeding on the ground(dead-leaf gleaning) then the bird would “freeze” for some time, but most of the flycatchers that feed on the top of the trees wouldn’t even bother to stop.

Ari started timing the freezing behavior for some of these birds and was able to collect a small sample. He would play an alarm call and then measure the time birds were freezing depending on where they were foraging.

This is his sample and you will be using it to understand evidence in likelihood.

bird_alarms <- read_csv("../../shared/AndesWorkshop2023/birdalarms.csv")
head(bird_alarms)

11.2 The model

We are going to model freezing time using an exponential distribution with parameter \(1/\theta\), where \(\theta\) is the expected time that birds of a group freeze.

What is an exponential distribution with parameter \(1/\theta\)?

11.3 The likelihood function

The likelihood function \(L(\theta;X)=P(\theta|X)\) is the product of the exponential density evaluated in every value of the sample observed. In order to do this calculation repeatedly we are going to make our own function in R. Just like the functions you have been using, your function is designed to do a suite of commands. This allows you to avoid copying and pasting the commands multiple times and changing all their arguments.

We define likelihood.fuction using the dexp() probability function. We return the negative likelihood, which is the product of all probabilities derived from our observations. Once you have defined this function and know what it does, you do not have to think about how to calculated likelihoods again.

likelihood.function <- function(parameter, observations){
    probabilities <- dexp(observations, rate=1/parameter,log=FALSE)
    L <- prod(probabilities)
    return(-L)
}

The likelihood function is much more difficult depending on the model we select. We will discuss more about its construction when we discuss substitution models. However the way to think about it is always as the probability of the data given a model (or set of parameters) \(P(Data|Model)\).

Knowing this definition of the likelihood then let’s ask

  • What is the input of the likelihood function?
  • What is the output of the likelihood function? ( and why is it negative?)

11.4 A mini example with a sample size of two

If we have a sample size of two birds freezing responses \(X=2,10\) the maximum likelihood estimate is the average \(\hat{\theta}=\sum_{i=1}^n x_i/n\). Therefore

(mle<-(2+10)/2)

Note that by putting the whole command in () we both assign the output to the variable and print it.

However, most of the time in difficult likelihoods it is impossible to calculate exactly who the mle so we do it numerically.

(likelihood.optimization <- optimize(f=likelihood.function, interval=c(0,10), observations=c(2,10)))

The function optimize is used when we have unidimensional functions (one parameter). For multidimensional we use optim or even better nloptr from the package with the same name.

What are the outputs?

11.5 The bird alarm example

In Ari’s example we have the dead-leaf gleaning species

dl.forager <- bird_alarms %>% filter(Forage == "DL") %>% pull(Response)

and the flycatchers

f.forager <- bird_alarms %>% filter(Forage =="F") %>% pull(Response) #11

11.5.1 Calculate the MLE for DL and F and the likelihood value evaluated at the MLE

You should be getting approximately the following values:

dl.optimization <- optimize(f=likelihood.function, interval=c(10,30), observations=dl.forager)

(mle.dl <- dl.optimization$minimum)
(likelihoodval.dl <- -dl.optimization$objective)

(mle.f <- mean(f.forager))
(likelihoodval.f <- likelihood.function(mle.f,observations=f.forager))

So, are the flycatchers behaving differently than the dead-leaf gleaners? What is the evidence? Can we compare these likelihoods or MLEs?

11.6 Where is the evidence for different behavior?

Likelihood functions are not only about the maximum likelihood estimate. Likelihoods represent plausibility. This is represented in the full likelihood function so it is important to explore it.

We select a series of parameters and measure their plausibility for example in the interval \((0,50)\)

parameter.vals <- seq(0.0001,50,0.01) #creating an interval for possible values for the likelihood
long <- length(parameter.vals) 
# Evaluating the likelihood for each of those values
p.likelihoodf <- rep(0,long)
for (i in 1:long){
p.likelihoodf[i]<- -likelihood.function(parameter.vals[i],observations=f.forager) # Remember is negative so we need to add a sign
}

In different studies likelihood can be represented using different scales

par(mfrow=c(1,3))
# Straight likelihood function
plot(parameter.vals, p.likelihoodf, type="l",main="Likelihood for flycatchers",xlab=expression(theta),ylab="Likelihood", lwd=2,xlim=c(0,5))

#log-likelihood function, most commonly used
plot(parameter.vals, log(p.likelihoodf), type="l",main="log-likelihood for flycatchers",xlab=expression(theta),ylab="Likelihood",lwd=2,xlim=c(0,5))

# Relative likelihood: Likelihood divided by the likelihood value at the MLE
plot(parameter.vals, p.likelihoodf/likelihoodval.f, type="l",main="Relative likelihood for flycatchers",xlab=expression(theta),ylab="Likelihood",lwd=2,xlim=c(0,5))

What do you think about the sample size for flycatchers?

11.6.1 Plot the likelihood for dead-leaf gleaners

It should look like this

p.likelihooddl<-rep(0,long)
for (i in 1:long){
p.likelihooddl[i]<- -likelihood.function(parameter.vals[i],observations=dl.forager)
}

par(mfrow=c(1,3))
plot(parameter.vals, p.likelihooddl, type="l",main="Likelihood for flycatchers",xlab="Rate Parameter",ylab="Likelihood",lty=2,col="red",lwd=2)

plot(parameter.vals, log(p.likelihooddl), type="l",main="log-likelihood for flycatchers",xlab="Rate Parameter",ylab="Likelihood",lty=2,col="red",lwd=2)

plot(parameter.vals, p.likelihooddl/likelihoodval.dl, type="l",main="Relative likelihood for flycatchers",xlab="Rate Parameter",ylab="Likelihood",lty=2,col="red",lwd=2)

11.7 Are the freezing times different for the two groups?

Usually you will go ahead and do some statistical test to say “I reject that the average freezing time of the groups is the same”. Except that you can’t do a T-test (not normal), sample sizes are really small (so not so much power). The evidence of likelihood comes to the rescue.

Using relative likelihoods we can compare the evidence between the two groups

par(mfrow=c(1,1))

plot(parameter.vals, p.likelihoodf/likelihoodval.f, type="l",main="Evidence for responses",xlab="Rate Parameter",ylab="Likelihood")
lines(parameter.vals, p.likelihooddl/likelihoodval.dl,lty=2,col="red",lwd=2)
legend(x=35,y=0.8, col=c("black","red"),legend=c("flycatcher","dead-leaf"),lty=1:2)