In biology, we often compare two alternative explanations for the same data.
Examples:
- Is a DNA sequence better explained by a motif model or by random background?
- Does adding an extra parameter significantly improve a model?
- Is one experimental condition better explained by a different distribution?
The Likelihood Ratio Test (LRT) provides a principled statistical framework to compare two nested models using probability theory.
In this project, you will implement an LRT from scratch in F#.
After completing this project, you should be able to:
- Explain what a statistical model and likelihood are
- Implement likelihood functions for simple probabilistic models
- Compare nested models using a likelihood ratio
- Compute a test statistic and p-value
- Interpret statistical evidence in a biological context
You are given numerical observations and two competing statistical models:
-
Null model
$M_0$ : a simpler model with fewer parameters -
Alternative model
$M_1$ : a more complex model that extends$M_0$
Your task is to determine whether the more complex model provides a significantly better explanation of the data.
The Likelihood Ratio Test asks:
Does the increase in likelihood justify the additional model complexity?
This is answered by comparing the maximum likelihoods of the two models.
Given data
In practice, we work with the log-likelihood:
Let:
-
$\ell_0$ be the maximum log-likelihood under the null model -
$\ell_1$ be the maximum log-likelihood under the alternative model
The test statistic is:
Under standard assumptions,
You will use Gaussian models with known structure.
All observations come from a single normal distribution:
Parameters:
Observations come from two different groups, each with its own mean:
Parameters:
$\mu_A$ $\mu_B$ $\sigma$
This models a biological scenario such as:
- control vs treatment
- two experimental conditions
- A list of observations:
values - A list of group labels:
labels(e.g."A"or"B")
Example:
values = [4.8; 5.1; 5.0; 6.2; 6.4; 6.1]
labels = ["A"; "A"; "A"; "B"; "B"; "B"]
- Log-likelihood under
$M_0$ - Log-likelihood under
$M_1$ - Likelihood ratio statistic
$\Lambda$ - Degrees of freedom
- p-value
- Final interpretation
Implement the log-likelihood for a normal distribution:
Estimate parameters by maximum likelihood.
For a normal distribution:
$\hat{\mu} = \text{mean}(x)$ $\hat{\sigma}^2 = \frac{1}{n} \sum (x_i - \hat{\mu})^2$
Apply this:
- once for all data (null model)
- once per group (alternative model)
Compute:
-
$\ell_0$ : log-likelihood under the null model -
$\ell_1$ : log-likelihood under the alternative model
Compute:
- Degrees of freedom:
$df = \text{number of parameters in } M_1 - M_0 = 1$ - Compute the p-value using the
$\chi^2(df)$ distribution
(You may implement the
Decide whether the alternative model is significantly better than the null model at a chosen significance level (e.g.
Input
values: [5.0; 5.1; 4.9; 6.2; 6.3; 6.1]
labels: ["A"; "A"; "A"; "B"; "B"; "B"]
Output
Log-likelihood (null): -7.82
Log-likelihood (alternative): -2.91
LRT statistic: 9.82
p-value: 0.0017
Conclusion: significant difference between groups
- Do not use built-in statistical test functions
- Focus on numerical correctness and clarity
- Work in log-space only
- Structure your code so models are clearly separated
- Implement a one-sided alternative
- Visualize fitted distributions
- Apply the test to real biological data
- Extend to more than two groups
Submit:
- F# source code
- A documentation describing your approach
- One example dataset with interpretation