Project: Likelihood Ratio Test for Competing Biological Models

Background

In biology, we often compare two alternative explanations for the same data.

Examples:

Is a DNA sequence better explained by a motif model or by random background?
Does adding an extra parameter significantly improve a model?
Is one experimental condition better explained by a different distribution?

The Likelihood Ratio Test (LRT) provides a principled statistical framework to compare two nested models using probability theory.

In this project, you will implement an LRT from scratch in F#.

Learning Objectives

After completing this project, you should be able to:

Explain what a statistical model and likelihood are
Implement likelihood functions for simple probabilistic models
Compare nested models using a likelihood ratio
Compute a test statistic and p-value
Interpret statistical evidence in a biological context

Problem Description

You are given numerical observations and two competing statistical models:

Null model $M_0$: a simpler model with fewer parameters
Alternative model $M_1$: a more complex model that extends $M_0$

Your task is to determine whether the more complex model provides a significantly better explanation of the data.

Key Idea

The Likelihood Ratio Test asks:

Does the increase in likelihood justify the additional model complexity?

This is answered by comparing the maximum likelihoods of the two models.

Statistical Framework

Likelihood

Given data $x_1, \dots, x_n$ and a model with parameters $\theta$, the likelihood is:

$$ L(\theta) = \prod_{i=1}^n p(x_i \mid \theta) $$

In practice, we work with the log-likelihood:

$$ \ell(\theta) = \sum_{i=1}^n \log p(x_i \mid \theta) $$

Likelihood Ratio Test Statistic

Let:

$\ell_0$ be the maximum log-likelihood under the null model
$\ell_1$ be the maximum log-likelihood under the alternative model

The test statistic is:

$$ \Lambda = 2(\ell_1 - \ell_0) $$

Under standard assumptions, $\Lambda$ follows a $\chi^2$ distribution with degrees of freedom equal to the difference in number of parameters between the models.

Models Used in This Project

You will use Gaussian models with known structure.

Null Model $M_0$

All observations come from a single normal distribution:

$$ x_i \sim \mathcal{N}(\mu, \sigma^2) $$

Parameters: $\mu, \sigma$

Alternative Model $M_1$

Observations come from two different groups, each with its own mean:

$$ x_i \sim \mathcal{N}(\mu_{g(i)}, \sigma^2) $$

Parameters:

$\mu_A$
$\mu_B$
$\sigma$

This models a biological scenario such as:

control vs treatment
two experimental conditions

Input

A list of observations: values
A list of group labels: labels (e.g. "A" or "B")

Example:

values = [4.8; 5.1; 5.0; 6.2; 6.4; 6.1]
labels = ["A"; "A"; "A"; "B"; "B"; "B"]

Output

Log-likelihood under $M_0$
Log-likelihood under $M_1$
Likelihood ratio statistic $\Lambda$
Degrees of freedom
p-value
Final interpretation

Starting Tasks

Task 1: Log-Likelihood for a Gaussian Model

Implement the log-likelihood for a normal distribution:

$$ \log p(x \mid \mu, \sigma) = -\frac{1}{2}\log(2\pi\sigma^2) -\frac{(x - \mu)^2}{2\sigma^2} $$

Task 2: Parameter Estimation

Estimate parameters by maximum likelihood.

For a normal distribution:

$\hat{\mu} = \text{mean}(x)$
$\hat{\sigma}^2 = \frac{1}{n} \sum (x_i - \hat{\mu})^2$

Apply this:

once for all data (null model)
once per group (alternative model)

Task 3: Compute Log-Likelihoods

Compute:

$\ell_0$: log-likelihood under the null model
$\ell_1$: log-likelihood under the alternative model

Task 4: Likelihood Ratio Statistic

Compute:

$$ \Lambda = 2(\ell_1 - \ell_0) $$

Task 5: p-value Calculation

Degrees of freedom: $df = \text{number of parameters in } M_1 - M_0 = 1$
Compute the p-value using the $\chi^2(df)$ distribution

(You may implement the $\chi^2$ CDF numerically or approximate it.)

Task 6: Interpretation

Decide whether the alternative model is significantly better than the null model at a chosen significance level (e.g. $\alpha = 0.05$).

Example

Input

values: [5.0; 5.1; 4.9; 6.2; 6.3; 6.1]
labels: ["A"; "A"; "A"; "B"; "B"; "B"]

Output

Log-likelihood (null): -7.82
Log-likelihood (alternative): -2.91
LRT statistic: 9.82
p-value: 0.0017
Conclusion: significant difference between groups

Implementation Notes

Do not use built-in statistical test functions
Focus on numerical correctness and clarity
Work in log-space only
Structure your code so models are clearly separated

Tasks extension

Implement a one-sided alternative
Visualize fitted distributions
Apply the test to real biological data
Extend to more than two groups

Submission

Submit:

F# source code
A documentation describing your approach
One example dataset with interpretation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Project: Likelihood Ratio Test for Competing Biological Models

Background

Learning Objectives

Problem Description

Key Idea

Statistical Framework

Likelihood

Likelihood Ratio Test Statistic

Models Used in This Project

Null Model $M_0$

Alternative Model $M_1$

Input

Output

Starting Tasks

Task 1: Log-Likelihood for a Gaussian Model

Task 2: Parameter Estimation

Task 3: Compute Log-Likelihoods

Task 4: Likelihood Ratio Statistic

Task 5: p-value Calculation

Task 6: Interpretation

Example

Implementation Notes

Tasks extension

Submission

FilesExpand file tree

Likelihood_Ratio_Test.md

Latest commit

History

Likelihood_Ratio_Test.md

File metadata and controls

Project: Likelihood Ratio Test for Competing Biological Models

Background

Learning Objectives

Problem Description

Key Idea

Statistical Framework

Likelihood

Likelihood Ratio Test Statistic

Models Used in This Project

Null Model $M_0$

Alternative Model $M_1$

Input

Output

Starting Tasks

Task 1: Log-Likelihood for a Gaussian Model

Task 2: Parameter Estimation

Task 3: Compute Log-Likelihoods

Task 4: Likelihood Ratio Statistic

Task 5: p-value Calculation

Task 6: Interpretation

Example

Implementation Notes

Tasks extension

Submission