Skip to content

Commit c055405

Browse files
authored
Merge of #10991
2 parents 82e0fc4 + 930181c commit c055405

5 files changed

Lines changed: 294 additions & 0 deletions

File tree

src/content/docs/test-insights.mdx

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
---
2+
title: Test Insights
3+
description: Monitor, detect, and manage unreliable tests across your repositories.
4+
---
5+
6+
Test Insights helps you manage test reliability across the full lifecycle.
7+
It catches flaky tests on pull requests before they merge, surfaces unhealthy
8+
tests across your repositories, and lets you quarantine problematic tests so
9+
they don't block your CI pipeline.
10+
11+
## How it works
12+
13+
Test Insights is organized into three phases that follow the natural lifecycle
14+
of a test reliability problem:
15+
16+
1. **[Prevention](/test-insights/prevention)**: Catch flaky and broken tests
17+
on pull requests before they reach your codebase. Mergify reruns tests on
18+
PRs to detect inconsistent behavior early.
19+
20+
2. **[Detection](/test-insights/detection)**: Identify and prioritize
21+
unhealthy tests across your repositories. See which tests are flaky or
22+
broken, and focus on the ones with the most impact.
23+
24+
3. **[Mitigation](/test-insights/mitigation)**: Quarantine problematic tests
25+
to unblock CI without removing them. Tests keep running, but their failures
26+
no longer block merges.
27+
28+
## Key concepts
29+
30+
- **Flaky test**: A test that produces different results on the same commit.
31+
For example, passing on one run and failing on the next with identical code.
32+
33+
- **Broken test**: A test that fails consistently, with recent runs weighted
34+
more heavily.
35+
36+
- **Health status**: A test's reliability classification: healthy, flaky, or
37+
broken. Based on results from multiple CI runs.
38+
39+
- **Confidence**: How much data is available to assess a test's health. Low
40+
confidence means the status could still change significantly as more runs
41+
are collected.
42+
43+
- **Quarantine**: Isolating a test so its failures are ignored for merge
44+
decisions. The test still runs and results are still collected, preserving
45+
full visibility.
46+
47+
## Setup
48+
49+
Test Insights is powered by the same CI integration as
50+
[CI Insights](/ci-insights). To get started, configure your CI system and test
51+
framework:
52+
53+
- [GitHub Actions setup](/ci-insights/setup/github-actions)
54+
- [Jenkins setup](/ci-insights/setup/jenkins)
55+
- [Test framework configuration](/ci-insights#test-framework-configuration)
Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
---
2+
title: Detection
3+
description: Identify and prioritize unhealthy tests across your repositories.
4+
---
5+
6+
Even with prevention in place, tests can degrade over time. Detection surfaces
7+
all unhealthy tests (flaky and broken) across your repositories, so you can
8+
see the full picture and prioritize what to fix.
9+
10+
## How tests are classified
11+
12+
Mergify classifies tests based on their results across multiple CI runs,
13+
with recent results weighted more heavily:
14+
15+
- **Flaky**: The test produces inconsistent results on the same commit. It
16+
passes on some runs and fails on others, without any code changes.
17+
18+
- **Broken**: The test fails consistently. Recent runs are weighted more
19+
heavily, so a test that started failing recently will be classified as
20+
broken even if it passed in earlier runs.
21+
22+
Only unhealthy tests (flaky or broken) appear in Detection. Healthy tests
23+
are not listed.
24+
25+
## Understanding confidence
26+
27+
Confidence indicates how much data is available to assess a test's health.
28+
29+
- **High confidence**: Enough runs have been collected to make a reliable
30+
assessment. The health status is unlikely to change significantly.
31+
32+
- **Low confidence**: Limited data is available. The health status could
33+
still shift as more runs are collected. Treat low-confidence results as
34+
preliminary.
35+
36+
Confidence increases as more CI runs are collected for a given test.
37+
38+
## Prioritizing with impact
39+
40+
The impact metric reflects how many failed executions a test causes. A
41+
high-impact flaky test wastes more CI time and disrupts more workflows than
42+
a low-impact one.
43+
44+
Use impact to decide which tests to fix first: high-impact tests give you
45+
the most return on investment when fixed.
46+
47+
## Practical workflows
48+
49+
### Finding your worst tests
50+
51+
Sort by impact to surface the tests causing the most CI disruption. These
52+
are the best candidates for immediate attention.
53+
54+
### Narrowing scope
55+
56+
Use filters to focus on specific areas:
57+
58+
- **Test name**: Search for a specific test or pattern
59+
- **Job name**: Focus on tests within a particular CI job
60+
- **Pipeline name**: Narrow to a specific CI pipeline
61+
62+
### Checking quarantine status
63+
64+
Tests that have already been quarantined are indicated in the health status.
65+
This helps you avoid spending time investigating tests that are already being
66+
managed through [Mitigation](/test-insights/mitigation).
67+
68+
## Setup
69+
70+
Detection requires test metrics collection through repeated CI runs. See the
71+
CI setup guides for your platform:
72+
73+
- [GitHub Actions setup](/ci-insights/setup/github-actions)
74+
- [Jenkins setup](/ci-insights/setup/jenkins)
Lines changed: 81 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,81 @@
1+
---
2+
title: Mitigation
3+
description: Quarantine problematic tests to unblock CI without losing visibility.
4+
---
5+
6+
When a flaky or broken test blocks CI, teams face a tough choice: fix it
7+
immediately, delete it, or ignore it. Quarantine offers a better option. The
8+
test keeps running, but its failures no longer block merges. You maintain full
9+
visibility without disruption.
10+
11+
## How quarantine works
12+
13+
A quarantined test still executes in your CI pipeline and its results are
14+
still collected by Mergify. The difference is that failures are ignored for
15+
merge decisions.
16+
17+
This means:
18+
19+
- Your CI stays green while you work on a fix
20+
21+
- Historical data is preserved, so you can track whether the test improves
22+
or worsens over time
23+
24+
- Other team members can see the test is quarantined and why
25+
26+
Quarantine works on any branch, not just the default branch.
27+
28+
:::note
29+
Quarantined tests must still be uploaded through one of the supported CI
30+
integrations. See the
31+
[test framework configuration](/ci-insights#test-framework-configuration)
32+
for setup details.
33+
:::
34+
35+
## Manual quarantine
36+
37+
You can manually add or remove specific tests from quarantine through the
38+
Mergify dashboard. This is useful when you've identified a problematic test
39+
through [Detection](/test-insights/detection) and want to stop it from
40+
blocking your team while you investigate.
41+
42+
For technical details on how quarantine integrates with your CI pipeline,
43+
see the [Quarantine documentation](/ci-insights/quarantine).
44+
45+
## Auto-quarantine
46+
47+
Auto-quarantine lets Mergify automatically quarantine tests without manual
48+
intervention. By default, only flaky tests are quarantined automatically.
49+
You can also enable quarantining of known broken tests through an additional
50+
option.
51+
52+
This is useful for teams that want hands-off management of unreliable tests.
53+
You can enable or disable auto-quarantine per repository from the Mitigation
54+
page in the dashboard.
55+
56+
## Practical workflows
57+
58+
### Quarantining a test from Detection
59+
60+
When you identify a high-impact flaky or broken test in
61+
[Detection](/test-insights/detection), you can quarantine it directly to
62+
stop it from blocking merges while you work on a fix.
63+
64+
### Reviewing quarantined tests
65+
66+
Periodically check the Mitigation page to review quarantined tests. Look
67+
for tests whose health status has improved; these may be ready to be
68+
removed from quarantine.
69+
70+
### Enabling auto-quarantine
71+
72+
For repositories where broken tests frequently block CI, enable
73+
auto-quarantine to let Mergify handle it automatically. This reduces manual
74+
overhead and keeps your CI pipeline moving.
75+
76+
## Setup
77+
78+
Mitigation uses the same CI integration as Detection. To ensure quarantine
79+
works correctly, your CI must be configured to check quarantine status. See
80+
the [Quarantine documentation](/ci-insights/quarantine) for technical setup
81+
details.
Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
---
2+
title: Prevention
3+
description: Catch flaky and broken tests on pull requests before they reach your codebase.
4+
---
5+
6+
Prevention monitors tests introduced or modified in pull requests. By
7+
rerunning tests on PRs, it detects flaky behavior before code merges, keeping
8+
your codebase reliable.
9+
10+
## How it works
11+
12+
When a pull request runs tests, Mergify reruns them to check for consistency.
13+
Tests that produce different results on the same commit are flagged as flaky.
14+
This happens transparently as part of your existing CI pipeline, with no changes
15+
to your test code needed.
16+
17+
Tests caught as flaky on a PR are prevented from silently degrading your
18+
test suite. You can review their health status before deciding to merge.
19+
20+
## What you can track
21+
22+
Prevention provides key metrics to help you understand test reliability
23+
on pull requests:
24+
25+
### Caught flaky tests
26+
27+
The number of flaky tests detected during PR reruns. This is the core value
28+
of Prevention: every caught test is a reliability problem that didn't make it
29+
into your codebase.
30+
31+
### New tests
32+
33+
Tests being introduced on PRs, along with their health status. Each new test
34+
is classified as healthy, flaky, or broken based on its rerun results. This
35+
helps you spot unreliable tests before they're merged.
36+
37+
### CI budget spent
38+
39+
The total CI time spent on reruns. This metric helps teams understand the
40+
cost of flaky test prevention and make informed trade-offs between
41+
thoroughness and CI budget.
42+
43+
## Practical workflows
44+
45+
### Reviewing tests before merging
46+
47+
When a PR introduces or modifies tests, check the Prevention page to see
48+
their health status. Tests with a flaky or broken status should be
49+
investigated before merging.
50+
51+
### Filtering by pull request state
52+
53+
Use the pull request state filter to focus on specific PRs:
54+
55+
- **Open**: Tests on PRs still in review
56+
- **Merged**: Tests on PRs that have already been merged
57+
- **Closed**: Tests on PRs that were closed without merging
58+
59+
### Understanding confidence on new tests
60+
61+
New tests have limited run data, so their confidence level may be low. A low
62+
confidence means the health status could change as more data is collected.
63+
Consider waiting for more runs before drawing conclusions about a test's
64+
reliability.
65+
66+
## Setup
67+
68+
Prevention requires test framework plugins that instrument test runs to track
69+
flakiness on pull requests.
70+
71+
See the [test framework configuration](/ci-insights#test-framework-configuration)
72+
for setup instructions specific to your framework (pytest-mergify,
73+
rspec-mergify, etc.).

src/content/navItems.tsx

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -100,6 +100,17 @@ const navItems: NavItem[] = [
100100
},
101101
],
102102
},
103+
{
104+
title: 'Test Insights',
105+
path: '/test-insights',
106+
icon: 'fa6-solid:flask-vial',
107+
children: [
108+
{ title: 'Overview', path: '/test-insights', icon: 'fa6-regular:lightbulb' },
109+
{ title: 'Prevention', path: '/test-insights/prevention', icon: 'fa6-solid:shield-halved' },
110+
{ title: 'Detection', path: '/test-insights/detection', icon: 'fa6-solid:magnifying-glass' },
111+
{ title: 'Mitigation', path: '/test-insights/mitigation', icon: 'fa-solid:radiation' },
112+
],
113+
},
103114
{
104115
title: 'Merge Queue',
105116
icon: MergeQueueIcon,

0 commit comments

Comments
 (0)