Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
baf7550
feat(benchmark): add annotation load time baseline benchmarks (#566)
labradorite-dev Feb 25, 2026
6b60ec7
feat(benchmark): add 10k-URL scale seeder and scale benchmark tests (…
labradorite-dev Feb 25, 2026
1cea1da
fix(benchmark): replace per-URL suggestion loops with bulk inserts (#…
labradorite-dev Feb 25, 2026
e9db7d3
fix(benchmark): use confidence=1.0 for location suggestions (#566)
labradorite-dev Feb 25, 2026
51596ca
feat(db): materialize url_annotation_count_view and url_annotation_fl…
labradorite-dev Feb 26, 2026
2c414e5
chore(deps): upgrade pytest-benchmark from 4.0 to 5.2.3
labradorite-dev Feb 26, 2026
825eabb
fix(annotate): use LEFT JOINs on materialized views; refresh in sort …
labradorite-dev Feb 27, 2026
90118a0
style: fix lint issues in our changes (#566)
labradorite-dev Feb 27, 2026
b9a0c27
fix(migration): rebase onto upstream 1fb2286a016c to resolve multiple…
labradorite-dev Feb 27, 2026
f826efd
ci(benchmark): post results to GHA job summary (#566)
labradorite-dev Feb 27, 2026
d0861d0
ci(benchmark): capture per-phase timings in GHA job summary (#566)
labradorite-dev Feb 27, 2026
864a8bd
fix(benchmark): use removesuffix to avoid mangling phase labels (#566)
labradorite-dev Feb 27, 2026
72a1810
docs(benchmark): simplify README; remove stale diagram and hardcoded …
labradorite-dev Feb 27, 2026
ed5a7e5
style: fix flake8 warnings in benchmark test and timing module (#566)
labradorite-dev Feb 27, 2026
9aae5a4
style: add missing docstrings to benchmark package, conftest, and sca…
labradorite-dev Feb 27, 2026
00f24ec
style: fix flake8 annotations from PR check (#566)
labradorite-dev Feb 27, 2026
35248e0
refactor(benchmark): extract summary heredoc to scripts/post_benchmar…
labradorite-dev Feb 28, 2026
80b77b8
docs(development): replace hardcoded dev db password with reference t…
labradorite-dev Feb 28, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
61 changes: 61 additions & 0 deletions .github/workflows/benchmark.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
name: Annotation Benchmark

on:
workflow_dispatch:
pull_request:
branches: [dev]
Copy link
Collaborator

@maxachis maxachis Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of having a benchmark that we can repeatedly review! But I'm not convinced we should have it run on every PR to dev -- main perhaps (which is not listed here), but I want to be sensitive to GitHub action usage limits. We may be nowhere near those limits (I'll ask @josh-chamberlain for deets on that), but it's something I want to flag for closer inspection.

The other thing is that I am assuming that relatively few code changes will impact annotation benchmarking, and in that context it may not be efficient to have a GitHub workflow solely dedicated to one (albeit important) part of the app that runs on every code change, because most of the time it will function just fine. The lint and test actions, for example, are always relevant; even if most of the code they test is unchanged, it can still fail on conceivably any change made. But that's not as probable with this action.

That might be an argument for a wider set of benchmarks, which would be a separate issue.

My first instinct is I like the benchmark, and I'm open to the idea of having GitHub actions for a more broad set of benchmarks. But I want to keep scope relatively narrow in this PR. I would either cut it for now or set it to workflow dispatch only.


jobs:
benchmark:
runs-on: ubuntu-latest
timeout-minutes: 30
container: python:3.11.9

services:
postgres:
image: postgres:15
env:
POSTGRES_PASSWORD: postgres
options: >-
--health-cmd pg_isready
--health-interval 10s
--health-timeout 5s
--health-retries 5

env:
POSTGRES_PASSWORD: postgres
POSTGRES_USER: postgres
POSTGRES_DB: postgres
POSTGRES_HOST: postgres
POSTGRES_PORT: 5432
GOOGLE_API_KEY: TEST
GOOGLE_CSE_ID: TEST

steps:
- name: Checkout repository
uses: actions/checkout@v4

- name: Install uv and set the python version
uses: astral-sh/setup-uv@v5

- name: Install the project
run: uv sync --locked --all-extras --dev

- name: Run benchmark tests
run: |
uv run pytest tests/automated/integration/benchmark \
-m "manual and benchmark" \
--benchmark-json=benchmark-results.json \
-v

- name: Post benchmark summary
run: uv run python scripts/post_benchmark_summary.py

- name: Upload benchmark results
uses: actions/upload-artifact@v4
with:
name: benchmark-results-${{ github.sha }}
path: |
benchmark-results.json
per-phase-results.json
retention-days: 90
Original file line number Diff line number Diff line change
Expand Up @@ -5,20 +5,21 @@
Create Date: 2026-02-15 12:57:34.550327

"""
from typing import Sequence, Union
from typing import Optional, Sequence, Union

from alembic import op

from src.util.alembic_helpers import switch_enum_type

# revision identifiers, used by Alembic.
revision: str = '1fb2286a016c'
down_revision: Union[str, None] = '759ce7d0772b'
down_revision: Optional[str] = '759ce7d0772b'
Copy link
Collaborator

@maxachis maxachis Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've no objections to this in principle, but do keep an eye on keeping scope focused to what's relevant to this issue! We're affecting a number of files as-is, and tighter PRs make things simpler to review.

branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None


def upgrade() -> None:
"""Add internet_archive to batch_strategy enum."""
switch_enum_type(
table_name="batches",
column_name="strategy",
Expand All @@ -38,6 +39,7 @@ def upgrade() -> None:


def downgrade() -> None:
"""Remove internet_archive from batch_strategy enum."""
op.execute("""
DELETE FROM BATCHES
WHERE STRATEGY = 'internet_archive'
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,246 @@
"""Materialize url_annotation_count_view and url_annotation_flags

Revision ID: c8e4f1a2b3d5
Revises: 759ce7d0772b
Create Date: 2026-02-26 00:00:00.000000

"""
from typing import Optional, Sequence, Union

from alembic import op


# revision identifiers, used by Alembic.
revision: str = 'c8e4f1a2b3d5'
down_revision: Optional[str] = '1fb2286a016c'
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None

_URL_ANNOTATION_COUNT_VIEW_SQL = """
WITH
auto_location_count AS (
SELECT
u_1.id,
count(anno.url_id) AS cnt
FROM
urls u_1
JOIN annotation__location__auto__subtasks anno
ON u_1.id = anno.url_id
GROUP BY
u_1.id
)
, auto_agency_count AS (
SELECT
u_1.id,
count(anno.url_id) AS cnt
FROM
urls u_1
JOIN annotation__agency__auto__subtasks anno
ON u_1.id = anno.url_id
GROUP BY
u_1.id
)
, auto_url_type_count AS (
SELECT
u_1.id,
count(anno.url_id) AS cnt
FROM
urls u_1
JOIN annotation__url_type__auto anno
ON u_1.id = anno.url_id
GROUP BY
u_1.id
)
, auto_record_type_count AS (
SELECT
u_1.id,
count(anno.url_id) AS cnt
FROM
urls u_1
JOIN annotation__record_type__auto anno
ON u_1.id = anno.url_id
GROUP BY
u_1.id
)
, user_location_count AS (
SELECT
u_1.id,
count(anno.url_id) AS cnt
FROM
urls u_1
JOIN annotation__location__user anno
ON u_1.id = anno.url_id
GROUP BY
u_1.id
)
, user_agency_count AS (
SELECT
u_1.id,
count(anno.url_id) AS cnt
FROM
urls u_1
JOIN annotation__agency__user anno
ON u_1.id = anno.url_id
GROUP BY
u_1.id
)
, user_url_type_count AS (
SELECT
u_1.id,
count(anno.url_id) AS cnt
FROM
urls u_1
JOIN annotation__url_type__user anno
ON u_1.id = anno.url_id
GROUP BY
u_1.id
)
, user_record_type_count AS (
SELECT
u_1.id,
count(anno.url_id) AS cnt
FROM
urls u_1
JOIN annotation__record_type__user anno
ON u_1.id = anno.url_id
GROUP BY
u_1.id
)
, anon_location_count AS (
SELECT
u_1.id,
count(anno.url_id) AS cnt
FROM
urls u_1
JOIN annotation__location__anon anno
ON u_1.id = anno.url_id
GROUP BY
u_1.id
)
, anon_agency_count AS (
SELECT
u_1.id,
count(anno.url_id) AS cnt
FROM
urls u_1
JOIN annotation__agency__anon anno
ON u_1.id = anno.url_id
GROUP BY
u_1.id
)
, anon_url_type_count AS (
SELECT
u_1.id,
count(anno.url_id) AS cnt
FROM
urls u_1
JOIN annotation__url_type__anon anno
ON u_1.id = anno.url_id
GROUP BY
u_1.id
)
, anon_record_type_count AS (
SELECT
u_1.id,
count(anno.url_id) AS cnt
FROM
urls u_1
JOIN annotation__record_type__anon anno
ON u_1.id = anno.url_id
GROUP BY
u_1.id
)
SELECT
u.id AS url_id,
COALESCE(auto_ag.cnt, 0::bigint) AS auto_agency_count,
COALESCE(auto_loc.cnt, 0::bigint) AS auto_location_count,
COALESCE(auto_rec.cnt, 0::bigint) AS auto_record_type_count,
COALESCE(auto_typ.cnt, 0::bigint) AS auto_url_type_count,
COALESCE(user_ag.cnt, 0::bigint) AS user_agency_count,
COALESCE(user_loc.cnt, 0::bigint) AS user_location_count,
COALESCE(user_rec.cnt, 0::bigint) AS user_record_type_count,
COALESCE(user_typ.cnt, 0::bigint) AS user_url_type_count,
COALESCE(anon_ag.cnt, 0::bigint) AS anon_agency_count,
COALESCE(anon_loc.cnt, 0::bigint) AS anon_location_count,
COALESCE(anon_rec.cnt, 0::bigint) AS anon_record_type_count,
COALESCE(anon_typ.cnt, 0::bigint) AS anon_url_type_count,
COALESCE(auto_ag.cnt, 0::bigint) + COALESCE(auto_loc.cnt, 0::bigint) + COALESCE(auto_rec.cnt, 0::bigint) +
COALESCE(auto_typ.cnt, 0::bigint) + COALESCE(user_ag.cnt, 0::bigint) + COALESCE(user_loc.cnt, 0::bigint) +
COALESCE(user_rec.cnt, 0::bigint) + COALESCE(user_typ.cnt, 0::bigint) + COALESCE(anon_ag.cnt, 0::bigint) +
COALESCE(anon_loc.cnt, 0::bigint) + COALESCE(anon_rec.cnt, 0::bigint) + COALESCE(anon_typ.cnt, 0::bigint) AS total_anno_count

FROM
urls u
LEFT JOIN auto_agency_count auto_ag
ON auto_ag.id = u.id
LEFT JOIN auto_location_count auto_loc
ON auto_loc.id = u.id
LEFT JOIN auto_record_type_count auto_rec
ON auto_rec.id = u.id
LEFT JOIN auto_url_type_count auto_typ
ON auto_typ.id = u.id
LEFT JOIN user_agency_count user_ag
ON user_ag.id = u.id
LEFT JOIN user_location_count user_loc
ON user_loc.id = u.id
LEFT JOIN user_record_type_count user_rec
ON user_rec.id = u.id
LEFT JOIN user_url_type_count user_typ
ON user_typ.id = u.id
LEFT JOIN anon_agency_count anon_ag
ON anon_ag.id = u.id
LEFT JOIN anon_location_count anon_loc
ON anon_loc.id = u.id
LEFT JOIN anon_record_type_count anon_rec
ON anon_rec.id = u.id
LEFT JOIN anon_url_type_count anon_typ
ON anon_typ.id = u.id
"""

_URL_ANNOTATION_FLAGS_SQL = """
SELECT u.id as url_id,
EXISTS (SELECT 1 FROM public.annotation__record_type__auto a WHERE a.url_id = u.id) AS has_auto_record_type_suggestion,
EXISTS (SELECT 1 FROM public.annotation__url_type__auto a WHERE a.url_id = u.id) AS has_auto_relevant_suggestion,
EXISTS (SELECT 1 FROM public.annotation__agency__auto__subtasks a WHERE a.url_id = u.id) AS has_auto_agency_suggestion,
EXISTS (SELECT 1 FROM public.annotation__location__auto__subtasks a WHERE a.url_id = u.id) AS has_auto_location_suggestion,
EXISTS (SELECT 1 FROM public.annotation__record_type__user a WHERE a.url_id = u.id) AS has_user_record_type_suggestion,
EXISTS (SELECT 1 FROM public.annotation__url_type__user a WHERE a.url_id = u.id) AS has_user_relevant_suggestion,
EXISTS (SELECT 1 FROM public.annotation__agency__user a WHERE a.url_id = u.id) AS has_user_agency_suggestion,
EXISTS (SELECT 1 FROM public.annotation__location__user a WHERE a.url_id = u.id) AS has_user_location_suggestion,
EXISTS (SELECT 1 FROM public.link_agencies__urls a WHERE a.url_id = u.id) AS has_confirmed_agency,
EXISTS (SELECT 1 FROM public.reviewing_user_url a WHERE a.url_id = u.id) AS was_reviewed
FROM urls u
"""


def upgrade() -> None:
"""Convert url_annotation_count_view and url_annotation_flags to materialized views."""
# Drop regular views
op.execute("DROP VIEW IF EXISTS url_annotation_count_view")
op.execute("DROP VIEW IF EXISTS url_annotation_flags")

# Recreate as materialized views
op.execute(
f"CREATE MATERIALIZED VIEW url_annotation_count_view AS {_URL_ANNOTATION_COUNT_VIEW_SQL}"
)
op.execute(
f"CREATE MATERIALIZED VIEW url_annotation_flags AS {_URL_ANNOTATION_FLAGS_SQL}"
)

# Unique indexes required for REFRESH MATERIALIZED VIEW CONCURRENTLY
op.execute("CREATE UNIQUE INDEX ON url_annotation_count_view (url_id)")
op.execute("CREATE UNIQUE INDEX ON url_annotation_flags (url_id)")


def downgrade() -> None:
"""Revert url_annotation_count_view and url_annotation_flags to regular views."""
op.execute("DROP MATERIALIZED VIEW IF EXISTS url_annotation_count_view")
op.execute("DROP MATERIALIZED VIEW IF EXISTS url_annotation_flags")

# Recreate as regular views
op.execute(
f"CREATE VIEW url_annotation_count_view AS {_URL_ANNOTATION_COUNT_VIEW_SQL}"
)
op.execute(
f"CREATE OR REPLACE VIEW url_annotation_flags AS ({_URL_ANNOTATION_FLAGS_SQL})"
)
4 changes: 2 additions & 2 deletions docs/development.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,14 +35,14 @@ At minimum, you need the database connection variables:

```dotenv
POSTGRES_USER=test_source_collector_user
POSTGRES_PASSWORD=HanviliciousHamiltonHilltops
POSTGRES_PASSWORD=<see local_database/docker-compose.yml>
POSTGRES_DB=source_collector_test_db
POSTGRES_HOST=127.0.0.1
POSTGRES_PORT=5432
DEV=true
```

These match the defaults in `local_database/docker-compose.yml`.
The password and other defaults are defined in `local_database/docker-compose.yml`.

### API Keys

Expand Down
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,7 @@ dev = [
"pytest>=7.2.2",
"pytest-asyncio~=0.25.2",
"pytest-mock==3.12.0",
"pytest-benchmark~=5.2",
"pytest-timeout~=2.3.1",
"vulture>=2.14",
]
Expand Down
1 change: 1 addition & 0 deletions pytest.ini
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,5 @@ timeout = 300
asyncio_default_fixture_loop_scope=function
markers =
manual: mark test as manual-only (excluded from default test runs)
benchmark: mark test as a performance benchmark (subset of manual)
asyncio_mode = auto
Loading