Skip to content

Centralize materialized view registry and refactor refresh task#292

Open
shlokgilda wants to merge 4 commits into
chaoss:mainfrom
shlokgilda:feature/materialized-views-registry
Open

Centralize materialized view registry and refactor refresh task#292
shlokgilda wants to merge 4 commits into
chaoss:mainfrom
shlokgilda:feature/materialized-views-registry

Conversation

@shlokgilda

@shlokgilda shlokgilda commented May 5, 2026

Copy link
Copy Markdown
Contributor

Description:

Introduces a single source of truth for all 15 existing PostgreSQL materialized view definitions and replaces a fragile hardcoded refresh task with a dynamic loop.

  • add collectoss/application/db/materialized_views.py — registry of all 15 existing materialized views (SQL definitions + unique index columns). get_refresh_sql() helper builds REFRESH MATERIALIZED VIEW statements.
  • refactor collectoss/tasks/db/refresh_materialized_views.py — replaces 14 hardcoded try/except: pass blocks with a dynamic loop over the registry. Uses an AUTOCOMMIT connection so REFRESH CONCURRENTLY actually works
    (it cannot run inside a transaction block, which the old code silently violated). Raises RuntimeError at the end if any view failed both concurrent and non-concurrent refresh, so failures are visible in Celery instead of swallowed.
  • wire alembic_utils into collectoss/application/schema/alembic/env.py — registers all views from the registry so alembic revision --autogenerate detects SQL definition changes automatically (phase 2 of Add new materialized views for heatmaps on 8knot (and make a system for it) #243).
  • add alembic-utils==0.8.8 to pyproject.toml and regenerate uv.lock.

No schema changes in this PR. The 3 new heatmap views for 8Knot follow in a separate incremental PR on top of this one.

This PR fixes (partial — heatmap views follow separately) #243

Notes for Reviewers:

The key correctness fix worth understanding: REFRESH MATERIALIZED VIEW CONCURRENTLY which wraps everything in engine.begin(), so every concurrent refresh was silently failing and falling back to a blocking refresh on every run. This PR fixes that by using engine.connect().execution_options(isolation_level="AUTOCOMMIT") directly.

The ; COMMIT; embedded in the old SQL strings has also been removed — those were breaking SQLAlchemy's transaction management.

First post-deploy alembic revision --autogenerate may propose a no-op normalization revision for views whose SQL Postgres normalizes differently from what's in the registry (Postgres reformats SQL on storage). Safe to discard that revision.

Signed commits

  • Yes, I signed my commits.

AI Disclosure: Claude Code was used to draft this PR description and write docstrings in the new materialized_views.py module. I reviewed and verified all code changes, SQL definitions, and fixes before committing.

@shlokgilda shlokgilda requested a review from MoralCode as a code owner May 5, 2026 05:02
Comment thread collectoss/application/db/materialized_views.py Outdated
@MoralCode MoralCode added this to the v1.1 Migration Release milestone May 5, 2026
Comment thread collectoss/application/db/models/augur_data.py
Comment thread collectoss/tasks/git/facade_tasks.py
Comment thread collectoss/tasks/db/refresh_materialized_views.py
@MoralCode

Copy link
Copy Markdown
Contributor

How easily would this method handle a move of all materialized views to their own db schema?

@MoralCode MoralCode force-pushed the feature/materialized-views-registry branch from 37f7acd to 0db2f05 Compare May 18, 2026 18:08
@MoralCode

MoralCode commented May 18, 2026

Copy link
Copy Markdown
Contributor

im taking this over/picking up this issue on behalf of shlok, just rebased it and made an adjustment to more intelligently handle jumping straight to a non-concurrent refresh when there is no index (indicating we can run concurrently)

@MoralCode MoralCode moved this to In Progress in CollectOSS Feature Roadmap May 18, 2026
@MoralCode MoralCode moved this from In Progress to Paused in CollectOSS Feature Roadmap May 27, 2026
@MoralCode

Copy link
Copy Markdown
Contributor

Note to self: materialized views should be moved to a schema called stable

@MoralCode MoralCode force-pushed the feature/materialized-views-registry branch from 0db2f05 to b9b7f02 Compare June 9, 2026 22:36
@MoralCode

Copy link
Copy Markdown
Contributor

need to also test an actual refresh cycle for this

@MoralCode MoralCode force-pushed the feature/materialized-views-registry branch from b9b7f02 to 2a8f113 Compare June 10, 2026 23:19
shlokgilda and others added 4 commits June 10, 2026 19:20
- add collectoss/application/db/materialized_views.py: single source of
  truth for all 15 existing materialized view definitions (SQL + unique
  index columns). get_refresh_sql() helper constructs REFRESH statements.

- refactor collectoss/tasks/db/refresh_materialized_views.py: replace 14
  hardcoded try/except-pass blocks with a dynamic loop over the registry.
  uses an AUTOCOMMIT connection so REFRESH CONCURRENTLY works correctly
  (it cannot run inside a transaction block). raises RuntimeError if any
  view fails both concurrent and non-concurrent refresh.

- wire alembic_utils into collectoss/application/schema/alembic/env.py:
  register all views from the registry so that alembic revision
  --autogenerate detects SQL definition changes automatically.

- add alembic-utils==0.8.8 to pyproject.toml and regenerate uv.lock.

closes chaoss#243 (partial — heatmap views follow in a separate PR)

Signed-off-by: Shlok Gilda <gildashlok@hotmail.com>
Replace the list-of-dicts registry with a frozen MaterializedView dataclass
exposing fqn, refresh_sql(), and to_pg_view(). Brings the registry's shape
in line with the declarative ORM style used elsewhere in the codebase and
gives callers attribute access + type checking instead of string-keyed
dict lookups.

unique_index_columns is a tuple so frozen=True actually means immutable.
__repr__ is overridden to keep the multi-hundred-line view SQL out of
debug logs. Refresh task and alembic env.py updated to use the new API;
get_refresh_sql free function removed (only two call sites).

Emitted REFRESH SQL is byte-identical to the previous version.
…e correct way based on whether indexes exist or not

Signed-off-by: Adrian Edwards <adredwar@redhat.com>
Signed-off-by: Adrian Edwards <adredwar@redhat.com>
@MoralCode MoralCode force-pushed the feature/materialized-views-registry branch from 2a8f113 to fd9a862 Compare June 10, 2026 23:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Paused/Background

Development

Successfully merging this pull request may close these issues.

2 participants