Skip to content
Merged

Dev #488

Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
299 commits
Select commit Hold shift + click to select a range
15e8bee
Add scraping logic for non pending URLs
maxachis Jul 31, 2025
e92cd66
Clean up logic, refactor URL Requests Interface, begin setting up pro…
maxachis Aug 1, 2025
20f1f9b
Finish draft of Probe Task logic
maxachis Aug 1, 2025
0c8c5eb
Begin draft of test logic
maxachis Aug 1, 2025
24f2cac
Finish tests for URL Probe
maxachis Aug 1, 2025
ab3071e
Adjust URL Html Task logic.
maxachis Aug 1, 2025
b7a0af0
Add task to loader
maxachis Aug 1, 2025
7a78aed
Fix bugs and refine
maxachis Aug 2, 2025
98edd9a
Refactor
maxachis Aug 2, 2025
158f211
Refine HTML task
maxachis Aug 3, 2025
7b80acf
Fix broken imports
maxachis Aug 3, 2025
284eb66
fix bug when checking for marked as 404
maxachis Aug 3, 2025
073b247
Merge pull request #355 from Police-Data-Accessibility-Project/mc_89_…
maxachis Aug 3, 2025
6342a21
Add check constraint for status code
maxachis Aug 3, 2025
01b927d
Add limit of 500 for task at a time.
maxachis Aug 3, 2025
e305f73
Begin draft of URL Probe
maxachis Aug 8, 2025
8a9981a
Temporarily disable url probe task
maxachis Aug 8, 2025
58b7766
Temporarily disable url probe task
maxachis Aug 8, 2025
f3cf21c
Finish up new URL Probe tasks
maxachis Aug 9, 2025
bfc0998
Latest draft of URL Probe
maxachis Aug 9, 2025
a4362a0
Disable URL Probe task
maxachis Aug 9, 2025
8c9f5ed
Fix broken tests.
maxachis Aug 9, 2025
f14cab0
Merge pull request #359 from Police-Data-Accessibility-Project/mc_356…
maxachis Aug 9, 2025
00e7d27
Remove functional duplicates and set up constraints forbidding fragme…
maxachis Aug 10, 2025
1c6ee24
Merge pull request #361 from Police-Data-Accessibility-Project/mc_360…
maxachis Aug 10, 2025
789caea
Add feature flags for URL tasks
maxachis Aug 10, 2025
cda63ee
Add feature flags for URL tasks
maxachis Aug 10, 2025
ac69495
Fix bugs in test
maxachis Aug 10, 2025
5791951
Remove inconsistent test
maxachis Aug 10, 2025
50efe74
Merge pull request #364 from Police-Data-Accessibility-Project/mc_352…
maxachis Aug 10, 2025
d4fe41f
Clean collector URLs
maxachis Aug 11, 2025
fdec9c3
Fix broken imports
maxachis Aug 11, 2025
257c216
Merge pull request #365 from Police-Data-Accessibility-Project/mc_362…
maxachis Aug 11, 2025
83d88d5
Change `url.outcome` to `url.status`
maxachis Aug 11, 2025
9f7eebc
Merge pull request #366 from Police-Data-Accessibility-Project/mc_335…
maxachis Aug 11, 2025
8ccd1b6
Change name of url_data_sources to url_data_source
maxachis Aug 11, 2025
35e8da9
Merge pull request #367 from Police-Data-Accessibility-Project/mc_336…
maxachis Aug 11, 2025
a2d2ba8
Remove agencies_ds_updated_at
maxachis Aug 11, 2025
5cafa58
Merge pull request #368 from Police-Data-Accessibility-Project/mc_334…
maxachis Aug 11, 2025
06dec6e
Finesse new URL Probe logic
maxachis Aug 11, 2025
55bcff0
Merge pull request #369 from Police-Data-Accessibility-Project/mc_356…
maxachis Aug 11, 2025
a63c670
Set DELETE_OLD_LOGS scheduled task to occur first.
maxachis Aug 11, 2025
5115b3a
Deprecate URL Duplicate Task
maxachis Aug 11, 2025
9ca186c
Fix bugs in test and imports
maxachis Aug 11, 2025
f283b82
Merge pull request #370 from Police-Data-Accessibility-Project/mc_363…
maxachis Aug 11, 2025
58edb2e
Remove root URL cache and rename `db.model.instantiations` to `db.mod…
maxachis Aug 12, 2025
e5bf317
Begin draft of Root URL Task
maxachis Aug 12, 2025
8fe7b81
Add draft for operator
maxachis Aug 12, 2025
ed49821
Finish draft of automated tests
maxachis Aug 12, 2025
e221579
Finishing touches to Root URL Task
maxachis Aug 12, 2025
6616944
Merge pull request #372 from Police-Data-Accessibility-Project/mc_340…
maxachis Aug 12, 2025
76e3391
Finishing touches on Push to Huggingface Task
maxachis Aug 12, 2025
ab4a1fc
Merge pull request #373 from Police-Data-Accessibility-Project/mc_89_…
maxachis Aug 12, 2025
d2b795c
Add handling for ServerDisconnectedError
maxachis Aug 12, 2025
45b1ae7
Break up Huggingface Upload
maxachis Aug 12, 2025
f689718
Merge pull request #374 from Police-Data-Accessibility-Project/mc_89_…
maxachis Aug 12, 2025
bb23f5a
Revise URL Relevancy task
maxachis Aug 12, 2025
fbd88e8
Merge pull request #377 from Police-Data-Accessibility-Project/mc_376…
maxachis Aug 12, 2025
1d6d0a0
Begin draft of IA task
maxachis Aug 14, 2025
be4b277
Continue Internet Archive Draft
maxachis Aug 14, 2025
1f01391
Continue draft
maxachis Aug 16, 2025
88da28f
Set up Internet Archive Probe Task
maxachis Aug 17, 2025
2045aaa
Merge pull request #380 from Police-Data-Accessibility-Project/mc_378…
maxachis Aug 17, 2025
8fdd9b4
Finish setting up IA Save Task
maxachis Aug 19, 2025
0b335f8
Add test environment variable for INTERNET_ARCHIVE_S3_KEYS
maxachis Aug 19, 2025
3de27c1
Add test environment variable for INTERNET_ARCHIVE_S3_KEYS
maxachis Aug 19, 2025
28937d5
Merge pull request #383 from Police-Data-Accessibility-Project/mc_378…
maxachis Aug 19, 2025
124ca7d
Add test environment variable for INTERNET_ARCHIVE_S3_KEYS
maxachis Aug 19, 2025
aa1822f
Continue draft
maxachis Aug 21, 2025
e32c8ec
Progress draft
maxachis Aug 25, 2025
85b134f
Fix last tests
maxachis Aug 25, 2025
f47dbea
Continue draft
maxachis Aug 25, 2025
12eee24
Continue draft
maxachis Aug 26, 2025
2f08da1
Continue draft
maxachis Aug 26, 2025
497be00
/
maxachis Aug 28, 2025
fa63ec5
.
maxachis Aug 28, 2025
4968ab1
Add draft of Meta URL sync logic
maxachis Aug 29, 2025
b8749a4
Continue draft
maxachis Aug 30, 2025
7ae95c9
Continue draft
maxachis Aug 30, 2025
8bbefe5
Continue draft
maxachis Aug 30, 2025
0c760e2
Finish automated tests
maxachis Aug 30, 2025
01f7a50
Update draft
maxachis Sep 1, 2025
2bdaf1d
Continue draft
maxachis Sep 4, 2025
a8acbda
Update Draft
maxachis Sep 4, 2025
0dfb272
Continue Draft
maxachis Sep 4, 2025
db770be
Update Draft
maxachis Sep 5, 2025
e86e589
Resolve existing tests
maxachis Sep 6, 2025
e36bf18
Continue draft
maxachis Sep 6, 2025
2ac254e
Begin setting up Homepage CTE and additional views
maxachis Sep 6, 2025
fd16c86
Continue Draft
maxachis Sep 6, 2025
cd48315
Continue Draft
maxachis Sep 6, 2025
d07dfe5
Finish auto tests for homepage match
maxachis Sep 7, 2025
ef12a5c
Add framework of test for nlp
maxachis Sep 8, 2025
0471f15
Continue Draft
maxachis Sep 8, 2025
0346817
Continue draft
maxachis Sep 9, 2025
e3af970
Continue draft
maxachis Sep 9, 2025
008ab74
Continue Draft
maxachis Sep 11, 2025
f07b388
Continue draft
maxachis Sep 11, 2025
dd21a9c
Continue Draft
maxachis Sep 11, 2025
52abc9c
Finish draft
maxachis Sep 12, 2025
833f493
Merge pull request #404 from Police-Data-Accessibility-Project/mc_381…
maxachis Sep 12, 2025
05f7837
Fix bug in `sync_agencies`
maxachis Sep 12, 2025
91df8e8
Merge pull request #406 from Police-Data-Accessibility-Project/mc_381…
maxachis Sep 12, 2025
c6742d7
Bug fix and change configuration for NLP processor
maxachis Sep 12, 2025
63fa598
Fix bug when posting to Discord with large amounts and set up POST_TO…
maxachis Sep 12, 2025
3adc268
Merge pull request #411 from Police-Data-Accessibility-Project/mc_381…
maxachis Sep 12, 2025
2ea0b1e
Set up progress bar task flag
maxachis Sep 12, 2025
98b8791
Merge pull request #412 from Police-Data-Accessibility-Project/mc_408…
maxachis Sep 12, 2025
4182816
Set up progress bar task flag
maxachis Sep 12, 2025
54874a5
Merge pull request #413 from Police-Data-Accessibility-Project/mc_410…
maxachis Sep 12, 2025
0696e58
Begin draft
maxachis Sep 13, 2025
6f2ab38
Fix bug in URL Submit Approved Task, update test
maxachis Sep 13, 2025
8920279
Merge pull request #418 from Police-Data-Accessibility-Project/mc_416…
maxachis Sep 13, 2025
dcd2442
Update imports
maxachis Sep 13, 2025
4ad3a2d
Update imports
maxachis Sep 13, 2025
4706717
Continue draft
maxachis Sep 13, 2025
98a4546
Finish initial draft
maxachis Sep 14, 2025
4f9e61f
Adjust test
maxachis Sep 14, 2025
842ffc7
Fix bug in test
maxachis Sep 14, 2025
16dcd86
Merge pull request #420 from Police-Data-Accessibility-Project/mc_414…
maxachis Sep 14, 2025
ff0589b
Set task URL limit to 25
maxachis Sep 14, 2025
d463a2c
Bug fix and change configuration for NLP processor
maxachis Sep 15, 2025
53094c4
Add location tables
maxachis Sep 15, 2025
9ff4f4d
Merge remote-tracking branch 'origin/mc_423_add_location_tables' into…
maxachis Sep 15, 2025
6de7eca
Merge pull request #424 from Police-Data-Accessibility-Project/mc_423…
maxachis Sep 15, 2025
30560c2
Add location annotation database components
maxachis Sep 16, 2025
489c12c
Add location annotation database components
maxachis Sep 16, 2025
e830566
Update `annotate/all` `GET` logic and tests
maxachis Sep 16, 2025
ef84df3
Continue draft
maxachis Sep 16, 2025
91f2ebd
Begin splitting up Location Tasks
maxachis Sep 18, 2025
3a62dfd
Continue draft
maxachis Sep 18, 2025
c99c221
Finish Location Annotation Draft
maxachis Sep 21, 2025
768b1d6
Merge pull request #428 from Police-Data-Accessibility-Project/mc_425…
maxachis Sep 21, 2025
9a69a9a
Complete pre-auto validate draft
maxachis Sep 22, 2025
bfe9575
Merge pull request #429 from Police-Data-Accessibility-Project/mc_422…
maxachis Sep 22, 2025
46f01e0
Remove outdated unique constraints for suggestions
maxachis Sep 22, 2025
d4a5f36
Correct validation for confidence auto suggestion
maxachis Sep 22, 2025
d7c0051
Add conditional for when record type is none (i.e., meta url)
maxachis Sep 22, 2025
b6fc231
Begin auto-validate draft
maxachis Sep 22, 2025
09e50d6
Update Screenshot constants -- add compression quality
maxachis Sep 22, 2025
c271873
Continue draft
maxachis Sep 23, 2025
ba2a6f6
Continue draft
maxachis Sep 23, 2025
6755bd0
Continue draft
maxachis Sep 23, 2025
0b0e730
Finish initial draft
maxachis Sep 23, 2025
0b8aa60
Finish draft
maxachis Sep 24, 2025
4c66219
Require URLs to have names prior to submission.
maxachis Sep 24, 2025
4d000ae
Fix error in unit test for Individual record
maxachis Sep 24, 2025
4936e5a
Merge pull request #431 from Police-Data-Accessibility-Project/mc_422…
maxachis Sep 24, 2025
3bb0ffc
Fix underlying code kink
maxachis Sep 24, 2025
d4a6e9d
Merge pull request #437 from Police-Data-Accessibility-Project/mc_fix…
maxachis Sep 24, 2025
40a47fa
Add logic for adding automatic URL name suggestions.
maxachis Sep 25, 2025
fd9abbd
Merge pull request #438 from Police-Data-Accessibility-Project/mc_fix…
maxachis Sep 25, 2025
ac66d9e
Continue draft
maxachis Sep 25, 2025
dcbd185
Finish draft of adding annotation name logic
maxachis Sep 25, 2025
992a6e3
Merge pull request #439 from Police-Data-Accessibility-Project/mc_432…
maxachis Sep 25, 2025
b0bfd11
Fix bugs
maxachis Sep 25, 2025
bfd88a7
Merge pull request #440 from Police-Data-Accessibility-Project/mc_432…
maxachis Sep 25, 2025
777321f
Create /search/agency endpoint with test
maxachis Sep 25, 2025
c13f9ce
Create /search/agency endpoint with test
maxachis Sep 25, 2025
3921563
Merge pull request #441 from Police-Data-Accessibility-Project/mc_425…
maxachis Sep 25, 2025
3026bed
Add dependent location logic
maxachis Sep 26, 2025
8292b10
Merge pull request #442 from Police-Data-Accessibility-Project/mc_425…
maxachis Sep 26, 2025
df49b4a
Relax requirements for some URL types
maxachis Sep 26, 2025
89fec42
Fix tests
maxachis Sep 26, 2025
a2e8c16
Fix tests
maxachis Sep 26, 2025
54b1caf
Merge pull request #443 from Police-Data-Accessibility-Project/mc_425…
maxachis Sep 26, 2025
248cb91
Add annotation suggestions for Record Type and URL Type
maxachis Sep 26, 2025
1cfe599
Fix test
maxachis Sep 26, 2025
8194a3b
Merge pull request #444 from Police-Data-Accessibility-Project/mc_425…
maxachis Sep 26, 2025
b35e6cc
Add additional agency attributes
maxachis Sep 26, 2025
5ce8203
Merge pull request #445 from Police-Data-Accessibility-Project/mc_425…
maxachis Sep 26, 2025
d4928ac
Add filter for jurisdiction type
maxachis Sep 26, 2025
c543419
Merge pull request #446 from Police-Data-Accessibility-Project/mc_425…
maxachis Sep 26, 2025
4974049
Add new_agency_suggestion table and update locations expanded view
maxachis Sep 26, 2025
b45789a
Merge pull request #447 from Police-Data-Accessibility-Project/mc_425…
maxachis Sep 26, 2025
7dcf18e
Add annotation logic for new agency suggestion
maxachis Sep 26, 2025
284edc3
Merge pull request #448 from Police-Data-Accessibility-Project/mc_425…
maxachis Sep 26, 2025
c94ce77
Add link table for new agency suggestions
maxachis Sep 27, 2025
16c58a3
Merge pull request #449 from Police-Data-Accessibility-Project/mc_425…
maxachis Sep 27, 2025
7943554
Add additional attributes for agency search
maxachis Sep 27, 2025
d5971d7
Merge pull request #450 from Police-Data-Accessibility-Project/mc_425…
maxachis Sep 27, 2025
72e5261
Update Auto Validate to also require a settled name
maxachis Sep 28, 2025
faaa616
Merge pull request #451 from Police-Data-Accessibility-Project/mc_425…
maxachis Sep 28, 2025
f20d44c
Begin draft
maxachis Sep 29, 2025
b59dc5b
Add logic for missing locations/agencies, and URL suspension
maxachis Sep 29, 2025
6119755
Add suspend URL task to URL task loader
maxachis Sep 29, 2025
7a06472
Merge pull request #453 from Police-Data-Accessibility-Project/mc_452…
maxachis Sep 29, 2025
1861926
Add filtering by URL ID
maxachis Sep 30, 2025
902161e
Merge pull request #454 from Police-Data-Accessibility-Project/mc_396…
maxachis Sep 30, 2025
7e6e4c7
Begin draft
maxachis Sep 30, 2025
269985c
Add `submit/url` endpoint
maxachis Sep 30, 2025
30e1510
Merge pull request #457 from Police-Data-Accessibility-Project/mc_455…
maxachis Sep 30, 2025
fd6a76b
Continue draft
maxachis Oct 3, 2025
345862b
Continue draft
maxachis Oct 3, 2025
f09d881
Finish initial draft of `submit_meta_urls` task
maxachis Oct 3, 2025
65f4c97
Merge pull request #458 from Police-Data-Accessibility-Project/mc_sub…
maxachis Oct 3, 2025
b886dfb
Fix bug with wrong task status assigned.
maxachis Oct 3, 2025
9ffa9a9
Fix bugs
maxachis Oct 3, 2025
11c1967
Merge pull request #459 from Police-Data-Accessibility-Project/mc_sub…
maxachis Oct 3, 2025
69e1b33
Add task cleanup task
maxachis Oct 3, 2025
d100466
Adjust test
maxachis Oct 3, 2025
c39ae11
Merge pull request #460 from Police-Data-Accessibility-Project/mc_385…
maxachis Oct 3, 2025
a980f46
Continue draft
maxachis Oct 4, 2025
887b859
Add task cleanup and revise
maxachis Oct 4, 2025
58153ab
Merge pull request #462 from Police-Data-Accessibility-Project/mc_385…
maxachis Oct 4, 2025
558dd10
Remove agency locational information and review endpoint and logic
maxachis Oct 4, 2025
694a437
Merge pull request #463 from Police-Data-Accessibility-Project/mc_426…
maxachis Oct 4, 2025
85c15d3
Remove unused batch columns
maxachis Oct 4, 2025
1c9be51
Remove unused columns in batch creation
maxachis Oct 4, 2025
86a2f7c
Merge pull request #464 from Police-Data-Accessibility-Project/mc_426…
maxachis Oct 4, 2025
f328ed2
Add leaderboard and user contribution endpoints
maxachis Oct 5, 2025
8ab17d9
Merge pull request #465 from Police-Data-Accessibility-Project/mc_426…
maxachis Oct 5, 2025
b343495
Add URL Task Count views
maxachis Oct 5, 2025
e2a2823
Merge remote-tracking branch 'origin/dev' into dev
maxachis Oct 5, 2025
b72f6c6
Update Internet Archives Save
maxachis Oct 5, 2025
d3aab61
Merge pull request #467 from Police-Data-Accessibility-Project/mc_387…
maxachis Oct 5, 2025
b74fc4f
Update Internet Archives Save
maxachis Oct 5, 2025
6505ad5
Merge pull request #468 from Police-Data-Accessibility-Project/mc_387…
maxachis Oct 5, 2025
7c47373
Update IA Probe Task
maxachis Oct 5, 2025
066257c
Merge pull request #469 from Police-Data-Accessibility-Project/mc_386…
maxachis Oct 5, 2025
6638460
Fix bug in Delete Stale Screenshots Task
maxachis Oct 5, 2025
62242a0
Merge pull request #470 from Police-Data-Accessibility-Project/mc_386…
maxachis Oct 5, 2025
275e2c1
Change display name to full display name
maxachis Oct 5, 2025
62be32a
Merge pull request #471 from Police-Data-Accessibility-Project/mc_386…
maxachis Oct 5, 2025
d62f2f5
Add standalone route for agency suggestions
maxachis Oct 5, 2025
ece0305
Merge pull request #473 from Police-Data-Accessibility-Project/mc_386…
maxachis Oct 5, 2025
9cf810f
Add batch agency/location logic, modify AutoGoogler to utilize
maxachis Oct 11, 2025
64770b4
Merge pull request #477 from Police-Data-Accessibility-Project/mc_260…
maxachis Oct 11, 2025
44d3bfb
Add batch link subtasks for location/agency id tasks
maxachis Oct 11, 2025
444bf01
Merge pull request #478 from Police-Data-Accessibility-Project/mc_260…
maxachis Oct 11, 2025
24edf04
Add description for `get_user_contributions`
maxachis Oct 11, 2025
355d370
Merge pull request #479 from Police-Data-Accessibility-Project/mc_476…
maxachis Oct 11, 2025
4973760
Update users aggregated endpoint and add URL status materialized view
maxachis Oct 12, 2025
6324678
Merge pull request #480 from Police-Data-Accessibility-Project/mc_393…
maxachis Oct 12, 2025
240647a
Update internet archive save logic
maxachis Oct 12, 2025
22925fa
Merge pull request #481 from Police-Data-Accessibility-Project/mc_393…
maxachis Oct 12, 2025
3254f68
Consolidate 404 Probe into URL Probe Task
maxachis Oct 12, 2025
1387033
Merge branch 'dev' into mc_358_remove_404_probe_enhance_url_probe
maxachis Oct 12, 2025
cd0fd35
Fix alembic bugs
maxachis Oct 12, 2025
bfae051
Merge pull request #482 from Police-Data-Accessibility-Project/mc_358…
maxachis Oct 12, 2025
cbae3f6
Update batch status for `/batch` `GET`
maxachis Oct 12, 2025
3f7b94d
Add record type metrics count logic
maxachis Oct 12, 2025
e0a3dba
Merge pull request #483 from Police-Data-Accessibility-Project/mc_30_…
maxachis Oct 12, 2025
67d86eb
Remove Contact Info and Agency Meta Record Type
maxachis Oct 13, 2025
1aa6603
Merge pull request #485 from Police-Data-Accessibility-Project/mc_484…
maxachis Oct 13, 2025
d45f889
Add anonymous annotation endpoint
maxachis Oct 14, 2025
09166d6
Merge branch 'dev' into mc_434_anonymous_annotation_submissions
maxachis Oct 14, 2025
ef55307
Merge and fix alembic chain
maxachis Oct 14, 2025
8aaaee7
Merge pull request #487 from Police-Data-Accessibility-Project/mc_434…
maxachis Oct 14, 2025
cc581bf
Fix bug in agency contributions
maxachis Oct 14, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
2 changes: 2 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@ RUN uv sync --locked --no-dev
# Must call from the root directory because uv does not add playwright to path
RUN playwright install-deps chromium
RUN playwright install chromium
# Download Spacy Model
RUN python -m spacy download en_core_web_sm

# Copy project files
COPY src ./src
Expand Down
129 changes: 110 additions & 19 deletions ENV.md

Large diffs are not rendered by default.

5 changes: 2 additions & 3 deletions alembic/env.py
Original file line number Diff line number Diff line change
@@ -1,13 +1,12 @@
import logging
from datetime import datetime
from logging.config import fileConfig

from alembic import context
from sqlalchemy import engine_from_config
from sqlalchemy import pool

from src.db.helpers import get_postgres_connection_string
from src.db.models.templates import Base
from src.db.helpers.connect import get_postgres_connection_string
from src.db.models.templates_.base import Base

# this is the Alembic Config object, which provides
# access to the values within the .ini file in use.
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,285 @@
"""Setup for sync data sources task

Revision ID: 59d2af1bab33
Revises: 9552d354ccf4
Create Date: 2025-07-21 06:37:51.043504

"""
from typing import Sequence, Union

from alembic import op
import sqlalchemy as sa
from sqlalchemy.dialects.postgresql import JSONB

from src.util.alembic_helpers import switch_enum_type, id_column

# revision identifiers, used by Alembic.
revision: str = '59d2af1bab33'
down_revision: Union[str, None] = '9552d354ccf4'
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None

SYNC_STATE_TABLE_NAME = "data_sources_sync_state"
URL_DATA_SOURCES_METADATA_TABLE_NAME = "url_data_sources_metadata"

CONFIRMED_AGENCY_TABLE_NAME = "confirmed_url_agency"
LINK_URLS_AGENCIES_TABLE_NAME = "link_urls_agencies"
CHANGE_LOG_TABLE_NAME = "change_log"

AGENCIES_TABLE_NAME = "agencies"

TABLES_TO_LOG = [
LINK_URLS_AGENCIES_TABLE_NAME,
"urls",
"url_data_sources",
"agencies",
]

OperationTypeEnum = sa.Enum("UPDATE", "DELETE", "INSERT", name="operation_type")


def upgrade() -> None:

Check warning on line 41 in alembic/versions/2025_07_21_0637-59d2af1bab33_setup_for_sync_data_sources_task.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] alembic/versions/2025_07_21_0637-59d2af1bab33_setup_for_sync_data_sources_task.py#L41 <103>

Missing docstring in public function
Raw output
./alembic/versions/2025_07_21_0637-59d2af1bab33_setup_for_sync_data_sources_task.py:41:1: D103 Missing docstring in public function
_create_data_sources_sync_state_table()
_create_data_sources_sync_task()

_rename_confirmed_url_agency_to_link_urls_agencies()
_create_change_log_table()
_add_jsonb_diff_val_function()
_create_log_table_changes_trigger()


_add_table_change_log_triggers()

Check failure on line 51 in alembic/versions/2025_07_21_0637-59d2af1bab33_setup_for_sync_data_sources_task.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] alembic/versions/2025_07_21_0637-59d2af1bab33_setup_for_sync_data_sources_task.py#L51 <303>

too many blank lines (2)
Raw output
./alembic/versions/2025_07_21_0637-59d2af1bab33_setup_for_sync_data_sources_task.py:51:5: E303 too many blank lines (2)
_add_agency_id_column()



def downgrade() -> None:

Check warning on line 56 in alembic/versions/2025_07_21_0637-59d2af1bab33_setup_for_sync_data_sources_task.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] alembic/versions/2025_07_21_0637-59d2af1bab33_setup_for_sync_data_sources_task.py#L56 <103>

Missing docstring in public function
Raw output
./alembic/versions/2025_07_21_0637-59d2af1bab33_setup_for_sync_data_sources_task.py:56:1: D103 Missing docstring in public function

Check failure on line 56 in alembic/versions/2025_07_21_0637-59d2af1bab33_setup_for_sync_data_sources_task.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] alembic/versions/2025_07_21_0637-59d2af1bab33_setup_for_sync_data_sources_task.py#L56 <303>

too many blank lines (3)
Raw output
./alembic/versions/2025_07_21_0637-59d2af1bab33_setup_for_sync_data_sources_task.py:56:1: E303 too many blank lines (3)
_drop_data_sources_sync_task()
_drop_data_sources_sync_state_table()
_drop_change_log_table()
_drop_table_change_log_triggers()
_drop_jsonb_diff_val_function()
_drop_log_table_changes_trigger()

_rename_link_urls_agencies_to_confirmed_url_agency()

OperationTypeEnum.drop(op.get_bind())
_drop_agency_id_column()



def _add_jsonb_diff_val_function() -> None:

Check failure on line 71 in alembic/versions/2025_07_21_0637-59d2af1bab33_setup_for_sync_data_sources_task.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] alembic/versions/2025_07_21_0637-59d2af1bab33_setup_for_sync_data_sources_task.py#L71 <303>

too many blank lines (3)
Raw output
./alembic/versions/2025_07_21_0637-59d2af1bab33_setup_for_sync_data_sources_task.py:71:1: E303 too many blank lines (3)
op.execute(
"""
CREATE OR REPLACE FUNCTION jsonb_diff_val(val1 JSONB, val2 JSONB)
RETURNS JSONB AS
$$
DECLARE
result JSONB;
v RECORD;
BEGIN
result = val1;
FOR v IN SELECT * FROM jsonb_each(val2)
LOOP
IF result @> jsonb_build_object(v.key, v.value)
THEN
result = result - v.key;
ELSIF result ? v.key THEN
CONTINUE;
ELSE
result = result || jsonb_build_object(v.key, 'null');
END IF;
END LOOP;
RETURN result;
END;
$$ LANGUAGE plpgsql;
"""
)

def _drop_jsonb_diff_val_function() -> None:
op.execute("DROP FUNCTION IF EXISTS jsonb_diff_val(val1 JSONB, val2 JSONB)")

def _create_log_table_changes_trigger() -> None:
op.execute(
f"""
CREATE OR REPLACE FUNCTION public.log_table_changes()
RETURNS trigger
LANGUAGE 'plpgsql'
COST 100
VOLATILE NOT LEAKPROOF
AS $BODY$
DECLARE
old_values JSONB;
new_values JSONB;
old_to_new JSONB;
new_to_old JSONB;
BEGIN
-- Handle DELETE operations (store entire OLD row since all data is lost)
IF (TG_OP = 'DELETE') THEN
old_values = row_to_json(OLD)::jsonb;

INSERT INTO {CHANGE_LOG_TABLE_NAME} (operation_type, table_name, affected_id, old_data)
VALUES ('DELETE', TG_TABLE_NAME, OLD.id, old_values);

RETURN OLD;

-- Handle UPDATE operations (only log the changed columns)
ELSIF (TG_OP = 'UPDATE') THEN
old_values = row_to_json(OLD)::jsonb;
new_values = row_to_json(NEW)::jsonb;
new_to_old = jsonb_diff_val(old_values, new_values);
old_to_new = jsonb_diff_val(new_values, old_values);

-- Skip logging if both old_to_new and new_to_old are NULL or empty JSON objects
IF (new_to_old IS NOT NULL AND new_to_old <> '{{}}') OR
(old_to_new IS NOT NULL AND old_to_new <> '{{}}') THEN
INSERT INTO {CHANGE_LOG_TABLE_NAME} (operation_type, table_name, affected_id, old_data, new_data)
VALUES ('UPDATE', TG_TABLE_NAME, OLD.id, new_to_old, old_to_new);
END IF;

RETURN NEW;

-- Handle INSERT operations
ELSIF (TG_OP = 'INSERT') THEN
new_values = row_to_json(NEW)::jsonb;

-- Skip logging if new_values is NULL or an empty JSON object
IF new_values IS NOT NULL AND new_values <> '{{}}' THEN
INSERT INTO {CHANGE_LOG_TABLE_NAME} (operation_type, table_name, affected_id, new_data)
VALUES ('INSERT', TG_TABLE_NAME, NEW.id, new_values);
END IF;

RETURN NEW;
END IF;
END;
$BODY$;
"""
)

def _drop_log_table_changes_trigger() -> None:
op.execute(f"DROP TRIGGER IF EXISTS log_table_changes ON {URL_DATA_SOURCES_METADATA_TABLE_NAME}")

def _create_data_sources_sync_state_table() -> None:
table = op.create_table(
SYNC_STATE_TABLE_NAME,
id_column(),
sa.Column('last_full_sync_at', sa.DateTime(), nullable=True),
sa.Column('current_cutoff_date', sa.Date(), nullable=True),
sa.Column('current_page', sa.Integer(), nullable=True),
)
# Add row to `data_sources_sync_state` table
op.bulk_insert(
table,
[
{
"last_full_sync_at": None,
"current_cutoff_date": None,
"current_page": None
}
]
)

def _drop_data_sources_sync_state_table() -> None:
op.drop_table(SYNC_STATE_TABLE_NAME)

def _create_data_sources_sync_task() -> None:
switch_enum_type(
table_name='tasks',
column_name='task_type',
enum_name='task_type',
new_enum_values=[
'HTML',
'Relevancy',
'Record Type',
'Agency Identification',
'Misc Metadata',
'Submit Approved URLs',
'Duplicate Detection',
'404 Probe',
'Sync Agencies',
'Sync Data Sources'
]
)

def _drop_data_sources_sync_task() -> None:
switch_enum_type(
table_name='tasks',
column_name='task_type',
enum_name='task_type',
new_enum_values=[
'HTML',
'Relevancy',
'Record Type',
'Agency Identification',
'Misc Metadata',
'Submit Approved URLs',
'Duplicate Detection',
'404 Probe',
'Sync Agencies',
]
)

def _create_change_log_table() -> None:
# Create change_log table
op.create_table(
CHANGE_LOG_TABLE_NAME,
id_column(),
sa.Column("operation_type", OperationTypeEnum, nullable=False),
sa.Column("table_name", sa.String(), nullable=False),
sa.Column("affected_id", sa.Integer(), nullable=False),
sa.Column("old_data", JSONB, nullable=True),
sa.Column("new_data", JSONB, nullable=True),
sa.Column(
"created_at", sa.DateTime(), server_default=sa.func.now(), nullable=False
),
)

def _drop_change_log_table() -> None:
op.drop_table(CHANGE_LOG_TABLE_NAME)

def _rename_confirmed_url_agency_to_link_urls_agencies() -> None:
op.rename_table(CONFIRMED_AGENCY_TABLE_NAME, LINK_URLS_AGENCIES_TABLE_NAME)

def _rename_link_urls_agencies_to_confirmed_url_agency() -> None:
op.rename_table(LINK_URLS_AGENCIES_TABLE_NAME, CONFIRMED_AGENCY_TABLE_NAME)

def _add_table_change_log_triggers() -> None:
# Create trigger for tables:
def create_table_trigger(table_name: str) -> None:
op.execute(
"""
CREATE OR REPLACE TRIGGER log_{table_name}_changes
BEFORE INSERT OR DELETE OR UPDATE
ON public.{table_name}
FOR EACH ROW
EXECUTE FUNCTION public.log_table_changes();
""".format(table_name=table_name)
)

for table_name in TABLES_TO_LOG:
create_table_trigger(table_name)

def _drop_table_change_log_triggers() -> None:
def drop_table_trigger(table_name: str) -> None:
op.execute(
f"""
DROP TRIGGER log_{table_name}_changes
ON public.{table_name}
"""
)

for table_name in TABLES_TO_LOG:
drop_table_trigger(table_name)

def _add_agency_id_column():
op.add_column(
AGENCIES_TABLE_NAME,
id_column(),
)


def _drop_agency_id_column():
op.drop_column(
AGENCIES_TABLE_NAME,
'id',
)
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
"""Setup for upload to huggingface task

Revision ID: 637de6eaa3ab
Revises: 59d2af1bab33
Create Date: 2025-07-26 08:30:37.940091

"""
from typing import Sequence, Union

from alembic import op
import sqlalchemy as sa

from src.util.alembic_helpers import id_column, switch_enum_type

# revision identifiers, used by Alembic.
revision: str = '637de6eaa3ab'
down_revision: Union[str, None] = '59d2af1bab33'
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None

TABLE_NAME = "huggingface_upload_state"


def upgrade() -> None:

Check warning on line 24 in alembic/versions/2025_07_26_0830-637de6eaa3ab_setup_for_upload_to_huggingface_task.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] alembic/versions/2025_07_26_0830-637de6eaa3ab_setup_for_upload_to_huggingface_task.py#L24 <103>

Missing docstring in public function
Raw output
./alembic/versions/2025_07_26_0830-637de6eaa3ab_setup_for_upload_to_huggingface_task.py:24:1: D103 Missing docstring in public function
op.create_table(
TABLE_NAME,
id_column(),
sa.Column(
"last_upload_at",
sa.DateTime(),
nullable=False
),
)

switch_enum_type(
table_name='tasks',
column_name='task_type',
enum_name='task_type',
new_enum_values=[
'HTML',
'Relevancy',
'Record Type',
'Agency Identification',
'Misc Metadata',
'Submit Approved URLs',
'Duplicate Detection',
'404 Probe',
'Sync Agencies',
'Sync Data Sources',
'Push to Hugging Face'
]
)


def downgrade() -> None:

Check warning on line 55 in alembic/versions/2025_07_26_0830-637de6eaa3ab_setup_for_upload_to_huggingface_task.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] alembic/versions/2025_07_26_0830-637de6eaa3ab_setup_for_upload_to_huggingface_task.py#L55 <103>

Missing docstring in public function
Raw output
./alembic/versions/2025_07_26_0830-637de6eaa3ab_setup_for_upload_to_huggingface_task.py:55:1: D103 Missing docstring in public function
op.drop_table(TABLE_NAME)

switch_enum_type(
table_name='tasks',
column_name='task_type',
enum_name='task_type',
new_enum_values=[
'HTML',
'Relevancy',
'Record Type',
'Agency Identification',
'Misc Metadata',
'Submit Approved URLs',
'Duplicate Detection',
'404 Probe',
'Sync Agencies',
'Sync Data Sources'
]
)
Loading