Skip to content
Merged

Dev #322

Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
339 commits
Select commit Hold shift + click to select a range
bf756ac
DRAFT
maxachis Apr 2, 2025
8b33344
feat(app): Add batch filtering for annotation requests
maxachis Apr 2, 2025
eae4979
fix(tests): fix import bug
maxachis Apr 2, 2025
2e28c1b
Merge pull request #186 from Police-Data-Accessibility-Project/mc_179…
maxachis Apr 2, 2025
b669eab
feat(app): add `review/reject-source` endpoint
maxachis Apr 3, 2025
04b376c
Merge pull request #192 from Police-Data-Accessibility-Project/mc_191…
maxachis Apr 3, 2025
ea23d0c
feat(database): Adjust annotation logic for URLs marked not relevant
maxachis Apr 4, 2025
e348a95
Merge pull request #193 from Police-Data-Accessibility-Project/mc_187…
maxachis Apr 4, 2025
c20e8ac
feat(database): add agency not in database in annotate agencies
maxachis Apr 4, 2025
def4844
fix(tests): fix import bug
maxachis Apr 4, 2025
fcb9b2d
fix(tests): fix import bug
maxachis Apr 4, 2025
f9ae605
Merge pull request #194 from Police-Data-Accessibility-Project/mc_190…
maxachis Apr 4, 2025
443e767
feat(api): require final review permission for review endpoints
maxachis Apr 4, 2025
2bab320
Merge branch 'dev' into mc_update_misc_metadata_task_html
maxachis Apr 5, 2025
3b3253f
feat(app): update misc metadata task to use html title description as…
maxachis Apr 5, 2025
4acb72e
Merge branch 'dev' into mc_152_submit_approved_url_task
maxachis Apr 5, 2025
5777f5d
Merge pull request #196 from Police-Data-Accessibility-Project/mc_upd…
maxachis Apr 5, 2025
74cc0d3
Merge branch 'refs/heads/dev' into mc_152_submit_approved_url_task
maxachis Apr 5, 2025
77c7dff
DRAFT
maxachis Apr 9, 2025
1b0e6c3
feat(app): allow retrieving URLs for annotation without html info
maxachis Apr 9, 2025
27581eb
Fix import bug
maxachis Apr 9, 2025
0ba8dc1
Fix import bug
maxachis Apr 9, 2025
3275fe3
Fix import bug
maxachis Apr 9, 2025
b8a1276
Merge pull request #203 from Police-Data-Accessibility-Project/mc_162…
maxachis Apr 9, 2025
753a06d
Temporarily disable HTML Task Operator
maxachis Apr 10, 2025
87c1057
Re-enable HTML Task Operator with logging on fetch_and_render
maxachis Apr 10, 2025
0b7661e
Re-enable HTML Task Operator with logging on fetch_and_render
maxachis Apr 10, 2025
c3a8511
Transition relevancy pipeline to lazy loading
maxachis Apr 10, 2025
eba18d1
Remove log for fetch and render
maxachis Apr 10, 2025
d68ab30
feat(app): enable task loop to repeat if prerequisites met
maxachis Apr 10, 2025
197febb
Merge pull request #207 from Police-Data-Accessibility-Project/mc_204…
maxachis Apr 10, 2025
e2575af
DRAFT
maxachis Apr 12, 2025
cb3ed94
DRAFT
maxachis Apr 12, 2025
7bfd1e4
DRAFT
maxachis Apr 12, 2025
6c3fe10
feat(app): make collectors asynchronouns and add task trigger
maxachis Apr 14, 2025
24173fb
fix(app): fix import bug
maxachis Apr 14, 2025
f001fb8
fix(app): fix import bug
maxachis Apr 14, 2025
0dbb987
fix(tests): comment out inconsistent test
maxachis Apr 14, 2025
afe55d7
fix(tests): comment out inconsistent test
maxachis Apr 14, 2025
32c3f7b
Merge pull request #211 from Police-Data-Accessibility-Project/mc_204…
maxachis Apr 14, 2025
72caf70
feat(app): make logger async
maxachis Apr 14, 2025
ea88e2b
Merge pull request #214 from Police-Data-Accessibility-Project/mc_210…
maxachis Apr 14, 2025
5b3658f
feat(app): add task status endpoint
maxachis Apr 14, 2025
f1cf5b9
Merge pull request #215 from Police-Data-Accessibility-Project/mc_212…
maxachis Apr 14, 2025
07a4a09
feat(app): add task status endpoint
maxachis Apr 15, 2025
37f86c3
Merge pull request #217 from Police-Data-Accessibility-Project/mc_216…
maxachis Apr 15, 2025
43f9178
fix(app): fix bug with task repetition count
maxachis Apr 15, 2025
c84a7bd
fix(app): temporarily disable HTML Task Operator
maxachis Apr 15, 2025
a4f3be7
fix(app): fix bug with task repeating
maxachis Apr 15, 2025
8f7332d
Merge pull request #218 from Police-Data-Accessibility-Project/mc_216…
maxachis Apr 15, 2025
dcf43e4
Merge branch 'dev' into mc_152_submit_approved_url_task
maxachis Apr 15, 2025
82e65b9
feat(app): add submit approved URL task
maxachis Apr 15, 2025
beae1b9
fix(tests): fix import bug
maxachis Apr 15, 2025
a9787bc
Merge pull request #227 from Police-Data-Accessibility-Project/mc_152…
maxachis Apr 15, 2025
d134194
feat(database): allow one user annotation per url
maxachis Apr 17, 2025
b560594
fix(tests): fix broken tests
maxachis Apr 17, 2025
dec9bfd
Merge pull request #229 from Police-Data-Accessibility-Project/mc_226…
maxachis Apr 17, 2025
345a257
feat(api): adjust final review to reflect single user annotations
maxachis Apr 17, 2025
98d8a95
Merge pull request #231 from Police-Data-Accessibility-Project/mc_226…
maxachis Apr 17, 2025
6858bc0
feat(app): change batch status `completed` to `ready to label`
maxachis Apr 17, 2025
193e68f
Merge pull request #232 from Police-Data-Accessibility-Project/mc_228…
maxachis Apr 17, 2025
c18bd68
feat(app): Add `/batch` filter for batches with pending URLs
maxachis Apr 17, 2025
5e575f5
Merge pull request #233 from Police-Data-Accessibility-Project/mc_228…
maxachis Apr 17, 2025
85a2883
fix(database): fix duplicate bug in `/batch` get for `has_pending_urls`
maxachis Apr 17, 2025
ed0e956
Merge pull request #234 from Police-Data-Accessibility-Project/mc_228…
maxachis Apr 17, 2025
cfad874
fix(app): Change suggestion type `MANUAL_SUGGESTION` to `USER_SUGGEST…
maxachis Apr 17, 2025
5ce3779
Merge pull request #235 from Police-Data-Accessibility-Project/mc_230…
maxachis Apr 17, 2025
3519bf4
refactor(app): remove deprecated and unused code
maxachis Apr 17, 2025
7804bf7
Merge pull request #237 from Police-Data-Accessibility-Project/mc_213…
maxachis Apr 17, 2025
c239567
fix(build): remove nonexistent directory from dockerfile
maxachis Apr 17, 2025
333baf5
docs(api): Change `/docs` to `/api` for API display
maxachis Apr 17, 2025
93d7edb
Merge pull request #238 from Police-Data-Accessibility-Project/mc_160…
maxachis Apr 17, 2025
1830b16
refactor(app): consolidate environment variable usage
maxachis Apr 18, 2025
d58d361
Merge pull request #239 from Police-Data-Accessibility-Project/mc_160…
maxachis Apr 18, 2025
a863442
Merge branch 'dev' into mc_190_add_agency_ids_via_annotate_agencies
maxachis Apr 18, 2025
4853787
Merge pull request #195 from Police-Data-Accessibility-Project/mc_190…
maxachis Apr 18, 2025
06406bd
refactor(app): refactor to reduce memory strain from huggingface task
maxachis Apr 18, 2025
0c90280
fix(tests): fix broken test
maxachis Apr 18, 2025
d6a5d2d
Merge pull request #240 from Police-Data-Accessibility-Project/mc_185…
maxachis Apr 18, 2025
36d8b5d
refactor(tests): reduce time required to run time-bound tests
maxachis Apr 18, 2025
f7a9606
refactor(tests): reduce time required to run time-bound tests
maxachis Apr 18, 2025
d2cfc83
refactor(tests): reduce time required to run time-bound tests
maxachis Apr 18, 2025
569b46c
refactor(tests): reduce time required to run time-bound tests
maxachis Apr 18, 2025
55fe581
refactor(tests): reduce time required to run time-bound tests
maxachis Apr 18, 2025
9b58948
refactor(tests): reduce time required to run time-bound tests
maxachis Apr 18, 2025
f1b5a62
Merge pull request #242 from Police-Data-Accessibility-Project/mc_185…
maxachis Apr 18, 2025
48f33e6
feat(app): make delete logs job an asynchronous scheduled task
maxachis Apr 18, 2025
a3f0325
Merge pull request #244 from Police-Data-Accessibility-Project/mc_241…
maxachis Apr 18, 2025
22aa07e
feat(app): Create `/annotation/all` endpoints
maxachis Apr 21, 2025
3ac0fd7
Merge pull request #245 from Police-Data-Accessibility-Project/mc_183…
maxachis Apr 21, 2025
72ed9c9
DRAFT
maxachis Apr 21, 2025
a10e940
Refactor Docker Logic and add Data Sources Dumper Logic
maxachis Apr 22, 2025
ca27fdb
add .gitattributes
maxachis Apr 22, 2025
0e1eb39
DRAFT
maxachis Apr 22, 2025
fbb329e
DRAFT
maxachis Apr 22, 2025
e3c0091
DRAFT
maxachis Apr 22, 2025
27ef006
DRAFT
maxachis Apr 22, 2025
f4d4134
DRAFT
maxachis Apr 22, 2025
861ea71
feat(database): begin setting up FDW - initial link
maxachis Apr 22, 2025
8e47a33
feat(database): begin setting up FDW - initial link
maxachis Apr 22, 2025
cce04a1
Merge pull request #247 from Police-Data-Accessibility-Project/mc_246…
maxachis Apr 22, 2025
f12ef61
feat(database): begin setting up FDW - initial link
maxachis Apr 22, 2025
af7f0e0
feat(database): begin setting up FDW - initial link
maxachis Apr 22, 2025
857c771
Merge pull request #248 from Police-Data-Accessibility-Project/mc_246…
maxachis Apr 22, 2025
b8d3322
fix(remove FDW setup):
maxachis Apr 22, 2025
8e013bb
fix(database): Remove FDW setup and tests
maxachis Apr 22, 2025
fac1931
fix(database): Remove FDW setup and tests
maxachis Apr 22, 2025
4799d7e
fix(database): Remove FDW setup and tests
maxachis Apr 22, 2025
f4cbd17
Merge pull request #249 from Police-Data-Accessibility-Project/mc_246…
maxachis Apr 22, 2025
79ccfba
Update test_app.yml to use uv
maxachis May 2, 2025
2f7abc2
Update test_app.yml to use uv
maxachis May 2, 2025
e8575eb
Update test_app.yml to use uv
maxachis May 2, 2025
d867105
Update test_app.yml to use uv
maxachis May 2, 2025
88bad7c
Update test_app.yml to use uv
maxachis May 2, 2025
a6c79aa
Update test_app.yml to use uv
maxachis May 2, 2025
babcde8
Update test_app.yml to use uv
maxachis May 2, 2025
38c2a37
Merge remote-tracking branch 'origin/mc_251_uv' into mc_251_uv
maxachis May 2, 2025
7a8b373
Update test_app.yml to use uv
maxachis May 2, 2025
ac85798
Update Dockerfile to use uv
maxachis May 2, 2025
7e8e425
Merge pull request #252 from Police-Data-Accessibility-Project/mc_251_uv
maxachis May 2, 2025
220c319
DRAFT
maxachis May 3, 2025
25ced55
feat(app): Add `/collector/manual` endpoint
maxachis May 3, 2025
addc5f5
feat(app): Add `/collector/manual` endpoint
maxachis May 3, 2025
060bc11
feat(app): Add `/collector/manual` endpoint
maxachis May 3, 2025
2268bd9
Merge pull request #253 from Police-Data-Accessibility-Project/mc_200…
maxachis May 3, 2025
02f7b3a
Comment out URL relevance Huggingface Task Operator call
maxachis May 4, 2025
18be3c9
Comment out URL relevance Huggingface Task Operator call
maxachis May 4, 2025
1fcc949
feat(app): Add `/search/url` endpoint
maxachis May 4, 2025
41ca2ef
Merge pull request #256 from Police-Data-Accessibility-Project/mc_254…
maxachis May 4, 2025
e090cad
feat(app): Add special error message for annotation user conflict
maxachis May 4, 2025
18d9645
Merge pull request #257 from Police-Data-Accessibility-Project/mc_171…
maxachis May 4, 2025
154895e
DRAFT
maxachis May 6, 2025
ee4489e
DRAFT
maxachis May 6, 2025
c6c8299
DRAFT
maxachis May 7, 2025
8d62278
Convert to full uv/pyproject dependency management
maxachis May 7, 2025
6a9464a
Convert to full uv/pyproject dependency management
maxachis May 7, 2025
2e00141
Convert to full uv/pyproject dependency management
maxachis May 7, 2025
7118238
Merge pull request #262 from Police-Data-Accessibility-Project/mc_261…
maxachis May 7, 2025
9d9ca70
Convert to full uv/pyproject dependency management
maxachis May 7, 2025
1faba4e
Convert to full uv/pyproject dependency management
maxachis May 7, 2025
fe3de20
Merge pull request #263 from Police-Data-Accessibility-Project/mc_261…
maxachis May 7, 2025
a8facc9
Merge branch 'dev' into mc_125_metrics_endpoints
maxachis May 7, 2025
fb1acf2
DRAFT
maxachis May 9, 2025
1084d64
feat(app): Create metrics endpoints
maxachis May 9, 2025
2249c41
Merge pull request #264 from Police-Data-Accessibility-Project/mc_125…
maxachis May 9, 2025
7de9c50
DRAFT
maxachis May 11, 2025
c61d914
DRAFT
maxachis May 11, 2025
1281e40
Fix bug in `get_urls_breakdown_pending_metrics`
maxachis May 12, 2025
04e4551
fix(app): Address bug in agency identification
maxachis May 12, 2025
bd27536
fix(app): Address bug in agency identification
maxachis May 12, 2025
f09d60d
Merge pull request #266 from Police-Data-Accessibility-Project/mc_255…
maxachis May 12, 2025
dd643b6
feat(app): add url duplicate check task operator
maxachis May 13, 2025
845cb1b
feat(app): add url duplicate check task operator
maxachis May 13, 2025
f913410
Merge pull request #267 from Police-Data-Accessibility-Project/mc_198…
maxachis May 13, 2025
b4326d6
feat(app): replace in-project access manager with `pdap_access_manager`
maxachis May 13, 2025
3c5cd5a
Merge pull request #269 from Police-Data-Accessibility-Project/mc_268…
maxachis May 13, 2025
b4a445f
feat(app): Change metrics endpoints from per week to per month
maxachis May 13, 2025
dd4e3fe
Merge pull request #270 from Police-Data-Accessibility-Project/mc_268…
maxachis May 13, 2025
2813c27
fix(app): Update URL Duplicate Task to better handle 429 TOO MANY REQ…
maxachis May 13, 2025
2dcd3c6
Merge remote-tracking branch 'origin/dev' into dev
maxachis May 13, 2025
a2d6f97
feat(app): Add 404 Probe Task
maxachis May 15, 2025
54a0ae3
Merge pull request #272 from Police-Data-Accessibility-Project/mc_117…
maxachis May 15, 2025
cd32524
fix(app): fix task type for Probe Task Operator to `PROBE_404`
maxachis May 16, 2025
3e5f5ca
DRAFT
maxachis May 16, 2025
7935ac2
feat(app): Overhaul rejection/relevancy annotation logic
maxachis May 16, 2025
6334f5a
fix(tests): fix breaking tests
maxachis May 16, 2025
b72a810
fix(tests): fix breaking tests
maxachis May 16, 2025
82ce183
Merge pull request #273 from Police-Data-Accessibility-Project/mc_223…
maxachis May 16, 2025
2a58765
feat(app): remove huggingface logic
maxachis May 16, 2025
fe583b3
Merge pull request #274 from Police-Data-Accessibility-Project/mc_255…
maxachis May 16, 2025
e085574
Update README.md
maxachis May 19, 2025
1c7bd60
Merge pull request #275 from Police-Data-Accessibility-Project/mc_156…
maxachis May 19, 2025
4c7a7dc
Replace in-app DiscordPoster with pypi DiscordPoster
maxachis May 19, 2025
72b473b
Merge pull request #276 from Police-Data-Accessibility-Project/mc_243…
maxachis May 19, 2025
6a2e2a0
Adjust final review to not return URLs in the absence of user annotat…
maxachis May 27, 2025
b1878d5
Remove test_example_collector_lifecycle_multiple_batches
maxachis May 27, 2025
9a7ecc1
Merge pull request #282 from Police-Data-Accessibility-Project/mc_280…
maxachis May 27, 2025
86d15e4
Rename `collector_db` -> `db`
maxachis May 27, 2025
9e4b046
Merge pull request #283 from Police-Data-Accessibility-Project/mc_281…
maxachis May 27, 2025
0bdb82d
Begin moving files to `src`
maxachis May 27, 2025
943d29a
Move directories into `src` directory
maxachis May 27, 2025
5939bc4
Merge pull request #285 from Police-Data-Accessibility-Project/mc_281…
maxachis May 27, 2025
3911e90
Update `execute.sh` uvicorn destination.
maxachis May 27, 2025
04864ee
Reorganize directories
maxachis Jun 2, 2025
b6822eb
Reorganize directories
maxachis Jun 2, 2025
ed633f2
Merge pull request #289 from Police-Data-Accessibility-Project/mc_281…
maxachis Jun 2, 2025
66d56ad
Fix bug when parsing dynamic URLs
maxachis Jun 2, 2025
c759e4f
Merge pull request #292 from Police-Data-Accessibility-Project/mc_281…
maxachis Jun 2, 2025
c6cbd48
Add batch progress info for annotations
maxachis Jun 2, 2025
5a17f6d
Merge pull request #294 from Police-Data-Accessibility-Project/mc_288…
maxachis Jun 2, 2025
47b0412
Add batch progress info for annotations
maxachis Jun 3, 2025
cb58e5c
Update `/batch/{id}` endpoint and associated logic/tests
maxachis Jun 3, 2025
45fc3c9
Merge pull request #297 from Police-Data-Accessibility-Project/mc_295…
maxachis Jun 3, 2025
1051178
Temporarily disabled duplicate task
maxachis Jun 4, 2025
aba8518
Refactor get_next_url_for_final_review query
maxachis Jun 4, 2025
8910e09
Allow logs to persist for up to 7 days
maxachis Jun 5, 2025
152f2c2
Re-enable URL Duplicate Task
maxachis Jun 5, 2025
54f1887
Refactor to remove redundant ClientSession creation in `agency_identi…
maxachis Jun 5, 2025
57306ac
Merge pull request #306 from Police-Data-Accessibility-Project/mc_305…
maxachis Jun 5, 2025
e43ba30
Increase URLs per result count
maxachis Jun 5, 2025
776bb9a
Merge pull request #307 from Police-Data-Accessibility-Project/mc_301…
maxachis Jun 5, 2025
07d5159
Add wrapper for identifying response formatting exceptions
maxachis Jun 9, 2025
5b989ff
Merge pull request #310 from Police-Data-Accessibility-Project/sc_309…
maxachis Jun 9, 2025
41d34e1
Fix bug in final review when url is marked as new agency
maxachis Jun 9, 2025
e0b4357
Merge pull request #311 from Police-Data-Accessibility-Project/sc_309…
maxachis Jun 9, 2025
eda508c
Update PDAP Access manager
maxachis Jun 12, 2025
3e91932
Merge pull request #314 from Police-Data-Accessibility-Project/sc_309…
maxachis Jun 12, 2025
b6d6828
Reorganize SQLAlchemy models
maxachis Jun 16, 2025
be25827
Manual test for `sync_agencies`, set up task operator shell
maxachis Jun 16, 2025
f3baa47
Merge branch 'dev' into mc_315_agencies_sync
maxachis Jun 16, 2025
d1cebd8
Fix broken imports
maxachis Jun 16, 2025
733e0a5
Fix broken imports
maxachis Jun 16, 2025
1e4414f
Merge pull request #316 from Police-Data-Accessibility-Project/mc_315…
maxachis Jun 16, 2025
f68d4e0
Reorganize, remove unused logic
maxachis Jun 16, 2025
026f4b0
Begin draft
maxachis Jun 16, 2025
9e78a69
Fix bug
maxachis Jun 16, 2025
67eb6bf
Merge pull request #317 from Police-Data-Accessibility-Project/mc_313…
maxachis Jun 16, 2025
7e17fba
Refactor GetNextURLForFinalReviewQueryBuilder
maxachis Jun 16, 2025
b442d00
Refactor GetNDraft GetMetricsURLSAggregatedPendingQueryBuilder logic
maxachis Jun 16, 2025
f96bc67
Finish endpoint logic and test
maxachis Jun 17, 2025
956fd22
Merge pull request #318 from Police-Data-Accessibility-Project/mc_312…
maxachis Jun 17, 2025
a8117cc
Add `remaining` attribute to review get next source logic
maxachis Jun 17, 2025
325a0b5
Fix breaking tests
maxachis Jun 17, 2025
8fb6a1e
Merge pull request #319 from Police-Data-Accessibility-Project/mc_313…
maxachis Jun 17, 2025
725bc8a
Continue draft on agencies sync logic
maxachis Jun 17, 2025
7ee1d4c
Refactor DB Client with helpers
maxachis Jun 18, 2025
23e3311
Begin building test logic
maxachis Jun 19, 2025
a8fe575
Continue draft on agencies sync logic
maxachis Jun 19, 2025
ed28c3b
Set up agencies sync task
maxachis Jun 23, 2025
3632993
Merge pull request #320 from Police-Data-Accessibility-Project/mc_315…
maxachis Jun 23, 2025
51ad317
link to api tos
josh-chamberlain Jun 23, 2025
df10f74
Merge remote-tracking branch 'origin/dev' into api-tos
maxachis Jun 23, 2025
63fedd1
Adjust main description.
maxachis Jun 23, 2025
8b73904
Merge pull request #321 from Police-Data-Accessibility-Project/api-tos
maxachis Jun 23, 2025
7c790c4
Add logic for storing compressed html when scraping HTML
maxachis Jul 5, 2025
bbbf150
Merge pull request #325 from Police-Data-Accessibility-Project/setup_…
maxachis Jul 5, 2025
0e95f74
update readme chart
josh-chamberlain Jul 10, 2025
e172e76
Begin first draft
maxachis Jul 15, 2025
66ff19e
Finish developing auto relevant task
maxachis Jul 16, 2025
535fcf5
Update endpoints to use full relevancy confidence and model name as well
maxachis Jul 16, 2025
c43c7a3
Fix broken import
maxachis Jul 16, 2025
991230f
Merge pull request #330 from Police-Data-Accessibility-Project/mc_329…
maxachis Jul 16, 2025
2dd4c07
Fix broken logic
maxachis Jul 16, 2025
c137dbd
Fix broken logic
maxachis Jul 16, 2025
bc303bf
Merge pull request #331 from Police-Data-Accessibility-Project/mc_329…
maxachis Jul 16, 2025
744c1bd
Update logic to disconnect URLs from Batches
maxachis Jul 17, 2025
ffe914f
Extract logic, break up functions into separate domain directories an…
maxachis Jul 17, 2025
927cc73
Fix bug in `get_urls` logic
maxachis Jul 17, 2025
295b96a
Merge pull request #332 from Police-Data-Accessibility-Project/mc_89_…
maxachis Jul 17, 2025
aeec7ae
link to design principles
josh-chamberlain Jul 22, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
1 change: 1 addition & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
*.sh text eol=lf
40 changes: 0 additions & 40 deletions .github/workflows/common_crawler.yaml

This file was deleted.

94 changes: 0 additions & 94 deletions .github/workflows/populate_labelstudio.yml

This file was deleted.

53 changes: 22 additions & 31 deletions .github/workflows/test_app.yml
Original file line number Diff line number Diff line change
@@ -1,27 +1,12 @@
# This workflow will test the Source Collector App
# Utilizing the docker-compose file in the root directory
name: Test Source Collector App
on: pull_request

#jobs:
# build:
# runs-on: ubuntu-latest
# steps:
# - name: Checkout repository
# uses: actions/checkout@v4
# - name: Run docker-compose
# uses: hoverkraft-tech/compose-action@v2.0.1
# with:
# compose-file: "docker-compose.yml"
# - name: Execute tests in the running service
# run: |
# docker ps -a && docker exec data-source-identification-app-1 pytest /app/tests/test_automated
on: pull_request

jobs:
container-job:
runs-on: ubuntu-latest
timeout-minutes: 20
container: python:3.12.8
container: python:3.11.9

services:
postgres:
Expand All @@ -34,22 +19,28 @@ jobs:
--health-timeout 5s
--health-retries 5

env:
POSTGRES_PASSWORD: postgres
POSTGRES_USER: postgres
POSTGRES_DB: postgres
POSTGRES_HOST: postgres
POSTGRES_PORT: 5432
GOOGLE_API_KEY: TEST
GOOGLE_CSE_ID: TEST

steps:
- name: Checkout repository
uses: actions/checkout@v4
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt

- name: Install uv and set the python version
uses: astral-sh/setup-uv@v5
with:
python-version: ${{ matrix.python-version }}

- name: Install the project
run: uv sync --locked --all-extras --dev

- name: Run tests
run: |
pytest tests/test_automated
pytest tests/test_alembic
env:
POSTGRES_PASSWORD: postgres
POSTGRES_USER: postgres
POSTGRES_DB: postgres
POSTGRES_HOST: postgres
POSTGRES_PORT: 5432
GOOGLE_API_KEY: TEST
GOOGLE_CSE_ID: TEST
uv run pytest tests/automated
uv run pytest tests/alembic
File renamed without changes.
25 changes: 20 additions & 5 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,17 +1,32 @@
# Dockerfile for Source Collector FastAPI app

FROM python:3.12.8
FROM python:3.11.9-slim
COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/

# Set working directory
WORKDIR /app

# Copy project files
COPY . .
COPY pyproject.toml uv.lock ./

# Install dependencies
RUN pip install --no-cache-dir -r requirements.txt
ENV UV_PROJECT_ENVIRONMENT="/usr/local/"
RUN uv sync --locked --no-dev
# Must call from the root directory because uv does not add playwright to path
RUN playwright install-deps chromium
RUN playwright install chromium

# Copy project files
COPY src ./src
COPY alembic.ini ./alembic.ini
COPY alembic ./alembic
COPY apply_migrations.py ./apply_migrations.py
COPY execute.sh ./execute.sh
COPY .project-root ./.project-root

# Expose the application port
EXPOSE 80

RUN chmod +x execute.sh
RUN chmod +x execute.sh
# Use the below for ease of local development, but remove when pushing to GitHub
# Because there is no .env file in the repository (for security reasons)
#COPY .env ./.env
55 changes: 41 additions & 14 deletions ENV.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,17 +2,44 @@ This page provides a full list, with description, of all the environment variabl

Please ensure these are properly defined in a `.env` file in the root directory.

| Name | Description | Example |
|--------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------|
| `LABEL_STUDIO_ACCESS_TOKEN` | The access token for the Label Studio API. The access token for the Label Studio API. This can be obtained by logging into Label Studio and navigating to the [user account section](https://app.heartex.com/user/account), where the access token can be copied. | `abc123` |
| `LABEL_STUDIO_PROJECT_ID` | The project ID for the Label Studio API. This can be obtained by logging into Label Studio and navigating to the relevant project, where the project id will be in the URL, as in `https://app.heartex.com/projects/58475/` | `58475` |
| `LABEL_STUDIO_ORGANIZATION_ID` | The organization ID for the Label Studio API. This can be obtained by logging into Label Studio and navigating to the [Organization section](https://app.heartex.com/organization?page=1), where the organization ID can be copied. | `6758` |
| `GOOGLE_API_KEY` | The API key required for accessing the Google Custom Search API | `abc123` |
| `GOOGLE_CSE_ID` | The CSE ID required for accessing the Google Custom Search API | `abc123` |
|`POSTGRES_USER` | The username for the test database | `test_source_collector_user` |
|`POSTGRES_PASSWORD` | The password for the test database | `HanviliciousHamiltonHilltops` |
|`POSTGRES_DB` | The database name for the test database | `source_collector_test_db` |
|`POSTGRES_HOST` | The host for the test database | `127.0.0.1` |
|`POSTGRES_PORT` | The port for the test database | `5432` |
|`DS_APP_SECRET_KEY`| The secret key used for decoding JWT tokens produced by the Data Sources App. Must match the secret token that is used in the Data Sources App for encoding. |`abc123`|
|`DEV`| Set to any value to run the application in development mode. |`true`|
| Name | Description | Example |
|----------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------|
| `GOOGLE_API_KEY` | The API key required for accessing the Google Custom Search API | `abc123` |
| `GOOGLE_CSE_ID` | The CSE ID required for accessing the Google Custom Search API | `abc123` |
|`POSTGRES_USER` | The username for the test database | `test_source_collector_user` |
|`POSTGRES_PASSWORD` | The password for the test database | `HanviliciousHamiltonHilltops` |
|`POSTGRES_DB` | The database name for the test database | `source_collector_test_db` |
|`POSTGRES_HOST` | The host for the test database | `127.0.0.1` |
|`POSTGRES_PORT` | The port for the test database | `5432` |
|`DS_APP_SECRET_KEY`| The secret key used for decoding JWT tokens produced by the Data Sources App. Must match the secret token `JWT_SECRET_KEY` that is used in the Data Sources App for encoding. | `abc123` |
|`DEV`| Set to any value to run the application in development mode. | `true` |
|`DEEPSEEK_API_KEY`| The API key required for accessing the DeepSeek API. | `abc123` |
|`OPENAI_API_KEY`| The API key required for accessing the OpenAI API. | `abc123` |
|`PDAP_EMAIL`| An email address for accessing the PDAP API.[^1] | `abc123@test.com` |
|`PDAP_PASSWORD`| A password for accessing the PDAP API.[^1] | `abc123` |
|`PDAP_API_KEY`| An API key for accessing the PDAP API. | `abc123` |
|`PDAP_API_URL`| The URL for the PDAP API| `https://data-sources-v2.pdap.dev/api`|
|`DISCORD_WEBHOOK_URL`| The URL for the Discord webhook used for notifications| `abc123` |
|`HUGGINGFACE_INFERENCE_API_KEY` | The API key required for accessing the Huggingface Inference API. | `abc123` |

[^1:] The user account in question will require elevated permissions to access certain endpoints. At a minimum, the user will require the `source_collector` and `db_write` permissions.

## Foreign Data Wrapper (FDW)
```
FDW_DATA_SOURCES_HOST=127.0.0.1 # The host of the Data Sources Database, used for FDW setup
FDW_DATA_SOURCES_PORT=1234 # The port of the Data Sources Database, used for FDW setup
FDW_DATA_SOURCES_USER=fdw_user # The username for the Data Sources Database, used for FDW setup
FDW_DATA_SOURCES_PASSWORD=password # The password for the Data Sources Database, used for FDW setup
FDW_DATA_SOURCES_DB=db_name # The database name for the Data Sources Database, used for FDW setup

```

## Data Dumper

```
PROD_DATA_SOURCES_HOST=127.0.0.1 # The host of the production Data Sources Database, used for Data Dumper
PROD_DATA_SOURCES_PORT=1234 # The port of the production Data Sources Database, used for Data Dumper
PROD_DATA_SOURCES_USER=dump_user # The username for the production Data Sources Database, used for Data Dumper
PROD_DATA_SOURCES_PASSWORD=password # The password for the production Data Sources Database, used for Data Dumper
PROD_DATA_SOURCES_DB=db_name # The database name for the production Data Sources Database, used for Data Dumper
```
Loading