Skip to content
Closed

Dev #271

Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
247 commits
Select commit Hold shift + click to select a range
357adda
Add URL Cache
maxachis Jan 23, 2025
541ce85
Change transformers to use PyTorch
maxachis Jan 23, 2025
9c604de
Change transformers to use tf-keras
maxachis Jan 23, 2025
4231c47
Reduce size of Dockerfile
maxachis Jan 23, 2025
bc68f28
Reduce size of Dockerfile
maxachis Jan 23, 2025
f8826d8
Set playwright to only install chromium
maxachis Jan 23, 2025
aabc499
Remove unused components
maxachis Jan 23, 2025
c743cdd
Add url_record_type_labeling directory and add init files to hugging_…
maxachis Jan 24, 2025
076fbbb
Add url_record_type_labeling directory to Dockerfile
maxachis Jan 24, 2025
8f81089
Remove unused json files
maxachis Jan 27, 2025
4c9f03f
Begin draft
maxachis Jan 27, 2025
5cca756
Create first draft of DeepSeek record classifier and test
maxachis Jan 27, 2025
e1dce2f
Change "Cycle" term to "Task" for clarity. Add README description.
maxachis Jan 27, 2025
f861bd6
Add precise test data
maxachis Jan 27, 2025
593b219
Draft work
maxachis Jan 27, 2025
990cf3b
Refine DeepSeekRecordClassifier
maxachis Jan 27, 2025
6a30a43
Remove unused files
maxachis Jan 28, 2025
1e502a3
Implement draft of Record Type Classifier:
maxachis Jan 28, 2025
698902e
Create `/task` route
maxachis Jan 28, 2025
fbc9bcb
Rename directory
maxachis Jan 28, 2025
b90a7bd
Change name of url_error_info foreign key constraint
maxachis Jan 28, 2025
2cecaf5
Add logic for automatically assigning values to record types via OpenAI.
maxachis Jan 29, 2025
e139656
Add tests to Dockerfile
maxachis Jan 29, 2025
c3b8752
Fix error in import routing
maxachis Jan 29, 2025
42af80a
Fix error in import routing
maxachis Jan 29, 2025
49bb5a6
Fix error in import routing
maxachis Jan 29, 2025
3e6c0b4
Fix error in import routing
maxachis Jan 29, 2025
13eae1b
Fix error in import routing
maxachis Jan 29, 2025
f017c1b
Merge pull request #143 from Police-Data-Accessibility-Project/mc_rec…
maxachis Jan 29, 2025
b648382
Comment out `.env` copy command
maxachis Jan 29, 2025
776671e
Convert dockerfile to slim package
maxachis Jan 29, 2025
c2601a3
Remove unused files
maxachis Jan 30, 2025
a7f757e
Remove unused files
maxachis Jan 30, 2025
ecde93f
Remove unused libraries
maxachis Jan 30, 2025
1ab1eab
Update Dockerfile
maxachis Jan 30, 2025
da929dd
Update Dockerfile/requirements
maxachis Jan 30, 2025
34a7ef3
Begin draft on record type annotation.
maxachis Jan 31, 2025
d6efb62
Draft work
maxachis Jan 31, 2025
88df8d5
Update container python version
maxachis Jan 31, 2025
3b20923
Merge pull request #144 from Police-Data-Accessibility-Project/mc_rec…
maxachis Jan 31, 2025
14e3d66
Create new table: `url_agency_suggestions`
maxachis Feb 2, 2025
f73c084
Move alembic to standalone directory.
maxachis Feb 5, 2025
a6dcd00
Add logic for Agency Identification.
maxachis Feb 5, 2025
3eab887
Merge remote-tracking branch 'origin/mc_record_type_annotations' into…
maxachis Feb 5, 2025
154bf33
Merge pull request #145 from Police-Data-Accessibility-Project/mc_rec…
maxachis Feb 5, 2025
ff1999f
Add alembic directory
maxachis Feb 5, 2025
83f25cb
Add logic for Agency Identification task
maxachis Feb 5, 2025
2e5bc36
Add `user_id` and trigger enforcement logic for url_agency_suggestion
maxachis Feb 5, 2025
5975ce6
Add first draft of agency annotation api endpoint
maxachis Feb 5, 2025
d8076fd
Begin draft on revision of agency identification task
maxachis Feb 7, 2025
412ad7e
Revise TaskOperator logic
maxachis Feb 8, 2025
28df49d
Revise agency identification annotation logic
maxachis Feb 10, 2025
f98dcd7
Revise agency identification annotation logic
maxachis Feb 10, 2025
a1a2107
Merge remote-tracking branch 'origin/mc_add_agency_identification_ann…
maxachis Feb 10, 2025
615adef
Merge pull request #146 from Police-Data-Accessibility-Project/mc_add…
maxachis Feb 10, 2025
31266b6
Comment out `COPY .env`
maxachis Feb 10, 2025
a049497
DRAFT
maxachis Feb 21, 2025
e3b467e
feat(api): add review get next source endpoint
maxachis Feb 21, 2025
cb7b50f
Merge remote-tracking branch 'origin/mc_150_create_review_next_source…
maxachis Feb 21, 2025
96a98ac
feat(api): add review get next source endpoint
maxachis Feb 21, 2025
d8e254b
Merge pull request #153 from Police-Data-Accessibility-Project/mc_150…
maxachis Feb 21, 2025
c833c2d
DRAFT
maxachis Feb 21, 2025
953206f
DRAFT
maxachis Feb 22, 2025
b550fe5
DRAFT - Begin Overhaul
maxachis Feb 24, 2025
a3598e9
DRAFT
maxachis Feb 24, 2025
93faa1f
build(api): add approve source endpoint and overhaul metadata
maxachis Feb 25, 2025
6133f20
build(api): add approve source endpoint and overhaul metadata
maxachis Feb 25, 2025
0cbaf8d
build(api): add approve source endpoint and overhaul metadata
maxachis Feb 25, 2025
ede8d00
Merge pull request #154 from Police-Data-Accessibility-Project/mc_151…
maxachis Feb 25, 2025
7bcce5c
build(api): add approve source endpoint and overhaul metadata
maxachis Feb 26, 2025
048b317
feat(app): add table to record users validating URLs.
maxachis Mar 11, 2025
a0c5eb9
Merge pull request #155 from Police-Data-Accessibility-Project/mc_151…
maxachis Mar 11, 2025
74247a7
DRAFT
maxachis Mar 15, 2025
d0522c3
feat(app): Add Miscellaneous URL Metadata Task
maxachis Mar 25, 2025
d8183a7
Correct bug in import addressing
maxachis Mar 25, 2025
2301192
Merge pull request #165 from Police-Data-Accessibility-Project/mc_158…
maxachis Mar 25, 2025
590d719
feat(app): Add additional information to Final Review Process
maxachis Mar 27, 2025
e12cceb
Merge pull request #172 from Police-Data-Accessibility-Project/mc_159…
maxachis Mar 27, 2025
0552310
DRAFT
maxachis Mar 28, 2025
cc46ef7
feat(app): Allow multiple confirmed agencies for URL
maxachis Mar 28, 2025
8413370
Merge pull request #174 from Police-Data-Accessibility-Project/mc_173…
maxachis Mar 28, 2025
852a376
feat(app): `/review/approve-source` new agencies added to db
maxachis Mar 29, 2025
9dc2d1e
Merge pull request #176 from Police-Data-Accessibility-Project/mc_175…
maxachis Mar 29, 2025
d99d189
fix(database): Fix bug causing validated URLs to show up for some ann…
maxachis Mar 29, 2025
9ff1e31
Merge pull request #178 from Police-Data-Accessibility-Project/mc_177…
maxachis Mar 29, 2025
b6efea0
Merge branch 'dev' into mc_152_submit_approved_url_task
maxachis Mar 29, 2025
a3dedcd
DRAFT
maxachis Mar 31, 2025
c5e7528
Set default for snippet if none exists.
maxachis Apr 1, 2025
7f2033d
Merge pull request #182 from Police-Data-Accessibility-Project/mc_181…
maxachis Apr 1, 2025
bf756ac
DRAFT
maxachis Apr 2, 2025
8b33344
feat(app): Add batch filtering for annotation requests
maxachis Apr 2, 2025
eae4979
fix(tests): fix import bug
maxachis Apr 2, 2025
2e28c1b
Merge pull request #186 from Police-Data-Accessibility-Project/mc_179…
maxachis Apr 2, 2025
b669eab
feat(app): add `review/reject-source` endpoint
maxachis Apr 3, 2025
04b376c
Merge pull request #192 from Police-Data-Accessibility-Project/mc_191…
maxachis Apr 3, 2025
ea23d0c
feat(database): Adjust annotation logic for URLs marked not relevant
maxachis Apr 4, 2025
e348a95
Merge pull request #193 from Police-Data-Accessibility-Project/mc_187…
maxachis Apr 4, 2025
c20e8ac
feat(database): add agency not in database in annotate agencies
maxachis Apr 4, 2025
def4844
fix(tests): fix import bug
maxachis Apr 4, 2025
fcb9b2d
fix(tests): fix import bug
maxachis Apr 4, 2025
f9ae605
Merge pull request #194 from Police-Data-Accessibility-Project/mc_190…
maxachis Apr 4, 2025
443e767
feat(api): require final review permission for review endpoints
maxachis Apr 4, 2025
2bab320
Merge branch 'dev' into mc_update_misc_metadata_task_html
maxachis Apr 5, 2025
3b3253f
feat(app): update misc metadata task to use html title description as…
maxachis Apr 5, 2025
4acb72e
Merge branch 'dev' into mc_152_submit_approved_url_task
maxachis Apr 5, 2025
5777f5d
Merge pull request #196 from Police-Data-Accessibility-Project/mc_upd…
maxachis Apr 5, 2025
74cc0d3
Merge branch 'refs/heads/dev' into mc_152_submit_approved_url_task
maxachis Apr 5, 2025
77c7dff
DRAFT
maxachis Apr 9, 2025
1b0e6c3
feat(app): allow retrieving URLs for annotation without html info
maxachis Apr 9, 2025
27581eb
Fix import bug
maxachis Apr 9, 2025
0ba8dc1
Fix import bug
maxachis Apr 9, 2025
3275fe3
Fix import bug
maxachis Apr 9, 2025
b8a1276
Merge pull request #203 from Police-Data-Accessibility-Project/mc_162…
maxachis Apr 9, 2025
753a06d
Temporarily disable HTML Task Operator
maxachis Apr 10, 2025
87c1057
Re-enable HTML Task Operator with logging on fetch_and_render
maxachis Apr 10, 2025
0b7661e
Re-enable HTML Task Operator with logging on fetch_and_render
maxachis Apr 10, 2025
c3a8511
Transition relevancy pipeline to lazy loading
maxachis Apr 10, 2025
eba18d1
Remove log for fetch and render
maxachis Apr 10, 2025
d68ab30
feat(app): enable task loop to repeat if prerequisites met
maxachis Apr 10, 2025
197febb
Merge pull request #207 from Police-Data-Accessibility-Project/mc_204…
maxachis Apr 10, 2025
e2575af
DRAFT
maxachis Apr 12, 2025
cb3ed94
DRAFT
maxachis Apr 12, 2025
7bfd1e4
DRAFT
maxachis Apr 12, 2025
6c3fe10
feat(app): make collectors asynchronouns and add task trigger
maxachis Apr 14, 2025
24173fb
fix(app): fix import bug
maxachis Apr 14, 2025
f001fb8
fix(app): fix import bug
maxachis Apr 14, 2025
0dbb987
fix(tests): comment out inconsistent test
maxachis Apr 14, 2025
afe55d7
fix(tests): comment out inconsistent test
maxachis Apr 14, 2025
32c3f7b
Merge pull request #211 from Police-Data-Accessibility-Project/mc_204…
maxachis Apr 14, 2025
72caf70
feat(app): make logger async
maxachis Apr 14, 2025
ea88e2b
Merge pull request #214 from Police-Data-Accessibility-Project/mc_210…
maxachis Apr 14, 2025
5b3658f
feat(app): add task status endpoint
maxachis Apr 14, 2025
f1cf5b9
Merge pull request #215 from Police-Data-Accessibility-Project/mc_212…
maxachis Apr 14, 2025
07a4a09
feat(app): add task status endpoint
maxachis Apr 15, 2025
37f86c3
Merge pull request #217 from Police-Data-Accessibility-Project/mc_216…
maxachis Apr 15, 2025
43f9178
fix(app): fix bug with task repetition count
maxachis Apr 15, 2025
c84a7bd
fix(app): temporarily disable HTML Task Operator
maxachis Apr 15, 2025
a4f3be7
fix(app): fix bug with task repeating
maxachis Apr 15, 2025
8f7332d
Merge pull request #218 from Police-Data-Accessibility-Project/mc_216…
maxachis Apr 15, 2025
dcf43e4
Merge branch 'dev' into mc_152_submit_approved_url_task
maxachis Apr 15, 2025
82e65b9
feat(app): add submit approved URL task
maxachis Apr 15, 2025
beae1b9
fix(tests): fix import bug
maxachis Apr 15, 2025
a9787bc
Merge pull request #227 from Police-Data-Accessibility-Project/mc_152…
maxachis Apr 15, 2025
d134194
feat(database): allow one user annotation per url
maxachis Apr 17, 2025
b560594
fix(tests): fix broken tests
maxachis Apr 17, 2025
dec9bfd
Merge pull request #229 from Police-Data-Accessibility-Project/mc_226…
maxachis Apr 17, 2025
345a257
feat(api): adjust final review to reflect single user annotations
maxachis Apr 17, 2025
98d8a95
Merge pull request #231 from Police-Data-Accessibility-Project/mc_226…
maxachis Apr 17, 2025
6858bc0
feat(app): change batch status `completed` to `ready to label`
maxachis Apr 17, 2025
193e68f
Merge pull request #232 from Police-Data-Accessibility-Project/mc_228…
maxachis Apr 17, 2025
c18bd68
feat(app): Add `/batch` filter for batches with pending URLs
maxachis Apr 17, 2025
5e575f5
Merge pull request #233 from Police-Data-Accessibility-Project/mc_228…
maxachis Apr 17, 2025
85a2883
fix(database): fix duplicate bug in `/batch` get for `has_pending_urls`
maxachis Apr 17, 2025
ed0e956
Merge pull request #234 from Police-Data-Accessibility-Project/mc_228…
maxachis Apr 17, 2025
cfad874
fix(app): Change suggestion type `MANUAL_SUGGESTION` to `USER_SUGGEST…
maxachis Apr 17, 2025
5ce3779
Merge pull request #235 from Police-Data-Accessibility-Project/mc_230…
maxachis Apr 17, 2025
3519bf4
refactor(app): remove deprecated and unused code
maxachis Apr 17, 2025
7804bf7
Merge pull request #237 from Police-Data-Accessibility-Project/mc_213…
maxachis Apr 17, 2025
c239567
fix(build): remove nonexistent directory from dockerfile
maxachis Apr 17, 2025
333baf5
docs(api): Change `/docs` to `/api` for API display
maxachis Apr 17, 2025
93d7edb
Merge pull request #238 from Police-Data-Accessibility-Project/mc_160…
maxachis Apr 17, 2025
1830b16
refactor(app): consolidate environment variable usage
maxachis Apr 18, 2025
d58d361
Merge pull request #239 from Police-Data-Accessibility-Project/mc_160…
maxachis Apr 18, 2025
a863442
Merge branch 'dev' into mc_190_add_agency_ids_via_annotate_agencies
maxachis Apr 18, 2025
4853787
Merge pull request #195 from Police-Data-Accessibility-Project/mc_190…
maxachis Apr 18, 2025
06406bd
refactor(app): refactor to reduce memory strain from huggingface task
maxachis Apr 18, 2025
0c90280
fix(tests): fix broken test
maxachis Apr 18, 2025
d6a5d2d
Merge pull request #240 from Police-Data-Accessibility-Project/mc_185…
maxachis Apr 18, 2025
36d8b5d
refactor(tests): reduce time required to run time-bound tests
maxachis Apr 18, 2025
f7a9606
refactor(tests): reduce time required to run time-bound tests
maxachis Apr 18, 2025
d2cfc83
refactor(tests): reduce time required to run time-bound tests
maxachis Apr 18, 2025
569b46c
refactor(tests): reduce time required to run time-bound tests
maxachis Apr 18, 2025
55fe581
refactor(tests): reduce time required to run time-bound tests
maxachis Apr 18, 2025
9b58948
refactor(tests): reduce time required to run time-bound tests
maxachis Apr 18, 2025
f1b5a62
Merge pull request #242 from Police-Data-Accessibility-Project/mc_185…
maxachis Apr 18, 2025
48f33e6
feat(app): make delete logs job an asynchronous scheduled task
maxachis Apr 18, 2025
a3f0325
Merge pull request #244 from Police-Data-Accessibility-Project/mc_241…
maxachis Apr 18, 2025
22aa07e
feat(app): Create `/annotation/all` endpoints
maxachis Apr 21, 2025
3ac0fd7
Merge pull request #245 from Police-Data-Accessibility-Project/mc_183…
maxachis Apr 21, 2025
72ed9c9
DRAFT
maxachis Apr 21, 2025
a10e940
Refactor Docker Logic and add Data Sources Dumper Logic
maxachis Apr 22, 2025
ca27fdb
add .gitattributes
maxachis Apr 22, 2025
0e1eb39
DRAFT
maxachis Apr 22, 2025
fbb329e
DRAFT
maxachis Apr 22, 2025
e3c0091
DRAFT
maxachis Apr 22, 2025
27ef006
DRAFT
maxachis Apr 22, 2025
f4d4134
DRAFT
maxachis Apr 22, 2025
861ea71
feat(database): begin setting up FDW - initial link
maxachis Apr 22, 2025
8e47a33
feat(database): begin setting up FDW - initial link
maxachis Apr 22, 2025
cce04a1
Merge pull request #247 from Police-Data-Accessibility-Project/mc_246…
maxachis Apr 22, 2025
f12ef61
feat(database): begin setting up FDW - initial link
maxachis Apr 22, 2025
af7f0e0
feat(database): begin setting up FDW - initial link
maxachis Apr 22, 2025
857c771
Merge pull request #248 from Police-Data-Accessibility-Project/mc_246…
maxachis Apr 22, 2025
b8d3322
fix(remove FDW setup):
maxachis Apr 22, 2025
8e013bb
fix(database): Remove FDW setup and tests
maxachis Apr 22, 2025
fac1931
fix(database): Remove FDW setup and tests
maxachis Apr 22, 2025
4799d7e
fix(database): Remove FDW setup and tests
maxachis Apr 22, 2025
f4cbd17
Merge pull request #249 from Police-Data-Accessibility-Project/mc_246…
maxachis Apr 22, 2025
79ccfba
Update test_app.yml to use uv
maxachis May 2, 2025
2f7abc2
Update test_app.yml to use uv
maxachis May 2, 2025
e8575eb
Update test_app.yml to use uv
maxachis May 2, 2025
d867105
Update test_app.yml to use uv
maxachis May 2, 2025
88bad7c
Update test_app.yml to use uv
maxachis May 2, 2025
a6c79aa
Update test_app.yml to use uv
maxachis May 2, 2025
babcde8
Update test_app.yml to use uv
maxachis May 2, 2025
38c2a37
Merge remote-tracking branch 'origin/mc_251_uv' into mc_251_uv
maxachis May 2, 2025
7a8b373
Update test_app.yml to use uv
maxachis May 2, 2025
ac85798
Update Dockerfile to use uv
maxachis May 2, 2025
7e8e425
Merge pull request #252 from Police-Data-Accessibility-Project/mc_251_uv
maxachis May 2, 2025
220c319
DRAFT
maxachis May 3, 2025
25ced55
feat(app): Add `/collector/manual` endpoint
maxachis May 3, 2025
addc5f5
feat(app): Add `/collector/manual` endpoint
maxachis May 3, 2025
060bc11
feat(app): Add `/collector/manual` endpoint
maxachis May 3, 2025
2268bd9
Merge pull request #253 from Police-Data-Accessibility-Project/mc_200…
maxachis May 3, 2025
02f7b3a
Comment out URL relevance Huggingface Task Operator call
maxachis May 4, 2025
18be3c9
Comment out URL relevance Huggingface Task Operator call
maxachis May 4, 2025
1fcc949
feat(app): Add `/search/url` endpoint
maxachis May 4, 2025
41ca2ef
Merge pull request #256 from Police-Data-Accessibility-Project/mc_254…
maxachis May 4, 2025
e090cad
feat(app): Add special error message for annotation user conflict
maxachis May 4, 2025
18d9645
Merge pull request #257 from Police-Data-Accessibility-Project/mc_171…
maxachis May 4, 2025
154895e
DRAFT
maxachis May 6, 2025
ee4489e
DRAFT
maxachis May 6, 2025
c6c8299
DRAFT
maxachis May 7, 2025
8d62278
Convert to full uv/pyproject dependency management
maxachis May 7, 2025
6a9464a
Convert to full uv/pyproject dependency management
maxachis May 7, 2025
2e00141
Convert to full uv/pyproject dependency management
maxachis May 7, 2025
7118238
Merge pull request #262 from Police-Data-Accessibility-Project/mc_261…
maxachis May 7, 2025
9d9ca70
Convert to full uv/pyproject dependency management
maxachis May 7, 2025
1faba4e
Convert to full uv/pyproject dependency management
maxachis May 7, 2025
fe3de20
Merge pull request #263 from Police-Data-Accessibility-Project/mc_261…
maxachis May 7, 2025
a8facc9
Merge branch 'dev' into mc_125_metrics_endpoints
maxachis May 7, 2025
fb1acf2
DRAFT
maxachis May 9, 2025
1084d64
feat(app): Create metrics endpoints
maxachis May 9, 2025
2249c41
Merge pull request #264 from Police-Data-Accessibility-Project/mc_125…
maxachis May 9, 2025
7de9c50
DRAFT
maxachis May 11, 2025
c61d914
DRAFT
maxachis May 11, 2025
1281e40
Fix bug in `get_urls_breakdown_pending_metrics`
maxachis May 12, 2025
04e4551
fix(app): Address bug in agency identification
maxachis May 12, 2025
bd27536
fix(app): Address bug in agency identification
maxachis May 12, 2025
f09d60d
Merge pull request #266 from Police-Data-Accessibility-Project/mc_255…
maxachis May 12, 2025
dd643b6
feat(app): add url duplicate check task operator
maxachis May 13, 2025
845cb1b
feat(app): add url duplicate check task operator
maxachis May 13, 2025
f913410
Merge pull request #267 from Police-Data-Accessibility-Project/mc_198…
maxachis May 13, 2025
b4326d6
feat(app): replace in-project access manager with `pdap_access_manager`
maxachis May 13, 2025
3c5cd5a
Merge pull request #269 from Police-Data-Accessibility-Project/mc_268…
maxachis May 13, 2025
b4a445f
feat(app): Change metrics endpoints from per week to per month
maxachis May 13, 2025
dd4e3fe
Merge pull request #270 from Police-Data-Accessibility-Project/mc_268…
maxachis May 13, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
1 change: 1 addition & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
*.sh text eol=lf
40 changes: 0 additions & 40 deletions .github/workflows/common_crawler.yaml

This file was deleted.

94 changes: 0 additions & 94 deletions .github/workflows/populate_labelstudio.yml

This file was deleted.

53 changes: 22 additions & 31 deletions .github/workflows/test_app.yml
Original file line number Diff line number Diff line change
@@ -1,27 +1,12 @@
# This workflow will test the Source Collector App
# Utilizing the docker-compose file in the root directory
name: Test Source Collector App
on: pull_request

#jobs:
# build:
# runs-on: ubuntu-latest
# steps:
# - name: Checkout repository
# uses: actions/checkout@v4
# - name: Run docker-compose
# uses: hoverkraft-tech/compose-action@v2.0.1
# with:
# compose-file: "docker-compose.yml"
# - name: Execute tests in the running service
# run: |
# docker ps -a && docker exec data-source-identification-app-1 pytest /app/tests/test_automated
on: pull_request

jobs:
container-job:
runs-on: ubuntu-latest
timeout-minutes: 20
container: python:3.12.8
container: python:3.11.9

services:
postgres:
Expand All @@ -34,22 +19,28 @@ jobs:
--health-timeout 5s
--health-retries 5

env:
POSTGRES_PASSWORD: postgres
POSTGRES_USER: postgres
POSTGRES_DB: postgres
POSTGRES_HOST: postgres
POSTGRES_PORT: 5432
GOOGLE_API_KEY: TEST
GOOGLE_CSE_ID: TEST

steps:
- name: Checkout repository
uses: actions/checkout@v4
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt

- name: Install uv and set the python version
uses: astral-sh/setup-uv@v5
with:
python-version: ${{ matrix.python-version }}

- name: Install the project
run: uv sync --locked --all-extras --dev

- name: Run tests
run: |
pytest tests/test_automated
pytest tests/test_alembic
env:
POSTGRES_PASSWORD: postgres
POSTGRES_USER: postgres
POSTGRES_DB: postgres
POSTGRES_HOST: postgres
POSTGRES_PORT: 5432
GOOGLE_API_KEY: TEST
GOOGLE_CSE_ID: TEST
uv run pytest tests/test_automated
uv run pytest tests/test_alembic
File renamed without changes.
45 changes: 40 additions & 5 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,17 +1,52 @@
# Dockerfile for Source Collector FastAPI app

FROM python:3.12.8
FROM python:3.11.9-slim
COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/

# Set working directory
WORKDIR /app

# Copy project files
COPY . .
COPY pyproject.toml uv.lock ./

# Install dependencies
RUN pip install --no-cache-dir -r requirements.txt
ENV UV_PROJECT_ENVIRONMENT="/usr/local/"
RUN uv sync --locked --no-dev
# Must call from the root directory because uv does not add playwright to path
RUN playwright install-deps chromium
RUN playwright install chromium


# Copy project files
COPY api ./api
COPY collector_db ./collector_db
COPY collector_manager ./collector_manager
COPY core ./core
COPY html_tag_collector ./html_tag_collector
COPY hugging_face/url_relevance ./hugging_face/url_relevance
COPY hugging_face/url_record_type_labeling ./hugging_face/url_record_type_labeling
COPY hugging_face/HuggingFaceInterface.py ./hugging_face/HuggingFaceInterface.py
COPY source_collectors ./source_collectors
COPY util ./util
COPY alembic.ini ./alembic.ini
COPY alembic ./alembic
COPY apply_migrations.py ./apply_migrations.py
COPY security_manager ./security_manager
COPY pdap_api_client ./pdap_api_client
COPY execute.sh ./execute.sh
COPY .project-root ./.project-root

COPY tests/conftest.py ./tests/conftest.py
COPY tests/__init__.py ./tests/__init__.py
COPY tests/test_automated ./tests/test_automated
COPY tests/test_alembic ./tests/test_alembic
COPY tests/helpers ./tests/helpers

COPY llm_api_logic ./llm_api_logic

# Expose the application port
EXPOSE 80

RUN chmod +x execute.sh
RUN chmod +x execute.sh
# Use the below for ease of local development, but remove when pushing to GitHub
# Because there is no .env file in the repository (for security reasons)
#COPY .env ./.env
54 changes: 40 additions & 14 deletions ENV.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,17 +2,43 @@ This page provides a full list, with description, of all the environment variabl

Please ensure these are properly defined in a `.env` file in the root directory.

| Name | Description | Example |
|--------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------|
| `LABEL_STUDIO_ACCESS_TOKEN` | The access token for the Label Studio API. The access token for the Label Studio API. This can be obtained by logging into Label Studio and navigating to the [user account section](https://app.heartex.com/user/account), where the access token can be copied. | `abc123` |
| `LABEL_STUDIO_PROJECT_ID` | The project ID for the Label Studio API. This can be obtained by logging into Label Studio and navigating to the relevant project, where the project id will be in the URL, as in `https://app.heartex.com/projects/58475/` | `58475` |
| `LABEL_STUDIO_ORGANIZATION_ID` | The organization ID for the Label Studio API. This can be obtained by logging into Label Studio and navigating to the [Organization section](https://app.heartex.com/organization?page=1), where the organization ID can be copied. | `6758` |
| `GOOGLE_API_KEY` | The API key required for accessing the Google Custom Search API | `abc123` |
| `GOOGLE_CSE_ID` | The CSE ID required for accessing the Google Custom Search API | `abc123` |
|`POSTGRES_USER` | The username for the test database | `test_source_collector_user` |
|`POSTGRES_PASSWORD` | The password for the test database | `HanviliciousHamiltonHilltops` |
|`POSTGRES_DB` | The database name for the test database | `source_collector_test_db` |
|`POSTGRES_HOST` | The host for the test database | `127.0.0.1` |
|`POSTGRES_PORT` | The port for the test database | `5432` |
|`DS_APP_SECRET_KEY`| The secret key used for decoding JWT tokens produced by the Data Sources App. Must match the secret token that is used in the Data Sources App for encoding. |`abc123`|
|`DEV`| Set to any value to run the application in development mode. |`true`|
| Name | Description | Example |
|----------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------|
| `GOOGLE_API_KEY` | The API key required for accessing the Google Custom Search API | `abc123` |
| `GOOGLE_CSE_ID` | The CSE ID required for accessing the Google Custom Search API | `abc123` |
|`POSTGRES_USER` | The username for the test database | `test_source_collector_user` |
|`POSTGRES_PASSWORD` | The password for the test database | `HanviliciousHamiltonHilltops` |
|`POSTGRES_DB` | The database name for the test database | `source_collector_test_db` |
|`POSTGRES_HOST` | The host for the test database | `127.0.0.1` |
|`POSTGRES_PORT` | The port for the test database | `5432` |
|`DS_APP_SECRET_KEY`| The secret key used for decoding JWT tokens produced by the Data Sources App. Must match the secret token `JWT_SECRET_KEY` that is used in the Data Sources App for encoding. | `abc123` |
|`DEV`| Set to any value to run the application in development mode. | `true` |
|`DEEPSEEK_API_KEY`| The API key required for accessing the DeepSeek API. | `abc123` |
|`OPENAI_API_KEY`| The API key required for accessing the OpenAI API. | `abc123` |
|`PDAP_EMAIL`| An email address for accessing the PDAP API.[^1] | `abc123@test.com` |
|`PDAP_PASSWORD`| A password for accessing the PDAP API.[^1] | `abc123` |
|`PDAP_API_KEY`| An API key for accessing the PDAP API. | `abc123` |
|`PDAP_API_URL`| The URL for the PDAP API| `https://data-sources-v2.pdap.dev/api`|
|`DISCORD_WEBHOOK_URL`| The URL for the Discord webhook used for notifications| `abc123` |

[^1:] The user account in question will require elevated permissions to access certain endpoints. At a minimum, the user will require the `source_collector` and `db_write` permissions.

## Foreign Data Wrapper (FDW)
```
FDW_DATA_SOURCES_HOST=127.0.0.1 # The host of the Data Sources Database, used for FDW setup
FDW_DATA_SOURCES_PORT=1234 # The port of the Data Sources Database, used for FDW setup
FDW_DATA_SOURCES_USER=fdw_user # The username for the Data Sources Database, used for FDW setup
FDW_DATA_SOURCES_PASSWORD=password # The password for the Data Sources Database, used for FDW setup
FDW_DATA_SOURCES_DB=db_name # The database name for the Data Sources Database, used for FDW setup

```

## Data Dumper

```
PROD_DATA_SOURCES_HOST=127.0.0.1 # The host of the production Data Sources Database, used for Data Dumper
PROD_DATA_SOURCES_PORT=1234 # The port of the production Data Sources Database, used for Data Dumper
PROD_DATA_SOURCES_USER=dump_user # The username for the production Data Sources Database, used for Data Dumper
PROD_DATA_SOURCES_PASSWORD=password # The password for the production Data Sources Database, used for Data Dumper
PROD_DATA_SOURCES_DB=db_name # The database name for the production Data Sources Database, used for Data Dumper
```
Loading