Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
*.sh text eol=lf
52 changes: 28 additions & 24 deletions .github/workflows/test_app.yml
Original file line number Diff line number Diff line change
@@ -1,21 +1,6 @@
# This workflow will test the Source Collector App
# Utilizing the docker-compose file in the root directory
name: Test Source Collector App
on: pull_request

#jobs:
# build:
# runs-on: ubuntu-latest
# steps:
# - name: Checkout repository
# uses: actions/checkout@v4
# - name: Run docker-compose
# uses: hoverkraft-tech/compose-action@v2.0.1
# with:
# compose-file: "docker-compose.yml"
# - name: Execute tests in the running service
# run: |
# docker ps -a && docker exec data-source-identification-app-1 pytest /app/tests/test_automated
on: pull_request

jobs:
container-job:
Expand All @@ -34,22 +19,41 @@ jobs:
--health-timeout 5s
--health-retries 5

env: # <-- Consolidated env block here
POSTGRES_PASSWORD: postgres
POSTGRES_USER: postgres
POSTGRES_DB: source_collector_test_db
POSTGRES_HOST: postgres
POSTGRES_PORT: 5432
DATA_SOURCES_HOST: postgres
DATA_SOURCES_PORT: 5432
DATA_SOURCES_USER: postgres
DATA_SOURCES_PASSWORD: postgres
DATA_SOURCES_DB: test_data_sources_db
FDW_DATA_SOURCES_HOST: postgres
FDW_DATA_SOURCES_PORT: 5432
FDW_DATA_SOURCES_USER: postgres
FDW_DATA_SOURCES_PASSWORD: postgres
FDW_DATA_SOURCES_DB: test_data_sources_db
GOOGLE_API_KEY: TEST
GOOGLE_CSE_ID: TEST

steps:
- name: Checkout repository
uses: actions/checkout@v4

- name: Install PostgreSQL client tools
run: |
apt-get update
apt-get install -y postgresql-client

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt
python -m local_database.create_database --use-shell

- name: Run tests
run: |
pytest tests/test_automated
pytest tests/test_alembic
env:
POSTGRES_PASSWORD: postgres
POSTGRES_USER: postgres
POSTGRES_DB: postgres
POSTGRES_HOST: postgres
POSTGRES_PORT: 5432
GOOGLE_API_KEY: TEST
GOOGLE_CSE_ID: TEST
22 changes: 21 additions & 1 deletion ENV.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,4 +21,24 @@ Please ensure these are properly defined in a `.env` file in the root directory.
|`PDAP_API_URL`| The URL for the PDAP API| `https://data-sources-v2.pdap.dev/api`|
|`DISCORD_WEBHOOK_URL`| The URL for the Discord webhook used for notifications| `abc123` |

[^1:] The user account in question will require elevated permissions to access certain endpoints. At a minimum, the user will require the `source_collector` and `db_write` permissions.
[^1:] The user account in question will require elevated permissions to access certain endpoints. At a minimum, the user will require the `source_collector` and `db_write` permissions.

## Foreign Data Wrapper (FDW)
```
FDW_DATA_SOURCES_HOST=127.0.0.1 # The host of the Data Sources Database, used for FDW setup
FDW_DATA_SOURCES_PORT=1234 # The port of the Data Sources Database, used for FDW setup
FDW_DATA_SOURCES_USER=fdw_user # The username for the Data Sources Database, used for FDW setup
FDW_DATA_SOURCES_PASSWORD=password # The password for the Data Sources Database, used for FDW setup
FDW_DATA_SOURCES_DB=db_name # The database name for the Data Sources Database, used for FDW setup

```

## Data Dumper

```
PROD_DATA_SOURCES_HOST=127.0.0.1 # The host of the production Data Sources Database, used for Data Dumper
PROD_DATA_SOURCES_PORT=1234 # The port of the production Data Sources Database, used for Data Dumper
PROD_DATA_SOURCES_USER=dump_user # The username for the production Data Sources Database, used for Data Dumper
PROD_DATA_SOURCES_PASSWORD=password # The password for the production Data Sources Database, used for Data Dumper
PROD_DATA_SOURCES_DB=db_name # The database name for the production Data Sources Database, used for Data Dumper
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,250 @@
"""Set up foreign data wrapper

Revision ID: 13f1272f94b9
Revises: e285e6e7cf71
Create Date: 2025-04-21 18:17:34.593973

"""
import os
from typing import Sequence, Union

from alembic import op
from dotenv import load_dotenv

# revision identifiers, used by Alembic.
revision: str = '13f1272f94b9'
down_revision: Union[str, None] = 'e285e6e7cf71'
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None


def upgrade() -> None:

Check warning on line 21 in alembic/versions/2025_04_21_1817-13f1272f94b9_set_up_foreign_data_wrapper.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] alembic/versions/2025_04_21_1817-13f1272f94b9_set_up_foreign_data_wrapper.py#L21 <103>

Missing docstring in public function
Raw output
./alembic/versions/2025_04_21_1817-13f1272f94b9_set_up_foreign_data_wrapper.py:21:1: D103 Missing docstring in public function

load_dotenv()
remote_host = os.getenv("FDW_DATA_SOURCES_HOST")
user = os.getenv("FDW_DATA_SOURCES_USER")
password = os.getenv("FDW_DATA_SOURCES_PASSWORD")
db_name = os.getenv("FDW_DATA_SOURCES_DB")
port = os.getenv("FDW_DATA_SOURCES_PORT")

op.execute(f"CREATE EXTENSION IF NOT EXISTS postgres_fdw;")

Check warning on line 30 in alembic/versions/2025_04_21_1817-13f1272f94b9_set_up_foreign_data_wrapper.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] alembic/versions/2025_04_21_1817-13f1272f94b9_set_up_foreign_data_wrapper.py#L30 <541>

f-string is missing placeholders
Raw output
./alembic/versions/2025_04_21_1817-13f1272f94b9_set_up_foreign_data_wrapper.py:30:16: F541 f-string is missing placeholders

op.execute(f"""
CREATE SERVER data_sources_server
FOREIGN DATA WRAPPER postgres_fdw
OPTIONS (host '{remote_host}', dbname '{db_name}', port '{port}');
""")

op.execute(f"""
CREATE USER MAPPING FOR {user}
SERVER data_sources_server
OPTIONS (user '{user}', password '{password}');
""")

op.execute('CREATE SCHEMA if not exists "remote";')

# Users table
op.execute("""
CREATE FOREIGN TABLE IF NOT EXISTS "remote".users
(
id bigint,
created_at timestamp with time zone,
updated_at timestamp with time zone,
email text,
password_digest text,
api_key character varying,
role text
)
SERVER data_sources_server
OPTIONS (
schema_name 'public',
table_name 'users'
);
""")

# Agencies
# -Enums
# --Jurisdiction Type
op.execute("""
CREATE TYPE jurisdiction_type AS ENUM
('school', 'county', 'local', 'port', 'tribal', 'transit', 'state', 'federal');
""")
# --Agency Type
op.execute("""
CREATE TYPE agency_type AS ENUM
('incarceration', 'law enforcement', 'aggregated', 'court', 'unknown');
""")

# -Table
op.execute("""
CREATE FOREIGN TABLE IF NOT EXISTS "remote".agencies
(
name character ,
homepage_url character ,
jurisdiction_type jurisdiction_type ,
lat double precision,
lng double precision,
defunct_year character ,
airtable_uid character ,
agency_type agency_type ,
multi_agency boolean ,
no_web_presence boolean ,
airtable_agency_last_modified timestamp with time zone,
rejection_reason character ,
last_approval_editor character ,
submitter_contact character,
agency_created timestamp with time zone,
id integer,
approval_status text,
creator_user_id integer
)
SERVER data_sources_server
OPTIONS (
schema_name 'public',
table_name 'agencies'
);
""")

# Locations Table
# -Enums
# --Location Type
op.execute("""
CREATE TYPE location_type AS ENUM
('State', 'County', 'Locality');
""")

# -Table
op.execute("""
CREATE FOREIGN TABLE IF NOT EXISTS "remote".locations
(
id bigint,
type location_type,
state_id bigint,
county_id bigint,
locality_id bigint
)
SERVER data_sources_server
OPTIONS (
schema_name 'public',
table_name 'locations'
);
""")

# Data Sources Table

# -Enums
# -- access_type
op.execute("""
CREATE TYPE access_type AS ENUM
('Download', 'Webpage', 'API');
""")

# -- agency_aggregation
op.execute("""
CREATE TYPE agency_aggregation AS ENUM
('county', 'local', 'state', 'federal');
""")
# -- update_method
op.execute("""
CREATE TYPE update_method AS ENUM
('Insert', 'No updates', 'Overwrite');
""")

# -- detail_level
op.execute("""
CREATE TYPE detail_level AS ENUM
('Individual record', 'Aggregated records', 'Summarized totals');
""")

# -- retention_schedule
op.execute("""
CREATE TYPE retention_schedule AS ENUM
('< 1 day', '1 day', '< 1 week', '1 week', '1 month', '< 1 year', '1-10 years', '> 10 years', 'Future only');
""")

# -Table
op.execute("""
CREATE FOREIGN TABLE IF NOT EXISTS "remote".data_sources
(
name character varying ,
description character ,
source_url character ,
agency_supplied boolean,
supplying_entity character ,
agency_originated boolean,
agency_aggregation agency_aggregation,
coverage_start date,
coverage_end date,
updated_at timestamp with time zone ,
detail_level detail_level,
record_download_option_provided boolean,
data_portal_type character ,
update_method update_method,
readme_url character ,
originating_entity character ,
retention_schedule retention_schedule,
airtable_uid character ,
scraper_url character ,
created_at timestamp with time zone ,
submission_notes character ,
rejection_note character ,
submitter_contact_info character ,
agency_described_not_in_database character ,
data_portal_type_other character ,
data_source_request character ,
broken_source_url_as_of timestamp with time zone,
access_notes text ,
url_status text ,
approval_status text ,
record_type_id integer,
access_types access_type[],
tags text[] ,
record_formats text[] ,
id integer,
approval_status_updated_at timestamp with time zone ,
last_approval_editor bigint
)
SERVER data_sources_server
OPTIONS (
schema_name 'public',
table_name 'data_sources'
);
""")



def downgrade() -> None:

Check warning on line 216 in alembic/versions/2025_04_21_1817-13f1272f94b9_set_up_foreign_data_wrapper.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] alembic/versions/2025_04_21_1817-13f1272f94b9_set_up_foreign_data_wrapper.py#L216 <103>

Missing docstring in public function
Raw output
./alembic/versions/2025_04_21_1817-13f1272f94b9_set_up_foreign_data_wrapper.py:216:1: D103 Missing docstring in public function

Check failure on line 216 in alembic/versions/2025_04_21_1817-13f1272f94b9_set_up_foreign_data_wrapper.py

View workflow job for this annotation

GitHub Actions / flake8

[flake8] alembic/versions/2025_04_21_1817-13f1272f94b9_set_up_foreign_data_wrapper.py#L216 <303>

too many blank lines (3)
Raw output
./alembic/versions/2025_04_21_1817-13f1272f94b9_set_up_foreign_data_wrapper.py:216:1: E303 too many blank lines (3)
# Drop foreign schema
op.execute('DROP SCHEMA IF EXISTS "remote" CASCADE;')

# Drop enums
enums = [
"jurisdiction_type",
"agency_type",
"location_type",
"access_type",
"agency_aggregation",
"update_method",
"detail_level",
"retention_schedule",
]
for enum in enums:
op.execute(f"""
DROP TYPE IF EXISTS {enum};
""")

# Drop user mapping
user = os.getenv("DATA_SOURCES_USER")
op.execute(f"""
DROP USER MAPPING FOR {user} SERVER data_sources_server;
""")

# Drop server
op.execute("""
DROP SERVER IF EXISTS data_sources_server CASCADE;
""")

# Drop FDW
op.execute("""
DROP EXTENSION IF EXISTS postgres_fdw CASCADE;
""")
Loading