Skip to content

Commit 47f0c07

Browse files
authored
Merge pull request #129 from Police-Data-Accessibility-Project/mc_128_source_collector_api
Mc 128 source collector api
2 parents dbdb56f + 33780bc commit 47f0c07

File tree

247 files changed

+870301
-2480
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

247 files changed

+870301
-2480
lines changed

.github/workflows/test_app.yml

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
# This workflow will test the Source Collector App
2+
# Utilizing the docker-compose file in the root directory
3+
name: Test Source Collector App
4+
on: pull_request
5+
6+
jobs:
7+
build:
8+
runs-on: ubuntu-latest
9+
steps:
10+
- name: Checkout repository
11+
uses: actions/checkout@v4
12+
- name: Run docker-compose
13+
uses: hoverkraft-tech/compose-action@v2.0.1
14+
with:
15+
compose-file: "docker-compose.yml"
16+
- name: Execute tests in the running service
17+
run: |
18+
docker exec data-source-identification-app-1 pytest /app/tests/test_automated

Dockerfile

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
# Dockerfile for Source Collector FastAPI app
2+
3+
FROM python:3.12.8
4+
5+
# Set working directory
6+
WORKDIR /app
7+
8+
# Copy project files
9+
COPY . .
10+
11+
# Install dependencies
12+
RUN pip install --no-cache-dir -r requirements.txt
13+
14+
# Expose the application port
15+
EXPOSE 80
16+
17+
# Run FastAPI app with uvicorn
18+
CMD ["uvicorn", "api.main:app", "--host", "0.0.0.0", "--port", "80"]

ENV.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
This page provides a full list, with description, of all the environment variables used by the application.
2+
3+
Please ensure these are properly defined in a `.env` file in the root directory.
4+
5+
| Name | Description | Example |
6+
|--------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------|
7+
| `LABEL_STUDIO_ACCESS_TOKEN` | The access token for the Label Studio API. The access token for the Label Studio API. This can be obtained by logging into Label Studio and navigating to the [user account section](https://app.heartex.com/user/account), where the access token can be copied. | `abc123` |
8+
| `LABEL_STUDIO_PROJECT_ID` | The project ID for the Label Studio API. This can be obtained by logging into Label Studio and navigating to the relevant project, where the project id will be in the URL, as in `https://app.heartex.com/projects/58475/` | `58475` |
9+
| `LABEL_STUDIO_ORGANIZATION_ID` | The organization ID for the Label Studio API. This can be obtained by logging into Label Studio and navigating to the [Organization section](https://app.heartex.com/organization?page=1), where the organization ID can be copied. | `6758` |
10+
| `GOOGLE_API_KEY` | The API key required for accessing the Google Custom Search API | `abc123` |
11+
| `GOOGLE_CSE_ID` | The CSE ID required for accessing the Google Custom Search API | `abc123` |
12+
|`POSTGRES_USER` | The username for the test database | `test_source_collector_user` |
13+
|`POSTGRES_PASSWORD` | The password for the test database | `HanviliciousHamiltonHilltops` |
14+
|`POSTGRES_DB` | The database name for the test database | `source_collector_test_db` |
15+
|`POSTGRES_HOST` | The host for the test database | `127.0.0.1` |
16+
|`POSTGRES_PORT` | The port for the test database | `5432` |
17+
|`DS_APP_SECRET_KEY`| The secret key used for decoding JWT tokens produced by the Data Sources App. Must match the secret token that is used in the Data Sources App for encoding. |`abc123`|

README.md

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,9 @@ openai-playground | Scripts for accessing the openai API on PDAP's shared accoun
1414
source_collectors| Tools for extracting metadata from different sources, including CKAN data portals and Common Crawler
1515
collector_db | Database for storing data from source collectors
1616
collector_manager | A module which provides a unified interface for interacting with source collectors and relevant data
17+
core | A module which integrates other components, such as collector_manager and collector_db
18+
api | API for interacting with collector_manager, core, and collector_db
19+
local_database | Resources for setting up a test database for local development
1720

1821
## How to use
1922

@@ -30,6 +33,29 @@ Thank you for your interest in contributing to this project! Please follow these
3033
- If you want to work on something, create an issue first so the broader community can discuss it.
3134
- If you make a utility, script, app, or other useful bit of code: put it in a top-level directory with an appropriate name and dedicated README and add it to the index.
3235

36+
# Testing
37+
38+
Note that prior to running tests, you need to install [Docker](https://docs.docker.com/get-started/get-docker/) and have the Docker engine running.
39+
40+
Tests can be run by spinning up the `docker-compose-test.yml` file in the root directory. This will start a two-container setup, consisting of the FastAPI Web App and a clean Postgres Database.
41+
42+
This can be done via the following command:
43+
44+
```bash
45+
docker compose up -d
46+
```
47+
48+
Note that while the container may mention the web app running on `0.0.0.0:8000`, the actual host may be `127.0.0.1:8000`.
49+
50+
To access the API documentation, visit `http://{host}:8000/docs`.
51+
52+
To run tests on the container, run:
53+
54+
```bash
55+
docker exec data-source-identification-app-1 pytest /app/tests/test_automated
56+
```
57+
58+
Be sure to inspect the `docker-compose.yml` file in the root directory -- some environment variables are dependant upon the Operating System you are using.
3359

3460
# Diagrams
3561

annotation_pipeline/populate_labelstudio.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -202,7 +202,7 @@ def label_studio_upload(batch_id: str, FILENAME: str, record_type: str):
202202
api_manager = LabelStudioAPIManager(config)
203203

204204
#import tasks
205-
label_studio_response = api_manager.import_tasks_into_project(data)
205+
label_studio_response = api_manager.export_tasks_into_project(data)
206206

207207
#check import success
208208
if label_studio_response.status_code == HTTPStatus.CREATED:

api/README.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
To spin up a development version of the client, run:
2+
3+
```bash
4+
fastapi dev main.py
5+
```
6+
7+
For the client to function properly in a local environment, the local database must be set up. Consult the `README.md` file in the `local_database` directory for further instructions.
File renamed without changes.

api/dependencies.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
from core.SourceCollectorCore import SourceCollectorCore
2+
3+
4+
def get_core() -> SourceCollectorCore:
5+
from api.main import app
6+
return app.state.core

api/main.py

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
from contextlib import asynccontextmanager
2+
3+
from fastapi import FastAPI
4+
5+
from api.routes.batch import batch_router
6+
from api.routes.collector import collector_router
7+
from api.routes.label_studio import label_studio_router
8+
from api.routes.root import root_router
9+
from collector_db.DatabaseClient import DatabaseClient
10+
from core.CoreLogger import CoreLogger
11+
from core.SourceCollectorCore import SourceCollectorCore
12+
13+
14+
@asynccontextmanager
15+
async def lifespan(app: FastAPI):
16+
# Initialize shared dependencies
17+
db_client = DatabaseClient()
18+
source_collector_core = SourceCollectorCore(
19+
core_logger=CoreLogger(
20+
db_client=db_client
21+
),
22+
db_client=DatabaseClient(),
23+
)
24+
25+
# Pass dependencies into the app state
26+
app.state.core = source_collector_core
27+
28+
# Startup logic
29+
yield # Code here runs before shutdown
30+
31+
# Shutdown logic (if needed)
32+
app.state.core.shutdown()
33+
# Clean up resources, close connections, etc.
34+
pass
35+
36+
37+
app = FastAPI(
38+
title="Source Collector API",
39+
description="API for collecting data sources",
40+
version="0.1.0",
41+
lifespan=lifespan
42+
)
43+
44+
app.include_router(root_router)
45+
app.include_router(collector_router)
46+
app.include_router(batch_router)
47+
app.include_router(label_studio_router)

api/routes/__init__.py

Whitespace-only changes.

0 commit comments

Comments
 (0)