Data Source Manager

A FastAPI application for identifying and cataloguing police data sources. Part of the Police Data Accessibility Project (PDAP).

The Source Manager collects URLs from various sources, enriches them with metadata using automated tasks and ML models, supports human annotation for validation, and synchronizes approved data sources to the Data Sources App.

Quick Start

# Install dependencies
uv sync

# Start the local database
cd local_database && docker compose up -d && cd ..

# Create a .env file (see ENV.md for all variables)
# At minimum, set the POSTGRES_* variables to match local_database defaults.

# Run the app
fastapi dev main.py

Then open http://localhost:8000/api for the interactive API docs.

Note: accessing API endpoints requires a valid Bearer token from the Data Sources API.

Documentation

Document	Description
Architecture	System design, module structure, task system, data flow
API Reference	All 65 endpoints across 15 route groups
Development Guide	Local setup, environment variables, common workflows
Testing Guide	Running tests, CI pipeline, writing new tests
Deployment	Docker, Alembic migrations, DS App synchronization
Collectors	Collector architecture and how to build new ones
Environment Variables	Full reference for all env vars and feature flags

Project Structure

src/
├── api/            # FastAPI routers and endpoint logic
├── core/           # Integration layer and task system
├── db/             # SQLAlchemy models, async DB client, queries
├── collectors/     # Pluggable URL collection strategies
├── external/       # Clients for external services (HuggingFace, PDAP, etc.)
├── security/       # JWT auth and permissions
└── util/           # Shared helpers

Contributing

Thank you for your interest in contributing to this project! Please follow these guidelines:

These Design Principles may be used to make decisions or guide your work.
If you want to work on something, create an issue first so the broader community can discuss it.
If you make a utility, script, app, or other useful bit of code: put it in a top-level directory with an appropriate name and dedicated README and add it to the index.

Code Quality

Docstrings and type hints are checked via a GitHub Action (python_checks.yml) using pydocstyle and mypy. These produce advisory PR comments and do not block merges.

Note: python_checks.yml only runs on pull requests from within the repo, not from forks.

Name		Name	Last commit message	Last commit date
Latest commit History 1,637 Commits
.github		.github
alembic		alembic
docs		docs
local_database		local_database
src		src
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.project-root		.project-root
.pydocstyle		.pydocstyle
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
ENV.md		ENV.md
LICENSE		LICENSE
README.md		README.md
alembic.ini		alembic.ini
apply_migrations.py		apply_migrations.py
docker-compose.yml		docker-compose.yml
execute.sh		execute.sh
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
start_mirrored_local_app.py		start_mirrored_local_app.py
urls.csv		urls.csv
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Source Manager

Quick Start

Documentation

Project Structure

Contributing

Code Quality

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Data Source Manager

Quick Start

Documentation

Project Structure

Contributing

Code Quality

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages