Django app development guideline

This document outlines the development requirements and guidelines for Django apps in the Boost Data Collector Django project.

Overview

Django project: One Django project with multiple Django apps; all apps share the same virtual environment, settings, and database.
Workflow: The project runs app tasks sequentially via management commands (e.g. python manage.py run_boost_library_tracker). Scheduling uses boost_collector_runner with config/boost_collector_schedule.yaml. In production, Celery Beat invokes: python manage.py run_scheduled_collectors --schedule default --group <group_id> for a group batch, or python manage.py run_scheduled_collectors --schedule interval --interval-minutes <n> for an interval batch. Manual runs of a single command differ from Beat’s per-group schedule; use the Beat-style flags above when testing the YAML-driven path.
Configuration: Django settings (e.g. settings.py), environment variables for database URL and API keys (e.g. via django-environ or python-decouple).

Architecture (high level)

flowchart LR
  subgraph sched [Scheduling]
    Beat[Celery_Beat]
    Task[run_scheduled_collectors_task]
    Cmd[run_scheduled_collectors]
    YAML[boost_collector_schedule.yaml]
  end
  subgraph apps [Collector_apps]
    C1[boost_library_tracker]
    C2[other_trackers]
  end
  subgraph core [Core]
    BC[BaseCollectorCommand]
    CB[CollectorBase]
  end
  Beat --> Task
  Task --> Cmd
  YAML --> Cmd
  Cmd --> C1
  Cmd --> C2
  C1 --> BC
  C2 --> BC
  BC --> CB

GitHub activity vs Boost library tracker: Scheduled GitHub sync for Boost repos runs through boost_library_tracker (run_boost_github_activity_tracker, collect_boost_library, etc.). The github_activity_tracker app holds shared fetch/sync utilities, models, and maintenance commands (e.g. workspace migration); it is not the primary entry point for the nightly Boost GitHub collector. Use boost_library_tracker as the reference when adding or debugging that pipeline.

For supported imports from core, see Core_public_API.md.

Django app requirements

1. Programming language

Must be developed using Python.
Must use Python 3.11+.

2. Entry point and dependencies

Must expose one or more Django management commands in the app's management/commands/ folder (e.g. run_boost_library_tracker.py). Register commands in boost_collector_runner's config/boost_collector_schedule.yaml for scheduled runs.
Project dependencies (including app-specific ones) are listed in the project root requirements.txt; all apps use the same virtual environment.

3. Configuration and logging

Use Django settings for environment variables and constants (e.g. from settings.py or env vars loaded via django-environ).
Use the project's logging configuration (settings.LOGGING); get a logger in your app (e.g. logging.getLogger(__name__)).

4. Database access

Must use Django ORM for database access; all data access goes through Django models.
Use migrations for schema changes; run python manage.py migrate as part of setup and deployment.
Write access: only to your app's models (or shared models your app owns). Avoid writing to other apps' tables directly.
Read-only access: you may read other apps' models when needed; prefer loose coupling and avoid circular imports.
Do not define ForeignKey or ORM relationships to another app's models if it would create tight coupling or circular dependencies. To use another app's data, query it in your command or service and use the result in your logic.

5. Exit codes

Management commands must exit with proper exit codes when run as scripts (e.g. from run_scheduled_collectors).
0 for success.
Non-zero for failure.

6. Restart and resume logic

App tasks should implement restart logic so that if a command is interrupted and run again, it can resume without redoing completed work.
Check the database or state to see what has already been done; skip already processed items to avoid duplicate work.

What the Django project provides

The project provides:

Settings and configuration: settings.py (Django settings; database, logging, installed apps), and environment variables for database URL, credentials, and API keys (e.g. via django-environ or python-decouple).
Database: One PostgreSQL database shared by all apps; migrations are run from the project root.
Execution: manage.py and management commands; within a single run_scheduled_collectors batch, app commands run in order sequentially. Separate Celery Beat entries may still run concurrently across workers.

Local development setup

Use these steps to get the Django project running on your machine.

Clone the repository and open the project root (where manage.py lives).
Create a virtual environment (e.g. python -m venv .venv) and activate it.
Install dependencies (e.g. pip install -r requirements.txt).
Copy the sample env file (e.g. .env.example) to .env and fill in values for database URL, credentials, and any API keys (e.g. via django-environ or python-decouple).
Ensure the database is reachable. Run migrations: python manage.py migrate.
Run a single app command (e.g. python manage.py run_boost_library_tracker) or a YAML batch (e.g. python manage.py run_scheduled_collectors --schedule default --group <group_id>) to confirm the project works. To test the YAML-driven path as Beat does, use python manage.py run_scheduled_collectors --schedule default --group <group_id> for a group batch, or python manage.py run_scheduled_collectors --schedule interval --interval-minutes <n> for an interval batch (see config/boost_collector_schedule.yaml).

Testing workflow

Run tests often so you catch problems early.

PostgreSQL for pytest: config.test_settings requires DATABASE_URL pointing at PostgreSQL (see README.md: docker compose -f docker-compose.test.yml up -d, then export DATABASE_URL / SECRET_KEY). This matches CI and avoids SQLite-only passes that fail in production.
Before each commit: run the test suite for the code you changed (python -m pytest or a subset).
For app commands: ensure the command runs successfully (e.g. python manage.py run_boost_library_tracker exits with 0 and does the expected work).
Full workflow: run python manage.py run_scheduled_collectors --schedule default --group <group_id> / --schedule interval --interval-minutes <n> when testing the YAML-driven path (matches how Celery Beat invokes it). Add tests for new behavior and keep them passing.

Step-by-step development workflow guide

This guide walks you from setup to merged code.

Set up locally - Follow "Local development setup" above.
Create a feature branch - Branch from develop (e.g. git checkout develop && git pull && git checkout -b feature/your-feature-name).
Develop and test - Make your changes in the Django app. Run the testing workflow (e.g. run tests and the app command) after each logical change.
Commit and push - Commit with clear messages and push the feature branch to the remote.
Open a pull request - Open a PR targeting the develop branch. Describe what changed and how to test it.
Address review - Respond to reviewer comments and update the PR as needed.
Merge - After approval and passing checks, merge into develop. Follow "Merge Process" below for exact steps.
Adding a new Django app - Add the app to INSTALLED_APPS, create models and migrations, add a management command in management/commands/, and add it to config/boost_collector_schedule.yaml under the right group and schedule (boost_collector_runner). Update docs as needed.

Development workflow