This document outlines the development requirements and guidelines for Django apps in the Boost Data Collector Django project.
- Django project: One Django project with multiple Django apps; all apps share the same virtual environment, settings, and database.
- Workflow: The project runs app tasks sequentially via management commands (e.g.
python manage.py run_boost_library_tracker). Scheduling uses boost_collector_runner withconfig/boost_collector_schedule.yaml. In production, Celery Beat invokes:python manage.py run_scheduled_collectors --schedule default --group <group_id>for a group batch, orpython manage.py run_scheduled_collectors --schedule interval --interval-minutes <n>for an interval batch. Manual runs of a single command differ from Beat’s per-group schedule; use the Beat-style flags above when testing the YAML-driven path. - Configuration: Django settings (e.g.
settings.py), environment variables for database URL and API keys (e.g. viadjango-environorpython-decouple).
flowchart LR
subgraph sched [Scheduling]
Beat[Celery_Beat]
Task[run_scheduled_collectors_task]
Cmd[run_scheduled_collectors]
YAML[boost_collector_schedule.yaml]
end
subgraph apps [Collector_apps]
C1[boost_library_tracker]
C2[other_trackers]
end
subgraph core [Core]
BC[BaseCollectorCommand]
CB[CollectorBase]
end
Beat --> Task
Task --> Cmd
YAML --> Cmd
Cmd --> C1
Cmd --> C2
C1 --> BC
C2 --> BC
BC --> CB
GitHub activity vs Boost library tracker: Scheduled GitHub sync for Boost repos runs through boost_library_tracker (run_boost_github_activity_tracker, collect_boost_library, etc.). The github_activity_tracker app holds shared fetch/sync utilities, models, and maintenance commands (e.g. workspace migration); it is not the primary entry point for the nightly Boost GitHub collector. Use boost_library_tracker as the reference when adding or debugging that pipeline.
For supported imports from core, see Core_public_API.md.
- Must be developed using Python.
- Must use Python 3.11+.
- Must expose one or more Django management commands in the app's
management/commands/folder (e.g.run_boost_library_tracker.py). Register commands in boost_collector_runner'sconfig/boost_collector_schedule.yamlfor scheduled runs. - Project dependencies (including app-specific ones) are listed in the project root
requirements.txt; all apps use the same virtual environment.
- Use Django settings for environment variables and constants (e.g. from
settings.pyor env vars loaded viadjango-environ). - Use the project's logging configuration (
settings.LOGGING); get a logger in your app (e.g.logging.getLogger(__name__)).
- Must use Django ORM for database access; all data access goes through Django models.
- Use migrations for schema changes; run
python manage.py migrateas part of setup and deployment. - Write access: only to your app's models (or shared models your app owns). Avoid writing to other apps' tables directly.
- Read-only access: you may read other apps' models when needed; prefer loose coupling and avoid circular imports.
- Do not define ForeignKey or ORM relationships to another app's models if it would create tight coupling or circular dependencies. To use another app's data, query it in your command or service and use the result in your logic.
- Management commands must exit with proper exit codes when run as scripts (e.g. from
run_scheduled_collectors). 0for success.- Non-zero for failure.
- App tasks should implement restart logic so that if a command is interrupted and run again, it can resume without redoing completed work.
- Check the database or state to see what has already been done; skip already processed items to avoid duplicate work.
The project provides:
- Settings and configuration:
settings.py(Django settings; database, logging, installed apps), and environment variables for database URL, credentials, and API keys (e.g. viadjango-environorpython-decouple). - Database: One PostgreSQL database shared by all apps; migrations are run from the project root.
- Execution:
manage.pyand management commands; within a singlerun_scheduled_collectorsbatch, app commands run in order sequentially. Separate Celery Beat entries may still run concurrently across workers.
Use these steps to get the Django project running on your machine.
- Clone the repository and open the project root (where
manage.pylives). - Create a virtual environment (e.g.
python -m venv .venv) and activate it. - Install dependencies (e.g.
pip install -r requirements.txt). - Copy the sample env file (e.g.
.env.example) to.envand fill in values for database URL, credentials, and any API keys (e.g. viadjango-environorpython-decouple). - Ensure the database is reachable. Run migrations:
python manage.py migrate. - Run a single app command (e.g.
python manage.py run_boost_library_tracker) or a YAML batch (e.g.python manage.py run_scheduled_collectors --schedule default --group <group_id>) to confirm the project works. To test the YAML-driven path as Beat does, usepython manage.py run_scheduled_collectors --schedule default --group <group_id>for a group batch, orpython manage.py run_scheduled_collectors --schedule interval --interval-minutes <n>for an interval batch (seeconfig/boost_collector_schedule.yaml).
Run tests often so you catch problems early.
- PostgreSQL for pytest:
config.test_settingsrequiresDATABASE_URLpointing at PostgreSQL (see README.md:docker compose -f docker-compose.test.yml up -d, then exportDATABASE_URL/SECRET_KEY). This matches CI and avoids SQLite-only passes that fail in production. - Before each commit: run the test suite for the code you changed (
python -m pytestor a subset). - For app commands: ensure the command runs successfully (e.g.
python manage.py run_boost_library_trackerexits with 0 and does the expected work). - Full workflow: run
python manage.py run_scheduled_collectors --schedule default --group <group_id>/--schedule interval --interval-minutes <n>when testing the YAML-driven path (matches how Celery Beat invokes it). Add tests for new behavior and keep them passing.
This guide walks you from setup to merged code.
- Set up locally - Follow "Local development setup" above.
- Create a feature branch - Branch from
develop(e.g.git checkout develop && git pull && git checkout -b feature/your-feature-name). - Develop and test - Make your changes in the Django app. Run the testing workflow (e.g. run tests and the app command) after each logical change.
- Commit and push - Commit with clear messages and push the feature branch to the remote.
- Open a pull request - Open a PR targeting the
developbranch. Describe what changed and how to test it. - Address review - Respond to reviewer comments and update the PR as needed.
- Merge - After approval and passing checks, merge into
develop. Follow "Merge Process" below for exact steps. - Adding a new Django app - Add the app to
INSTALLED_APPS, create models and migrations, add a management command inmanagement/commands/, and add it toconfig/boost_collector_schedule.yamlunder the right group and schedule (boost_collector_runner). Update docs as needed.
- Create feature branch from
developbranch. - Develop and test your app locally (run tests and the app command).
- Create pull request targeting
developbranch in the project repository. - Wait for review and address feedback.
- Create the app (e.g.
python manage.py startapp my_appor add the app folder to the project). - Add the app to
INSTALLED_APPSin settings. - Add a management command (e.g. in
my_app/management/commands/run_my_app.py) that runs the app logic and returns the correct exit code. - Add the command to
config/boost_collector_schedule.yamlunder the right group with the right schedule (see Workflow.md). - Create and run migrations; update documentation.
Reviewers should check:
- Security and malicious code: Check for code that could expose internal information; review outbound requests; verify no sensitive data (credentials, URLs, tokens) is sent out; ensure no hardcoded credentials or secrets.
- Code quality: Developed using Python; follows Python and Django best practices; proper error handling.
- Database access: Uses Django ORM; only writes to own app's models; read-only access to other apps when needed; no tight coupling or circular imports; migrations included and applied.
- Integration: Uses Django settings and logging; has a management command in
management/commands/; command is in the run order if it is a collector; implements restart logic where needed. - Testing: Tests included and passing; app command runs successfully; no breaking changes.
- Documentation: README or docstrings updated if needed; code comments where appropriate.
- Pull request approved by reviewers.
- All checks passing (tests, linting, etc.).
- Merge to
developbranch:git checkout develop git merge feature/your-feature-name git push origin develop
- After testing, merge to
mainwhen ready for production.
- master: Main/production branch (stable code).
- develop: Development branch (active development).
- Feature branches: Created from
developfor features. Developers must branch fromdevelop; do not branch frommain.
Pull requests target the develop branch.
- Command lives in
my_app/management/commands/run_my_app.py(name must match the command:run_my_app). - Uses Django logging and Django ORM; no separate session or config module.
- Returns 0 for success, non-zero for failure so
run_scheduled_collectorscan detect failures. - Implement restart logic inside the task (check what is already processed and skip it).
- Workflow.md - Main application workflow and execution order.
- Schema.md - Database schema and table relationships.
- README.md - Project overview and quick start.