Command-line interface for Flash - distributed inference and serving framework.
Install Flash:
pip install runpod-flashCreate a new project, navigate to it, and install dependencies:
flash init my-project
cd my-project
uv sync # or: pip install -r requirements.txtAuthenticate with RunPod (saves API key to ~/.runpod/config.toml):
flash loginAlternatively, set your API key via environment variable or .env file:
export RUNPOD_API_KEY=your_api_key_here
# or add to .env fileDeploy your application to RunPod:
flash deployRun your script to call deployed endpoints:
python gpu_demo.pyFor local development with hot-reload:
flash devCreate a new Flash project.
flash init [PROJECT_NAME] [OPTIONS]Options:
--force, -f: Overwrite existing files
Examples:
flash init my-project
flash init .
flash init my-project --forceBuild Flash application for deployment.
flash build [OPTIONS]Options:
--no-deps: Skip transitive dependencies during pip install--keep-build: Keep.flash/.builddirectory after creating archive--output, -o: Custom archive name (default: artifact.tar.gz)--exclude: Comma-separated packages to exclude (e.g., 'torch,torchvision')--preview: Launch local test environment after build
Example:
flash build
flash build --preview
flash build --keep-build --output deploy.tar.gz
flash build --exclude torch,torchvision,torchaudioBuild and deploy Flash applications to RunPod Serverless endpoints.
flash deploy [OPTIONS]Options:
--env, -e: Target environment name--app, -a: Flash app name--no-deps: Skip transitive dependencies during pip install--exclude: Comma-separated packages to exclude (e.g., 'torch,torchvision')--output, -o: Custom archive name (default: artifact.tar.gz)--preview: Build and launch local preview instead of deploying
Examples:
flash deploy
flash deploy --env staging
flash deploy --exclude torch,torchvision,torchaudio
flash deploy --previewStart a Flash development server for testing, debugging, and local development.
flash run is a hidden alias for flash dev.
flash dev [OPTIONS]Options:
--host: Host to bind to (default: localhost)--port, -p: Port to bind to (default: 8888)--reload/--no-reload: Enable auto-reload (default: enabled)--auto-provision: Auto-provision Serverless endpoints on startup (default: disabled)
Example:
flash dev
flash dev --port 3000Manage deployment environments for your Flash applications.
flash env <subcommand> [OPTIONS]Subcommands:
list: Show all available environmentscreate <name>: Create a new environmentget <name>: Get detailed environment informationdelete <name>: Delete an environment and its resources
Options:
--app, -a: Flash app name (auto-detected if in project directory)
Examples:
flash env list
flash env create staging
flash env get production
flash env delete devManage Flash apps (cloud-side organizational units that group deployment environments, build artifacts, and configuration).
flash app <subcommand> [OPTIONS]Subcommands:
list: Show all Flash appscreate <name>: Create a new Flash appget <name>: Get detailed app informationdelete: Delete an app and all associated resources
Options:
--app, -a: Flash app name (required for delete)
Examples:
flash app list
flash app create my-project
flash app get my-project
flash app delete --app my-projectManage and delete RunPod serverless endpoints.
flash undeploy [NAME|list] [OPTIONS]Options:
--all: Undeploy all endpoints (requires confirmation)--interactive, -i: Interactive checkbox selection--cleanup-stale: Remove inactive endpoints from tracking
Examples:
flash undeploy list
flash undeploy my-api
flash undeploy --all
flash undeploy --interactive
flash undeploy --cleanup-staleFlash automatically logs CLI activity to local files during development for debugging and auditing.
Quick configuration:
export FLASH_FILE_LOGGING_ENABLED=false # disable file logging
export FLASH_LOG_RETENTION_DAYS=7 # keep only 7 days of logs
export FLASH_LOG_DIR=/var/log/flash # custom log directoryDefault location: .flash/logs/activity.log
my-project/
├── gpu_worker.py # GPU worker with @Endpoint function
├── cpu_worker.py # CPU worker with @Endpoint function
├── lb_worker.py # Load-balanced HTTP endpoint
├── .env
├── pyproject.toml # Python dependencies (uv/pip compatible)
└── README.md
Required in .env:
RUNPOD_API_KEY=your_api_key_hereOptional:
FLASH_APP=my-project # defaults to current directory name
FLASH_ENV=staging # defaults to "production"
FLASH_SENTINEL_TIMEOUT=120 # sentinel request timeout in seconds (default: 90)# health check
curl http://localhost:8888/ping
# QB endpoint
curl -X POST http://localhost:8888/gpu_worker/runsync \
-H "Content-Type: application/json" \
-d '{"input": {"message": "Hello GPU!"}}'
# LB endpoint
curl -X POST http://localhost:8888/lb_worker/process \
-H "Content-Type: application/json" \
-d '{"input": "test"}'flash --help
flash init --help
flash dev --help