Production-grade DevOps platform demonstrating end-to-end cloud-native system design and deployment.
Built as a real-world backend + DevOps project, this platform showcases hands-on experience with Kubernetes, Docker, CI/CD pipelines, Infrastructure as Code (Terraform on AWS), and distributed system design.
Designed and implemented by an MSc Computer Science student with a focus on Backend & DevOps engineering.
- End-to-end CI/CD pipeline (GitHub Actions → GHCR → Kubernetes)
- Kubernetes deployment on AWS EKS with Helm and HPA autoscaling
- Infrastructure provisioning with Terraform (VPC, EKS, RDS)
- Production-ready backend (FastAPI + PostgreSQL + Redis)
- Observability stack (Prometheus, Grafana, Jaeger)
- Load-tested system (k6, p95 < 500ms)
┌──────────────────────────────────┐
│ GitHub Actions │
│ lint → test → scan → build │
│ → load test (k6) → deploy │
└────────────┬─────────────────────┘
│ push image
▼
┌──────────────┐
│ GHCR Image │
│ Registry │
└──────┬───────┘
│ helm upgrade
┌──────────────────────────▼────────────────────────────┐
│ AWS EKS Cluster │
│ ┌──────────────────────────────────────────────┐ │
│ │ devops-platform namespace │ │
│ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │
│ │ │ Pod 1 │ │ Pod 2 │ │ Pod N │ │ │
│ │ │FastAPI │ │FastAPI │ │FastAPI │ │ │
│ │ └────┬────┘ └────┬────┘ └────┬────┘ │ │
│ │ └───────────┬┘ │ HPA │ │
│ │ ┌───▼──────┐ min:2 max:6 │ │
│ │ │ Service │ │ │
│ │ └───┬──────┘ │ │
│ │ ┌───▼──────┐ │ │
│ │ │ Ingress │ │ │
│ └───────────────┴──────────┴───────────────────┘ │
│ │ │ │
│ ┌──────▼──────┐ ┌─────────▼──────┐ │
│ │ RDS Postgres │ │ ElastiCache │ │
│ │ (private) │ │ Redis │ │
│ └─────────────┘ └────────────────┘ │
└───────────────────────────────────────────────────────┘
│
┌──────────────▼───────────────┐
│ Observability │
│ Prometheus → Grafana :3000 │
│ Jaeger tracing :16686 │
└──────────────────────────────┘
| Layer | Technology |
|---|---|
| Application | Python 3.12 / FastAPI / SQLAlchemy |
| Authentication | JWT (python-jose) + bcrypt password hashing |
| Database | PostgreSQL 16 + Alembic migrations |
| Caching | Redis 7 (write-through, TTL 60s, graceful degradation) |
| Rate Limiting | slowapi (60 req/min reads, 30 req/min writes) |
| Observability | Prometheus metrics + Jaeger distributed tracing |
| Containerization | Docker (multi-stage build, non-root user) |
| Local Dev | Docker Compose (app + postgres + redis + jaeger + grafana) |
| CI/CD | GitHub Actions (lint → test → scan → build → load test) |
| Load Testing | k6 (smoke test in CI, p95 < 500ms threshold) |
| Image Registry | GitHub Container Registry (GHCR) |
| Infrastructure | Terraform (AWS VPC + EKS + RDS) |
| Orchestration | Kubernetes via Helm chart (HPA, rolling updates) |
| Security Scanning | Trivy (CRITICAL/HIGH CVEs) |
Prerequisites: Docker + Docker Compose
git clone https://github.com/APapafragkakis/DevOps-Platform
cd DevOps-Platform
docker compose up -d --build| Service | URL |
|---|---|
| API docs | http://localhost:8000/docs |
| Metrics | http://localhost:8000/metrics |
| Grafana | http://localhost:3000 (admin/admin) |
| Prometheus | http://localhost:9090 |
| Jaeger UI | http://localhost:16686 |
All /items endpoints require a JWT token.
# 1. Register
curl -X POST http://localhost:8000/auth/register \
-H "Content-Type: application/json" \
-d '{"username": "alex", "password": "secret"}'
# 2. Login — get token
curl -X POST http://localhost:8000/auth/token \
-d "username=alex&password=secret"
# 3. Use token
curl http://localhost:8000/items \
-H "Authorization: Bearer <token>"Or use the Authorize button in Swagger UI at /docs.
make up # Start all services
make down # Stop all services
make logs # Follow app logs
make test # Run tests with coverage
make lint # Run flake8
make migrate # Run Alembic migrations
make migration msg="add users table"
make health # Curl /health
make clean # Remove containers + volumesgit push (main)
│
├── lint flake8
├── test pytest + coverage (SQLite in CI)
├── security Trivy — CRITICAL/HIGH CVEs
├── build Docker multi-stage → push to GHCR
├── load-test k6 smoke test (p95 < 500ms, error rate < 1%)
├── deploy staging helm upgrade → smoke test /health
└── deploy prod manual approval → helm upgrade → auto-rollback
helm upgrade --install devops-platform ./helm/devops-platform \
--set secrets.databaseUrl="postgresql://..." \
--set secrets.secretKey="your-secret-key" \
--set secrets.redisUrl="redis://..." \
--set image.tag="sha-abc1234"cd terraform
terraform init
terraform plan -var="environment=staging" -var="db_password=SECRET"
terraform apply -var="environment=staging" -var="db_password=SECRET"
aws eks update-kubeconfig --name $(terraform output -raw eks_cluster_name)Cost note: EKS (~$0.10/hr) + RDS. Run
terraform destroywhen done.
cd app
pip install -r requirements.txt
pytest tests/ -v --cov=. --cov-report=term-missingTests use SQLite in-memory — zero external dependencies.
Why JWT over sessions? JWTs are stateless — no shared session store needed across pods. Each token is self-contained and verified with a secret key, making horizontal scaling trivial.
Why Redis caching with graceful degradation? Every Redis call is wrapped in try/except. If Redis goes down, the app falls back to Postgres transparently — no cascading failure.
Why Helm over raw Kubernetes manifests? Helm templates let you deploy to staging and prod from the same chart with different values. No copy-pasting manifests, no config drift.
Why k6 in CI? Catching performance regressions before they hit production. The pipeline fails if p95 latency exceeds 500ms or error rate exceeds 1%.
Why Alembic instead of create_all?
create_all can't alter existing tables and has no rollback.
Alembic gives versioned, reversible migrations that run automatically on startup.
Why maxUnavailable: 0 in rolling updates?
With 2 replicas, allowing 1 unavailable = 50% capacity loss during deploy.
maxUnavailable: 0 + maxSurge: 1 ensures zero-downtime deployments.
Why soft delete?
Hard deletes make production debugging much harder.
is_active = false means we can always audit what happened.
DevOps-Platform/
├── app/
│ ├── main.py # routes, rate limiting, JWT protection
│ ├── auth.py # JWT token creation and validation
│ ├── cache.py # Redis layer with graceful degradation
│ ├── tracing.py # OpenTelemetry / Jaeger setup
│ ├── database.py
│ ├── models.py
│ ├── schemas.py
│ ├── crud.py
│ ├── Dockerfile # multi-stage, non-root, healthcheck
│ ├── migrations/
│ │ └── versions/
│ │ ├── 0001_initial_items_table.py
│ │ └── 0002_add_users_table.py
│ └── tests/
│ └── test_main.py # 11 tests, SQLite in-memory
├── helm/
│ └── devops-platform/ # Helm chart (Deployment, Service, HPA, Secret)
├── load-tests/
│ └── smoke.js # k6 load test (runs in CI)
├── docker-compose.yml # app + postgres + redis + jaeger + grafana
├── Makefile
├── monitoring/
│ └── prometheus.yml
├── k8s/ # raw manifests (superseded by Helm chart)
├── terraform/ # AWS VPC + EKS + RDS
└── .github/
└── workflows/
├── ci.yml # lint → test → security → build → load-test
└── cd.yml # staging → approval → production
- ArgoCD for GitOps-style continuous delivery
- AWS Secrets Manager integration
- Alembic migrations in the CD pipeline
- JWT refresh tokens
- Grafana dashboards as code
MIT