diff --git a/docs/.pages b/docs/.pages index a2142ed..d6083f3 100644 --- a/docs/.pages +++ b/docs/.pages @@ -1,5 +1,6 @@ nav: - index.md - project + - design - documentation - appendices diff --git a/docs/design/.pages b/docs/design/.pages new file mode 100644 index 0000000..e12805f --- /dev/null +++ b/docs/design/.pages @@ -0,0 +1,7 @@ +title: Design +nav: + - index.md + - stories + - requirements + - architecture + - specifications diff --git a/docs/design/architecture/architecture.md b/docs/design/architecture/architecture.md new file mode 100644 index 0000000..89cd997 --- /dev/null +++ b/docs/design/architecture/architecture.md @@ -0,0 +1,163 @@ +# Architecture + +Un schéma d'implémentation détaillé sera établi pour décrire l'architecture globale de l'infrastructure, y compris les différents composants, leur interaction et leur déploiement. + + +## Application + + +### Proxy: Traefik + +* Application router +* High-availability by fault tolerance +* Load-balancing by requests distribution + ALB ? +* Helm chart +* Version + + +### Application: FastAPI + +* Python + * Version: + * Packaging +* Project management? Poetry? + + +### Databases: PostgreSQL + +* Application is using the well known and battle-tested OSS PostgreSQL database management system. + + FIXME pg version + +* Use a specialized Kubernetes operator to manage the PostgreSQL cluster + + Zalando's **postgres-operator** is chosen because... + FIXME main reason(s) to choose it over alternatives + + Alternative OSS solutions available are: ... + FIXME list alternative solution considered + +* Helm chart +* Single read/write Master / Multi read-only slaves architectur + + It provides both high availability and read load balancing, without the additional complexity of multi-master configuration, which the application doesn't require at the moment. + +* Master failover + + +## Infrastructure + + +### Network + + +#### Application Load Balancer (ALB) + + +#### Virtual Private Cloud (VPC) + +* VPC subnet + + +#### Internet Gateways (IGW) + +* Global egress + + +#### Availability Zones (AZ) + +* 3 availability zones for high availability + + +#### NAT Gateways (NGW) + +* Per AZ egress +* In public subnet + + +#### Subnets + + +##### Public subnets + +* Application access +* Bastion access +* NAT gateways ? + + +##### Private subnets + +* Database +* Admin + + +### Compute + +#### Nodes + +* EC2 compute nodes + * Application nodes + * Database nodes + * Bastion node +* Auto-scaling groups + + +#### Orchestration: Kubernetes + +#### Pods + +##### ReplicaSets + +* Traefik +* Application +* Database +* cert-manager +* Metrics server +* bastion + +##### DaemonSets + +* Any ? + + +### Storage + +#### CSI + +##### EBS + +* For persistent storage needs (application data, logs, and monitoring metrics) AWS Elastic Block Storage (EBS) is used to provide persistent block devices to stateful pods +* To integrate EBS with Kubernetes, the Container Storage Interface is leveraged by using the `aws-ebs-csi-driver` cluster addon from EKS which registers EBS volumes as a backend for K8s Persistant Volumes. + + + +## CI/CD + +### GitHub Actions + + +#### Actions + +#### Workflows + +* Application +* Infrastructure + + +## Observability + +### Log management (ELK/EFK) + +### Metrics (Prometheus/Grafana) + +### Event and alerting + + +## Business continuity & Disaster Recovery + +### TODO Recap HA features + +### Backup + +### Disaster Recovery diff --git a/docs/design/architecture/architecture_outline.md b/docs/design/architecture/architecture_outline.md new file mode 100644 index 0000000..e00824b --- /dev/null +++ b/docs/design/architecture/architecture_outline.md @@ -0,0 +1,236 @@ +# Architecture outline + +Kanban = Pull system +WIP = 2 maximum + +? GH Action link status to labels + + +## Architecture requirements + +[The twelve-factor app](https://12factor.net/) + +* Microservices application + * N-tier + * Containerized +* Cloud (EKS)/Kubernetes based infrastructure +* Automated provisioning and deployment +* High availability +* Auto-scaling +* Load balancing +* Backup +* Disaster Recovery +* Versioning and rollback management of application and infrastructure +* Network isolation +* Identity and Access Management +* Docs as first class citizen + + +## Architecture components + + +### Application stack + +* Reverse-Proxy / Load-Balancer: Traefik +* Application: FastAPI + * Language: Python +* Base de Données: + * AWS RDS + * Multi-AZ + * Master/Slave + * AWS Aurora + * PostgreSQL + + +## Infrastructure stack + +* Compute nodes (EC2) + * Node types + * NodeGroups + * App, DB, Bastion, Logs, Metrics + * Spread across private subnets +* Container orchestration: EKS / GCP / k8s + * Namespaces + * (Dev /) Staging / Production + * Others ? + * Deployments / ReplicaSets + * DaemonSets + * Services + * Ingress + * ConfigMaps + * Secrets + * Horizontal Pod Autoscalers (HPAs) +* Network + * DNS + * Route 53 + * Resolver/AmazonProvidedDN has VPC+2 IP address + * Public IP + * Internet gateway + * Number ? + * Region + * Number: eu-west-3 + * Choice + * VPC + * Availability zones + * Three availability zones in the chosen AWS region (eu-west-3) + * Subnets + * Public subnets + * Private subnets + * Application subnets: host application service nodes/containers + * Database subnets: host database service nodes/containers + * Admin subnets: host system administration and operation services + * CIDRs + | --- | --- | --- | + | VPC || 10.0.0.0/13 | + | Public subnets || 10.6.0.0/22 | + | | Spare subnet capacity | 10.6.0.0/24 | + | | Availability Zone 1 | 10.6.1.0/24 | + | | Availability Zone 2 | 10.6.2.0/24 | + | | Availability Zone 3 | 10.6.3.0/24 | + | App subnets || 10.4.0.0/22 | + | | Spare subnet capacity | 10.4.0.0/24 | + | | Availability Zone 1 | 10.4.1.0/24 | + | | Availability Zone 2 | 10.4.2.0/24 | + | | Availability Zone 3 | 10.4.3.0/24 | + | Database subnets with dedicated custom network ACL || 10.2.0.0/22 | + | | Spare subnet capacity | 10.2.0.0/24 | + | | Availability Zone 1 | 10.2.1.0/24 | + | | Availability Zone 2 | 10.2.2.0/24 | + | | Availability Zone 3 | 10.2.3.0/24 | + | Admin subnets with dedicated custom network ACL || 10.0.0.0/22 | + | | Spare subnet capacity | 10.0.0.0/24 | + | | Availability Zone 1 | 10.0.1.0/24 | + | | Availability Zone 2 | 10.0.2.0/24 | + | | Availability Zone 3 | 10.0.3.0/24 | + | Spare subnets capacity || 10.[1,3,5,7].0.0/22| + * IP addressing + * 5 IP addresses in each subnet CIDR block are reserved and unavailable for use + * VPC Endpoint(s) + * NAT Gateways for each public subnet + * Security + * Security Groups + * Stateful Inbound only: source / port range / protocol + * Act as virtual stateful inbound firewalls with a `deny all`` default rule + * All evaluated before decision + * Default VPC security group + * Network ACLs + * Subnet level stateless inbound and outbound access control + * Evaluated on a first match basis + * Extra layer of security to security groups + * Default VPC NACL: all inbound and outbound network traffic + * Special **`*`** rule: denies traffic if no rule matches + * Number / Type / Protocol / Port range / Source / Destination / Allow or Deny + * Security SSL/TLS + * CertManager + * Bastion + * SSH + * VPN +* Storage + * Persistent Volumes + * Block Storage: EBS + * File Storage: ? +* Artifacts registries + * ECR / GCR / Private registry + + +## IaC / GitOps stack + +* Git +* Terraform + * Providers + * State storage (S3?) + * Versioning + * RBAC Policies + * Terraform cloud ? +* Helm + * Charts + * Values +* Bash or Python maintenance and glue scripts +* Go or Jinja templating +* YAML data +* GitHub Actions + + +## CI/CD + +* Developers + * Code + * Commit + * Push +* GitHub Repository + * Branches + * Pull Request + * Merge +* GitHub Actions + * Triggers + * Repository events + * Manual + * Environment creation + * Checkout + * Actions + * Test code + * Static analysis + * Unit testing + * Code security testing + * Build testing + * Build artifacts + * Test artifacts + * Images security (Clair, Trivy) + * Release (push) staging artifacts to registry(ies) + * Deploy to Staging + * Test Staging deployment + * User Acceptance Testing / Functionnal testing + * Load testing + * DAST; Dynamic Application Security Testing + * Release (push) stable artifacts to registry(ies) + * Deploy to production + +* Pipelines + * Separation per microservice + docs + +* Environments + * Dev: provides app view/access to dev code + * Test: application testing + * Staging: application+env test + + +## Observability + + +### Architecture + +* ELK / Graylog ? +* Prometheus +* Grafana +* Auto-discovery / agents + * Exporters (FastAPI ?) + * Filebeat / Fluentbit + * rsyslog +* Alerting ? + + +### Logs + +* System logs +* Application logs +* Process logs + +### Metrics + +* Compute statistics + * CPU + * Memory +* Network statistics + * Latency + * Bandwith +* Storage statistics + * IOPS + * Bandwidth +* Services statistics + * HTTP statistics + * Database statistics +* Processes statistics + * Deployment statistics + * Testing statistics + +### Tracing diff --git a/docs/design/index.md b/docs/design/index.md new file mode 100644 index 0000000..258f83f --- /dev/null +++ b/docs/design/index.md @@ -0,0 +1,13 @@ +# Project requirements and design + +A DevOps project implementing a containerized microservices application architecture in the Cloud requires a detailed study. + +User and system needs are discovered and gathered by writing user stories. + +From these, functional and non-functional requirements can be performed can be extracted to ground and guide system design. + +Passed careful elicitaion, we focus our attention on the specific requirements for the key components of the system. + +Finally, we can design a target architecture and specify system characterics and parameters. + +Choices are logged in versionned decision records kept together with the code. diff --git a/docs/design/requirements/.pages b/docs/design/requirements/.pages new file mode 100644 index 0000000..3b80b74 --- /dev/null +++ b/docs/design/requirements/.pages @@ -0,0 +1,5 @@ +title: Requirements +nav: + - index.md + - tech_requirements.md + - key_components.md diff --git a/docs/design/requirements/index.md b/docs/design/requirements/index.md new file mode 100644 index 0000000..fc6c49e --- /dev/null +++ b/docs/design/requirements/index.md @@ -0,0 +1 @@ +# Project requirements diff --git a/docs/project/requirements/key_components.md b/docs/design/requirements/key_components.md similarity index 100% rename from docs/project/requirements/key_components.md rename to docs/design/requirements/key_components.md diff --git a/docs/design/requirements/principles.md b/docs/design/requirements/principles.md new file mode 100644 index 0000000..91c5cbe --- /dev/null +++ b/docs/design/requirements/principles.md @@ -0,0 +1,15 @@ +# Architecture principles + + +## Microservices + +Each functionality of the application has been broken down into autonomous, independent and specialized services. +Each microservice is responsible for a specific part of the application, making it easy to maintain, deploy and scale each component individually. + +Microservices communicate with each other via REST APIs, enabling flexible and decentralized communication. +This architecture allows the application to benefit from greater scalability, resilience and ease of deployment of new features. + +## The Twelve-Factor app methodology + + +## Well-architected frameworks and industry best practices diff --git a/docs/project/requirements/tech_requirements.md b/docs/design/requirements/tech_requirements.md similarity index 89% rename from docs/project/requirements/tech_requirements.md rename to docs/design/requirements/tech_requirements.md index 651f961..6f5a0b8 100644 --- a/docs/project/requirements/tech_requirements.md +++ b/docs/design/requirements/tech_requirements.md @@ -1,4 +1,4 @@ -## Architectural and Technical Requirements +# Architectural and Technical Requirements -### Functional Requirements +## Functional Requirements -#### Containerized Microservices Architecture (Kubernetes) +### High Performance Asynchronous API Service (FastAPI) + +- **Requirement**: Develop APIs capable of handling high concurrency. +- **Details**: + - Use FastAPI for its asynchronous features. + - Ensure APIs are scalable and can handle simultaneous requests efficiently. + - Implement proper error handling and validation in API endpoints. + + +### Containerized Microservices Architecture (Kubernetes) - **Requirement**: Implement a scalable and resilient containerized microservices architecture. - **Details**: @@ -28,7 +37,7 @@ - Enable service discovery and dynamic routing. -#### Automated CI/CD Pipeline (GitHub Actions) +### Automated CI/CD Pipeline (GitHub Actions) - **Requirement**: Automate testing, building, and deployment processes. - **Details**: @@ -37,16 +46,7 @@ - Automate deployment to different environments (staging, production). -#### High Performance Asynchronous API Service (FastAPI) - -- **Requirement**: Develop APIs capable of handling high concurrency. -- **Details**: - - Use FastAPI for its asynchronous features. - - Ensure APIs are scalable and can handle simultaneous requests efficiently. - - Implement proper error handling and validation in API endpoints. - - -#### Database High Availability (PostgreSQL) +### Database High Availability (PostgreSQL) - **Requirement**: Ensure data integrity and availability with database replication and failover. - **Details**: @@ -55,7 +55,7 @@ - Regularly test failover to ensure data integrity and availability. -#### Dynamic Resource Allocation (Kubernetes) +### Dynamic Resource Allocation (Kubernetes) - **Requirement**: Implement pod autoscaling based on usage metrics. - **Details**: @@ -64,7 +64,7 @@ - Ensure autoscaling does not impact ongoing transactions or operations. -#### Comprehensive Application Testing +### Comprehensive Application Testing - **Requirement**: Integrate end-to-end testing tools in the CI/CD pipeline. - **Details**: @@ -73,7 +73,7 @@ - Automate test execution as part of the CI/CD pipeline. -#### API Documentation Automation +### API Documentation Automation - **Requirement**: Ensure API documentation is auto-generated and stays current with code changes. - **Details**: @@ -82,7 +82,7 @@ - Ensure documentation is clear, accurate, and easily accessible. -#### Data Persistence (EBS with Kubernetes) +### Data Persistence (EBS with Kubernetes) - **Requirement**: Implement persistent storage solutions for Kubernetes. - **Details**: @@ -91,7 +91,7 @@ - Manage data backup and recovery processes. -#### Kubernetes Resource Management (Helm) +### Kubernetes Resource Management (Helm) - **Requirement**: Simplify and manage Kubernetes deployments using Helm. - **Details**: @@ -100,7 +100,7 @@ - Manage Helm chart versions and releases. -#### Zero-Downtime Deployment +### Zero-Downtime Deployment - **Requirement**: Implement deployment strategies to avoid downtime. - **Details**: @@ -112,12 +112,16 @@ These detailed functional requirements provide a clear roadmap for developing and managing the microservices architecture, ensuring robustness, scalability, and efficiency. They cover the core aspects of system functionality, from infrastructure setup to application deployment and operation. -### Non-Functional Requirements +## Non-Functional Requirements Detailing and structuring the architectural and technical non-functional requirements will focus on how the system should operate and the qualities it should possess. These requirements are crucial for ensuring the system's reliability, security, and efficiency. +Non-functional requirements play a crucial role in ensuring the system's resilience, efficiency, and alignment with technical, business, and compliance standards. + +These user stories focus on the specifics of the technology stack, how each component serves specific technical needs and contributes to the overall functionality and efficiency of the system, including performance, security, maintainability, and scalability. + -#### SSL/TLS Integration (Traefik) +### SSL/TLS Integration (Traefik) - **Requirement**: Ensure secure communication via SSL/TLS. - **Details**: @@ -126,7 +130,7 @@ Detailing and structuring the architectural and technical non-functional require - Integrate SSL/TLS termination with Traefik for encrypted traffic handling. -#### Code Quality Assurance (CI Pipeline) +### Code Quality Assurance (CI Pipeline) - **Requirement**: Maintain high standards of code quality. - **Details**: @@ -135,7 +139,7 @@ Detailing and structuring the architectural and technical non-functional require - Set up quality gates to prevent merging substandard code. -#### Kubernetes Cluster Monitoring (Prometheus) +### Kubernetes Cluster Monitoring (Prometheus) - **Requirement**: Implement comprehensive monitoring of Kubernetes clusters. - **Details**: @@ -144,7 +148,7 @@ Detailing and structuring the architectural and technical non-functional require - Configure alerts for abnormal metrics or system behaviors. -#### Effective Log Management (ELK Stack) +### Effective Log Management (ELK Stack) - **Requirement**: Efficient handling and analysis of logs. - **Details**: @@ -153,7 +157,7 @@ Detailing and structuring the architectural and technical non-functional require - Ensure real-time log analysis and accessibility. -#### Consistent Infrastructure Provisioning (Terraform) +### Consistent Infrastructure Provisioning (Terraform) - **Requirement**: Achieve consistent and repeatable cloud infrastructure setup. - **Details**: @@ -162,7 +166,7 @@ Detailing and structuring the architectural and technical non-functional require - Maintain Terraform state files for tracking and managing infrastructure changes. -#### Enhanced Kubernetes Security (Network Policies) +### Enhanced Kubernetes Security (Network Policies) - **Requirement**: Implement robust network security within Kubernetes. - **Details**: @@ -171,7 +175,7 @@ Detailing and structuring the architectural and technical non-functional require - Regularly audit and update network policies to address new security needs. -#### Routine Performance Benchmarking +### Routine Performance Benchmarking - **Requirement**: Establish performance benchmarks and regular evaluations. - **Details**: @@ -180,7 +184,7 @@ Detailing and structuring the architectural and technical non-functional require - Use benchmark results to guide performance optimization efforts. -#### GDPR Compliance in Logging/Monitoring +### GDPR Compliance in Logging/Monitoring - **Requirement**: Adhere to GDPR and other data protection regulations. - **Details**: @@ -189,7 +193,7 @@ Detailing and structuring the architectural and technical non-functional require - Regularly review and update compliance measures as regulations evolve. -#### Cloud Resource Cost Monitoring (AWS Tools) +### Cloud Resource Cost Monitoring (AWS Tools) - **Requirement**: Optimize and monitor cloud resource usage and costs. - **Details**: @@ -198,7 +202,7 @@ Detailing and structuring the architectural and technical non-functional require - Set up alerts for budget overruns or unexpected cost spikes. -#### DNS Routing and Service Discovery (Kubernetes) +### DNS Routing and Service Discovery (Kubernetes) - **Requirement**: Efficiently manage service accessibility and DNS routing. - **Details**: @@ -212,7 +216,7 @@ These categorized requirements give a clearer view of what the system is expecte These non-functional requirements address critical aspects such as security, compliance, cost management, and performance optimization. They are essential for the system's overall robustness, security, and operational efficiency, providing a comprehensive framework for the non-functional aspects of the system's architecture and technical setup. -#### Secure Admin Access (Bastion and VPN) +### Secure Admin Access (Bastion and VPN) - **Requirement**: Implement secure and controlled administrative access mechanisms. - **Details**: @@ -221,7 +225,7 @@ These non-functional requirements address critical aspects such as security, com - **Security Reviews and Assessments**: Regularly assess and update the security measures to ensure the highest level of protection against new threats. -#### Technical Content Management (Docs as Code, DocOps) +### Technical Content Management (Docs as Code, DocOps) - **Requirement**: Manage technical documentation efficiently using Docs as Code and DocOps methodologies. - **Details**: @@ -230,7 +234,7 @@ These non-functional requirements address critical aspects such as security, com - **Collaboration Tools**: Implement tools that support collaborative writing and reviewing. -#### GitOps +### GitOps - **Requirement**: Apply GitOps principles for managing and automating the cloud-native application lifecycle. - **Details**: @@ -239,7 +243,7 @@ These non-functional requirements address critical aspects such as security, com - **Backup and Recovery of Repositories**: Implement strategies for backing up Git repositories and quick recovery in case of data loss. -#### Event Monitoring and Alerting +### Event Monitoring and Alerting - **Requirement**: Set up a comprehensive event monitoring and alerting system. - **Details**: @@ -248,7 +252,7 @@ These non-functional requirements address critical aspects such as security, com - **Integration with Notification Systems**: Seamlessly integrate with communication tools for prompt alert dissemination. -#### API Endpoints Routing (Traefik) +### API Endpoints Routing (Traefik) - **Requirement**: Efficiently manage and route API endpoints using Traefik. - **Details**: @@ -257,7 +261,7 @@ These non-functional requirements address critical aspects such as security, com - **Dynamic Configuration**: Allow for dynamic changes in routing rules without disrupting service. -#### Terraform State Management +### Terraform State Management - **Requirement**: Ensure secure and efficient management of Terraform states. - **Details**: @@ -266,7 +270,7 @@ These non-functional requirements address critical aspects such as security, com - **Backup and Recovery**: Automate backups of the state files and setup clear recovery mechanism in case of corruption or loss. -#### Artifact/Container Registry +### Artifact/Container Registry - **Requirement**: Maintain a secure and efficient artifact and container registry. - **Details**: @@ -275,7 +279,7 @@ These non-functional requirements address critical aspects such as security, com - **CI/CD Integration**: Seamlessly integrate with CI/CD pipelines for automated pushing and pulling of artifacts/containers. -#### IP Address Management +### IP Address Management - **Requirement**: Automate and optimize the management of IP addresses. - **Details**: @@ -284,7 +288,7 @@ These non-functional requirements address critical aspects such as security, com - **Network Service Integration**: Ensure seamless integration with existing network infrastructure like DNS, DHCP, and other network services. -#### Identity and Access Management (IAM) +### Identity and Access Management (IAM) - **Requirement**: Implement comprehensive identity and access management solutions. - **Details**: @@ -293,7 +297,7 @@ These non-functional requirements address critical aspects such as security, com - **Audit and Compliance**: Maintain detailed access logs of access and changes for security audits and compliance with regulations. -#### Requirements Documentation and Traceability +### Requirements Documentation and Traceability - **Requirement**: Maintain comprehensive and traceable documentation of all system requirements. - **Details**: @@ -302,7 +306,7 @@ These non-functional requirements address critical aspects such as security, com - **Version Control**: Use version control systems to track changes in requirements over time. -#### Service Discovery and Service Mesh +### Service Discovery and Service Mesh - **Requirement**: Implement efficient service discovery mechanisms and service mesh architecture. - **Details**: @@ -311,7 +315,7 @@ These non-functional requirements address critical aspects such as security, com - **Resilience and Observability**: Ensure the service mesh provides enhanced resilience and observability features. -#### Nodes and Containers Hardening +### Nodes and Containers Hardening - **Requirement**: Harden nodes and containers to enhance security. - **Details**: @@ -320,7 +324,7 @@ These non-functional requirements address critical aspects such as security, com - **Regular Security Scans**: Conduct regular security scans and vulnerability assessments. -#### Network Segmentation/Separation +### Network Segmentation/Separation - **Requirement**: Implement network segmentation to improve security and manageability. - **Details**: @@ -329,7 +333,7 @@ These non-functional requirements address critical aspects such as security, com - **Monitoring and Logging**: Monitor network segments for unusual activities and maintain logs for security and troubleshooting. -#### Backup +### Backup - **Requirement**: Establish a robust backup strategy for all critical components and data. - **Details**: @@ -338,7 +342,7 @@ These non-functional requirements address critical aspects such as security, com - **Backup Testing**: Regularly test backups to ensure data integrity and recoverability. -#### Disaster Recovery +### Disaster Recovery - **Requirement**: Develop and implement a comprehensive disaster recovery plan. - **Details**: @@ -347,7 +351,7 @@ These non-functional requirements address critical aspects such as security, com - **Offsite Storage**: Use offsite storage solutions for critical backups to safeguard against local disasters. -#### PKI and Certificate Authority +### PKI and Certificate Authority - **Requirement**: Establish and manage a Public Key Infrastructure (PKI) and Certificate Authority for digital certificate management. - **Details**: diff --git a/docs/design/specifications/specifications.md b/docs/design/specifications/specifications.md new file mode 100644 index 0000000..247b185 --- /dev/null +++ b/docs/design/specifications/specifications.md @@ -0,0 +1 @@ +# Specifications diff --git a/docs/design/stories/.pages b/docs/design/stories/.pages new file mode 100644 index 0000000..9597a67 --- /dev/null +++ b/docs/design/stories/.pages @@ -0,0 +1,5 @@ +title: User stories +nav: + - index.md + - Functional: functional.md + - Non-functional: non-functional.md diff --git a/docs/design/stories/functional.md b/docs/design/stories/functional.md new file mode 100644 index 0000000..db1acb8 --- /dev/null +++ b/docs/design/stories/functional.md @@ -0,0 +1,85 @@ +# Functional user stories + +The functional user stories present specific functionalities and actions that our Traefik/FastAPI/PostgreSQL application on Kubernetes must perform, encompassing a range of functionalities from infrastructure setup to application behavior and user interaction. + +These stories outline the core operational capabilities and tasks the system is expected to execute, defining the primary objectives and features essential for the project's success. + +This section serves as a roadmap for the development and implementation of key functionalities, ensuring that our system meets the desired operational goals and user needs. + +## Application user + +- As an application user, I expect the system to remain available no mater what problem the system may face. + +- As an application user, I expect the system to automatically scale during peak usage times to maintain performance and availability. + + +## Architects & DevOps + +- As a system architect, I need to design a scalable microservices architecture so that the application can handle varying loads efficiently. + +- As a DevOps engineer, I need to automate server provisioning to ensure a consistent and repeatable cloud environment setup. + +- As a cloud engineer, I need to integrate managed Kubernetes, as well as other cloud services for enhanced functionality and compliance with industry and provider recommended best practices. + +- As a DevOps team lead, I want to establish a continuous feedback loop with the development team to iteratively improve the system based on real-time usage data. + + +## Ops + +- As a system administrator, I want to automate server creation using scripts and IaC tools and techniques so that the infrastructure setup is efficient (fast, unattended), reliable (tested) and repeatable. + +- As a network administrator, I need to configure an application router and reverse proxy to efficiently route incoming requests to the appropriate services. + +- As a Kubernetes operator, I want to configure clusters with appropriate resource allocation and scaling policies to manage application deployment effectively. + +- As an infrastructure manager, I need to ensure high availability and disaster recovery capabilities for the Kubernetes cluster and its workloads. + +- As a performance analyst, I want to track and optimize the resource utilization of the cloud infrastructure to maintain cost-effectiveness. + +- As a system administrator, I want to implement a solution for monitoring system performance and setting up alerts for any anomalies. + +- As a system administrator, I need to configure services for efficient log collection, analysis, and visualization. + + +## Devs & DBAs + +- As a developer, I need a continuous integration process that automatically tests the latest code changes with static analysis and unit testing tools, ensuring code quality. + +- As a developer, I need a continuous deployment process that automatically deploys the latest code changes to a test environment, ensuring rapid feedback and iteration. + +- As a developer, I want to utilize a solution for easy packaging, deployment and management of cluster resources. + +- As a backend developer, I want to build RESTful APIs using FastAPI that are well-documented and easy to consume for front-end developers. + +- As a database administrator, I need to design a normalized PostgreSQL database schema that supports efficient data retrieval and storage. + +- As a database administrator, I want to use state of the art, cloud ready open source Relational Database Management System. + + +## Testing & QA + +- As a CI/CD engineer, I need to set up workflows for automating the build, test, and deployment processes of the application. + +- As a QA engineer, I want to integrate automated testing in the CI/CD pipeline to ensure code quality and reliability. + +- As a DevOps engineer, I want to automatically test and provision infrastructure using IaC code. + + +## Technical Content Creators & UX/DX + +- As a documentation writer, I need to create comprehensive user guides and API documentation for the system to facilitate easy usage and maintenance. + +- As a devolopper experience manager, I want the application API documentation to be constantly updated, synced with development, and published. + +- As a system architect, I want the requirements and specifications documents for the system to be tracked and versioned. + +- As a technical content manager, I want all docs to be managed as code using lightweight markup languages text diagram syntaxes. + + +## Compliance & Security Specialists + +- As a security specialist, I want to implement robust security measures in the infrastructure setup to protect against vulnerabilities and attacks. + +- As a security analyst, I need to enforce role-based access control in Kubernetes to provide secure and restricted access to the cluster resources. + +- As a compliance officer, I need to ensure that the entire application stack, including all tools and processes, adheres to relevant industry standards and regulations. diff --git a/docs/design/stories/index.md b/docs/design/stories/index.md new file mode 100644 index 0000000..7b33130 --- /dev/null +++ b/docs/design/stories/index.md @@ -0,0 +1,20 @@ +# Architectural and Technical User Stories + + + +Theses stories try to capture the requirements pertinent to the architecture, development, deployment and operation of a performant, robust, scalable, and secure microservices application in a cloud environment. + +Two distinct sets of user stories guide our project’s development and operational approach. + +The first set comprises functional user stories, which concentrate on the specific actions, behaviors, and capabilities the system must exhibit to meet the core operational goals. + +In contrast, the second set consists of non-functional user stories. These are essential in outlining the operational qualities and constraints of our system. + +Together, these two sets of user stories create a comprehensive blueprint that addresses both the 'what' and the 'how' aspects of the system, ensuring a balanced focus on functionality and operational excellence. + +They should provide a solid foundation for the design, development, deployment, and maintenance of the system, ensuring that all critical aspects are addressed. diff --git a/docs/design/stories/non-functional.md b/docs/design/stories/non-functional.md new file mode 100644 index 0000000..0e2f050 --- /dev/null +++ b/docs/design/stories/non-functional.md @@ -0,0 +1,68 @@ +# Non functional user stories + +The non-functional user stories delve into the operational qualities, performance standards, and constraints that shape the efficiency and reliability of our Traefik/FastAPI/PostgreSQL application on Kubernetes. + +These stories encompass critical aspects such as security, scalability, maintainability, and compliance, which are pivotal for the system's overall robustness and effectiveness. + +They are integral to ensuring that, beyond its functionalities, the system adheres to the highest standards of operational excellence and meets the broader technical and business requirements. + + +## Application user + +- As an application user, I expect quick and seamless deployment of updates without downtime using rolling updates or blue-green deployments in Kubernetes. + + +## Architects and DevOps + +- As a system architect, I want to design a containerized application architecture using orchestration to ensure scalability and resilience. + +- As a DevOps engineer, I need to implement a CI/CD pipeline using a workflow pipeline that automates testing, building, and deployment of microservices. + +- As a cloud engineer, I need to use an IaC tool and scripts to automate the provisioning of resources, ensuring infrastructure consistency and rollback capability. + +- As a DevOps engineer, I want to establish a performance benchmarking routine for the application to ensure it meets the desired performance criteria. + +- As an infrastructure engineer, I need to integrate a storage solution i ensuring data persistence across stateful pods restarts. + + +## Ops + +- As a network administrator, I need to set up internal and external DNS routing to efficiently manage service discovery and accessibility. + +- As a container cluster operator, I want to set up horizontal autoscaling based on CPU and memory usage metrics to handle load dynamically. + +- As a system administrator, I want to set up detailed monitoring of container cluster metrics and application performance indicators. + +- As a logging manager, I need to configure log rotation and archiving strategies to manage log data effectively. + +- As a performance analyst, I want to monitor cloud resources using cost management tools and optimize where necessary. + + +## Devs & DBAs + +- As a backend developer, I want to utilize a modern Python framework to create asynchronous APIs that can handle high volumes of requests concurrently. + +- As a database administrator, I need to configure replication and failover mechanisms to ensure data integrity and availability. + +- As a developer, I want a solution to manage cluster resource complexities and simplifying the deployment process. + + +## Testing & QA + +- As a CI/CD engineer, I need to integrate static code analysis tools in the CI pipeline to enforce code quality standards. + +- As a QA engineer, I want to implement end-to-end testing using tools in the CI/CD pipeline for thorough application testing. + + +## Compliance & Security Specialists + +- As a security specialist, I want to integrate SSL/TLS termination for secure communication and data protection. + +- As a security analyst, I need to set up cluster network policies to restrict internal traffic flows, enhancing network security. + +- As a compliance officer, I need to ensure logging and monitoring setups comply with GDPR and other relevant data protection regulations. + + +## Technical Content Creators & UX/DX + +- As a documentation writer, I need a solution to auto-generate API documentation that stays up-to-date with code changes. diff --git a/docs/project/.pages b/docs/project/.pages index 991c7ae..7a77792 100644 --- a/docs/project/.pages +++ b/docs/project/.pages @@ -1,9 +1,9 @@ title: Project nav: - - index.md - - introduction.md + # - index.md + # - introduction.md - overview - - requirements + - app_overview.md - skills.md - situations - conclusion.md diff --git a/docs/project/requirements/app_overview.md b/docs/project/app_overview.md similarity index 100% rename from docs/project/requirements/app_overview.md rename to docs/project/app_overview.md diff --git a/docs/project/overview/.pages b/docs/project/overview/.pages index 93d8d10..43d8789 100644 --- a/docs/project/overview/.pages +++ b/docs/project/overview/.pages @@ -1,6 +1,6 @@ title: Project overview nav: - - index.md +# - index.md - brief.md - context.md - objectives.md diff --git a/docs/project/overview/brief.md b/docs/project/overview/brief.md index b42677d..42ede8e 100644 --- a/docs/project/overview/brief.md +++ b/docs/project/overview/brief.md @@ -8,7 +8,7 @@ tags: An example of a generic Python RESTful API application is proposed by DataScientest in a GitHub [repository](https://github.com/DataScientest/fastapi-docker-traefik). The project may also be suitable for those who want to create an application or modify it. -The candidate is asked to automatically deploy the application as a microservices architecture, on a completely new environment, in a secure way, and with API endpoinys proxied through {{Traekfik}}. +The candidate is asked to automatically deploy the application as a microservices architecture, on a completely new environment, in a secure way, and with API endpoinys proxied through {{Traefik}}. The infrastructure should be containerized and orchestrated by {{Kubernetes}}, deployed in the Cloud on Amazon EKS, or on a local environment. @@ -29,4 +29,4 @@ Project deliverables include: * A presentation slide desk for the project defense * A professional portfolio demonstrating DevOps related experiences -The full [DevOps project directives](../../appendices/directives/devops_project.md) is available in the appendices of the documentation. +The full [DevOps project directives](../../appendices/directives/devops_project.md) are available in the appendices of the documentation. diff --git a/docs/project/overview/context.md b/docs/project/overview/context.md index a2bc15e..98bebf5 100644 --- a/docs/project/overview/context.md +++ b/docs/project/overview/context.md @@ -12,19 +12,19 @@ This project is an integral part of the assessment for the {{DevU}} {{DevOps_Cer It serves as a comprehensive demonstration of the candidate's expertise and proficiency in key areas of {{DevOps}} practices. -The project aims to showcase a holistic approach to software development, deployment, and operations, emphasizing modern methodologies and tools like Agile Software Development Life Cycle, Continuous Integration and Continuous Deployment, Observability, and Infrastructure as Code. +The project aims to showcase a holistic approach to software development, deployment, and operations, emphasizing modern methodologies and tools like Agile Software Development Life Cycle, Continuous Integration and Deployment, Observability, and Infrastructure as Code. ## Purpose of the Project -The primary purpose of this project is to evaluate the candidate's ability to architect, build, deploy, and monitor a microservices-based application in a cloud environment, specifically within a Kubernetes cluster. +The primary purpose of this project is to evaluate the candidate's ability to architect, build, deploy, and monitor a microservices-based application system and CI/CD pipeline in a cloud environment, specifically within a Kubernetes cluster. It aims to assess the candidate's skill set in applying DevOps principles and practices to real-world scenarios, highlighting the importance of automation, continuous integration and deployment, and monitoring in the software development lifecycle and IT operations. The project is designed to showcase a wide range of skills and knowledge essential for a DevOps engineer. This includes, but is not limited to: -- Architecting a scalable and resilient microservices application. +- Architecting a scalable and resilient microservices application infrastructure. - Building the application with a focus on automation and best practices in software development. - Deploying the application to a container cluster, demonstrating skills in orchestration. - Implementing CI/CD tools and processes to enable automated testing and deployment pipelines. diff --git a/docs/project/overview/objectives.md b/docs/project/overview/objectives.md index 8b6b85f..68d2520 100644 --- a/docs/project/overview/objectives.md +++ b/docs/project/overview/objectives.md @@ -9,7 +9,7 @@ tags: ## Define application requirements and specification -- Analyze business needs to discover and gather the application's functional and non-functional requirements. +- Analyze application needs to discover and gather its functional and non-functional requirements. - Elicit and prioritize requirements to build a project plan for the application. @@ -41,7 +41,7 @@ tags: - Establish a test and pre-production environment for validating updates and configurations before they are deployed to production. -- Configure backup and Disaster recovery +- Configure backup and disaster recovery - Implement security best practices throughout diff --git a/docs/project/overview/planning.md b/docs/project/overview/planning.md index dfdbecc..faff8c8 100644 --- a/docs/project/overview/planning.md +++ b/docs/project/overview/planning.md @@ -8,19 +8,14 @@ tags: # Planning -## Stage 0: Project framing +## Stage 0: Project framing, ends 2023.10.25 ### Activities * Inception meeting -### Deadline - -2023.10.25 - - -## [Stage 1: Requirements specifications and Project setup](https://github.com/DevOps-Boot/fastapi-k8s/milestone/1) +## [Stage 1: Requirements specifications and Project setup](https://github.com/DevOps-Boot/fastapi-k8s/milestone/1), ends 2023.10.30 ### Activities @@ -42,12 +37,7 @@ tags: * Implementation architecture and workflow schemas -### Deadline - -2023.10.30 - - -## [Stage 2: Configure, deploy and secure application infrastructure](https://github.com/DevOps-Boot/fastapi-k8s/milestone/2) +## [Stage 2: Configure, deploy and secure application infrastructure](https://github.com/DevOps-Boot/fastapi-k8s/milestone/2), ends 2023.11.10 ### Activities @@ -70,12 +60,7 @@ tags: * Use Cloud services to host and deploy the application -### Deadline - -2023.11.10 - - -## [Stage 3: Data ingestion and consumption](https://github.com/DevOps-Boot/fastapi-k8s/milestone/3) +## [Stage 3: Data ingestion and consumption](https://github.com/DevOps-Boot/fastapi-k8s/milestone/3), ends 2023.11.20 ### Activities @@ -99,16 +84,11 @@ tags: * Database and/or storage implementation files * Query files -### Deadline -2023.11.20 - - -## [Stage 4: CI/CD pipelines](https://github.com/DevOps-Boot/fastapi-k8s/milestone/4) +## [Stage 4: CI/CD pipelines](https://github.com/DevOps-Boot/fastapi-k8s/milestone/4), ends 2023.11.25 !!! note This is the most important step in the project, as it is the heart of the DevOps training. -!!! ### Activities @@ -130,12 +110,7 @@ The environments simply have to be separated. Source code should go through automated tests before being deployed to production to ensure correct behavior. -### Deadline - -2023.11.25 - - -## [Stage 5: Monitoring and Observability](https://github.com/DevOps-Boot/fastapi-k8s/milestone/5) +## [Stage 5: Monitoring and Observability](https://github.com/DevOps-Boot/fastapi-k8s/milestone/5), ends 2023.12.01 ### Activities @@ -158,12 +133,7 @@ Using a monitoring system, which includes a server, collection agents and a moni * Define automatic actions in response to these alarms. -### Deadline - -2023.12.01 - - -## Stage 6: Automation +## Stage 6: Automation, ends 2023.12.05 **Automate deployment of infrastructure, environments, and application.** @@ -173,12 +143,7 @@ Using a monitoring system, which includes a server, collection agents and a moni These templates will be reusable, repeatable regardless of the deployed environment -### Deadline - -2023.12.05 - - -## Training certification project demonstration and defense +## Training certification project demonstration and defense, ends 2023.12.12 For the demonstration, the application should be: @@ -200,12 +165,7 @@ Particular attention will be given to the form of intermediate renderings. Inclu If the renderings are not up to par and delivered on time, your project cannot be validated. -### Deadline - -Semaine du 12 Décembre - - -## Final interview +## Final interview, from 20240122 To validate the {{RNCP36061}}: @@ -216,4 +176,4 @@ The Professional File (PD) completed by the candidate no later than 15 days befo The presentation of the project carried out before the session; The written report (Word format) or technical file of the project. -Tous les détails concernant la validation se trouvent dans [ce document](../appendices/directives/rncp_validation.md) +Tous les détails concernant la validation se trouvent dans [ce document](../../appendices/directives/rncp_validation.md) diff --git a/docs/project/requirements/.pages b/docs/project/requirements/.pages deleted file mode 100644 index 8540f23..0000000 --- a/docs/project/requirements/.pages +++ /dev/null @@ -1,7 +0,0 @@ -title: Project requirements -nav: - - index.md - - app_overview.md - - user_stories.md - - tech_requirements.md - - key_components.md diff --git a/docs/project/requirements/index.md b/docs/project/requirements/index.md deleted file mode 100644 index 7deea79..0000000 --- a/docs/project/requirements/index.md +++ /dev/null @@ -1,14 +0,0 @@ -# Project requirements - - -A DevOps project implementing a containerized microservices application architecture in the Cloud requires a detailed study and careful elicitation of the functional and non-functional requirements. - -We start by gathering needs by collecting user stories, from which we extract technical requirements. Finally, before specifying and design the target architecture, we focus our attention on the specific requirements for the key components of the system. - -[Application Overview](app_overview.md) - -[Architectural and Technical User Stories](user_stories.md) - -[Architectural and Technical Requirements](tech_requirements.md) - -[Specific Requirements for Key Components](key_componenents.md) diff --git a/docs/project/requirements/user_stories.md b/docs/project/requirements/user_stories.md deleted file mode 100644 index 3337fd0..0000000 --- a/docs/project/requirements/user_stories.md +++ /dev/null @@ -1,149 +0,0 @@ -# Architectural and Technical User Stories - - - -## Functional user stories - -Theses stories try to capture the functional requirements pertinent to the architecture and operation of a robust, scalable, and secure microservices application in a cloud environment. - -They should provide a solid foundation for the development, deployment, and maintenance of the system, ensuring that all critical aspects are addressed. - - -### Application user - -- As an application user, I expect the system to remain available no mater what problem the system may face. - -- As an application user, I expect the system to automatically scale during peak usage times to maintain performance and availability. - - -### Architects & DevOps - -- As a system architect, I want to design a scalable microservices architecture so that the application can handle varying loads efficiently. - -- As a DevOps engineer, I need to automate server provisioning using Terraform scripts to ensure a consistent and repeatable cloud environment setup. - -- As a cloud engineer, I need to integrate Kubernetes and Amazon EKS, as well as other AWS services for enhanced functionality and compliance with AWS best practices. - -- As a DevOps team lead, I want to establish a continuous feedback loop with the development team to iteratively improve the system based on real-time usage data. - - -### Ops - -- As a system administrator, I want to automate server creation using scripts and IaC tools and techniques so that the infrastructure setup is efficient (fast, unattended), reliable (tested) and repeatable. - -- As a network administrator, I need to configure Traefik as a reverse proxy to efficiently route incoming requests to the appropriate FastAPI services. - -- As a Kubernetes operator, I want to configure Kubernetes clusters with appropriate resource allocation and scaling policies to manage application deployment effectively. - -- As an infrastructure manager, I need to ensure high availability and disaster recovery capabilities for the Kubernetes cluster and its workloads. - -- As a performance analyst, I want to track and optimize the resource utilization of the cloud infrastructure to maintain cost-effectiveness. - -- As a system administrator, I want to implement Prometheus and Grafana for monitoring system performance and setting up alerts for any anomalies. - -- As a system administrator, I need to configure the ELK stack for efficient log collection, analysis, and visualization. - - -### Devs & DBAs - -- As a developer, I need a continuous integration process that automatically tests the latest code changes with static analysis and unit testing tools, ensuring code quality. - -- As a developer, I need a continuous deployment process that automatically deploys the latest code changes to a test environment, ensuring rapid feedback and iteration. - -- As a developer, I want to utilize Helm charts for easy deployment and management of Kubernetes resources. - -- As a backend developer, I want to build RESTful APIs using FastAPI that are well-documented and easy to consume for front-end developers. - -- As a database administrator, I need to design a normalized PostgreSQL database schema that supports efficient data retrieval and storage. - - -### Testing & QA - -- As a CI/CD engineer, I need to set up GitHub Actions workflows for automating the build, test, and deployment processes of the application. - -- As a QA engineer, I want to integrate automated testing in the CI/CD pipeline to ensure code quality and reliability. - - -### Technical Content Creators & UX/DX - -- As a documentation writer, I need to create comprehensive user guides and API documentation for the system to facilitate easy usage and maintenance. - - -### Compliance & Security Specialists - -- As a security specialist, I want to implement robust security measures in the infrastructure setup to protect against vulnerabilities and attacks. - -- As a security analyst, I need to enforce role-based access control in Kubernetes to provide secure and restricted access to the cluster resources. - -- As a compliance officer, I need to ensure that the entire application stack, including all tools and processes, adheres to relevant industry standards and regulations. - - -## Non functional user stories - -These user stories focus on the specifics of the technology stack, how each component serves specific technical needs and contributes to the overall functionality and efficiency of the system. - - -### Application user - -- As an application user, I expect quick and seamless deployment of updates without downtime using rolling updates or blue-green deployments in Kubernetes. - - -### Architects and DevOps - -- As a system architect, I want to design a containerized application architecture using Kubernetes to ensure scalability and resilience. - -- As a DevOps engineer, I need to implement a CI/CD pipeline using GitHub Actions that automates testing, building, and deployment of microservices. - -- As a cloud engineer, I need to use Terraform to script the provisioning of AWS resources like EKS clusters, ensuring infrastructure consistency. - -- As a DevOps team lead, I want to establish a performance benchmarking routine for the application to ensure it meets the desired performance criteria. - -- As an infrastructure manager, I need to integrate EBS for persistent storage in Kubernetes, ensuring data persistence across pod restarts. - - -### Ops - -- As a network administrator, I need to set up internal and external DNS routing in Kubernetes to efficiently manage service discovery and accessibility. - -- As a Kubernetes operator, I want to set up pod autoscaling based on CPU and memory usage metrics to handle load dynamically. - -- As a system administrator, I want to set up Prometheus for detailed monitoring of Kubernetes cluster metrics and application performance indicators. - -- As a logging manager, I need to configure log rotation and archiving strategies in the ELK stack to manage log data effectively. - -- As a performance analyst, I want to monitor the cost of cloud resources using AWS cost management tools and optimize where necessary. - - -### Devs & DBAs - -- As a backend developer, I want to utilize FastAPI to create asynchronous APIs that can handle high volumes of requests concurrently. - -- As a database administrator, I need to configure PostgreSQL replication and failover mechanisms to ensure data integrity and availability. - -- As a developer, I want to utilize Helm for managing Kubernetes resource complexities and simplifying the deployment process. - - -### Testing & QA - -- As a CI/CD engineer, I need to integrate static code analysis tools in the CI pipeline to enforce code quality standards. - -- As a QA engineer, I want to implement end-to-end testing using tools like Selenium or Cypress in the CI/CD pipeline for thorough application testing. - - -### Compliance & Security Specialists - -- As a security specialist, I want to integrate SSL/TLS in Traefik for secure communication and data protection. - -- As a security analyst, I need to set up network policies in Kubernetes to restrict traffic flow between pods, enhancing network security. - -- As a compliance officer, I need to ensure logging and monitoring setups comply with GDPR and other relevant data protection regulations. - - -### Technical Content Creators & UX/DX - -- As a documentation writer, I need to use Swagger or similar tools to auto-generate API documentation that stays up-to-date with code changes.