Based on cloudfoundry/bosh-community-stemcell-ci-infra
Terraform modules and terragrunt code for Concourse deployment running on Kubernetes on GCP.
You may watch an introductory video to this project and how you can use it to set up Concourse on your infrastructure.
Users who are required to perform operations need to be added in the Role WG CI Manage via IAM in the Google Cloud console.
To consume the project with our terragrunt code and scripts please create a folder structure in your project with a copy of
-
terragrunt/scripts -
terragrunt/concourse-<gke_name> -
.tools-versions -
Use
git resourcefor terraform modules: see terragrunt/concourse-wg-ci-test/config.yaml or copyterraform-modulesfolder to your repository, see terragrunt/concourse-wg-ci/config.yaml
.tools-versions. Otherwise, there will be version mismatch errors when you run terragrunt. Alternatively, make use of the file flake.nix via nix develop or via direnv-load, see: direnv documentation
Also make sure that your git ssh setup is working: [https://docs.github.com/en/authentication/connecting-to-github-with-ssh]. The referencing git URLs use ssh, not https.
The project does not automatically create a DNS zone. Either create one manually, or reuse an existing zone.
You should at least look at the following variables:
project / region / zone / secondary_zonegcs_bucketdns_record / dns_zone / dns_domaingke_nameconcourse_github_mainTeam
Also make sure that the GKE version is not outdated:
gke_controlplane_version
The latest stable version can be found at [https://cloud.google.com/kubernetes-engine/docs/release-notes]
If you need to fine-tune the Concourse worker placement strategy, you can configure it with:
concourse_container_placement_strategy
gcloud auth login && gcloud auth application-default login
When using asdf instead of Nix, there can be problems with the "gke-gcloud-auth-plugin" if you use asdf as CLI management tool. If the "gcloud" CLI cannot find the plugin, you can copy the plugin into the shims folder as workaround:
cp ~/.asdf/installs/gcloud/415.0.0/bin/gke-gcloud-auth-plugin ~/.asdf/shims
To set the correct project out of the ones that you have access to (see gcloud projects list), run gcloud config set project 'app-runtime-interfaces-wg'.
This is necessary if you want to be able to authenticate with your GitHub profile.
-
Create Github OAuth App
Log on to github.com https://github.com/settings/developers -> Click "New OAuth App"
As "Homepage URL", enter the Concourse's base URL beginning with https://.
As "Authorization callback URL", enter the Concourse URL followed by
/sky/issuer/callback` also beginning with https://**. -
Create Google Secret
Open terragrunt/scripts/concourse/create-github-oauth-gcp.sh and enter your credentials for id** and secret.
Run
cd <folder with config.yaml> ../scripts/create-github-oauth-gcp.sh
For more information please refer to gcloud documentation.
The following command needs to be run from within your root directory (containing config.yaml file).
NOTE: it's not possible to plan for a fresh project due to the fact we can't test kubernetes resources against non-existing cluster
NOTE: terragrunt run-all commands do not show changes before applying
*NOTE: If you need to update the providers, run terragrunt run-all init -upgrade
terragrunt run-all apply- Login into the google cloud via
gcloud auth login && gcloud auth application-default login. - Configure you kubectl, see section How to obtain GKE credentials for your terminal.
The database.tf configuration enables deletion protection on multiple levels. The Terraform hashicorp provider includes a deletion protection flag:
resource "google_sql_database_instance" "concourse" {
# This option prevents Terraform from deleting an instance
deletion_protection = true
Note that if you really want to delete the database, Terraform will not allow this because deletion_protection = true is stored in the state. You first have to disable this flag, then run apply and then you can run a deletion operation.
In addition, we are setting a flag that enables the "Prevent instance deletion" option from the GCP console:
settings {
deletion_protection_enabled = "true"
}
Cloud SQL -> Instances -> Edit configuration -> Data Protection -> Retain backups after instance deletion
Please see end to end testing
Please see developer notes about vendir sync and developing modules with terragrunt.
Credhub credentials are expired if they are older than 30 days. As a result, following error messages are occurs
- Credhub pod:
Get "https://credhub.concourse.svc.cluster.local:9000/info": x509: certificate has expired or is not yet valid: current time 2023-02-27T10:14:45Z is after 2023-02-25T15:05:44Z - Concourse input resources
x509: certificate has expired or is not yet valid
Solution
Restart the credhub kubernetes deployment in the concourse namespace. It will destroy the old pod and create a new one.
This is workaround. The bug is describe issues#61
From the Cloud Console:
- Go to Google Cloud Console
- Go to managed pods and delete the pods
From local:
- Clone this project and either use nix or asdf to set up you environment.
- Login into the google cloud via
gcloud auth login && gcloud auth application-default login. - Configure you kubectl, see section How to obtain GKE credentials for your terminal.
- Execute
kubectl delete pods --namespace='concourse' --selector='app=credhub'.
If the above solution did not help, it might also be the CA that expires once a year.
Check its age
kubectl get secret -n concourse credhub-root-caIf it is older than a year, delete it:
kubectl delete secret -n concourse credhub-root-caAfterwards restart the secretgen-controller to trigger a recreation:
kubectl scale deploy -n secretgen-controller secretgen-controller --replicas=0
kubectl scale deploy -n secretgen-controller secretgen-controller --replicas=1
kubectl wait deployment -n secretgen-controller secretgen-controller --for=jsonpath='{.spec.replicas}'=1 --timeout=30sCheck that the CA has been recreated:
kubectl get secret -n concourse credhub-root-caRestart credhub and concourse:
kubectl delete pods --namespace='concourse' --selector='app=credhub'
kubectl delete pods --namespace='concourse' --selector='release=concourse'If you have manually set the recommended CloudSQL instance deletion protection please unset it.
Since we protect a backup of CredHub encryption key (stored in GCP Secret Manager) to fully destroy the project it needs to be removed from terraform state first.
cd <folder with config.yaml>/dr_create
terragrunt state rm google_secret_manager_secret_version.credhub_encryption_key
terragrunt state rm google_secret_manager_secret.credhub_encryption_key
WARNING: to complete deletion, remove the secret from GCP Secret manager -- please be aware doing so will permanently prevent DR recovery
gcloud secrets delete <gke_name>-credhub-encryption-key --project=<your project name>
To destroy:
terragrunt run-all destroy
Delete terraform state gcp bucket from GCP console or via gsutil
cd terragrunt/concourse-<gke_name>/concourse/app
terragrunt plan
terragrunt applyTerraform code is fetching GKE credentials automatically. In case you need to access the cluster with kubectl (or other kube clients) or to connect to Credhub instance (via terragrunt/scripts/concourse/start-credhub-cli.sh)
gcloud container clusters list
# Example output:
# NAME LOCATION MASTER_VERSION MASTER_IP MACHINE_TYPE NODE_VERSION NUM_NODES STATUS
# wg-ci europe-west3-a 1.23.8-gke.1900 34.159.31.85 e2-standard-4 1.23.8-gke.1900 3 RUNNING
gcloud container clusters get-credentials wg-ci --zone europe-west3-a
# Example output:
# Fetching cluster endpoint and auth data.
# kubeconfig entry generated for wg-ci.
kubectl config current-context
# Example output:
# gke_app-runtime-interfaces-wg_europe-west3-a_wg-ciPlease see DR scenario for a fully automated recovery procedure.
Please see Secrets Rotation
Please see Certificate Regeneration
Starting with version 1.33, the Google Kubernetes Engine (GKE) migrates clusters from Linux cgroupv1 to cgroupv2. The migration can be done manually in advance. The general migration procedure is explained on Migrate nodes to Linux cgroupv2. Note however that we do not use an "Autopilot" GKE cluster, but a "Standard" cluster. The migration for a "Standard" cluster is not fully explained in the Google documentation, so we are documenting it here.
- Log on to the GKE cluster as explained in How to obtain GKE credentials for your terminal.
- Create a system configuration file
cgroupv2.yaml:linuxConfig: cgroupMode: 'CGROUP_MODE_V2'
- Apply the configuration to the two node pools:
gcloud container node-pools update default-pool --system-config-from-file=./cgroupv2.yaml --region europe-west3-a --cluster wg-ci gcloud container node-pools update concourse-workers --system-config-from-file=./cgroupv2.yaml --region europe-west3-a --cluster wg-ci
Some regions are more expensive than others. For cost saving reasons, you can migrate the Concourse deployment to a different region. See Region Change Guide for detailed instructions.