Create a Cluster ¶
Provision new cluster ¶
-
Confirm you have the
linode-cli
command installed and configure with the IndeVets account by confirming you can list the current IndeVets clusters:linode-cli lke clusters-list
-
Save a name for the new cluster to a shell variable for future commands:
clusterName="indevets-1.23"
Production clusters are generally named
indevets-${kubernetesVersion}
-
Create new LKE cluster via
linode-cli
:linode-cli lke cluster-create \ --label "${clusterName}" \ --region us-east \ --k8s_version 1.25 \ --node_pools.type g6-standard-2 --node_pools.count 3 \ --node_pools.type g6-standard-4 --node_pools.count 2 \ --node_pools.type g6-dedicated-4 --node_pools.count 2 \ --control_plane.high_availability true \` --tags production
-
List clusters:
linode-cli lke clusters-list
-
Save
id
of new cluster to a shell variable for future commands:clusterId=36149
-
Wait for all nodes to have
status=ready
:watch -n 1 linode-cli lke pools-list $clusterId --text
-
Read ids of new pools into a variable:
{ read productionPoolId; read stagingPoolId; read sandboxPoolId; } <<< $(linode-cli lke pools-list $clusterId --format 'id' --text --no-headers | awk '{print $1}' | uniq)
-
Download and save
KUBECONFIG
for accessing new cluster withkubectl
client:linode-cli lke kubeconfig-view $clusterId --text --no-headers | base64 -d > ~/.kube/"${clusterName}.yaml" export KUBECONFIG=~/.kube/"${clusterName}.yaml"
-
Confirm that
kubectl
can list the new and ready nodes:kubectl get nodes
-
Apply environment labels and taints to nodes based on their pool ids:
kubectl label nodes -l lke.linode.com/pool-id=$productionPoolId environment=production kubectl label nodes -l lke.linode.com/pool-id=$stagingPoolId environment=staging kubectl label nodes -l lke.linode.com/pool-id=$sandboxPoolId environment=sandbox kubectl taint nodes -l environment=production environment=production:NoSchedule kubectl taint nodes -l environment=sandbox environment=sandbox:NoSchedule
Tip
These commands should be re-applied whenever any nodes are added or recycled. Each of the commands will gracefully fail when redundant so it’s safe to err on the side of re-running them frequently.
Load manifests into new cluster ¶
-
Change to a clean clone of the CATS repository:
cd ~/Repositories/indevets-cats
-
Fetch the latest
releases/k8s-manifests
projection from GitHub:git fetch --all git holo branch pull --all --force
-
If you’re staging a new cluster ahead of putting it live, build the
k8s-manifests-next
projection which patches all ingress definitions to use the*.indevets-next.k8s.jarv.us
hostname suffix that can be pointed at the new cluster for testing without interfering with live hostnames:git holo project k8s-manifests-next --commit-to=releases/k8s-manifests --fetch='*'
-
Check out the
k8s-manifests
projection:git checkout releases/k8s-manifests
-
First, apply CRDs and namespaces to the cluster:
kubectl apply -Rf ./_/CustomResourceDefinition kubectl apply -Rf ./_/Namespace
-
Second, download the
k8s cluster sealed-secrets master keypair
item’s attachment from Vaultwarden and apply to the new cluster to restore private keys for decrypting sealed secrets:kubectl apply -f ~/Downloads/cluster-sealed-secrets-master.key
-
Third, initialize the sealed secrets service and load all sealed secrets so that decrypted secrets are in place ahead of other services initializing:
kubectl apply -Rf ./sealed-secrets kubectl apply -f _/ClusterRole/secrets-unsealer.yaml kubectl apply -f _/ClusterRoleBinding/sealed-secrets.yaml find . \ -type d \ -name 'SealedSecret' \ -print0 \ | xargs -r0 -n 1 kubectl apply -Rf
-
Finally, apply all remaining resources:
find . \ -maxdepth 1 \ -type d \ -not -name '.*' \ -print0 \ | sort -z \ | xargs -r0 -n 1 kubectl apply -Rf
Some resources will likely fail to apply the first and second time this command is run as resources come online that are dependencies for other resources. Keep applying the above command with a couple second delay between each until there are no errors.
-
Monitor pods coming online across all namespaces
kubectl get -A pods
-
If this cluster is not going live immediately, suspend all cron jobs:
kubectl get --all-namespaces cronjobs \ --no-headers \ -o=custom-columns='NAMESPACE:.metadata.namespace,NAME:.metadata.name' \ | while read namespace name; do kubectl patch -n "${namespace}" cronjobs "${name}" \ -p '{"spec" : {"suspend" : true }}' done
Verify new cluster ¶
-
List all
CertificateRequest
objects and verify it is in the ready state -
Lit all
Secret
objects and verify the sealed secrets service has populated them -
List all ingresses and verify every service loads
Prepare old cluster to go down ¶
-
Put the CATS production pod into maintenence mode on the old cluster by opening a shell on it and running:
artisan down
-
Suspend all cron jobs on the old cluster:
kubectl get --all-namespaces cronjobs \ --no-headers \ -o=custom-columns='NAMESPACE:.metadata.namespace,NAME:.metadata.name' \ | while read namespace name; do kubectl patch -n "${namespace}" cronjobs "${name}" \ -p '{"spec" : {"suspend" : true }}' done
-
Dump Metabase configuration database from the old cluster:
kubectl -n metabase exec pod/database-0 -it -- pg_dumpall --clean -U metabase > /tmp/metabase.sql
Go live with new cluster ¶
-
Find the external IP for the new
ingress-nginx
LoadBalancer
instance on the new cluster:kubectl -n ingress-nginx get services
-
Update the
k8s.indevets.com
andschedule.indevets.com
A records to the new IP -
If cron jobs have all been suspended, they can be re-activated after the new cluster is ready to go live:
kubectl get --all-namespaces cronjobs \ --no-headers \ -o=custom-columns='NAMESPACE:.metadata.namespace,NAME:.metadata.name' \ | while read namespace name; do kubectl patch -n "${namespace}" cronjobs "${name}" \ -p '{"spec" : {"suspend" : false }}' done
-
Reset the locally checked-out
releases/k8s-manifests
branch to the latest canonical version on GitHub, getting rid of the version with patched ingresses projected earlier:git fetch --all git reset --hard origin/releases/k8s-manifests
-
Re-apply all ingress manifests:
find . \ -type d \ -name 'Ingress' \ -print0 \ | xargs -r0 -n 1 kubectl apply -Rf
-
Monitor Certificate objects progressing to the Ready state, checking the logs of the
cert-manager
pod if there seem to be issues:kubectl get -A Certificate
Update GitHub Actions to deploy to the new cluster ¶
-
Generate a kubeconfig file for the
cats-api-deployer
service account and paste a base64-encoded version of it into theKUBECONFIG_BASE64
in theapi
repository’s actions secrets -
Generate a kubeconfig file for the
github-actions
service account and paste a base64-encoded version of it into theKUBECONFIG_BASE64
in thecore
repository’s actions secrets
Restore configuration to new Metabase instance ¶
-
Scale the
metabase
deployment down to 0 -
Scale the
metabase
database statefulset down to 0 -
Delete the
metabase
persistent volume claim -
Scale the
metabase
database statefulset back up to 1 -
Once its pod is back online, restore the dump captured above to it:
cat /tmp/metabase.sql | kubectl -n metabase exec pod/database-0 -it -- psql -U metabase
-
Scale the
metabase
deployment back up to 1