Skip to content

Create a Cluster

Provision new cluster

  1. Confirm you have the linode-cli command installed and configure with the IndeVets account by confirming you can list the current IndeVets clusters:

    linode-cli lke clusters-list
    
  2. Save a name for the new cluster to a shell variable for future commands:

    clusterName="indevets-1.23"
    

    Production clusters are generally named indevets-${kubernetesVersion}

  3. Create new LKE cluster via linode-cli:

    linode-cli lke cluster-create \
        --label "${clusterName}" \
        --region us-east \
        --k8s_version 1.25 \
        --node_pools.type g6-standard-2 --node_pools.count 3 \
        --node_pools.type g6-standard-4 --node_pools.count 2 \
        --node_pools.type g6-dedicated-4 --node_pools.count 2 \
        --control_plane.high_availability true \`
        --tags production
    
  4. List clusters:

    linode-cli lke clusters-list
    
  5. Save id of new cluster to a shell variable for future commands:

    clusterId=36149
    
  6. Wait for all nodes to have status=ready:

    watch -n 1 linode-cli lke pools-list $clusterId --text
    
  7. Read ids of new pools into a variable:

    {
        read productionPoolId;
        read stagingPoolId;
        read sandboxPoolId;
    } <<< $(linode-cli lke pools-list $clusterId --format 'id' --text --no-headers | awk '{print $1}' | uniq)
    
  8. Download and save KUBECONFIG for accessing new cluster with kubectl client:

    linode-cli lke kubeconfig-view $clusterId --text --no-headers | base64 -d > ~/.kube/"${clusterName}.yaml"
    export KUBECONFIG=~/.kube/"${clusterName}.yaml"
    
  9. Confirm that kubectl can list the new and ready nodes:

    kubectl get nodes
    
  10. Apply environment labels and taints to nodes based on their pool ids:

    kubectl label nodes -l lke.linode.com/pool-id=$productionPoolId environment=production
    kubectl label nodes -l lke.linode.com/pool-id=$stagingPoolId environment=staging
    kubectl label nodes -l lke.linode.com/pool-id=$sandboxPoolId environment=sandbox
    
    kubectl taint nodes -l environment=production environment=production:NoSchedule
    kubectl taint nodes -l environment=sandbox environment=sandbox:NoSchedule
    

    Tip

    These commands should be re-applied whenever any nodes are added or recycled. Each of the commands will gracefully fail when redundant so it’s safe to err on the side of re-running them frequently.

Load manifests into new cluster

  1. Change to a clean clone of the CATS repository:

    cd ~/Repositories/indevets-cats
    
  2. Fetch the latest releases/k8s-manifests projection from GitHub:

    git fetch --all
    git holo branch pull --all --force
    
  3. If you’re staging a new cluster ahead of putting it live, build the k8s-manifests-next projection which patches all ingress definitions to use the *.indevets-next.k8s.jarv.us hostname suffix that can be pointed at the new cluster for testing without interfering with live hostnames:

    git holo project k8s-manifests-next --commit-to=releases/k8s-manifests --fetch='*'
    
  4. Check out the k8s-manifests projection:

    git checkout releases/k8s-manifests
    
  5. First, apply CRDs and namespaces to the cluster:

    kubectl apply -Rf ./_/CustomResourceDefinition
    kubectl apply -Rf ./_/Namespace
    
  6. Second, download the k8s cluster sealed-secrets master keypair item’s attachment from Vaultwarden and apply to the new cluster to restore private keys for decrypting sealed secrets:

    kubectl apply -f ~/Downloads/cluster-sealed-secrets-master.key
    
  7. Third, initialize the sealed secrets service and load all sealed secrets so that decrypted secrets are in place ahead of other services initializing:

    kubectl apply -Rf ./sealed-secrets
    kubectl apply -f _/ClusterRole/secrets-unsealer.yaml
    kubectl apply -f _/ClusterRoleBinding/sealed-secrets.yaml
    find . \
        -type d \
        -name 'SealedSecret' \
        -print0 \
        | xargs -r0 -n 1 kubectl apply -Rf
    
  8. Finally, apply all remaining resources:

    find . \
        -maxdepth 1 \
        -type d \
        -not -name '.*' \
        -print0 \
        | sort -z \
        | xargs -r0 -n 1 kubectl apply -Rf
    

    Some resources will likely fail to apply the first and second time this command is run as resources come online that are dependencies for other resources. Keep applying the above command with a couple second delay between each until there are no errors.

  9. Monitor pods coming online across all namespaces

    kubectl get -A pods
    
  10. If this cluster is not going live immediately, suspend all cron jobs:

    kubectl get --all-namespaces cronjobs \
        --no-headers \
        -o=custom-columns='NAMESPACE:.metadata.namespace,NAME:.metadata.name' \
        | while read namespace name; do
            kubectl patch -n "${namespace}" cronjobs "${name}" \
                -p '{"spec" : {"suspend" : true }}'
        done
    

Verify new cluster

  1. List all CertificateRequest objects and verify it is in the ready state

  2. Lit all Secret objects and verify the sealed secrets service has populated them

  3. List all ingresses and verify every service loads

Prepare old cluster to go down

  1. Put the CATS production pod into maintenence mode on the old cluster by opening a shell on it and running:

    artisan down
    
  2. Suspend all cron jobs on the old cluster:

    kubectl get --all-namespaces cronjobs \
        --no-headers \
        -o=custom-columns='NAMESPACE:.metadata.namespace,NAME:.metadata.name' \
        | while read namespace name; do
            kubectl patch -n "${namespace}" cronjobs "${name}" \
                -p '{"spec" : {"suspend" : true }}'
        done
    
  3. Dump Metabase configuration database from the old cluster:

    kubectl -n metabase exec pod/database-0 -it -- pg_dumpall --clean -U metabase > /tmp/metabase.sql
    

Go live with new cluster

  1. Find the external IP for the new ingress-nginx LoadBalancer instance on the new cluster:

    kubectl -n ingress-nginx get services
    
  2. Update the k8s.indevets.com and schedule.indevets.com A records to the new IP

  3. If cron jobs have all been suspended, they can be re-activated after the new cluster is ready to go live:

    kubectl get --all-namespaces cronjobs \
        --no-headers \
        -o=custom-columns='NAMESPACE:.metadata.namespace,NAME:.metadata.name' \
    | while read namespace name; do
        kubectl patch -n "${namespace}" cronjobs "${name}" \
            -p '{"spec" : {"suspend" : false }}'
    done
    
  4. Reset the locally checked-out releases/k8s-manifests branch to the latest canonical version on GitHub, getting rid of the version with patched ingresses projected earlier:

    git fetch --all
    git reset --hard origin/releases/k8s-manifests
    
  5. Re-apply all ingress manifests:

    find . \
        -type d \
        -name 'Ingress' \
        -print0 \
        | xargs -r0 -n 1 kubectl apply -Rf
    
  6. Monitor Certificate objects progressing to the Ready state, checking the logs of the cert-manager pod if there seem to be issues:

    kubectl get -A Certificate
    

Update GitHub Actions to deploy to the new cluster

  1. Generate a kubeconfig file for the cats-api-deployer service account and paste a base64-encoded version of it into the KUBECONFIG_BASE64 in the api repository’s actions secrets

  2. Generate a kubeconfig file for the github-actions service account and paste a base64-encoded version of it into the KUBECONFIG_BASE64 in the core repository’s actions secrets

Restore configuration to new Metabase instance

  1. Scale the metabase deployment down to 0

  2. Scale the metabase database statefulset down to 0

  3. Delete the metabase persistent volume claim

  4. Scale the metabase database statefulset back up to 1

  5. Once its pod is back online, restore the dump captured above to it:

    cat /tmp/metabase.sql | kubectl -n metabase exec pod/database-0 -it -- psql -U metabase
    
  6. Scale the metabase deployment back up to 1