Jérôme Decoster

Jérôme Decoster

3x AWS Certified - Architect, Developer, Cloud Practionner

03 Jun 2022

EKS + ArgoCD + Multi-Environments and Multi-clusters Gitops

The Goal
  • Create a voting app
  • Automate docker image build on ECR after each push using Github Actions workflows
  • Automatic management of feature branches in specific dynamically created repositories
  • Infrastructure management with Terraform from a dedicated git repository
  • Creation of 2 EKS clusters for a staging environment and another for production
  • Installation of ArgoCD in the staging cluster. Management of the production cluster from this instance
  • Creation of 2 ArgoCD applications. One per environment
  • Managing GitOps deployments using separate repositories
  • Quick synchronization of ArgoCD applications from a Github Action workflow

    architecture.svg

    The project

    The project is composed of 2 git repositories that you can fork :

    The voting app

    You can launch the voting application via this command :

    # run vote website using npm - dev mode (livereload + nodemon)
    $ make vote
    

    This command executes this script

    The site is available at http://localhost:4000/

    vote-static.png

    The site is in static version. It is not linked to a database

    The returned vote values are random

    function randInt(min, max) {
      return Math.floor(Math.random() * (max - min + 1)) + min
    }
    
    // ...
    
    app.get('/vote', async (req, res) => {
      let up = randInt(1, 9)
      let down = randInt(1, 9)
      return res.send({ up, down })
    })
    

    Continuous containerization of the voting application

    Containerization is managed by 3 Github Actions workflows

    vote-workflows.png

    The tag.yml workflow is triggered only when a tag using semver notation is pushed :

    name: tag
    
    on:  
      push:
        tags: [ "v*.*.*" ]
    
    # ...
    
    jobs:
      build:
        runs-on: ubuntu-latest
        if: github.ref_type == 'tag'
    

    In this specific case, the docker image is built and pushed to the ECR repository named multienv-vote :

    The pushed image will have as tag a value similar to v1.0.0

    env: 
      ECR_REPOSITORY: multienv-vote
    
    # ...
    
      run: |
        cd vote
        docker image build \
          --build-arg NODE_ENV=production \
          --build-arg VERSION=${{ github.ref_name }} \
          --tag ${{ env.ECR_REGISTRY }}/${{ env.ECR_REPOSITORY }}:${{ github.ref_name }} \
          --tag ${{ env.ECR_REGISTRY }}/${{ env.ECR_REPOSITORY }}:latest \
          .
        docker push --all-tags ${{ env.ECR_REGISTRY }}/${{ env.ECR_REPOSITORY }}
    

    The cd.yml workflow is triggered only when a commit is pushed to a feature branch :

    on: 
      push:
        branches-ignore:
          - master
      pull_request:
        branches-ignore:
          - master
    

    In this specific case, the docker image is built and pushed to a dynamically created ECR repository named multienv-vote-${ feature-branch-name } :

    The pushed image will be tagged with the first 8 characters of the commit sha. A value similar to 23e44fea

    env: 
      ECR_REPOSITORY: multienv-vote-${{ github.event.ref }}
      SHORT_SHA: $(echo ${{ github.sha }} | cut -c 1-8)
    
    # ...
    
      run: |
        cd vote
        docker image build \
          --build-arg NODE_ENV=${{ github.ref_name }} \
          --build-arg VERSION=${{ env.SHORT_SHA }} \
          --tag ${{ env.ECR_REGISTRY }}/${{ env.ECR_REPOSITORY }}:${{ env.SHORT_SHA }} \
          --tag ${{ env.ECR_REGISTRY }}/${{ env.ECR_REPOSITORY }}:latest \
          .
        docker push --all-tags ${{ env.ECR_REGISTRY }}/${{ env.ECR_REPOSITORY }}
    

    The delete.yml workflow is triggered only when a branch is deleted on github :

    on: delete
    

    When a feature branch is deleted, we also delete its associated ECR repository :

    env:
      ECR_REPOSITORY: gitops-multienv-${{ github.event.ref }}
    
    # ...
    
      run: |
        aws ecr delete-repository \
          --repository-name ${{ env.ECR_REPOSITORY }} \
          --query 'repository.repositoryUri' \
          --output text \
          2>/dev/null
    

    Setting up the workflow

    To be able to push an image into ECR from Github, you must create a user, associate a secret access key with it.

    You must then store the value pair KeyId / SecretAccessKey in the secret settings area of the Github project :

    vote-github-secrets.png

    To create this user we execute the following command :

    # create iam user + access key
    $ make user-create
    

    This command executes the user-create.sh script :

    The 2 variables are output in 2 files at the root of the project

    • .env_AWS_ACCESS_KEY_ID
    • .env_AWS_SECRET_ACCESS_KEY

    Add these values to the secrets of your Github repository

    Workflow testing

    I’m going to create a feature branch named feature-one and make a CSS modification in it :

    # make sure we ar on the master branch
    $ git checkout master
    
    # create + switch to the feature-one branch
    $ git checkout -b feature-one
    

    To modify our site we uncomment the following lines on the main.css file

    /* uncomment below to create feature-one */
    /* 
    color: black;
    background-color: #eceff1;
    border: #37474f 1px dashed;
    padding: .7em; 
    */
    

    Changes can be seen at http://localhost:4000/ after running the following command :

    # run vote website using npm - dev mode (livereload + nodemon)
    $ make vote
    

    vote-feature-one.png

    We push these modifications on Github :

    $ git add vote
    $ git commit -m 'css update'
    $ git push -u origin feature-one
    

    The workflow gets started :

    vote-workflow-cd.png

    The workflow is quickly successful :

    vote-workflow-cd-success.png

    The repository was created automatically :

    vote-ecr-feature-one.png

    The image was successfully pushed :

    vote-ecr-feature-one-image.png

    To locally test the image we just pushed to the ECR repository we run this command :

    # run latest image pushed to ecr
    $ make ecr-run
    

    This command runs the ecr-run.sh script

    It launches the website via a docker run which uses the last image pushed into a feature branch :

    # get the first feature repository (so, everything except master branch)
    REPOSITORY_NAME=$(aws ecr describe-repositories \
      --query "repositories[?starts_with(repositoryName, '$PROJECT_NAME-')].[repositoryName]" \
      --output text | \
      head -n 1)
    
    REPOSITORY_URI=$ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com/$REPOSITORY_NAME
    
    docker run \
      --rm \
      -e WEBSITE_PORT=4000 \
      -p 4000:4000 \
      --name vote \
      $REPOSITORY_URI:latest
    

    By opening http://localhost:4000/ we have our website :

    vote-ecr-run.png

    Note that the footer no longer indicates the same data

    The footer is created using this template :

    <footer>
    {% if node_env == 'development' %}<u>development</u> {% endif %}version: {{ version }}
    </footer>
    

    These instructions :

    FROM node:18.2-slim
    ARG NODE_ENV
    ENV NODE_ENV $NODE_ENV
    ARG VERSION
    ENV VERSION $VERSION
    # ...
    

    And these arguments :

    run: |
      docker image build \
        --build-arg NODE_ENV=${{ github.ref_name }} \
        --build-arg VERSION=${{ env.SHORT_SHA }} \
        # ...
    

    We are happy with these changes. We will merge this feature into the master branch, push them and create a new tag :

    # switch to the master branch 
    $ git checkout master
    # merge feature
    $ git merge feature-one
    $ git push
    # create a tag
    $ git tag v1.0.0
    # push tag on github
    $ git push --tags
    

    The tag is added to the github repository :

    vote-tag.png

    The tag workflow starts :

    vote-workflows-tag.png

    The ECR repository is created :

    vote-ecr-tag.png

    The image is pushed :

    vote-ecr-tag-image.png

    The tag name is v1.0.0

    The infrastructure

    The infrastructure is managed via a specific repository

    Terraform is used to create the 2 EKS environments :

    infra-terraform.png

    2 environments are available, via 2 separate folders :

    Managing environments through directories instead of terraform workspaces is a good solution

    We use a backend on S3 :

    terraform {
      backend "s3" {}
    }
    

    Which is defined via terraform init :

    terraform -chdir="$CHDIR" init \
      -input=false \
      -backend=true \
      -backend-config="bucket=$S3_BUCKET" \
      -backend-config="key=$CONFIG_KEY" \
      -backend-config="region=$AWS_REGION" \
      -reconfigure
    

    To create our S3 bucket we run the command :

    # setup project + create S3 bucket
    $ make init
    

    This command creates the bucket with versioning enabled :

    aws s3 mb s3://$S3_BUCKET --region $AWS_REGION
    
    # /!\ important for terraform states : enable bucket versioning
    aws s3api put-bucket-versioning \
      --bucket $S3_BUCKET \
      --versioning-configuration Status=Enabled \
      --region $AWS_REGION
    

    The bucket is created :

    infra-bucket.png

    The script also creates 3 variables in 3 files at the root of the projects :

    • .env_AWS_ID
    • .env_S3_BUCKET
    • .env_UUID

    Deployment of the staging environment

    To deploy the environment we just need to run these 2 commands :

    # terraform init the staging env
    $ make staging-init
    
    # terraform plan + apply the staging env
    $ make staging-apply
    

    The cluster is deployed :

    infra-cluster-staging.png

    The tfstate is in the S3 bucket :

    infra-s3-staging.png

    2 important steps are still missing at this precise moment :

    • kubectl has not been configured to interact with the cluster
    • We don’t have the possibility to see the contents of the cluster. What this unpleasant warning confirms :

    infra-eks-warning.png

    To correct these 2 points we execute the following command :

    # setup kubectl config + aws-auth configmap for staging env
    $ make eks-staging-config
    

    We configure kubectl using the update-kubeconfig command from this script :

    aws eks update-kubeconfig \
      --name $NAME \
      --region $REGION
    

    Making resources visible is more complex. You must edit the aws-auth ConfigMap

    Editing it manually is easier, we just need to do :

    $ EDITOR=nano kubectl edit configmap aws-auth -n kube-system
    

    And add the mapUsers section defined like this :

    mapUsers: |
      - userarn: arn:aws:iam::[account_id]:root
        groups:
        - system:masters
    

    Automating this modification by a script is more complex. The script below uses jq and yq :

    kubectl get configmap aws-auth \
            --namespace kube-system \
            --output yaml > "$PROJECT_DIR/aws-auth-configmap.yaml"
    
    # convert to json
    yq aws-auth-configmap.yaml -o json > "$PROJECT_DIR/aws-auth-configmap.json"
    
    # add mapUsers (use jq instead yq to add mapUsers because it's MUCH simpler and MORE clean)
    jq '.data += {"mapUsers": "- userarn: arn:aws:iam::'$AWS_ID':root\n  groups:\n  - system:masters\n"}' aws-auth-configmap.json \
    | yq --prettyPrint > "$PROJECT_DIR/aws-auth-configmap.yaml"
    
    # apply udated aws-auth-configmap.yaml
    kubectl apply --filename aws-auth-configmap.yaml --namespace kube-system
    

    Installing ArgoCD

    To install ArgoCD we run this script :

    # install argocd in staging env
    $ make argo-install
    

    Installing ArgoCD is done in a few command lines :

    kubectl create namespace argocd
    
    kubectl apply \
      --namespace argocd \
      --filename https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml
    
    kubectl wait deploy argocd-server \
      --timeout=180s \
      --namespace argocd \
      --for=condition=Available=True
    
    kubectl patch svc argocd-server -n argocd -p '{"spec": {"type": "LoadBalancer"}}'
    

    You can now connect to the ArgoCD web application with the data displayed in the Terminal :

    Accept self-signed certificate :

    infra-argo-self-signed.png

    Then login :

    infra-argo-login.png

    Connect ArgoCD to Github repository

    ArgoCD must have access to our github repository gitops-multienv-infra

    There are chances are that this repository is private

    It is therefore necessary to link the repository and ArgoCD using an SSH key

    These operations are performed using the following command

    The following command uses gh to automate adding the ssh key to Github. However, you can easily do it manually.

    Important : the command uses the GIT_REPO variable which must be adjusted to your repository

    # add git repo connection + create ssh key + add ssh key to github
    $ make argo-add-repo
    

    The script executes a few lines :

    ssh-keygen -t ed25519 -N "" -f ~/.ssh/$PROJECT_NAME.pem
    
    # ...
    
    gh ssh-key add ~/.ssh/$PROJECT_NAME.pub --title $PROJECT_NAME
    
    argocd repo add $GIT_REPO \
      --insecure-ignore-host-key \
      --ssh-private-key-path ~/.ssh/$PROJECT_NAME.pem
    

    The repository is added :

    infra-argo-git-repo.png

    The public key is added to Github :

    infra-github-key.png

    The ArgoCD staging application

    The Kubernetes application is managed through the argocd folder

    Deploying multiple environments using GitOps principles is not an easy and obvious practice

    We use the method explained by Kostis Kapelonis in this excellent article

    The application is defined using k8s manifests and a kustomization manifest within a base folder :

    infra-base.png

    The customization of the environments is then done via 2 specific folders within an overlays folder :

    infra-overlays.png

    The ArgoCD application is generated from a template :

    apiVersion: argoproj.io/v1alpha1
    kind: Application
    metadata:
      name: app-${NAMESPACE}
      namespace: argocd
    spec:
      project: default
    
      source:
        repoURL: ${GIT_REPO}
        targetRevision: HEAD
        path: argocd/overlays/${NAMESPACE}
      destination: 
        server: ${SERVER}
        namespace: gitops-multienv
    

    The creation of the application is done in a few lines with envsubst :

    export NAMESPACE=staging
    export SERVER=https://kubernetes.default.svc
    kubectl config use-context $PROJECT_NAME-staging
    cat argocd/argocd-app.yaml | envsubst | kubectl apply -f -
    

    The application deploys the image set via this patch :

    Important: you must configure and push the correct docker image URI before installing the ArgoCD staging application

    patches:
    - target:
        kind: Deployment
        name: website
      patch: |-
        - op: replace
          path: /spec/template/spec/containers/0/image
          value: xxxxxx.dkr.ecr.eu-west-3.amazonaws.com/multienv-vote:v1.0.0
    

    We execute the following command :

    # create argocd staging app
    $ make argo-staging-app
    

    infra-argo-staging-app.png

    The application is successfully deployed :

    infra-argo-staging-tree.png

    We get the load balancer URL :

    infra-argo-staging-load-balancer.png

    Setup the production EKS cluster

    The deployment of the cluster is done via the commands :

    # terraform init the production env
    $ make production-init
    
    # terraform plan + apply the production env
    $ make production-apply
    

    After a long wait, the cluster is deployed :

    infra-cluster-production.png

    We configure the access to the cluster with this command :

    # setup kubectl config + aws-auth configmap for production env
    $ make eks-production-config
    

    Connect ArgoCD to the new EKS cluster

    We want a single ArgoCD application to manage our multiple clusters

    We are therefore not going to install ArgoCD within the production cluster but connect this cluster to our ArgoCD application present within the staging cluster

    We can see the list of clusters managed by argocd via this command :

    $ argocd cluster list
    SERVER                          NAME        VERSION  STATUS      MESSAGE  PROJECT
    https://kubernetes.default.svc  in-cluster  1.21+    Successful
    

    This corresponds to this visual interface :

    infra-argo-cluster-list-1.png

    To add the cluster, it must be previously defined in kubectl. This has been done by the previous steps

    You can check the list of clusters manageable by kubectl via this command :

    $ kubectl config get-contexts
    CURRENT   NAME                           CLUSTER                                                             AUTHINFO                                                               
    *         multienv-infra-production      arn:aws:eks:eu-west-3:xxxxx:cluster/multienv-infra-production       arn:aws:eks:eu-west-3:xxxxx:cluster/multienv-infra-production       
              multienv-infra-staging         arn:aws:eks:eu-west-3:xxxxx:cluster/multienv-infra-staging          arn:aws:eks:eu-west-3:xxxxx:cluster/multienv-infra-staging
    

    We associate a cluster with argocd with this simple command :

    $ argocd cluster add --yes <cluster-context-name>
    

    We execute this command :

    # argocd add production cluster
    $ make argo-add-cluster
    

    The cluster is added :

    $ argocd cluster list
    SERVER                                         NAME                       VERSION  STATUS      MESSAGE
    https://xxxxx.gr7.eu-west-3.eks.amazonaws.com  multienv-infra-production           Unknown     Cluster has no applications and is not being monitored.  
    https://kubernetes.default.svc                 in-cluster                 1.21+    Successful
    

    It can also be seen via the interface :

    infra-argo-cluster-list-2.png

    The ArgoCD production application

    We deploy the production application using the following command :

    # create argocd production app
    $ make argo-production-app
    

    The application deploys the image set via this patch :

    Important : you must setup and push the correct docker image URI before install the ArgoCD production application

    patches:
    - target:
        kind: Deployment
        name: website
      patch: |-
        - op: replace
          path: /spec/template/spec/containers/0/image
          value: xxxxxx.dkr.ecr.eu-west-3.amazonaws.com/multienv-vote:v1.0.0
    

    infra-argo-production-app.png

    Faster synchronization of ArgoCD applications

    ArgoCD checkd for git repository update every 3 minutes

    You can use the Github webhook to sync it after each push

    But if it doesn’t work, we can also use the sync by the cli app

    We need to generate an ArgoCD token

    At the moment we don’t have any :

    $ argocd account get --account admin
    Name:               admin
    Enabled:            true
    Capabilities:       login
    
    Tokens:
    NONE
    

    We get an error if we try to generate one :

    $ argocd account generate-token
    FATA[0000] rpc error: code = Unknown desc = account 'admin' does not have apiKey capability
    

    We need to edit the argocd-cm ConfigMap :

    $ EDITOR=nano kubectl edit configmap argocd-cm -n argocd --context multienv-infra-staging
    

    Before :

    apiVersion: v1
    kind: ConfigMap
    metadata:
      annotations:
        # ...
    

    After :

    apiVersion: v1
    kind: ConfigMap
    data:
      accounts.admin: apiKey
    metadata:
      annotations:
        # ...
    

    Then generate the token again :

    $ argocd account generate-token
    eyJhb...
    

    Let’s check the data returned for the admin account :

    $ argocd account get --account admin
    Name:               admin
    Enabled:            true
    Capabilities:       login, apiKey
    
    Tokens:
    ID                                    ISSUED AT                  EXPIRING AT
    xxxxxxxx-3f07-44b7-a9d4-40972f273484  2022-02-20T00:00:00+02:00  never
    

    We get the address of the ArgoCD server with this command :

    # argocd cli login + show access data
    $ make argo-login 
    SERVER  xxxxx-xxxxx.eu-west-3.elb.amazonaws.com
    

    We add the token and the ArgoCD server URL as Github secrets :

    infra-github-secrets.png

    The cd.yml workflow is used to synchronize our 2 applications :

    - name: Install argocd and run `app sync`
      run: |
        curl -sSL -o /usr/local/bin/argocd https://github.com/argoproj/argo-cd/releases/latest/download/argocd-linux-amd64
        chmod +x /usr/local/bin/argocd
    
        argocd app get app-staging \
          --auth-token ${{ secrets.ARGOCD_TOKEN }} \
          --server ${{ secrets.ARGOCD_SERVER }} \
          --insecure \
          --grpc-web \
          --refresh >/dev/null
        argocd app wait app-staging \
          --auth-token ${{ secrets.ARGOCD_TOKEN }} \
          --server ${{ secrets.ARGOCD_SERVER }} \
          --insecure \
          --grpc-web \
          --sync \
          --health
    
        # ...
    

    Let’s deploy a new feature in staging

    We create a new branch in our multienv-vote project :

    $ git checkout -b feature-two
    

    We modify the css again :

    footer {
      /* ... */
    
      color: grey;
      background-color: #eceff1;
      border: grey 1px solid;
      padding: .7em;
    }
    

    Our local server gives this at address http://localhost:4000/ :

    feature-two-local.png

    We push this branch on github :

    $ git add .
    $ git commit -m 'css again'
    $ git push --set-upstream origin feature-two
    

    The repository is added :

    feature-two-ecr.png

    We get the URI of the docker image :

    feature-two-ecr-image.png

    We modify our kustomization.yaml file of the staging environment :

    patches:
    - target:
        kind: Deployment
        name: website
      patch: |-
        - op: replace
          path: /spec/template/spec/containers/0/image
          value: xxxxx.dkr.ecr.eu-west-3.amazonaws.com/multienv-vote-feature-two:cf03d643
    

    We push this modification :

    $ git add .
    $ git commit -m feature-two
    $ git push
    

    The workflow is activated :

    feature-two-github-actions.png

    The update of the 2 applications took 11 seconds :

    feature-two-github-actions-steps.png

    By reloading my browser at the staging URL :

    feature-two-staging.png

    This demonstration is over, it is important to remove all resources :

    # delete argocd apps then argocd
    $ make argo-destroy
    
    # terraform destroy the production env
    $ make production-destroy
    
    # terraform destroy the staging env
    $ make staging-destroy