22 Apr 2020

GitOps + Terraform

The project

We have created a great HTML site.
It is powered via an Apache VirtualHost.
The site is hosted on AWS. We use an Auto Scaling Group, Elastic Load Balancer and EC2.
The AWS infrastructure used is defined via Terraform files.
We use Codebuild to automatically deploy any new version of our infrastructure or our website.
Each git push on our repository will start to a new deployment.
The deployment pipeline is also created via Terraform files.

Install and setup the project

Get the code from this github repository :

# download the code
$ git clone \
    --depth 1 \
    https://github.com/jeromedecoster/aws-gitops-terraform.git \
    /tmp/aws

# cd
$ cd /tmp/aws

Clone or fork the project.

Then adjust this git URL in the user-data.sh file :

# adjust the URL with your repository
git clone https://github.com/jeromedecoster/aws-gitops-terraform.git --depth 1

GitOps in a few points

The concept was created by WeaveWorks. Here is the founding post file and here is an update.
Infrastructure as Code : We define declaratively our infrastructure using YAML files. So we use tools like Terraform, Ansible, …
Manual operations via ssh are prohibited. Everything must be done declaratively and stored in a git repository. So we say that : Git is our only source of truth.
Since we do not make any manual changes, we can easily replicate our deployments. We are talking about immutable deployments or immutable infrastructure.
The Infrastructure as Code is managed in a git repository, so we have a clear history of evolutions with the different commits and the log messages. It also facilitates rollbacks.
There are 2 deployment strategies : push and pull.
The push strategy is the simplest. It’s a classic approach :
1. We have a repo for the application (app) and a repo for the environment (env).
2. A new commit to the app repo starts a build pipeline.
3. Once the tests are successful, we build a new container with the application and store it in a registry.
4. We notify the env repo that a new image is available.
5. This change will trigger the deployment pipeline to replace the old image with the new one.
The pull strategy is more advanced. It requires specific tools. I’m skipping for now…

Exploring the project

This project is simpler than what was described previously in the push strategy.

We use a monorepo here and we do not build a docker image. This approach diverges from the recommendations and should not be used for real projects. This approach is however sufficient to make a first demonstration.

Let’s look at some parts of the source code.

If we look the Makefile, we have some actions to build and use the project :

setup-create: # create the settings.sh files + the AWS S3 bucket
	bin/setup.sh create

ssh-key-create: # create SSH keys + import public key to AWS
	bin/ssh-key.sh create

deployment-pipeline-init: # create terraform.tfvars + terraform init the deployment pipeline
	bin/deployment-pipeline.sh init

deployment-pipeline-apply: # terraform plan + terraform apply the deployment pipeline
	bin/deployment-pipeline.sh apply

The codebuild.tf file is used to :

Create the CodeBuild project with aws_codebuild_project.
Listen the github repository changes with aws_codebuild_webhook.

resource aws_codebuild_project codebuild {
  name          = "${local.project_name}-codebuild"
  service_role  = aws_iam_role.codebuild_role.arn
  build_timeout = 120

  source {
    type                = "GITHUB"
    location            = "https://github.com/${var.github_owner}/${var.github_repository_name}.git"
    git_clone_depth     = 1
    report_build_status = true
  }

  artifacts {
    type = "NO_ARTIFACTS"
  }

  environment {
    compute_type = "BUILD_GENERAL1_SMALL"
    # https://github.com/aws/aws-codebuild-docker-images/blob/master/al2/x86_64/standard/3.0/Dockerfile
    image = "aws/codebuild/amazonlinux2-x86_64-standard:3.0"
    type  = "LINUX_CONTAINER"
  }

  logs_config {
    cloudwatch_logs {
      group_name  = "${local.project_name}-log-group"
      stream_name = local.project_name
    }
  }
}

resource aws_codebuild_webhook webhook {
  project_name = aws_codebuild_project.codebuild.name

  filter_group {
    filter {
      type    = "EVENT"
      pattern = "PUSH"
    }

    filter {
      type    = "HEAD_REF"
      pattern = "master"
    }
  }

  filter_group {
    filter {
      type    = "EVENT"
      pattern = "PULL_REQUEST_CREATED,PULL_REQUEST_UPDATED,PULL_REQUEST_REOPENED"
    }

    filter {
      type    = "BASE_REF"
      pattern = "master"
    }
  }
}

The buildspec.yaml file is used to :

Allow CodeBuild to execute some generation steps.

version: 0.2

phases:
  pre_build:
    commands:
      - echo ······ pre_build `date` ······
      - buildspec/pre-build.sh
  build:
    commands:
      - echo ······ build `date` ······
      - buildspec/build.sh
  post_build:
    commands:
      - echo ······ post_build `date` ······

The pre_build step is defined in an executable bash script. The pre-build.sh file is used to :

Simply install Terraform.

#!/bin/bash
echo ······ install terraform ······
cd /usr/bin
curl -s -qL -o terraform.zip https://releases.hashicorp.com/terraform/0.12.24/terraform_0.12.24_linux_amd64.zip
unzip -o terraform.zip

The build step is also defined in an executable bash script. The build.sh file is used to :

Deploy the infrastructure described in the infra directory with Terraform.
Destroy the previous EC2 instances and Auto Scaling Groups with aws cli before creating new ones.

#!/bin/bash
echo ······ source settings.sh ······
cd $CODEBUILD_SRC_DIR
source settings.sh
echo ······ AWS_REGION=$AWS_REGION ······
echo ······ S3_BUCKET=$S3_BUCKET ······
echo ······ SSH_KEY=$SSH_KEY ······

echo ······ terraform init ······
cd infra
terraform init \
    -input=false \
    -backend=true \
    -backend-config="region=$AWS_REGION" \
    -backend-config="bucket=$S3_BUCKET" \
    -backend-config="key=terraform" \
    -no-color

NAME=$(terraform output | grep ^project_name | sed 's|.*= ||')
echo ······ NAME=$NAME ······

if [[ -n "$NAME" ]]; then
    ID=$(aws autoscaling describe-auto-scaling-groups \
        --auto-scaling-group-names $NAME \
        --query "AutoScalingGroups[?AutoScalingGroupName == '$NAME'].Instances[*].[InstanceId]" \
        --output text)
    echo ······ ID=$ID ······

    if [[ -n "$ID" ]]; then
        echo ······ terminate EC2 instances ······
        echo "$ID" | while read line; do
            aws ec2 terminate-instances --instance-ids $line
        done
        echo ······ sleep 5 seconds ······
        sleep 5
    fi
    echo ······ delete auto scaling group ······
    aws autoscaling delete-auto-scaling-group \
        --auto-scaling-group-name $NAME \
        --force-delete
    echo ······ sleep 10 seconds ······
    sleep 10

    while [[ -n $(aws autoscaling describe-auto-scaling-groups \
        --auto-scaling-group-names $NAME \
        --query "AutoScalingGroups[?AutoScalingGroupName == '$NAME']" \
        --output text) ]]; do
        aws autoscaling describe-auto-scaling-groups \
            --auto-scaling-group-names $NAME \
            --query "AutoScalingGroups[?AutoScalingGroupName == '$NAME'].[Status]" \
            --output text
        echo ······ waiting auto-scaling-group destruction. sleep 20 seconds ······
        sleep 20
    done
fi

echo ······ terraform plan ······
terraform plan \
    -var "ssh_key_name=$SSH_KEY" \
    -out=terraform.plan \
    -no-color

echo ······ terraform apply ······
terraform apply \
    -auto-approve \
    terraform.plan \
    -no-color

It is important to note that we are using an S3 backend.

The state of our infrastructure will therefore be stored remotely.

The terraform.tf file declare :

terraform {
  # 'backend-config' options must be passed like :
  # terraform init -input=false -backend=true \
  #   [with] -backend-config="backend.json"
  #     [or] -backend-config="backend.tfvars"
  #     [or] -backend-config="<key>=<value>"
  backend "s3" {}
}

The backend values are defined externally within the build.sh file :

terraform init \
  -input=false \
  -backend=true \
  -backend-config="region=$AWS_REGION" \
  -backend-config="bucket=$S3_BUCKET" \
  -backend-config="key=terraform" \
  -no-color

The asg.tf file is used to :

Create an Load Balancer with aws_lb.
Create a EC2 Auto Scaling Group with aws_autoscaling_group.
Create a listener with aws_lb_listener.

resource aws_launch_configuration launch_configuration {
  name          = local.project_name
  image_id      = data.aws_ami.latest_amazon_linux.id
  instance_type = "t2.micro"

  security_groups = [aws_security_group.security_group.id]

  user_data = file("${path.module}/user-data.sh")

  lifecycle {
    create_before_destroy = true
  }
}

resource aws_autoscaling_group default {
  name = local.project_name

  max_size         = 3
  min_size         = 1
  desired_capacity = 1

  launch_configuration = aws_launch_configuration.launch_configuration.name

  target_group_arns = [aws_lb_target_group.target_group.arn]

  vpc_zone_identifier = data.aws_subnet_ids.subnet_ids.ids

  lifecycle {
    create_before_destroy = true
  }
}

resource aws_lb_listener http {
  load_balancer_arn = aws_lb.lb.arn
  port              = "80"
  protocol          = "HTTP"

  default_action {
    target_group_arn = aws_lb_target_group.target_group.arn
    type             = "forward"
  }
}

Each EC2 instance started will execute this user-data.sh script :

We install httpd and git.
We clone our git repository.
We move our website to the /var/www/vhosts/example.com directory.
Then we enable and start httpd with systemd.

#!/bin/bash
sudo yum --assumeyes update
sudo yum --assumeyes install httpd git
mkdir /var/www/vhosts
cd /tmp
git clone https://github.com/jeromedecoster/aws-gitops-terraform.git --depth 1
mv aws-gitops-terraform/www /var/www/vhosts/example.com

cat <<EOF > /etc/httpd/conf.d/vhost.conf
<VirtualHost *:80>
    # REQUIRED. Set this to the host/domain/subdomain that
    # you want this VirtualHost record to handle.
    ServerName example.com

    # REQUIRED. Set this to the directory you want to use for
    # this vhost site's files.
    DocumentRoot /var/www/vhosts/example.com

    # REQUIRED. Let's make sure that .htaccess files work on
    # this site. Don't forget to change the file path to
    # match your DocumentRoot setting above.
    <Directory /var/www/vhosts/example.com>
        AllowOverride All
    </Directory>
</VirtualHost>
EOF

sudo systemctl enable httpd
sudo systemctl start httpd

And of course we have our great website page :

<html lang="en">
<head>
    <title>Fox or Bear</title>
</head>
<body>
    <h1>Fox</h1>
    <img src="fox.jpg" alt="A Fox">
    <!-- <h1>Bear</h1>
    <img src="bear.jpg" alt="A Bear"> -->
</body>
</html>

Setup Github

We need to create a github token with repo and admin:repo_hook selected :

We receive our Github token :

Run the project

Let’s start :

# create the settings.sh files + the AWS S3 bucket
$ make setup-create
create settings.sh

# it creates a bucket with a random suffix
create gitops-terraform-vlw4 bucket
make_bucket: gitops-terraform-vlw4

The settings.sh file has been created. You can change the region if you want :

AWS_REGION=eu-west-3
SSH_KEY=gitops-terraform
S3_BUCKET=gitops-terraform-vlw4

We create an SSH key to connect to EC2 instances if we need to :

# create SSH keys + import public key to AWS
$ make ssh-key-create
create gitops-terraform.pem + gitops-terraform.pub keys — without passphrase
import gitops-terraform.pub key to AWS EC2
{
    "KeyFingerprint": "2b:a0:de:aa:bb:cc:dd:ee:ff:gg:hh:ii:jj:kk:ll:mm",
    "KeyName": "gitops-terraform"
}

The key is displayed in the Key pairs sub-menu of the EC2 interface :

Creation of the deployment pipeline

We will create the CodeBuild pipeline linked to our github project :

# create terraform.tfvars + terraform init the deployment pipeline
$ make deployment-pipeline-init
init terraform
Initializing the backend...
# ...

create terraform.tfvars file
warn you must define /tmp/aws/deployment-pipeline/terraform.tfvars
terraform.tfvars 

github_token           = ""
github_owner           = ""
github_repository_name = ""

We must define the 3 variables :

Use your previously generated github token.

github_token           = "2.....2"
github_owner           = "jeromedecoster"
github_repository_name = "aws-gitops-terraform"

We must now validate and push this file in our repository on github because these variables will be used by the deployment pipeline.

I choose to create the file directly from the web interface.

I create a file :

I name it settings.sh and copy and paste the content :

Then I commit it :

Our pipeline can now be deployed :

# terraform plan + terraform apply the deployment pipeline
$ make deployment-pipeline-apply

CodeBuild is created :

Creation of the infrastructure and publication of the site

We will now modify the file index.html. The commit will trigger the deployment pipeline.

We uncomment the Fox part :

We commit this modification :

CodeBuild activates automatically :

We can see in the logs that Terraform is installed :

After a short wait we have the final success message :

We can see in the EC2 interface that the Auto Scaling group is running :

We have 1 instance started :

Let’s go see the Load Balancer. We get the DNS name URL :

And by pasting this address into our browser, we see :

Change the infrastructure

We now want to update our architecture.

Our site is a huge success. We want to have 2 instances started to support the incessant traffic.

We just need to edit the file asg.tf :

We change the value desired_capacity from 1 to 2 :

We commit this change :

The deployment pipeline activates again. We see in the logs that the Auto Scaling Group is stopped :

This is the major weakness of this infrastructure : an auto scaling group takes about 5 minutes to stop ! Such an architecture is therefore not acceptable in production.

Here are some possible solutions :

We would however like to use simpler solutions.

After a long wait, here is our new Auto Scaling Group :

We now have 2 instances running :

Change the site

We now want to update our site. We will edit the file index.html :

We comment on the Fox part.
We uncomment the Bear part.

We commit this modification :

CodeBuild activates automatically :

And after 5 minutes, by reloading our browser :

We have 2 new instances started :

We finished. We can destroy our architecture. We start with the deployment pipeline :

# terraform destroy the deployment pipeline
$ make deployment-pipeline-destroy

To destroy the site infrastructure, we must first initialize terraform locally :

# terraform init the project infrastructure
$ make infra-init

Then ask for destruction :

# terraform destroy the project infrastructure
$ make infra-destroy

aws ci/cd codebuild github gitops terraform