20 Jan 2022

Disaster recovery with multi region architecture

The Goal

Create a pseudo website photo with upload and comment features
Uploaded photos are stored in an S3 bucket
The website use Cognito to login and register users
The website also use the Cloudfront CDN
2 DynamoDB tables are used to store current application state and photos comments
2 API Gateway are used to directly get and put data on the DynamoDB tables
S3 bucket, DynamoDB tables and API Gateways are replicated in another AWS region
If the application state fails (for whatever reason), the application is still available from the other AWS Region

Install and setup the project

Get the code from this github repository :

# download the code
$ git clone \
    --depth 1 \
    https://github.com/jeromedecoster/multi-region-application.git \
    /tmp/aws

# cd
$ cd /tmp/aws

Before setup the project you need to change the email address value :

# use your email address to receive the email sent by AWS Cognito
export COGNITO_EMAIL=CHANGE_EMAIL_HERE@gmail.com

About

This project is an adaptation of this AWS-solutions project

My goal was to use Terraform instead of Cloudformation templates

I also want to remove all configuration lambda functions

The original project is a nice demonstration but not easy to setup. You can check this fork to install it

There is also 2 Youtube videos from AWS Events related to this project :

Multi-Region deployment – Part 1: Needs, challenges, and approaches
Multi-Region deployment – Part 2: Architectural best practices

The last chapter of the part 2 video shows the application running

My version of this project is decoupled as if it were 2 separate git repositories :

backend repository
frontend repository

These 2 projects therefore use separate Terraform projects. With 2 backend configurations hosted in a specific S3 bucket

The frontend project uses data source to retrieve data associated with resources created by the backend project

As can be seen in the cognito.tf file excerpt below :

The data "aws_s3_bucket" resource provider
The data "api_gateway_rest_api" resource provider

data "aws_s3_bucket" "bucket_primary" {
  bucket   = "${var.project_name}-primary"
}

data "aws_api_gateway_rest_api" "config_primary" {
  name     = "${var.project_name}-config-primary"
}

Setup config

In the config folder let’s run the following command :

$ cd config

# setup config + terraform remote state S3 bucket
$ make setup

This command will :

Generate 2 files config/uuid and config/rand which will contain random data
Create an S3 bucket that will be used to host Terraform’s backend.tfstate and fronted.tfstate files

config-s3

Setup backend

In the backend folder let’s run the following command :

$ cd backend

# terraform setup
$ make setup

The setup command initializes Terraform with a remote state :

# https://www.terraform.io/cli/commands/init
terraform init \
    -input=false \
    -backend=true \
    -backend-config="region=$AWS_REGION_CONFIG" \
    -backend-config="bucket=$PROJECT_NAME" \
    -backend-config="key=backend.tfstate" \
    -reconfigure

And now this command :

# terraform plan + apply (deploy)
$ make apply

The apply command deploys all these terraform files

We have 2 DynamoDB table in each region :

backend-tables

A config DynamoDB table is deployed

We have a table with a single item that describes the current state of the application :

backend-dynamo-item

Our application is referenced according to its unique identifier appId which was defined by this HCL code :

resource "aws_dynamodb_table_item" "config_item" {
  table_name = aws_dynamodb_table.config.name
  hash_key   = aws_dynamodb_table.config.hash_key

  item = <<ITEM
{
  "appId": {"S": "${var.app_state_uuid}"},
  "state": {"S": "active"}
}
ITEM
}

The current state of the application is active. In this case, the application behaves normally, everything is managed from the primary region us-east-1

If we set this state to failover, the application will enter disaster recovery mode. Everything will be managed from the secondary region ap-northeast-1

We can imagine that the healthstate of our application is tested via CloudWatch. If a deficiency is detected, the CloudWatch alarm automatically changes the state value from active to failover.

We have 2 API Gateway in each region :

backend-apigateway

The config API Gateway is deployed :

backend-state-get

It is particularly interesting to look at the Integration Request Mapping Templates :

backend-state-get-int-request

It corresponds to this HCL code :

request_templates = {
  "application/json" = jsonencode(
    {
      ExpressionAttributeValues = {
        ":v1" = {
          S = "$input.params('appId')"
        }
      }
      KeyConditionExpression = "appId = :v1"
      TableName              = aws_dynamodb_table.config.name
    }
  )
}

And look at the Response Mapping Templates part :

backend-state-get-int-response

It corresponds to this HCL code :

response_templates = {
  "application/json" = <<-EOT
          #set($inputRoot = $input.path('$'))
          #if($inputRoot.Items.size() == 1)
              #set($item = $inputRoot.Items.get(0))
              {
                "state": "$item.state.S"
              }
          #{else}
              {}
              ## Return an empty object
          #end
      EOT
}

We now query the state via this command :

# get application current state from primary region (with apigateway)
$ make get-state-primary

Upload

The upload command allows you to upload a random image. This is the essential initial step for :

Test file replication within a bucket from one region to another
Be able to add a comment related to this image
Test the replication of this comment in the second region table

# upload an image to the primary region (with aws cli)
$ make upload

This command uses imagemagick (assuming it’s already installed on your machine) to convert this test image :

COLORS='aqua black blue chartreuse chocolate coral cyan fuchsia gray green lime magenta'
COLORS="$COLORS maroon navy olive orange orchid purple red silver teal white yellow"
COLOR=$(echo "$COLORS" | tr ' ' '\n' | sort --random-sort | head -n 1)
log COLOR $COLOR

TINT=$(echo '60 80 100 120 140' | tr ' ' '\n' | sort --random-sort | head -n 1)
log TINT $TINT

convert avatar.jpg -fill $COLOR -tint $TINT converted.jpg

The image is uploaded via aws cli :

UUID=$(uuidgen)

aws s3 cp converted.jpg s3://$UPLOAD_BUCKET/public/$UUID

The image is successfully uploaded in our bucket :

upload-s3-1-file

If we open the image in the browser :

upload-s3-2-open

I see the image modified by imagemagick :

upload-s3-3-converted

The API Gateway store

The API Gateway store allows querying and writing directly to the dynamoDB store table :

api-store-get

It is particularly interesting to look at the Integration Request Mapping Templates :

api-store-get-int-request

Here is the HCL code :

request_templates = {
  "application/json" = jsonencode(
    {
      ExpressionAttributeValues = {
        ":v1" = {
          S = "$input.params('photoId')"
        }
      }
      IndexName              = "photoId"
      KeyConditionExpression = "photoId = :v1"
      TableName              = aws_dynamodb_table.store.name
    }
  )
}

The Response Mapping Templates :

api-store-get-int-response

Here is the HCL code :

response_templates = {
  "application/json" = <<-EOT
          #set($inputRoot = $input.path('$')) {
            "comments": [
              #foreach($elem in $inputRoot.Items) {
                "commentId": "$elem.commentId.S",
                "user": "$elem.user.S",
                "message": "$elem.message.S"
              }#if($foreach.hasNext),#end
            #end
            ]
          }
      EOT
}

You can get a comment associated with a photo via this command :

$ make get-comment-primary
{
  "comments": []
}

The script get the id of an image taken randomly from the S3 bucket :

# random photo id
RAND_PHOTO_ID=$(aws s3 ls s3://$BUCKET/public/ | \
    sort --random-sort | \
    head -n 1 | \
    awk '{ print $NF }')

The script then queries the API Gateway via :

curl --silent $PHOTOS_API/comments/$RAND_PHOTO_ID | jq

The following command adds a random comment to an image :

# add a comment to a random photo at primary region (with apigateway)
$ make add-comment-primary

This command runs this script :

DATA='{"commentId":"'$UUID'", "message":"message '$RANDOM'", "photoId":"'$RAND_PHOTO_ID'", "user":"'$COGNITO_USERNAME'"}'

curl $PHOTOS_API/comments/$RAND_PHOTO_ID \
  --header "Content-Type: application/json" \
  --data "$DATA"

Invoking the get-comment-primary command again, we get the comment :

$ make get-comment-primary
{
  "comments": [
    {
      "commentId": "abca91a3-b331-4ae8-a9fa-da371a712a7c",
      "user": "jerome",
      "message": "message 23632"
    }
  ]
}

We can see that the item has been duplicated to the table located in Tokyo :

api-store-item-tokyo

Setup frontend

In the frontend folder let’s run the following command :

$ cd frontend

# terraform setup + npm install the website
$ make setup

Then this command :

# terraform plan + apply (deploy)
$ make apply

The command creates a Cloudfront distribution :

frontend-cloudfront

The /console origin :

frontend-cloudfront-origin

It also creates a Cognito instance :

frontend-cognito

The HCL code also creates a user via an AWS cli call :

# Add aws_cognito_user resource
resource "null_resource" "cognito_users" {
  provisioner "local-exec" {
    command = <<COMMAND
aws cognito-idp admin-create-user \
  --user-pool-id ${aws_cognito_user_pool.pool.id} \
  --username ${var.cognito_username} \
  --user-attributes Name=email,Value=${var.cognito_email} \
  --region ${var.primary_region}
COMMAND
  }
}

An email has been sent with our password :

frontend-email

The config file

The website is a react application that loads its initial data from a json file uiConfig.json :

This file contains all the URLs and variables needed for the application to work

We will generate this file with this command :

# generate the uiConfig.json file
$ make ui-config

This command get the variables :

IDENTITY_POOL_ID=$(echo "$FRONTEND_JSON" | jq --raw-output '.identity_pool_id.value')
USER_POOL_CLIENT_ID=$(echo "$FRONTEND_JSON" | jq --raw-output '.user_pool_client_id.value')
USER_POOL_ID=$(echo "$FRONTEND_JSON" | jq --raw-output '.user_pool_id.value')
UI_REGION=$AWS_REGION_PRIMARY
STATE_PRIMARY=$(echo "$BACKEND_JSON" | jq --raw-output '.apigateway_config_url_primary.value')/state/$UUID
BUCKET_PRIMARY=$(echo "$BACKEND_JSON" | jq --raw-output '.bucket_primary.value')
PHOTOS_PRIMARY=$(echo "$BACKEND_JSON" | jq --raw-output '.apigateway_store_url_primary.value')
REGION_PRIMARY=$AWS_REGION_PRIMARY
STATE_SECONDARY=$(echo "$BACKEND_JSON" | jq --raw-output '.apigateway_config_url_secondary.value')/state/$UUID
BUCKET_SECONDARY=$(echo "$BACKEND_JSON" | jq --raw-output '.bucket_secondary.value')
PHOTOS_SECONDARY=$(echo "$BACKEND_JSON" | jq --raw-output '.apigateway_store_url_secondary.value')
REGION_SECONDARY=$AWS_REGION_SECONDARY

And write the file :

JSON=$(cat <<EOF
{
    "identityPoolId": "$IDENTITY_POOL_ID",
    "userPoolClientId": "$USER_POOL_CLIENT_ID",
    "userPoolId": "$USER_POOL_ID",
    "uiRegion": "$UI_REGION",
    "primary": {
        "stateUrl": "$STATE_PRIMARY",
        "objectStoreBucketName": "$BUCKET_PRIMARY",
        "photosApi": "$PHOTOS_PRIMARY",
        "region": "$REGION_PRIMARY"
    },
    "secondary": {
        "stateUrl": "$STATE_SECONDARY",
        "objectStoreBucketName": "$BUCKET_SECONDARY",
        "photosApi": "$PHOTOS_SECONDARY",
        "region": "$REGION_SECONDARY"
    }
}
EOF
)

echo "$JSON" > "$PROJECT_DIR/frontend/website/public/uiConfig.json"

Run the local website

To test our website on localhost:3000 the compilation time can be quite long !

# browse the website in localhost
$ make website-local

We can now log in using our credentials received via email :

localhost-1-log

Changing the password :

localhost-2-password

Skipping this step :

localhost-3-skip

We can see the image that was previously uploaded by script :

localhost-4-upload

We add an image by uploading it manually :

localhost-5-upload-new

This image is uploaded :

localhost-6-uploaded

We add a comment :

localhost-7-comment

This image is commented :

localhost-8-commented

The comment is stored in the dynamoDB table :

localhost-9-dynamo

Testing the disaster recovery strategy

We will simulate an important problem and the need to switch to the fallback region via this command :

$ cd backend

# switch application current state (with aws cli)
$ make switch-state

This command get the current value of state and changes it to the opposite value :

CURRENT_STATE=$(aws dynamodb get-item \
    --table-name $TABLE_NAME \
    --key '{"appId":{"S":"'$UUID'"}}' \
    --region $AWS_REGION_PRIMARY \
    | jq --raw-output '.Item.state.S')

[[ $CURRENT_STATE == 'active' ]] && NEW_STATE=failover || NEW_STATE=active;

aws dynamodb update-item \
    --table-name $TABLE_NAME \
    --key '{"appId":{"S":"'$UUID'"}}' \
    --update-expression "SET #sn = :sv" \
    --expression-attribute-names '{"#sn":"state"}' \
    --expression-attribute-values '{":sv":{"S":"'$NEW_STATE'"}}' \
    --return-values ALL_NEW \
    --region $AWS_REGION_PRIMARY

By reloading our browser, we see that we have changed the reference region :

disaster-recovery-tokyo

Website deployment

We publish the site with this command :

# deploy the static website to the S3 bucket
$ make website-deploy

Compiling the project can be quite long !

We get the website url with this command :

# get the cloudfront URL
$ make cdn-url

The website navigation should be similar to the experience on localhost

The demonstration is over. We can delete our resources with this 3 commands :

$ cd ../frontend
# destroy all resources
$ make destroy

$ cd ../backend
# destroy all resources
$ make destroy

$ cd ../config
# destroy config + terraform remote state S3 bucket
$ make destroy

amplify api-gateway aws cloudfront cognito dynamodb terraform