Skip to main content

AWS Infrastructure resources stage

NOTE: Be aware to use Terraform v1.2.9

Pre-requirement WARN: Setup stage it is needed and backend migrated to remote bucket.

WARN: It is recommended to run first time as dummy installation to provision nat gateway ip address, by default ACR registry access it is closed from unknown networks, after first dummy run, ask aos cloud team to allow traffic to ACR from your provisioned NAT GW ip address.

Infrastructure terraform provisioning

Export common variables (can be exported from env file generated):

# Get the needed env vars from setup stage
source ../setup/.envrc_aos-<env>

You should get an output like the above:

# aws sts get-caller-identity
{
"UserId": "AROA5J2TSAI3RHL7PCC37:AWSCLI-Session",
"Account": "XXXXXX",
"Arn": "arn:aws:sts::XXXXX:assumed-role/aos-staging-terraform-admin/AWSCLI-Session"
}

You need to copy or reuse the terraform.tfvars file to define your own variables:

# Global variables
aws_resource_region = "eu-central-1"
project_name = "aos"
environment = "staging"
additional_resources_tags = { author = "example@epam.com" }

domain_name = "aws-stage.epmp-aos.projects.epam.com"

# For first time install it is recommended to set to true
dummy_aos_helm_install = true

Setup extra terraform env vars (Recommended to use env vars).

  • export TF_VAR_aos_cloud_recaptcha_key=<key> This is the recaptcha key that it is needed for the environment and frontend, can be generated from GCP cloud.
  • export TF_VAR_smtp_mail_server_password=<pwd> SMTP Email password which is needed for notifications and send keys/certs to new users.
  • export TF_VAR_smtp_mail_server_login=Auto_Reply@mydomain.com SMTP Email which is used for notifications and send keys/certs to new users (Compatible as for now only with outlook servers).
  • export TF_VAR_aos_registry_properties={password="***",server="***.azurecr.io",username="****"} Need to ask to cloud team this property to generate temporary password and allow firewall rule from your NAT IP address.

Initialize the backend and select terraform workspace:

# Build the needed binaries for lambda and helm charts
pushd ../../
make build ENVIRONMENT=demo
popd

# Without assumed role
terraform init -backend-config "bucket=${TF_VAR_remote_state_bucket}" \
-backend-config "dynamodb_table=${TF_VAR_remote_state_db_table}"

terraform plan -out /tmp/tf.out
terraform apply /tmp/tf.out

If flag dummy_aos_helm_install=true, set to false, ask aos cloud team to open ACR access to your newly provisioned public ip addres, set flag dummy_aos_helm_install=false, then plan and apply.

Register your ALIAS DNS Record

Go to the aws console and get the load balancer host, as it is controlled by AWS and High Availability it will resolve to 3 public ip addresses, therefore, you need to setup the ALIAS record to redirect to the Load balancer controller.

  • EC2 > Load Balancing > Load Balancer > aos-<env>-public-lb > DNS name

Check for known limitation for certs in Troubleshoot section

Return to main readme

Troubleshoot tips

  • (Known issue) SSL Configuration wrong in Istio GW
    • To easily overcome this issue you need to provide your own and with correct domain tls cert properties in the following secrets created (secret value not managed by terraform), you can grab values from aos-<env>-ingress/Fake* values and setup those if you don't have your certificates yet.
      • aos-<env>-ingress/IngressCertificate
      • aos-<env>-ingress/IngressKey
      • aos-<env>-ingress/IngressCACertificate
    • After setting these secrets, need to reinstall aos Helm re-installation.
  • (Known issue) NLB DNS cannot be used
    • Unfortunately cert-manager does not support the domain name assigned to the NLB, you need to provide new DNS host with A record to the NLB.
  • Lambda functions failing
    • To start troubleshooting why lambda function it is failing, go to Cloudwatch > LogGroups >
      • /aws/lambda/aos-<env>-ecr-sync: ACR AOS image download and synchronization
      • /aws/lambda/aos-<env>-ecr-helm-install: Helm installation on EKS
  • Istio timeout in NLB
    • Due to some security restrictions load balancer controller needs to be able to manage security groups which are attached to the load balancer controller, if the permissions are not provided or some boundary exists, the load balancer controller will not be able to manage security groups or approve them, and Istio Gateway never will be reachable.

OpenVPN and EKS Access

Base64 openvpn information will be saved on secrets manager to get access to the control plane.

export AWS_DEFAULT_REGION="eu-central-1"
VPN_SECRET=$(aws secretsmanager list-secrets --filter Key="name",Values="${TF_VAR_project_name}-${TF_VAR_environment}-infrastructure/ovpn" --query 'SecretList[*].ARN' --output text)
aws secretsmanager get-secret-value --secret-id $VPN_SECRET --query 'SecretString' --output text | base64 -d > aos-staging.ovpn
aws eks update-kubeconfig --region <region> --name aos-<env>-eks
kubectl get pods -n <env>

Helm re-installation

terraform plan -out /tmp/tf.out -replace module.helm_objects_secondary[\"prometheus\"].aws_lambda_invocation.helm_install
terraform plan -out /tmp/tf.out -replace module.helm_objects_secondary[\"grafana\"].aws_lambda_invocation.helm_install \
-target module.helm_objects_secondary[\"grafana\"].aws_lambda_invocation.helm_install

# Aos helm re-installation
terraform plan -out /tmp/tf.out \
-replace module.helm_aos.aws_lambda_invocation.helm_install \
-target module.helm_aos

If change it is required in aos helm charts

pushd ../../
make build
popd
terraform plan -out /tmp/tf.out -replace module.helm_objects_secondary[\"aos\"].aws_lambda_invocation.helm_install \
-target module.helm_objects_secondary[\"aos\"].aws_lambda_invocation.helm_install