Skip to main content

Maintenance

Aos Edge for AWS it is designed to self-heal in case of application failure thanks to the kubernetes readiness and liveness probes configured as default, additionally Aos Edge it is configured to send alerts to maintainers in case there is a service failure, either by external dns provider or by pod/nodes failures. Emergency maintenance Edit

Common errors in Aos AWS Edge and how to fix them:

  • Containers are restarting constantly / App returns 500 errors.
    • Go to the AWS CloudWatch and find the eks-<env>-app log group, try to find error.
    • Alternatively you can download the kubectl and aws cli to login to the EKS cluster as the terraform role and tail the logs of each pod to find the error, more information on how to connect described in the deployment guide docs.
  • Http timeout, connectivity random errors
    • Check istio load balancer and load balancer permissions, sometimes the istio ingress pod needs to get restarted, if the EKS load balancer controller does not have enough privileges will not be able to manage the security groups as it should, therefore the redirection will fail.
    • Check istio ingress and istiod logs on istio system namespace, additionally you can also check connectivity with AWS Connectivity analyzer.
    • Check that dns host match the Virtual service and Istio gateway host property.
  • TLS errors.
    • Normally it is because the secret updated has not updated correctly the tls secrets stored in the secrets manager and the apps are not running the correct certificate.
    • Istio gateway has different cert than the backend app.

You can reach the aos support team for further assistance if the error persists or it is not listed above.