Kubernetes Deployment Architecture
AosCloud runs on Amazon EKS (Elastic Kubernetes Service) with a single managed node group, Bottlerocket-based worker nodes, and a comprehensive set of Helm charts covering service mesh, observability, security scanning, certificate management, and the AosCloud application itself. This page documents the cluster configuration, all deployed workloads, and how Kubernetes integrates with AWS services via IRSA.
Prerequisites
Before reading this page, you should be familiar with:
- AWS Resource Architecture — the full resource topology including VPC, managed databases, and storage
- IAM Roles and Policies — IRSA service accounts and their permissions
EKS Cluster Configuration
Cluster Settings
| Parameter | Value |
|---|---|
| Kubernetes version | 1.35 |
| API endpoint access | Private (default), optionally public |
| Networking | VPC CNI (pod-level ENI attachment) |
| Services CIDR | Configurable |
| Logging | CloudWatch Log Group (/aws/eks/<name>/cluster) |
The cluster control plane uses the AmazonEKSClusterPolicy, AmazonEKSServicePolicy, and AmazonEKSVPCResourceController managed policies via a dedicated IAM role.
Node Group
AosCloud uses a single managed node group with the following configuration:
| Parameter | Value |
|---|---|
| AMI type | BOTTLEROCKET_x86_64 |
| Default instance type | t3a.2xlarge (dev) / m5.2xlarge (prod) |
| Scaling — minimum nodes | 3 |
| Scaling — maximum nodes | 10 |
| Root volume | 50 GB gp3 EBS |
| Metadata service | IMDSv2 required (http_tokens = "required") |
The instance type and scaling configuration are configurable per environment, allowing per-environment tuning.
Allowed instance types (validated): t3/t3a (small through 2xlarge), t4g (xlarge, 2xlarge), m5/m5a (large through 24xlarge), m6a.xlarge, c5 (large through 24xlarge), r5 (large through 24xlarge).
Launch Template
The launch template configures Bottlerocket OS-specific settings:
- Cluster name, endpoint, and CA certificate injected for kubelet registration
- Admin container enabled for debugging access
- Tag specifications applied to instances, volumes, and network interfaces
- Autoscaler ownership tags (
k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/<cluster-id>) for automatic node scaling
EKS Managed Addons
The following addons are installed as EKS managed addons (not Helm charts), ensuring they are always compatible with the cluster version:
| Addon | Version | Purpose | IRSA Role |
|---|---|---|---|
kube-proxy | v1.35.3-eksbuild.11 | Network proxy on each node | None (DaemonSet) |
aws-ebs-csi-driver | v1.60.0-eksbuild.1 | Persistent volumes via EBS | <cluster>-ebs-controller |
aws-efs-csi-driver | v3.2.0-eksbuild.1 | Shared filesystem via EFS | <cluster>-efs-controller |
coredns | v1.14.3-eksbuild.2 | Cluster DNS resolution | None |
vpc-cni | v1.22.1-eksbuild.2 | Pod networking (VPC-native IPs) | <cluster>-vpc-cni-controller |
Helm Chart Catalog
All Helm charts are deployed to EKS as part of the infrastructure provisioning process.
Infrastructure Charts (Primary)
| Chart | Release Name | Namespace | Version | Purpose |
|---|---|---|---|---|
| aws_autoscaler | cluster-autoscaler | kube-system | 9.57.0 | Scales node group based on pod scheduling demand |
| aws_csi_secrets_provider | secrets-store-csi-aws | kube-system | 0.0.4 (local tgz) | Mounts AWS Secrets Manager values as Kubernetes volumes |
| aws_for_fluent_bit | aws-for-fluent-bit | kube-system | 0.1.35 | Ships pod logs to CloudWatch Logs |
| aws_load_balancer_controller | aws-load-balancer-controller | kube-system | 1.14.1 | Provisions NLB/ALB for Kubernetes Services and Ingresses |
| cert_manager | cert-manager | cert-manager | (matches app version) | Automates TLS certificate provisioning (Let's Encrypt, ACM, custom CA) |
| istio_base | base | istio-system | (configurable) | Istio CRDs and base resources |
| istio_discovery | istio-discovery | istio-system | (configurable) | Istiod control plane (pilot) |
| istio_cni | istio-cni | istio-system | (configurable) | CNI plugin for ambient mode traffic interception |
| istio_ztunnel | ztunnel | istio-system | (configurable) | L4 transparent proxy for ambient mesh |
| metrics_server | metrics-server | kube-system | 3.13.0 | Exposes pod/node resource metrics for HPA and kubectl top |
Secondary Charts
Deployed after primary charts (depends on primary charts being ready):
| Chart | Release Name | Namespace | Purpose |
|---|---|---|---|
| istio_ingress (ig_public) | istio-ingressgateway | istio-system | Public-facing NLB ingress gateway for external traffic |
Observability Charts
| Chart | Release Name | Namespace | Version | Purpose |
|---|---|---|---|---|
| prometheus | <base>-prometheus | monitoring | 23.1.0 | Cluster metrics collection, alerting, kube-state-metrics, node-exporter |
Application & Operator Charts
| Chart | Release Name | Namespace | Purpose |
|---|---|---|---|
| rabbitmq-cluster-operator | (Bitnami operator) | configurable | Deploys and manages RabbitMQ clusters as Kubernetes-native CRDs |
| aos | <base_name> | <environment> | Main AosCloud application (all microservices, InfluxDB2 dependency) |
Note: RabbitMQ is deployed via the
rabbitmq-cluster-operatorHelm chart. It runs as pods within EKS, managed via the Bitnami RabbitMQ Cluster Operator CRDs.
Istio Service Mesh — Ambient Mode
AosCloud deploys Istio in ambient mode, a sidecar-less architecture that uses per-node ztunnel proxies instead of per-pod sidecars. The four Istio charts together form the complete mesh:
| Component | Chart | Function |
|---|---|---|
| Base | istio_base | CRDs (VirtualService, Gateway, PeerAuthentication, etc.) |
| Discovery (Istiod) | istio_discovery | Control plane — pushes configuration to ztunnel and waypoint proxies |
| CNI | istio_cni | Node-level network plugin that redirects traffic into the mesh without init containers |
| Ztunnel | istio_ztunnel | Per-node L4 proxy handling mTLS, authorization, and telemetry |
The CNI chart is configured with profile: "ambient", which activates the ambient-specific traffic interception rules.
Istio Ingress Gateway
The public Istio ingress gateway (istio-ingressgateway) is deployed as a secondary chart and provisions an internet-facing AWS Network Load Balancer (NLB) with the following annotations:
- External NLB with IP target type
- Cross-zone load balancing enabled
- Source IP stickiness
- Proxy protocol enabled
All external HTTPS traffic enters the cluster through this gateway.
Pod-to-AWS-Service Mapping
The following table shows how AosCloud pods connect to AWS managed services:
| Pod / Service Account | AWS Service | Access Method | Purpose |
|---|---|---|---|
| Backend, API, Auth (sa_app) | S3 (backend bucket) | IRSA | Deployable Item storage |
| Backend, API, Auth (sa_app) | KMS | IRSA | Encryption/decryption of stored objects |
| Task runner, Message Handler (sa_task) | S3, KMS, Secrets Manager | IRSA | Full access to secrets and storage for deployment tasks |
| Task runner (sa_task) | EC2 | IRSA | Unit management operations |
| Service Discovery (sa_sd) | S3 (backend bucket) | IRSA | Service registration data storage |
| Secrets updater (sa_secrets_manager) | Secrets Manager | IRSA | Synchronizes secrets to Kubernetes |
| Base services (sa_base) | S3, KMS, Secrets Manager, EC2, CloudWatch | IRSA | Infrastructure-wide access for operational services |
| Data services (sa_data_services) | S3 (backend bucket) | IRSA | Data pipeline access |
| Units Queues Management (sa_uqm) | S3, EC2 | IRSA | Queue management and Unit operations |
| Fluent Bit | CloudWatch Logs | Node role | Ships container logs to CloudWatch |
| Cluster Autoscaler | Auto Scaling Groups, EC2 | IRSA | Scales node group |
| Load Balancer Controller | EC2, ELB | IRSA | Provisions NLB/ALB resources |
| EBS CSI Driver | EBS | IRSA (EKS addon) | Persistent volume lifecycle |
| EFS CSI Driver | EFS | IRSA (EKS addon) | Shared filesystem mounts |
| VPC CNI | EC2 (ENI management) | IRSA (EKS addon) | Pod IP address allocation |
| Prometheus | (in-cluster scraping) | — | Metrics collection from all pods |
| All pods | Aurora PostgreSQL | Network (security group) | Primary database |
| All pods | ElastiCache Redis | Network (security group) | Caching and session storage |
| Alert Handler | DocumentDB | Network (security group) | Alert storage (MongoDB-compatible) |
| All pods | RabbitMQ (in-cluster) | Kubernetes Service DNS | Message queue (via rabbitmq-cluster-operator) |
EKS OIDC Provider and IRSA
IRSA (IAM Roles for Service Accounts) is the mechanism that grants Kubernetes pods fine-grained AWS permissions without embedding credentials.
How OIDC Integration Works
-
OIDC Provider Creation: When the EKS cluster is created, the cluster's OIDC issuer URL is registered as an IAM OIDC identity provider.
-
Trust Policy: Each IAM role defines a trust policy that allows
sts:AssumeRoleWithWebIdentityfrom the OIDC provider, scoped to specific Kubernetes service account names:{"Effect": "Allow","Principal": {"Federated": "<OIDC_PROVIDER_ARN>"},"Action": "sts:AssumeRoleWithWebIdentity","Condition": {"StringLike": {"<OIDC_ISSUER>:sub": "system:serviceaccount:<namespace>:<sa-name>"}}} -
Service Account Annotation: Kubernetes service accounts are annotated with
eks.amazonaws.com/role-arn: <ROLE_ARN>, causing the EKS pod identity webhook to inject temporary credentials. -
Credential Injection: The AWS SDK in each pod automatically discovers credentials from the projected service account token, requiring no application-level configuration.
IRSA Roles for Cluster Infrastructure
The EKS cluster creates IRSA roles for cluster infrastructure components:
| Role Name | Service Accounts | Policies |
|---|---|---|
<cluster>-autoscaling-role | kube-system:aws-node, kube-system:cluster-autoscaler | Custom autoscaler policy |
<cluster>-load-balancer-controller | kube-system:aws-load-balancer-controller | Custom LB controller policy |
<cluster>-ebs-controller | kube-system:ebs-csi-* | AmazonEBSCSIDriverPolicy |
<cluster>-efs-controller | kube-system:efs-csi* | AmazonEFSCSIDriverPolicy |
<cluster>-vpc-cni-controller | kube-system:aws-node* | AmazonEKS_CNI_Policy |
IRSA Roles for Application Workloads
Application-level IRSA roles:
| Role Name | Assigned Pods | Key Permissions |
|---|---|---|
<project>-<env>-app | API, Auth, Backend | S3 read/write (backend bucket), KMS encrypt/decrypt |
<project>-<env>-task | Task runner, Message Handler | S3, KMS, Secrets Manager (full), EC2 operations |
<project>-<env>-sd | Service Discovery | S3 read/write (backend bucket) |
<project>-<env>-secrets-manager | Secrets updater | Secrets Manager read/write |
<project>-<env>-base | Operational services | S3, KMS, Secrets Manager, EC2, S3 (infra bucket), SaaS, CloudWatch |
<project>-<env>-data-services | Data services | S3 read/write (backend bucket) |
<project>-<env>-qm | Units Queues Management | S3, EC2 operations |
Deployment Ordering
Helm charts are deployed in a strict dependency order:
- Primary charts: All infrastructure charts install in parallel after ECR sync and auth patch complete
- Secondary charts: Istio ingress gateway installs after all primary charts
- AOS chart: The main application installs last, after secondary charts complete
- Prometheus stack: Deployed independently from the primary/secondary chain
Additional Resources
EFS for InfluxDB2
A dedicated EFS filesystem is provisioned for InfluxDB2 persistent storage within the AOS chart:
- Encrypted with the infrastructure KMS key
- Access point at
/influxdb2(UID/GID 1000, permissions 775) - Mounted into EKS pod subnets
- Referenced in the AOS Helm values as an EFS volume handle
Related Documentation
- AWS Resource Architecture — full infrastructure topology
- IAM Roles and Policies — detailed permission documentation for all roles referenced above
- Helm Values Reference — detailed chart configuration parameters