Version: v1.1

Resource Sizing Guidelines

Overview

AosCloud infrastructure scales with the number of connected Units in your fleet. This page provides sizing recommendations for each managed AWS resource at three fleet tiers, along with guidance on how fleet growth affects resource consumption.

All instance types and scaling parameters referenced here are configurable through the Helm values and infrastructure configuration. The default values reflect the small fleet tier suitable for development and initial production deployments.

Prerequisites

Required AWS Services — understand which services are provisioned

Fleet Size Tiers

AosCloud sizing is driven by the number of connected Units communicating with the cloud platform. The tiers below represent typical deployment scales:

Tier	Connected Units	Typical Use Case
Small	Up to 5,000	Development, staging, initial production rollout
Medium	5,000–50,000	Regional production deployment
Large	50,000–200,000+	Full-scale production fleet

Sizing Recommendations by Resource

Aurora PostgreSQL

Aurora PostgreSQL stores all structured application data (fleet management, user accounts, configuration). Resource consumption scales with the number of Units and the volume of provisioning and telemetry operations.

Parameter	Small	Medium	Large
Instance class	`db.t3.medium`	`db.r6g.large`	`db.r6g.xlarge`
Cluster instances	1	2	2–3
Storage type	Standard	`aurora-iopt1`	`aurora-iopt1`
Max connections	100	200	400

Scaling drivers:

Each connected Unit maintains session state and periodic heartbeat records
Provisioning operations (deploying Deployment Bundles) generate write bursts proportional to fleet size
Connection capacity is configurable per environment

ElastiCache Redis

Redis serves as the session cache, message queue staging area, and real-time connection state store. Each connected Unit maintains an active session entry.

Parameter	Small	Medium	Large
Node type	`cache.t4g.medium`	`cache.r7g.large`	`cache.r7g.xlarge`
Read replicas	1	1–2	2
Availability zones	1	2	2

Scaling drivers:

Each active Unit session consumes Redis memory (session DB 3)
Message queue staging (DB 2) grows with concurrent provisioning operations
The cache.t4g.medium default supports up to ~5,000 concurrent Unit sessions

DocumentDB (MongoDB-compatible)

DocumentDB stores alert data from the fleet. Write volume correlates directly with the number of Units reporting alerts and their reporting frequency.

Parameter	Small	Medium	Large
Instance class	`db.t3.medium`	`db.r5.large`	`db.r5.xlarge`
Cluster size	2	2	3
Availability zones	3	3	3

Scaling drivers:

Alert ingestion rate is proportional to active Unit count
Query performance degrades as the alert collection grows — larger instances provide more memory for working set
The cluster runs a minimum of 2 instances for high availability across availability zones

EKS Node Groups

EKS worker nodes run all AosCloud microservices, the Istio service mesh, monitoring agents, and the rabbitmq-cluster-operator. Node count scales with total pod demand.

Parameter	Small	Medium	Large
Instance type	`m5.xlarge`	`m5.2xlarge`	`m5.2xlarge`
Min nodes	3	4	6
Max nodes	10	20	30+
Cluster autoscaler	Enabled	Enabled	Enabled

Scaling drivers:

Each additional ~50,000 Units requires approximately one additional m5.xlarge node
The default scaling configuration allows 3–20 nodes; adjust max size for large fleets
Development environments may use t3.2xlarge for cost savings (burstable)
Production environments should use m5.xlarge or m5.2xlarge for consistent performance

EFS (Elastic File System)

EFS provides persistent storage for InfluxDB time-series data (metrics). EFS uses elastic throughput by default — no provisioned throughput configuration is required.

Parameter	Small	Medium	Large
Throughput mode	Elastic (default)	Elastic	Elastic
Encryption	Enabled (KMS)	Enabled (KMS)	Enabled (KMS)
Mount targets	Per EKS subnet	Per EKS subnet	Per EKS subnet

Scaling drivers:

Storage consumption grows with metrics retention period and number of reporting Units
EFS elastic throughput automatically scales with I/O demand — no manual sizing needed
For very high write workloads, consider lifecycle policies to tier infrequently accessed data

Fleet Size and Resource Consumption

The relationship between connected Units and resource consumption is approximately linear for most resources, with the following key drivers:

┌─────────────────────────────────────────────────────────┐
│                  Resource Scaling Model                   │
├──────────────────┬──────────────────────────────────────┤
│ Fleet Growth     │ Primary Resource Impact               │
├──────────────────┼──────────────────────────────────────┤
│ +Units           │ Redis memory, Aurora connections,     │
│                  │ EKS pod count, DocumentDB writes      │
├──────────────────┼──────────────────────────────────────┤
│ +Provisioning    │ Aurora write IOPS, S3 storage,       │
│  Operations      │ EKS CPU burst, message queue depth   │
├──────────────────┼──────────────────────────────────────┤
│ +Alert Volume    │ DocumentDB writes and storage,       │
│                  │ EKS memory for alert-handler pods    │
├──────────────────┼──────────────────────────────────────┤
│ +Monitoring      │ EFS storage (InfluxDB), Prometheus   │
│  Retention       │ ingestion rate                       │
└──────────────────┴──────────────────────────────────────┘

Autoscaling Behavior

AosCloud leverages the Kubernetes Cluster Autoscaler to handle dynamic load. When pods cannot be scheduled due to insufficient node capacity, the autoscaler provisions additional nodes up to the configured maximum. This covers:

Burst provisioning operations across the fleet
Temporary spikes in alert processing
Rolling updates that temporarily increase pod count

Managed services (Aurora, Redis, DocumentDB) do not autoscale automatically — you must provision appropriate instance sizes based on your expected steady-state fleet size.

Cost Implications by Tier

The table below provides approximate monthly cost ranges for each tier. Actual costs depend on region, reserved instance discounts, and usage patterns.

Resource	Small (~$)	Medium (~$)	Large (~$)
Aurora PostgreSQL	200–400	800–1,500	2,000–3,500
ElastiCache Redis	80–150	400–600	800–1,200
DocumentDB	150–250	400–700	800–1,500
EKS nodes (compute)	800–1,500	2,000–4,000	5,000–10,000
EFS	<10	10–50	50–200
Estimated total infrastructure	~2,500–5,000	~5,000–10,000	~10,000–20,000

note

These estimates cover compute and database infrastructure only. Additional costs for networking (VPC endpoints, data transfer), monitoring (Prometheus), CDN (CloudFront), and supporting services are documented in Costs.

For detailed per-service pricing and historical cost data, see the Costs documentation.

Sizing Decision Checklist

When selecting a tier, consider:

Current fleet size — how many Units are connected today?
Growth projection — what fleet size do you expect in 12 months?
Provisioning frequency — how often do you deploy Deployment Bundles (software updates)?
Alert density — how many alerts per Unit per day?
Availability requirements — do you need multi-AZ replicas for all databases?

Start with the tier matching your current fleet size, and configure the node group max size with headroom for the next tier. Database instance upgrades require a brief maintenance window but can be performed without data loss.

Required AWS Services — full catalog of provisioned services
Helm Values Reference — chart configuration including sizing parameters
Costs — detailed pricing and historical cost analysis

Overview​

Prerequisites​

Fleet Size Tiers​

Sizing Recommendations by Resource​

Aurora PostgreSQL​

ElastiCache Redis​

DocumentDB (MongoDB-compatible)​

EKS Node Groups​

EFS (Elastic File System)​

Fleet Size and Resource Consumption​

Autoscaling Behavior​

Cost Implications by Tier​

Sizing Decision Checklist​

Related Documentation​

Overview

Prerequisites

Fleet Size Tiers

Sizing Recommendations by Resource

Aurora PostgreSQL

ElastiCache Redis

DocumentDB (MongoDB-compatible)

EKS Node Groups

EFS (Elastic File System)

Fleet Size and Resource Consumption

Autoscaling Behavior

Cost Implications by Tier

Sizing Decision Checklist

Related Documentation