Resource Sizing Guidelines
Overview
AosCloud infrastructure scales with the number of connected Units in your fleet. This page provides sizing recommendations for each managed AWS resource at three fleet tiers, along with guidance on how fleet growth affects resource consumption.
All instance types and scaling parameters referenced here are configurable through the Helm values and infrastructure configuration. The default values reflect the small fleet tier suitable for development and initial production deployments.
Prerequisites
- Required AWS Services — understand which services are provisioned
Fleet Size Tiers
AosCloud sizing is driven by the number of connected Units communicating with the cloud platform. The tiers below represent typical deployment scales:
| Tier | Connected Units | Typical Use Case |
|---|---|---|
| Small | Up to 5,000 | Development, staging, initial production rollout |
| Medium | 5,000–50,000 | Regional production deployment |
| Large | 50,000–200,000+ | Full-scale production fleet |
Sizing Recommendations by Resource
Aurora PostgreSQL
Aurora PostgreSQL stores all structured application data (fleet management, user accounts, configuration). Resource consumption scales with the number of Units and the volume of provisioning and telemetry operations.
| Parameter | Small | Medium | Large |
|---|---|---|---|
| Instance class | db.t3.medium | db.r6g.large | db.r6g.xlarge |
| Cluster instances | 1 | 2 | 2–3 |
| Storage type | Standard | aurora-iopt1 | aurora-iopt1 |
| Max connections | 100 | 200 | 400 |
Scaling drivers:
- Each connected Unit maintains session state and periodic heartbeat records
- Provisioning operations (deploying Deployment Bundles) generate write bursts proportional to fleet size
- Connection capacity is configurable per environment
ElastiCache Redis
Redis serves as the session cache, message queue staging area, and real-time connection state store. Each connected Unit maintains an active session entry.
| Parameter | Small | Medium | Large |
|---|---|---|---|
| Node type | cache.t4g.medium | cache.r7g.large | cache.r7g.xlarge |
| Read replicas | 1 | 1–2 | 2 |
| Availability zones | 1 | 2 | 2 |
Scaling drivers:
- Each active Unit session consumes Redis memory (session DB 3)
- Message queue staging (DB 2) grows with concurrent provisioning operations
- The
cache.t4g.mediumdefault supports up to ~5,000 concurrent Unit sessions
DocumentDB (MongoDB-compatible)
DocumentDB stores alert data from the fleet. Write volume correlates directly with the number of Units reporting alerts and their reporting frequency.
| Parameter | Small | Medium | Large |
|---|---|---|---|
| Instance class | db.t3.medium | db.r5.large | db.r5.xlarge |
| Cluster size | 2 | 2 | 3 |
| Availability zones | 3 | 3 | 3 |
Scaling drivers:
- Alert ingestion rate is proportional to active Unit count
- Query performance degrades as the alert collection grows — larger instances provide more memory for working set
- The cluster runs a minimum of 2 instances for high availability across availability zones
EKS Node Groups
EKS worker nodes run all AosCloud microservices, the Istio service mesh, monitoring agents, and the rabbitmq-cluster-operator. Node count scales with total pod demand.
| Parameter | Small | Medium | Large |
|---|---|---|---|
| Instance type | m5.xlarge | m5.2xlarge | m5.2xlarge |
| Min nodes | 3 | 4 | 6 |
| Max nodes | 10 | 20 | 30+ |
| Cluster autoscaler | Enabled | Enabled | Enabled |
Scaling drivers:
- Each additional ~50,000 Units requires approximately one additional
m5.xlargenode - The default scaling configuration allows 3–20 nodes; adjust max size for large fleets
- Development environments may use
t3.2xlargefor cost savings (burstable) - Production environments should use
m5.xlargeorm5.2xlargefor consistent performance
EFS (Elastic File System)
EFS provides persistent storage for InfluxDB time-series data (metrics). EFS uses elastic throughput by default — no provisioned throughput configuration is required.
| Parameter | Small | Medium | Large |
|---|---|---|---|
| Throughput mode | Elastic (default) | Elastic | Elastic |
| Encryption | Enabled (KMS) | Enabled (KMS) | Enabled (KMS) |
| Mount targets | Per EKS subnet | Per EKS subnet | Per EKS subnet |
Scaling drivers:
- Storage consumption grows with metrics retention period and number of reporting Units
- EFS elastic throughput automatically scales with I/O demand — no manual sizing needed
- For very high write workloads, consider lifecycle policies to tier infrequently accessed data
Fleet Size and Resource Consumption
The relationship between connected Units and resource consumption is approximately linear for most resources, with the following key drivers:
┌─────────────────────────────────────────────────────────┐
│ Resource Scaling Model │
├──────────────────┬──────────────────────────────────────┤
│ Fleet Growth │ Primary Resource Impact │
├──────────────────┼──────────────────────────────────────┤
│ +Units │ Redis memory, Aurora connections, │
│ │ EKS pod count, DocumentDB writes │
├──────────────────┼──────────────────────────────────────┤
│ +Provisioning │ Aurora write IOPS, S3 storage, │
│ Operations │ EKS CPU burst, message queue depth │
├──────────────────┼──────────────────────────────────────┤
│ +Alert Volume │ DocumentDB writes and storage, │
│ │ EKS memory for alert-handler pods │
├──────────────────┼──────────────────────────────────────┤
│ +Monitoring │ EFS storage (InfluxDB), Prometheus │
│ Retention │ ingestion rate │
└──────────────────┴──────────────────────────────────────┘
Autoscaling Behavior
AosCloud leverages the Kubernetes Cluster Autoscaler to handle dynamic load. When pods cannot be scheduled due to insufficient node capacity, the autoscaler provisions additional nodes up to the configured maximum. This covers:
- Burst provisioning operations across the fleet
- Temporary spikes in alert processing
- Rolling updates that temporarily increase pod count
Managed services (Aurora, Redis, DocumentDB) do not autoscale automatically — you must provision appropriate instance sizes based on your expected steady-state fleet size.
Cost Implications by Tier
The table below provides approximate monthly cost ranges for each tier. Actual costs depend on region, reserved instance discounts, and usage patterns.
| Resource | Small (~$) | Medium (~$) | Large (~$) |
|---|---|---|---|
| Aurora PostgreSQL | 200–400 | 800–1,500 | 2,000–3,500 |
| ElastiCache Redis | 80–150 | 400–600 | 800–1,200 |
| DocumentDB | 150–250 | 400–700 | 800–1,500 |
| EKS nodes (compute) | 800–1,500 | 2,000–4,000 | 5,000–10,000 |
| EFS | <10 | 10–50 | 50–200 |
| Estimated total infrastructure | ~2,500–5,000 | ~5,000–10,000 | ~10,000–20,000 |
These estimates cover compute and database infrastructure only. Additional costs for networking (VPC endpoints, data transfer), monitoring (Prometheus), CDN (CloudFront), and supporting services are documented in Costs.
For detailed per-service pricing and historical cost data, see the Costs documentation.
Sizing Decision Checklist
When selecting a tier, consider:
- Current fleet size — how many Units are connected today?
- Growth projection — what fleet size do you expect in 12 months?
- Provisioning frequency — how often do you deploy Deployment Bundles (software updates)?
- Alert density — how many alerts per Unit per day?
- Availability requirements — do you need multi-AZ replicas for all databases?
Start with the tier matching your current fleet size, and configure the node group max size with headroom for the next tier. Database instance upgrades require a brief maintenance window but can be performed without data loss.
Related Documentation
- Required AWS Services — full catalog of provisioned services
- Helm Values Reference — chart configuration including sizing parameters
- Costs — detailed pricing and historical cost analysis