Skip to main content
Version: v1.1

Resource Sizing Guidelines

Overview

AosCloud infrastructure scales with the number of connected Units in your fleet. This page provides sizing recommendations for each managed AWS resource at three fleet tiers, along with guidance on how fleet growth affects resource consumption.

All instance types and scaling parameters referenced here are configurable through the Helm values and infrastructure configuration. The default values reflect the small fleet tier suitable for development and initial production deployments.

Prerequisites

Fleet Size Tiers

AosCloud sizing is driven by the number of connected Units communicating with the cloud platform. The tiers below represent typical deployment scales:

TierConnected UnitsTypical Use Case
SmallUp to 5,000Development, staging, initial production rollout
Medium5,000–50,000Regional production deployment
Large50,000–200,000+Full-scale production fleet

Sizing Recommendations by Resource

Aurora PostgreSQL

Aurora PostgreSQL stores all structured application data (fleet management, user accounts, configuration). Resource consumption scales with the number of Units and the volume of provisioning and telemetry operations.

ParameterSmallMediumLarge
Instance classdb.t3.mediumdb.r6g.largedb.r6g.xlarge
Cluster instances122–3
Storage typeStandardaurora-iopt1aurora-iopt1
Max connections100200400

Scaling drivers:

  • Each connected Unit maintains session state and periodic heartbeat records
  • Provisioning operations (deploying Deployment Bundles) generate write bursts proportional to fleet size
  • Connection capacity is configurable per environment

ElastiCache Redis

Redis serves as the session cache, message queue staging area, and real-time connection state store. Each connected Unit maintains an active session entry.

ParameterSmallMediumLarge
Node typecache.t4g.mediumcache.r7g.largecache.r7g.xlarge
Read replicas11–22
Availability zones122

Scaling drivers:

  • Each active Unit session consumes Redis memory (session DB 3)
  • Message queue staging (DB 2) grows with concurrent provisioning operations
  • The cache.t4g.medium default supports up to ~5,000 concurrent Unit sessions

DocumentDB (MongoDB-compatible)

DocumentDB stores alert data from the fleet. Write volume correlates directly with the number of Units reporting alerts and their reporting frequency.

ParameterSmallMediumLarge
Instance classdb.t3.mediumdb.r5.largedb.r5.xlarge
Cluster size223
Availability zones333

Scaling drivers:

  • Alert ingestion rate is proportional to active Unit count
  • Query performance degrades as the alert collection grows — larger instances provide more memory for working set
  • The cluster runs a minimum of 2 instances for high availability across availability zones

EKS Node Groups

EKS worker nodes run all AosCloud microservices, the Istio service mesh, monitoring agents, and the rabbitmq-cluster-operator. Node count scales with total pod demand.

ParameterSmallMediumLarge
Instance typem5.xlargem5.2xlargem5.2xlarge
Min nodes346
Max nodes102030+
Cluster autoscalerEnabledEnabledEnabled

Scaling drivers:

  • Each additional ~50,000 Units requires approximately one additional m5.xlarge node
  • The default scaling configuration allows 3–20 nodes; adjust max size for large fleets
  • Development environments may use t3.2xlarge for cost savings (burstable)
  • Production environments should use m5.xlarge or m5.2xlarge for consistent performance

EFS (Elastic File System)

EFS provides persistent storage for InfluxDB time-series data (metrics). EFS uses elastic throughput by default — no provisioned throughput configuration is required.

ParameterSmallMediumLarge
Throughput modeElastic (default)ElasticElastic
EncryptionEnabled (KMS)Enabled (KMS)Enabled (KMS)
Mount targetsPer EKS subnetPer EKS subnetPer EKS subnet

Scaling drivers:

  • Storage consumption grows with metrics retention period and number of reporting Units
  • EFS elastic throughput automatically scales with I/O demand — no manual sizing needed
  • For very high write workloads, consider lifecycle policies to tier infrequently accessed data

Fleet Size and Resource Consumption

The relationship between connected Units and resource consumption is approximately linear for most resources, with the following key drivers:

┌─────────────────────────────────────────────────────────┐
│ Resource Scaling Model │
├──────────────────┬──────────────────────────────────────┤
│ Fleet Growth │ Primary Resource Impact │
├──────────────────┼──────────────────────────────────────┤
│ +Units │ Redis memory, Aurora connections, │
│ │ EKS pod count, DocumentDB writes │
├──────────────────┼──────────────────────────────────────┤
│ +Provisioning │ Aurora write IOPS, S3 storage, │
│ Operations │ EKS CPU burst, message queue depth │
├──────────────────┼──────────────────────────────────────┤
│ +Alert Volume │ DocumentDB writes and storage, │
│ │ EKS memory for alert-handler pods │
├──────────────────┼──────────────────────────────────────┤
│ +Monitoring │ EFS storage (InfluxDB), Prometheus │
│ Retention │ ingestion rate │
└──────────────────┴──────────────────────────────────────┘

Autoscaling Behavior

AosCloud leverages the Kubernetes Cluster Autoscaler to handle dynamic load. When pods cannot be scheduled due to insufficient node capacity, the autoscaler provisions additional nodes up to the configured maximum. This covers:

  • Burst provisioning operations across the fleet
  • Temporary spikes in alert processing
  • Rolling updates that temporarily increase pod count

Managed services (Aurora, Redis, DocumentDB) do not autoscale automatically — you must provision appropriate instance sizes based on your expected steady-state fleet size.

Cost Implications by Tier

The table below provides approximate monthly cost ranges for each tier. Actual costs depend on region, reserved instance discounts, and usage patterns.

ResourceSmall (~$)Medium (~$)Large (~$)
Aurora PostgreSQL200–400800–1,5002,000–3,500
ElastiCache Redis80–150400–600800–1,200
DocumentDB150–250400–700800–1,500
EKS nodes (compute)800–1,5002,000–4,0005,000–10,000
EFS<1010–5050–200
Estimated total infrastructure~2,500–5,000~5,000–10,000~10,000–20,000
note

These estimates cover compute and database infrastructure only. Additional costs for networking (VPC endpoints, data transfer), monitoring (Prometheus), CDN (CloudFront), and supporting services are documented in Costs.

For detailed per-service pricing and historical cost data, see the Costs documentation.

Sizing Decision Checklist

When selecting a tier, consider:

  1. Current fleet size — how many Units are connected today?
  2. Growth projection — what fleet size do you expect in 12 months?
  3. Provisioning frequency — how often do you deploy Deployment Bundles (software updates)?
  4. Alert density — how many alerts per Unit per day?
  5. Availability requirements — do you need multi-AZ replicas for all databases?

Start with the tier matching your current fleet size, and configure the node group max size with headroom for the next tier. Database instance upgrades require a brief maintenance window but can be performed without data loss.