Skip to main content
Version: v1.1

Scheduling and Placement

Introduction

When a new desired state arrives from AosCloud, the Communication Manager (CM) must decide where each Instance should run. This decision — scheduling and placement — is handled by the CM's Launcher module and its internal Balancer component. The Balancer implements a multi-stage filtering pipeline that evaluates every connected Node against each Instance's requirements, selecting the optimal Node-runtime combination based on priority, labels, resource capacity, and platform compatibility.

This page documents the scheduling algorithm, the filtering stages, resource accounting, rebalancing triggers, and how the system handles placement failures. Understanding this mechanism is important for OEMs because it determines how workloads are distributed across Nodes in a Unit, and how Node configuration (labels, priorities, resource limits) influences placement decisions.

Scheduling Architecture

Scheduling is a CM-level responsibility. The SM Launcher on each Node only executes start/stop commands — it does not make placement decisions. The CM Launcher coordinates the full scheduling lifecycle:

┌─────────────────────────────────────────────────────────────────┐
│ CM Launcher │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────────┐ │
│ │ Instance │ │ Node │ │ Balancer │ │
│ │ Manager │ │ Manager │ │ │ │
│ │ │ │ │ │ • Priority sort │ │
│ │ • Tracks │ │ • Tracks │ │ • Policy balancing │ │
│ │ active │ │ connected │ │ • Node filtering │ │
│ │ instances │ │ nodes │ │ • Runtime selection │ │
│ │ • Manages │ │ • Resource │ │ • Best-fit pick │ │
│ │ lifecycle │ │ tracking │ │ │ │
│ └──────────────┘ └──────────────┘ └──────────────────────┘ │
│ │
│ ┌──────────────────────────────────────┐ │
│ │ Network Manager │ │
│ │ • DNS, connectivity, exposed ports │ │
│ └──────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
│ │
▼ ▼
┌───────────┐ ┌───────────┐
│ Node A │ │ Node B │
│ (SM) │ │ (SM) │
└───────────┘ └───────────┘

Key Components

ComponentRole
LauncherEntry point — receives RunInstances requests from the Update Manager and coordinates the scheduling process
BalancerCore scheduling logic — implements the multi-phase algorithm that assigns Instances to Nodes
Instance ManagerTracks all active and scheduled Instances, manages their lifecycle states
Node ManagerMaintains the set of connected Nodes, their capabilities, and available resources
Network ManagerConfigures networking (DNS, provider networks) for scheduled service Instances

Scheduling Input

The Launcher receives a RunInstances call from the Update Manager containing an array of RunInstanceRequest entries. Each request specifies:

FieldDescription
Item IDIdentifies the Deployable Item (service image or component)
Item TypeWhether this is a service or component Instance
VersionTarget version of the Deployable Item
Owner IDThe owner (provider) of the service
Subject InfoIdentity subject under which the Instance runs
PriorityScheduling priority — higher values are scheduled first
Num InstancesHow many replicas of this Instance to schedule
LabelsPlacement constraint labels that must match Node labels

The Launcher expands each request into individual Instance objects (one per replica) and passes them to the Balancer for placement.

Multi-Phase Scheduling Algorithm

The Balancer executes scheduling in five sequential phases:

Phase 1: Instance Prioritization

Before placement begins, Instances are sorted by priority (descending). Higher-priority Instances are scheduled first, ensuring they get first pick of available resources. When priorities are equal, Instances are ordered by Item ID for deterministic behavior.

This ordering applies uniformly to both service Instances (container processes) and component Instances (firmware, rootfs).

Phase 2: Policy-Based Rebalancing

During a rebalance operation (not initial placement), the Balancer first handles Instances with a disabled balancing policy. These Instances are pinned to their current Node — they are re-scheduled to the same Node and runtime they were previously running on, bypassing the normal filtering pipeline.

Balancing PolicyBehavior
Enabled (default)Instance participates in normal scheduling and can be moved between Nodes during rebalancing
DisabledInstance is pinned to its current Node — rebalancing will not move it

The balancing policy is specified in the Deployable Item's configuration (the OCI item config).

Phase 3: Node Filtering Pipeline

For each Instance that needs placement, the Balancer runs a multi-stage filtering pipeline. The pipeline starts with all connected Nodes and progressively eliminates those that cannot host the Instance.

Stage 1: Node ID Filter

If the Instance has a previous Node assignment (from a prior scheduling round), the system checks whether that Node is still valid. This stage also handles explicit Node ID constraints if specified.

Stage 2: Label Matching

Nodes are filtered based on the Instance's required labels. A Node must possess all labels specified in the Instance's label set to be eligible. Labels are string key-value pairs configured on each Node.

Example: If an Instance requires labels ["gpu=true", "zone=edge"], only Nodes that have both of these labels pass this filter.

Stage 3: Shared Resource Availability

Nodes are filtered based on required shared resources (hardware devices, partitions, or other named resources). The Instance's item configuration declares which named resources it needs, and only Nodes that advertise those resources remain eligible.

Stage 4: Runtime Selection

After Node-level filtering, the Balancer selects the best runtime on the remaining Nodes. Each Node can have multiple runtimes (e.g., a container runtime and a boot runtime). The runtime selection applies its own filtering stages:

Runtime FilterCriteria
Runtime typeThe Instance's required runtime type must match (e.g., crun, boot, rootfs)
PlatformThe runtime's architecture and OS must match the Instance's image platform
CPU capacityThe runtime must have sufficient available CPU (DMIPS) for the Instance's requirements
RAM capacityThe runtime must have sufficient available RAM for the Instance's requirements
Instance countThe runtime must not have reached its maximum Instance capacity

Stage 5: Priority Selection and Best Fit

From the remaining Node-runtime candidates:

  1. Top priority filter — only Nodes with the highest priority value are retained
  2. Best-fit selection — among equal-priority Nodes, the system selects the Node with the most available resources (CPU first, then RAM as tiebreaker, then Node ID for determinism)

This "most available resources" strategy distributes workloads to avoid hotspots, preferring Nodes that have the most headroom.

Phase 4: Network Configuration

After all Instances are scheduled, the Balancer updates network configurations:

  1. Removes network parameters for Instances that are no longer scheduled
  2. Sets up provider networks for newly-scheduled service Instances on each Node
  3. Configures network parameters — first for Instances with exposed ports, then for all others
  4. Restarts the DNS server to reflect the new Instance topology

Phase 5: Instance Submission

The scheduling process concludes by:

  1. Submitting scheduled Instances to the Instance Manager (transitioning them from "scheduled" to "active")
  2. Sending the updated Instance lists to each Node's Service Manager via the SM Controller

The SM on each Node then executes the actual start/stop operations.

Resource Accounting

The Balancer tracks resource availability per Node and per runtime to make informed placement decisions.

Resource Types

ResourceUnitApplies ToDescription
CPUDMIPSService InstancesProcessing capacity available on the Node/runtime
RAMBytesService InstancesMemory available on the Node/runtime
StorageBytesService InstancesPersistent storage partition space
StateBytesService InstancesState partition space
Shared ResourcesNamedBothHardware devices, partitions, or other named resources

How Resource Requirements Are Determined

For service Instances, resource requirements come from two sources:

  1. Requested resources — explicit CPU and RAM values declared in the Deployable Item's configuration
  2. Node configuration ratios — if no explicit request is specified, the system calculates requirements using configurable ratios from the Node configuration (a percentage of total Node capacity)

The system also considers actual monitoring data during rebalancing — if a Node is under heavy load, the Balancer uses real consumption data rather than declared requirements to make better placement decisions.

Available Resource Calculation

Each Node tracks:

  • Total capacity — the Node's declared CPU (DMIPS) and RAM
  • System usage — CPU and RAM consumed by non-AosEdge processes (derived from monitoring data minus Instance consumption)
  • Instance reservations — resources reserved by already-scheduled Instances

Available resources = Total capacity − System usage − Instance reservations

Per-runtime tracking ensures that Instances are only placed on runtimes that have sufficient capacity, even when a Node has multiple runtimes sharing the same hardware.

Rebalancing

Rebalancing redistributes Instances across Nodes when resource usage becomes uneven. It is triggered by the monitoring system when resource thresholds are exceeded.

Threshold Configuration

Each Node can have resource usage thresholds configured:

ThresholdPurpose
Max thresholdUpper limit — when usage exceeds this percentage for longer than the minimum timeout, rebalancing is triggered
Min thresholdLower limit — rebalancing is considered complete when usage drops below this percentage for longer than the minimum timeout
Min timeoutDuration that usage must continuously exceed/fall below a threshold before action is taken

Thresholds are configured as percentages of total Node capacity. System-wide defaults apply unless overridden by per-Node configuration.

Rebalancing Flow

  1. Monitoring detects that a Node's resource usage exceeds the max threshold for the configured timeout
  2. The system triggers a Rebalance call on the Launcher
  3. The Balancer runs the full scheduling algorithm with rebalancing=true:
    • Instances with disabled balancing policy are pinned to their current Nodes
    • All other Instances are re-evaluated for optimal placement
    • Monitoring data is used for resource calculations (actual usage, not just declared requirements)
  4. Instances that should move are stopped on their current Node and started on the new target Node
  5. Rebalancing is considered complete when all Nodes fall below the min threshold

Preventing Rebalancing Oscillation

The dual-threshold design (max/min) with minimum timeouts prevents oscillation:

  • A brief CPU spike does not trigger rebalancing (must exceed max threshold for the full timeout duration)
  • Once triggered, rebalancing continues until usage stabilizes below the min threshold (not just below max)
  • Short-term drops below min threshold do not prematurely end rebalancing (must remain below for the full timeout)

Placement Failures

When the Balancer cannot find a suitable Node for an Instance, it records an error on that Instance and continues scheduling the remaining Instances. Placement can fail for several reasons:

Failure ReasonDescription
No connected NodesNo Nodes are currently connected to the CM
No matching Node IDThe Instance requires a specific Node that is not available
No matching labelsNo connected Node has all required labels
No matching resourcesNo Node has the required shared resources
No matching runtime typeNo Node has a runtime of the required type
No matching platformNo runtime matches the Instance's architecture/OS
Insufficient CPUAll candidate runtimes lack sufficient CPU capacity
Insufficient RAMAll candidate runtimes lack sufficient RAM capacity
Instance limit reachedAll candidate runtimes have reached their maximum Instance count

Failed Instances are reported with an error status to the cloud, enabling operators to diagnose and resolve placement issues (e.g., by adding capacity, adjusting labels, or reducing Instance requirements).

Instance Types and Scheduling Differences

The scheduling algorithm handles two Instance types with slightly different resource semantics:

AspectService InstanceComponent Instance
RuntimeContainer (crun/runc)Boot or Rootfs
CPU/RAM filteringUses requested resources or Node config ratiosDoes not consume CPU/RAM quotas (system-level)
Shared resourcesOptional (network devices, GPUs)Required (boot partitions, rootfs devices)
Max instances per runtimeConfigurable (typically many)Typically 1 per runtime
Network configurationFull (DNS, exposed ports, provider network)None
Balancing policyConfigurable (enabled/disabled)Typically disabled

Scheduling Sequence

The following sequence shows the end-to-end flow from desired-state reception through Instance placement:

Cloud → CM Update Manager → CM Launcher.RunInstances()


Balancer.RunInstances()

┌───────────────┼───────────────┐
▼ ▼ ▼
PrepareForBalancing PolicyBalancing NodeBalancing
(update monitoring, (pin disabled- (filter pipeline
reset state) rebalance for each Instance)
instances)


UpdateNetwork
(DNS, providers)


SubmitScheduledInstances


NodeManager.SendScheduledInstances()

┌─────────────────────┼──────────────────┐
▼ ▼ ▼
SM Controller SM Controller SM Controller
→ Node A → Node B → Node C
(start/stop) (start/stop) (start/stop)
  • Key Concepts — terminology including Node, Unit, Deployable Item