Multi-Node Operation
Introduction
AosEdge supports Units composed of multiple Nodes — distinct computing elements (boards, VMs, or hardware partitions) that work together as a single managed system. Multi-Node operation enables OEMs to build systems from heterogeneous hardware while presenting a unified management interface to the cloud.
This section covers how Nodes coordinate within a Unit, how the Main Node orchestrates the system, and how services and files are distributed across Nodes.
Why Multi-Node?
Real-world edge systems often combine hardware with different capabilities:
- A gateway board with network connectivity handles cloud communication and management
- A compute board with a powerful CPU or GPU runs intensive workloads
- A sensor board with specialized I/O handles real-time data acquisition
- A safety-critical partition runs certified software in isolation
Rather than requiring a separate cloud connection and management stack for each board, AosEdge groups them into a single Unit. The cloud sees one entity, sends one desired state, and receives one unified status — regardless of how many Nodes compose the Unit internally.
The Main Node Model
Every multi-Node Unit has exactly one Main Node that serves as the coordination point:
| Responsibility | How It Works |
|---|---|
| Cloud connectivity | The Main Node's Communication Manager (CM) maintains the single WebSocket connection to AosCloud |
| Desired-state distribution | CM receives the desired state for the entire Unit and distributes service assignments to the appropriate Nodes |
| Status aggregation | CM collects status from all Nodes' Service Managers and reports a unified UnitStatus to the cloud |
| Configuration management | CM manages the Unit configuration and pushes per-Node configurations to each Node |
| Update orchestration | CM coordinates firmware and service updates across all Nodes |
Secondary Nodes run their own Service Manager (SM) and Identity and Access Manager (IAM) locally, but rely on the Main Node's CM for cloud interaction and coordination.
How Nodes Communicate
Nodes within a Unit communicate through two mechanisms:
SM-to-CM Connection (Service Coordination)
Each Node's Service Manager connects to the Main Node's CM via gRPC. Through this connection:
- SM registers itself and reports available resources, runtimes, and running instances
- CM pushes service deployment commands (run, stop, update) to the Node
- SM sends back status, monitoring data, alerts, and logs
On the Main Node, SM connects to CM locally. On Secondary Nodes, the Message Proxy (MP) bridges this connection across the inter-Node transport.
IAM Registration Stream (Node Management)
Each Secondary Node's IAM maintains a persistent bidirectional gRPC stream (RegisterNode) to the Main Node's IAM. This
stream carries:
- Node identity and state information
- Provisioning and certificate operations
- Pause/resume commands from the cloud
- Connection state tracking
The Message Proxy's Role
The Message Proxy (MP) runs on Secondary Nodes and bridges the gap between the local components and the Main Node's CM. From the cloud's perspective, the Unit is a single entity — MP is purely an internal implementation detail that enables multi-Node operation.
MP provides:
- Message routing — relays CM commands to the local SM and forwards status back to the Main Node
- File distribution — downloads service images from CM's file server and delivers them to the local SM
- Transport abstraction — supports multiple inter-Node transports (Xen vchan for virtualized environments, TCP sockets for networked boards)
Component Distribution Summary
| Component | Main Node | Secondary Node | Purpose |
|---|---|---|---|
| CM | ✓ | — | Cloud connectivity, orchestration |
| SM | ✓ | ✓ | Local service lifecycle management |
| IAM | ✓ | ✓ | Per-Node identity and certificates |
| MP | — | ✓ | Inter-Node communication bridge |
In This Section
- Multi-Node Architecture — detailed topology, transport layers, and communication patterns between Nodes
- Node Lifecycle — Node state machine (unprovisioned, provisioned, paused, error), transitions, and connection state
- Inter-Node File Distribution — how service images are distributed from the Main Node to Secondary Nodes via MP
- Dynamic Node Registration — how Nodes dynamically join and leave a Unit at runtime
Related Pages
- Unit and Node Model — foundational explanation of the Unit/Node hierarchy
- Architecture Overview — system-wide component architecture and interactions
- Message Proxy — detailed MP component documentation
- Node Identity — how Nodes establish and report their identity
- Configuration — Unit and Node configuration reference