Skip to main content
Version: v1.1

Multi-Node Architecture

Introduction

This page describes the internal architecture that enables multi-Node operation in AosEdge Units. It covers the physical and logical topology, the transport layer that connects Nodes, the channel multiplexing protocol that carries multiple traffic types over a single link, and how the Service Manager and IAM connections are established across Node boundaries.

The multi-Node architecture is an edge-side implementation detail — the cloud communicates with the Unit as a single entity and has no awareness of the Message Proxy or inter-Node transport. From the OEM integration perspective, understanding this architecture helps with:

  • Selecting the appropriate transport backend for your hardware topology
  • Configuring port assignments and channel security
  • Diagnosing inter-Node connectivity issues
  • Understanding how service deployments and certificate operations reach Secondary Nodes

Topology

A multi-Node Unit follows a star topology with the Main Node at the center:

┌─────────────────────────────────┐
│ AosCloud │
└────────────────┬────────────────┘
│ WebSocket (JSON)

┌────────────────┴────────────────┐
│ Main Node │
│ │
│ ┌────┐ ┌────┐ ┌─────┐ │
│ │ CM │ │ SM │ │ IAM │ │
│ └──┬─┘ └────┘ └──┬──┘ │
│ │ │ │
└──────┼──────────────────┼────────┘
│ │
┌────────────────┼──────────────────┼────────────────┐
│ │ │ │
┌─────────┴─────────┐ │ ┌────────────┴────────────┐ │
│ Secondary Node A │ │ │ Secondary Node B │ │
│ │ │ │ │ │
│ ┌────┐ ┌─────┐ │ │ │ ┌────┐ ┌─────┐ │ │
│ │ SM │ │ IAM │ │ │ │ │ SM │ │ IAM │ │ │
│ └──┬─┘ └──┬──┘ │ │ │ └──┬─┘ └──┬──┘ │ │
│ │ │ │ │ │ │ │ │ │
│ ┌──┴───────┴──┐ │ │ │ ┌──┴───────┴──┐ │ │
│ │ MP │ │ │ │ │ MP │ │ │
│ └──────┬──────┘ │ │ │ └──────┬──────┘ │ │
└─────────┼──────────┘ │ └─────────┼───────────────┘ │
│ │ │ │
└────────────────┴───────────────┘ │
Transport (vchan or TCP) │

Key characteristics:

  • Single cloud connection — only the Main Node's CM maintains the WebSocket connection to AosCloud
  • Star topology — each Secondary Node connects independently to the Main Node; Secondary Nodes do not communicate with each other
  • Per-Node identity — each Node (Main and Secondary) runs its own IAM and maintains its own certificates
  • Centralized orchestration — CM on the Main Node distributes desired state, collects status, and coordinates updates across all Nodes

Transport Layer

The transport layer provides the raw byte-stream connection between a Secondary Node and the Main Node. Two backends are available, selected at build time:

Xen Virtual Channel (vchan)

Used when Nodes are Xen virtual machines running on the same physical host.

PropertyValue
Build flagWITH_VCHAN=ON
AddressingXen domain ID + XenStore paths
ChannelsSeparate read and write vchan instances
LatencyVery low (shared-memory based)
Use caseHypervisor-based multi-Node Units

The vchan transport uses the libxenvchan library. Each direction (read/write) is a separate virtual channel, addressed by the peer domain's ID and a XenStore path. The Secondary Node's MP creates the vchan endpoints and waits for the Main Node to connect.

Configuration:

{
"VChan": {
"Domain": 0,
"XSRXPath": "/local/domain/1/data/vchan_rx",
"XSTXPath": "/local/domain/1/data/vchan_tx",
"IAMCertStorage": "iam",
"SMCertStorage": "sm"
}
}

TCP Socket

Used when Nodes are separate physical boards connected over a network, or any non-Xen configuration.

PropertyValue
Build flagWITH_VCHAN=OFF (default)
AddressingTCP port (default: 30001)
Connection modelServer socket on Secondary Node; Main Node connects as client
LibraryPoco::Net with reactor pattern
Use caseNetworked boards, development environments

The TCP socket transport runs a server socket on the Secondary Node. The Main Node initiates the TCP connection to the Secondary Node's listening port. A Poco SocketReactor handles connection acceptance asynchronously.

Connection direction: The Main Node connects to the Secondary Node (the Secondary Node listens). This may seem counterintuitive, but it allows the Secondary Node to be ready before the Main Node starts, simplifying startup ordering.

Transport Interface

Both backends implement the same TransportItf interface:

MethodPurpose
Connect()Establish the transport connection (blocks until peer connects)
Read(message)Read raw bytes from the transport
Write(message)Write raw bytes to the transport
Close()Close the current connection (can reconnect)
Shutdown()Permanently shut down the transport

This abstraction makes the rest of the MP stack transport-agnostic — the channel multiplexing, message framing, and TLS wrapping work identically regardless of whether the underlying link is vchan or TCP.

Channel Multiplexing

A single transport connection carries multiple logical channels simultaneously. Each channel is identified by a port number and carries a different type of traffic.

Protocol Header

Every message sent over the transport is prefixed with an AosProtocolHeader:

FieldTypeSizePurpose
mPortuint32_t4 bytesIdentifies the destination logical channel
mDataSizeuint32_t4 bytesLength of the payload in bytes
mCheckSumuint8_t[32]32 bytesSHA-256 checksum of the payload

The Communication Manager reads the header, validates the checksum, looks up the channel by port number, and delivers the payload to the correct channel's receive buffer.

Maximum message size: 64 KB. Messages exceeding this limit are rejected.

Logical Channels

The MP establishes the following logical channels over the single transport:

ChannelPort SourceSecurityPurpose
CM OpenCMConfig.mOpenPortUnencryptedCM communication during provisioning
CM SecureCMConfig.mSecurePortTLS-encryptedCM communication in normal operation
IAM PublicIAMConfig.mOpenPortUnencryptedIAM public API (certificate requests, always available)
IAM ProtectedIAMConfig.mSecurePortTLS-encryptedIAM provisioning and Node management operations

In normal (non-provisioning) operation, the CM Secure and IAM Protected channels carry the primary traffic. The IAM Public channel remains available for certificate-related operations that must work before TLS is fully established.

Channel Security

Each logical channel can be either open (plaintext) or secure (TLS-encrypted):

  • Open channels — raw protobuf messages over the multiplexed transport. Used during provisioning when certificates are not yet available, and for the IAM public API.
  • Secure channels — wrap the underlying communication channel with TLS using OpenSSL. The SecureChannel class implements a custom BIO (Basic I/O) that reads/writes through the multiplexed channel rather than a raw socket. Authentication uses IAM-issued certificates.

When certificates are rotated (detected via IAM subscription notifications), the transport connection is closed and re-established with the new credentials. This triggers all channels to reconnect.

SM Connection Across Nodes

The Service Manager on each Node connects to the Main Node's CM via gRPC using the servicemanager v5 proto (RegisterSM RPC). The connection path differs by Node type:

Main Node SM

On the Main Node, SM connects directly to CM's gRPC server (localhost). No MP involvement.

Main Node: SM ──gRPC──► CM (SMController)

Secondary Node SM

On a Secondary Node, SM connects to the local MP's CMClient, which relays messages through the inter-Node transport to the Main Node's CM:

Secondary Node: SM ──gRPC──► MP (CMClient) ──[transport]──► Main Node CM (SMController)

The CMClient module within MP:

  1. Establishes a gRPC connection to CM's SMService on the Main Node
  2. Registers the Secondary Node's SM with CM (the RegisterSM bidirectional stream)
  3. Relays incoming commands from CM (service deployments, configuration updates) to the local SM
  4. Forwards outgoing status, monitoring data, alerts, and logs from the local SM back to CM
  5. Caches outgoing messages when disconnected and sends them once the connection is restored

From CM's perspective, all SM connections look identical — it does not distinguish between a local SM and a remote SM connected via MP. The SM Controller manages all connected SMs uniformly.

IAM Registration Across Nodes

Each Secondary Node's IAM registers with the Main Node's IAM through the RegisterNode bidirectional gRPC stream (defined in IAMPublicNodesService). This registration is carried over the MP's IAM channels.

Registration Flow

  1. Secondary Node's IAM establishes a RegisterNode stream to the Main Node's IAM
  2. The stream carries IAMOutgoingMessages (Secondary → Main) and IAMIncomingMessages (Main → Secondary)
  3. The Secondary Node sends its NodeInfo (node ID, type, capabilities, state) as the first outgoing message
  4. The Main Node's IAM tracks the Secondary Node's state and makes it visible to the cloud

Traffic Carried Over IAM Channels

IAM Public channel (open port):

  • Certificate requests (GetCert)
  • Certificate change subscriptions (SubscribeCertChanged)
  • Node info queries

IAM Protected channel (secure port):

  • Provisioning operations (StartProvisioning, FinishProvisioning, Deprovision)
  • Certificate management (CreateKey, ApplyCert)
  • Node control (PauseNode, ResumeNode)

The two IAM connections (IAMPublicConnection and IAMProtectedConnection) run as independent channels within the communication layer, each with its own port assignment and optional TLS wrapping.

Connection Lifecycle

Startup Sequence

  1. Transport initialization — MP initializes the transport backend (vchan or TCP socket)
  2. Communication Manager start — begins the connection loop (connect, read, reconnect on failure)
  3. CM Connection start — establishes open and secure CM channels, begins relaying messages
  4. IAM Public Connection start — establishes the IAM public channel (always available)
  5. IAM Protected Connection start — establishes the IAM protected channel (skipped in provisioning mode)

Reconnection Behavior

The Communication Manager runs a persistent connection loop:

  1. Attempt to connect the transport (blocks until peer is available)
  2. Enter the read loop — dispatch incoming messages to channels by port
  3. On transport error — close all channels, wait 3 seconds, retry from step 1

Individual connections (CM, IAM) also handle disconnection gracefully:

  • CMClient caches outgoing messages during disconnection
  • IAM connections retry with a 3-second timeout between attempts
  • Certificate rotation triggers a full transport reconnect

Provisioning Mode

When MP starts with the --provisioning flag:

  • All connections use open (unencrypted) channels
  • The IAM Protected connection is not initialized
  • Certificate subscriptions are skipped
  • This allows the Node to operate before obtaining its initial certificates

Once provisioning completes and certificates are issued, MP is restarted in normal mode with full TLS security.

Message Flow Example

To illustrate how a service deployment reaches a Secondary Node:

  1. Cloud → CM: AosCloud sends a desired state update over WebSocket to the Main Node's CM
  2. CM → SM (via MP): CM determines the target Node and sends a RunInstances command through the SM Controller's gRPC stream
  3. MP relay: On the Secondary Node, MP's CMClient receives the command over the secure CM channel and forwards it to the local SM via gRPC
  4. SM execution: The local SM pulls the service image (via MP's file distribution), unpacks it, and starts the service
  5. Status return: SM reports instance status back through the same path (SM → MP CMClient → transport → CM → cloud)

The entire flow is transparent — CM treats all SMs uniformly, and the cloud sees a single Unit status regardless of which Node runs which service.

Build Configuration

The transport backend is selected at build time:

# Xen vchan transport (for hypervisor environments)
-DWITH_VCHAN=ON

# TCP socket transport (default, for networked boards)
-DWITH_VCHAN=OFF

A typical Secondary Node build includes:

-DWITH_SM=ON # Local service management
-DWITH_IAM=ON # Per-Node identity
-DWITH_MP=ON # Inter-Node communication
-DWITH_CM=OFF # No cloud connection (handled by Main Node)