Cloud Communication
Introduction
The communication module is the CM subcomponent responsible for maintaining the persistent connection between an
AosEdge Unit and AosCloud. It implements a WebSocket-based JSON protocol (version 7) that carries all bidirectional
traffic: desired-state delivery, status reporting, monitoring data, alerts, logs, certificate lifecycle events, and
provisioning operations.
This page documents the connection lifecycle, message envelope format, the complete message type catalog, the acknowledgment mechanism, and the reconnection strategy with exponential backoff.
Connection Lifecycle
The communication module establishes and maintains the cloud connection through a multi-step process:
- Service Discovery — CM sends an HTTPS POST to the configured service discovery URL to obtain the WebSocket endpoint and an authentication token
- WebSocket Establishment — CM opens a persistent WebSocket connection to the endpoint returned by service discovery, authenticating with the bearer token
- Bidirectional Messaging — JSON messages flow in both directions over the WebSocket using binary frames
- Reconnection — If the connection drops, CM reconnects using exponential backoff
Service Discovery
Before connecting to the cloud WebSocket, CM must discover the endpoint. The service discovery flow:
- CM obtains the service discovery URL from its configuration (or from the certificate's Subject Alternative Name via the crypto helper)
- CM sends an HTTPS POST request with a JSON body containing:
version— the protocol version (7)systemId— the Unit's system identifiersupportedProtocols— list of supported transport protocols (currently["wss"])
- The service discovery server responds with:
version— protocol versionsystemId— echoed system IDconnectionInfo— list of WebSocket endpoint URLs to connect toauthToken— bearer token for WebSocket authenticationnextRequestDelay— suggested delay before the next discovery requesterrorCode— one ofNoError,Redirect,RepeatLater, orError
The connection uses mTLS — CM presents its "online" certificate (obtained from IAM) for client authentication, and validates the server's certificate against the configured CA.
WebSocket Connection
After successful service discovery, CM establishes the WebSocket connection:
- Creates an HTTPS session to the first URL in
connectionInfo - Sets the
Authorization: Bearer <authToken>header - Performs the WebSocket upgrade handshake
- Enables keep-alive on the socket
- Sets receive timeout to infinite (blocking read)
Once connected, CM notifies all registered connection listeners (other CM subcomponents that need to know about
connectivity state) via the OnConnect() callback.
Protocol Format
All messages use a consistent JSON envelope with two top-level fields:
{
"header": {
"version": 7,
"systemId": "<unit-system-id>",
"createdAt": "<ISO-8601-timestamp>",
"txn": "<uuid-transaction-id>"
},
"data": {
"messageType": "<type-discriminator>",
"correlationId": "<optional-correlation-id>",
...
}
}
Header Fields
| Field | Type | Description |
|---|---|---|
version | integer | Protocol version, always 7 |
systemId | string | The Unit's unique system identifier |
createdAt | string | ISO 8601 UTC timestamp of message creation |
txn | string | UUID v4 transaction identifier, unique per message |
Data Fields
| Field | Type | Description |
|---|---|---|
messageType | string | Discriminator identifying the payload type |
correlationId | string (optional) | Links a response to its originating request |
| (additional fields) | varies | Payload-specific fields depending on messageType |
Messages are transmitted as WebSocket binary frames. CM uses 4 dedicated handler threads to process received messages concurrently.
Message Types
The protocol defines the following message types, organized by direction and functional category:
Unit → Cloud Messages
messageType | Purpose |
|---|---|
unitStatus | Reports current Unit state: node information, installed items, running instances, subjects |
alerts | Forwards alert events (system alerts, quota alerts, instance alerts, download progress) |
monitoringData | Sends resource monitoring data (CPU, RAM, disk usage per node and instance) |
pushLog | Sends requested log data (system logs, instance logs, crash logs) |
newState | Reports a service instance's new state data |
stateRequest | Requests state data for a specific service instance |
overrideEnvVarsStatus | Reports status of environment variable override operations |
requestBlobUrls | Requests download URLs for image blobs (by OCI digest) |
issueUnitCertificates | Sends CSRs for certificate issuance |
installUnitCertificatesConfirmation | Confirms successful certificate installation |
startProvisioningResponse | Responds to a provisioning start request |
finishProvisioningResponse | Responds to a provisioning finish request |
deprovisioningResponse | Responds to a deprovisioning request |
ack | Acknowledges successful receipt of a cloud message |
nack | Indicates failure to process a cloud message (includes retryAfter in milliseconds) |
Cloud → Unit Messages
messageType | Purpose |
|---|---|
desiredStatus | Delivers the desired state: items to deploy, instances to run, node states, unit config, certificates |
blobUrls | Provides download URLs and decryption info for requested image blobs |
requestLog | Requests log data from the Unit (system, instance, or crash logs) |
stateAcceptance | Accepts or rejects a service instance's new state |
updateState | Notifies the Unit of an update state change |
overrideEnvVars | Requests environment variable overrides for service instances |
renewCertificatesNotification | Notifies that certificates need renewal |
issuedUnitCertificates | Delivers newly issued certificates |
startProvisioningRequest | Initiates the provisioning workflow |
finishProvisioningRequest | Completes the provisioning workflow |
deprovisioningRequest | Initiates deprovisioning |
ack | Acknowledges successful receipt of a Unit message |
nack | Indicates failure to process a Unit message |
Delta Status Reporting
The unitStatus message supports delta reporting via the isDeltaInfo flag. When true, only changed fields are
included rather than the full Unit state. This reduces message size for periodic status updates.
Acknowledgment Mechanism
Messages that require delivery confirmation use an ACK/NACK mechanism:
- Sender transmits a message with a unique
txnand optionally acorrelationId - Receiver processes the message and responds with either:
ack— message processed successfully (includescorrelationIdlinking to the original)nack— message processing failed (includesretryAftersuggesting when to retry, default 500ms)
- Sender tracks unacknowledged messages and retries if no response arrives within the configured timeout
Send Policies
CM uses two send policies for outgoing messages:
| Policy | Behavior |
|---|---|
ExpectAck | Message is tracked after sending; if no ACK arrives within the response wait timeout, it is re-enqueued for retry (up to 3 attempts) |
SendOnly | Message is sent without tracking; no retry on missing acknowledgment |
Important messages (alerts, logs, env var statuses, certificate operations) use ExpectAck. Non-critical messages
(monitoring data, state updates, unit status) use SendOnly.
Retry Logic
When a message expecting acknowledgment does not receive a response within the configured cloudResponseWaitTimeout:
- The message is re-enqueued to the send queue with an incremented retry counter
- Maximum retry attempts: 3 per message
- If all retries are exhausted, the message is dropped and the failure is logged
Reconnection Strategy
The communication module uses exponential backoff for reconnection:
| Parameter | Value |
|---|---|
| Initial reconnect delay | 1 second |
| Maximum reconnect delay | 10 minutes |
| Reconnect attempts per cycle | 5 |
| Backoff multiplier | 2× (exponential) |
Reconnection Flow
- The connection handler thread runs a continuous loop while CM is active
- On connection failure, the
Retryutility is invoked with exponential backoff:- First retry after 1 second
- Subsequent retries double the delay (2s, 4s, 8s, ...) up to the 10-minute cap
- After 5 failed attempts at the current delay, the delay increases
- If the WebSocket receives a close frame or an error occurs during frame reception, CM disconnects and re-enters the connection loop
- If authorization fails (HTTP 401 during WebSocket upgrade), the cached service discovery response is cleared, forcing a fresh discovery on the next attempt
Connection State Notifications
CM maintains a subscriber list for connection state changes. Other CM subcomponents (update manager, state handler) subscribe to receive:
OnConnect()— called when the WebSocket connection is successfully establishedOnDisconnect()— called when the connection is lost
This allows dependent modules to pause operations during disconnection and resume when connectivity is restored.
Threading Model
The communication module operates with the following dedicated threads:
| Thread | Responsibility |
|---|---|
| Connection handler | Manages the connection lifecycle (discovery → connect → receive loop → reconnect) |
| Send queue handler | Dequeues messages and transmits them over the WebSocket |
| Unacknowledged message handler | Monitors sent messages awaiting ACK; re-enqueues on timeout |
| Message handler pool (×4) | Processes received messages concurrently (parses JSON, dispatches to handlers) |
All threads coordinate through a shared mutex and condition variable. The send queue and receive queue are protected by
the mutex, and threads wake via notify_all() when new work is available.
Security
The cloud connection is secured through multiple layers:
- mTLS Authentication: CM presents its "online" certificate for client authentication; the server's certificate is validated against the configured CA certificate
- Certificate Rotation: When IAM notifies CM of a certificate change, CM disconnects and reconnects with the new credentials
- Bearer Token: The service discovery auth token is included in the WebSocket upgrade request for session-level authentication
- Message Integrity: All messages include the
systemIdin the header; CM validates that received messages match its own system ID
Message Logging
CM optionally logs all sent and received messages to a file for debugging purposes. When the cloudMessageLog
configuration path is set, every transmitted (TX) and received (RX) message is written to the log file with a
direction indicator. This is useful for protocol-level debugging but should be disabled in production due to the volume
of data.
Related Pages
- Communication Manager — overview of CM and all its subcomponents
- Update Manager — how CM processes desired-state updates received via this protocol
- Unit Configuration — unit config delivery via the
desiredStatusmessage - Cloud Protocol Reference — comprehensive protocol reference with full message schemas
- Connection Management — detailed connection lifecycle and failure handling
- Certificate Handler — certificate lifecycle that triggers reconnection
- Architecture Overview — high-level view of all AosCore components