Client Communication (smclient)
Introduction
The smclient module is the Service Manager's communication gateway to the Communication Manager (CM). It implements a
gRPC client that connects to CM's smcontroller service using the servicemanager/v5 protocol. Through this
bidirectional streaming connection, SM registers itself, receives operational commands, and reports telemetry back to
CM.
This page documents the connection lifecycle, the registration flow, the command/response message protocol, and the
interfaces that smclient aggregates from other SM submodules.
Architecture
The smclient operates as a gRPC client connecting to CM's smcontroller gRPC server. The connection uses a
bidirectional streaming RPC (RegisterSM), allowing both sides to send messages asynchronously over a single persistent
stream.
┌─────────────────────────────────────────────────────────────┐
│ Service Manager (SM) │
│ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ smclient │ │
│ │ │ │
│ │ ┌───────────┐ ┌──────────┐ ┌────────────────┐ │ │
│ │ │ Connection│ │ Message │ │ Interface │ │ │
│ │ │ Loop │ │ Handler │ │ Aggregator │ │ │
│ │ └─────┬─────┘ └────┬─────┘ └───────┬────────┘ │ │
│ │ │ │ │ │ │
│ └────────┼──────────────┼────────────────┼────────────┘ │
│ │ │ │ │
│ │ ┌────┴────┐ ┌─────┴──────┐ │
│ │ │launcher │ │ monitoring │ │
│ │ │logprov. │ │ alerts │ │
│ │ │netmgr │ │ imagemanager│ │
│ │ └─────────┘ └────────────┘ │
└───────────┼────────────────────────────────────────────────┘
│ gRPC (mTLS)
▼
┌───────────────────────┐
│ CM (smcontroller) │
│ SMService server │
└───────────────────────┘
Connection Lifecycle
Connection Loop
The smclient runs a dedicated connection thread that maintains a persistent connection to CM. The loop follows a
connect-register-process-reconnect pattern:
- Connect — Create a gRPC channel to CM's server URL using TLS credentials
- Register — Open the
RegisterSMbidirectional stream - Send SMInfo — Immediately send the Node's identity, runtime capabilities, and available resources
- Send Instance Status — Report current state of all running instances
- Process Messages — Enter a blocking read loop, handling incoming commands from CM
- Reconnect — If the stream closes (network failure, CM restart, certificate rotation), wait for a configurable timeout and retry from step 1
┌─────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ Connect │────▶│ Register │────▶│ Send │────▶│ Process │
│ │ │ SM │ │ SMInfo + │ │ Incoming │
│ │ │ │ │ Status │ │ Messages │
└─────────┘ └──────────┘ └──────────┘ └────┬─────┘
▲ │
│ ┌──────────────┐ │
└──────────────│ Wait timeout │◀───────────────────┘
│ (reconnect) │ (stream closed)
└──────────────┘
Configuration
The connection is configured through the smclient::Config structure:
| Parameter | Description |
|---|---|
mCMServerURL | gRPC endpoint of CM's smcontroller service (e.g., localhost:8093) |
mCertStorage | Certificate storage identifier used to obtain mTLS credentials from IAM |
mCMReconnectTimeout | Duration to wait before reconnecting after a connection loss |
TLS Security
The connection to CM is secured with mutual TLS (mTLS):
- Secure mode (default): SM obtains client certificates from the local IAM instance via the configured
mCertStorage. When certificates are renewed, thesmclientreceives anOnCertChangedcallback and gracefully reconnects with the new credentials. - Insecure mode (development only): Uses server-side TLS without client certificate authentication.
Registration Flow
When the bidirectional stream is established, SM immediately sends two messages to identify itself and synchronize state:
1. SMInfo Message
The first message on the stream is SMInfo, which tells CM about this Node's capabilities:
message SMInfo {
string node_id = 1; // Unique Node identifier
repeated RuntimeInfo runtimes = 2; // Available execution runtimes
repeated ResourceInfo resources = 3; // Host resources (devices, shared resources)
}
RuntimeInfo describes each available runtime (container, boot, rootfs):
| Field | Description |
|---|---|
runtime_id | Unique runtime identifier |
type | Runtime type string (e.g., "container", "boot", "rootfs") |
max_dmips | Maximum compute capacity in DMIPS |
allowed_dmips | DMIPS available for service allocation |
total_ram | Total RAM available to the runtime |
allowed_ram | RAM available for service allocation |
max_instances | Maximum concurrent instances supported |
os_info | Operating system information |
arch_info | CPU architecture information |
ResourceInfo describes shared host resources:
| Field | Description |
|---|---|
name | Resource name (e.g., a device path or shared resource identifier) |
shared_count | Number of instances that can share this resource simultaneously |
2. NodeInstancesStatus Message
Immediately after SMInfo, SM sends the current status of all service instances on this Node. This allows CM to
reconcile its view of the world with the actual state after a reconnection.
Incoming Messages (CM → SM)
After registration, SM enters a message processing loop. CM sends commands through the stream as SMIncomingMessages:
| Message | Purpose | SM Handler |
|---|---|---|
GetNodeConfigStatus | Request current Node configuration state | Returns NodeConfigStatus with version and any errors |
CheckNodeConfig | Validate a proposed Node configuration | Parses JSON config, runs validation, returns status |
SetNodeConfig | Apply a new Node configuration | Parses and applies config, returns result status |
UpdateInstances | Start and/or stop service instances | Delegates to launcher — the primary deployment command |
UpdateNetworks | Configure network parameters for services | Delegates to network manager |
SystemLogRequest | Retrieve system-level logs | Delegates to log provider with time range filter |
InstanceLogRequest | Retrieve logs for specific service instances | Delegates to log provider with instance filter |
InstanceCrashLogRequest | Retrieve crash logs for service instances | Delegates to log provider for crash-specific logs |
GetAverageMonitoring | Request time-averaged monitoring data | Retrieves from monitoring module and sends response |
ConnectionStatus | Notify SM of cloud connection state changes | Updates internal state, notifies registered listeners |
UpdateInstances — The Core Deployment Command
The most important incoming message is UpdateInstances, which drives the service lifecycle:
message UpdateInstances {
repeated InstanceInfo start_instances = 1;
repeated InstanceIdent stop_instances = 2;
}
Each InstanceInfo in start_instances contains the full specification for a service instance:
| Field | Description |
|---|---|
instance | Service/subject/instance identity triple |
version | Service version string |
manifest_digest | OCI manifest digest for image verification |
owner_id | Owning entity identifier |
runtime_id | Target runtime for execution |
uid / gid | Unix user/group IDs for the instance process |
priority | Scheduling priority |
storage_path / state_path | Persistent storage locations |
env_vars | Environment variables to inject |
network_parameters | Network configuration (subnet, IP, DNS, firewall rules) |
monitoring_parameters | Alert rules (CPU/RAM thresholds, traffic limits) |
Outgoing Messages (SM → CM)
SM sends status updates, telemetry, and responses back to CM through the same bidirectional stream as
SMOutgoingMessages:
| Message | Purpose | Trigger |
|---|---|---|
SMInfo | Node registration and capabilities | Sent once on connection establishment |
NodeConfigStatus | Node configuration state response | In response to config commands |
UpdateInstancesStatus | Result of an UpdateInstances command | After processing start/stop operations |
NodeInstancesStatus | Current state of all instances | On connection (initial sync) and on state changes |
InstantMonitoring | Real-time resource usage snapshot | Periodically from monitoring module |
AverageMonitoring | Time-averaged resource metrics | In response to GetAverageMonitoring |
Alert | Threshold violation or system event | When alert conditions are detected |
LogData | Requested log content (chunked) | In response to log requests |
Alert Types
The Alert message carries a timestamp and one of several alert variants:
| Alert Type | Description |
|---|---|
SystemQuotaAlert | Node-level resource threshold exceeded (CPU, RAM, disk) |
InstanceQuotaAlert | Service instance exceeded its resource allocation |
ResourceAllocateAlert | Failed to allocate a resource to an instance |
SystemAlert | Critical system-level event |
CoreAlert | AosCore component error (SM, CM, IAM) |
InstanceAlert | Service-specific error or event |
Interface Aggregation
The smclient module implements multiple interfaces, acting as the single outbound communication channel for several SM
subsystems:
| Interface | Purpose | Methods |
|---|---|---|
alerts::SenderItf | Forward alerts to CM | SendAlert() |
monitoring::SenderItf | Forward monitoring data to CM | SendMonitoringData() |
logging::SenderItf | Forward log data to CM | SendLog() |
launcher::SenderItf | Forward instance status changes to CM | SendNodeInstancesStatuses(), SendUpdateInstancesStatuses() |
imagemanager::BlobInfoProviderItf | Resolve blob digests to download URLs via CM | GetBlobsInfo() |
cloudconnection::CloudConnectionItf | Expose cloud connection state to other modules | SubscribeListener(), UnsubscribeListener(), IsConnected() |
This aggregation pattern means other SM modules (monitoring, alerts, launcher, image manager) don't need to know about
gRPC or the communication protocol — they simply call their respective sender interface, and smclient handles
serialization and transport.
GetBlobsInfo — Unary RPC
In addition to the bidirectional stream, smclient uses a separate unary RPC for blob URL resolution:
rpc GetBlobsInfos(BlobsInfosRequest) returns (BlobsInfos) {}
This is used by the image manager during image download to resolve content-addressable blob digests into downloadable URLs. Unlike the streaming messages, this is a simple request-response call.
Cloud Connection Status
CM forwards cloud connection state changes to SM via the ConnectionStatus message. SM maintains this state and exposes
it through the CloudConnectionItf interface:
- CONNECTED — The Unit has an active connection to AosCloud (via CM)
- DISCONNECTED — The Unit has lost cloud connectivity
Other SM modules (e.g., monitoring) can subscribe to connection state changes to adjust their behavior (for example, buffering data during disconnection).
Error Handling
The smclient handles failures at multiple levels:
| Failure | Behavior |
|---|---|
| Stream write failure | Returns error to caller; connection loop will detect and reconnect |
| Stream read failure | Exits message processing loop; triggers reconnection after timeout |
| Certificate renewal | Receives OnCertChanged callback; cancels current context to force reconnection with new credentials |
| CM unavailable | Retries connection after mCMReconnectTimeout interval |
| Individual message processing error | Logs the error but continues processing subsequent messages |
The reconnection mechanism ensures SM automatically recovers from transient network issues, CM restarts, and certificate rotations without manual intervention.
Related Pages
- Service Manager Overview — SM architecture and subcomponent summary
- Image Manager — uses
GetBlobsInfovia smclient for blob URL resolution - Launcher — sends instance status updates through smclient
- Architecture Overview — system-wide component relationships
- Communication Manager — the CM component that hosts the smcontroller server
- SM Controller — CM's server-side counterpart to this client
- Monitoring Pipeline — monitoring data flow from collection to cloud
comment in service-manager/index.md
Open Questions
- None
Assumptions
- The
mCMServerURLin production points to CM's smcontroller gRPC port on the Main Node - The reconnect timeout is the only backoff mechanism (no exponential backoff observed in code)
- ProxyService (also defined in v5 proto) is handled separately by the Message Proxy path, not by smclient directly — smclient only uses SMService
Human Review Checklist
- Technical accuracy verified against source code
- Terminology compliance (no deprecated terms)
- Cross-references resolve to correct targets
- Message tables accurately reflect v5 proto definition
- Interface aggregation list matches SMClientItf inheritance
- Connection lifecycle accurately reflects ConnectionLoop() implementation
- Content appropriate for OEM audience level --- -->