Skip to main content
Version: v1.1

Client Communication (smclient)

Introduction

The smclient module is the Service Manager's communication gateway to the Communication Manager (CM). It implements a gRPC client that connects to CM's smcontroller service using the servicemanager/v5 protocol. Through this bidirectional streaming connection, SM registers itself, receives operational commands, and reports telemetry back to CM.

This page documents the connection lifecycle, the registration flow, the command/response message protocol, and the interfaces that smclient aggregates from other SM submodules.

Architecture

The smclient operates as a gRPC client connecting to CM's smcontroller gRPC server. The connection uses a bidirectional streaming RPC (RegisterSM), allowing both sides to send messages asynchronously over a single persistent stream.

┌─────────────────────────────────────────────────────────────┐
│ Service Manager (SM) │
│ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ smclient │ │
│ │ │ │
│ │ ┌───────────┐ ┌──────────┐ ┌────────────────┐ │ │
│ │ │ Connection│ │ Message │ │ Interface │ │ │
│ │ │ Loop │ │ Handler │ │ Aggregator │ │ │
│ │ └─────┬─────┘ └────┬─────┘ └───────┬────────┘ │ │
│ │ │ │ │ │ │
│ └────────┼──────────────┼────────────────┼────────────┘ │
│ │ │ │ │
│ │ ┌────┴────┐ ┌─────┴──────┐ │
│ │ │launcher │ │ monitoring │ │
│ │ │logprov. │ │ alerts │ │
│ │ │netmgr │ │ imagemanager│ │
│ │ └─────────┘ └────────────┘ │
└───────────┼────────────────────────────────────────────────┘
│ gRPC (mTLS)

┌───────────────────────┐
│ CM (smcontroller) │
│ SMService server │
└───────────────────────┘

Connection Lifecycle

Connection Loop

The smclient runs a dedicated connection thread that maintains a persistent connection to CM. The loop follows a connect-register-process-reconnect pattern:

  1. Connect — Create a gRPC channel to CM's server URL using TLS credentials
  2. Register — Open the RegisterSM bidirectional stream
  3. Send SMInfo — Immediately send the Node's identity, runtime capabilities, and available resources
  4. Send Instance Status — Report current state of all running instances
  5. Process Messages — Enter a blocking read loop, handling incoming commands from CM
  6. Reconnect — If the stream closes (network failure, CM restart, certificate rotation), wait for a configurable timeout and retry from step 1
┌─────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ Connect │────▶│ Register │────▶│ Send │────▶│ Process │
│ │ │ SM │ │ SMInfo + │ │ Incoming │
│ │ │ │ │ Status │ │ Messages │
└─────────┘ └──────────┘ └──────────┘ └────┬─────┘
▲ │
│ ┌──────────────┐ │
└──────────────│ Wait timeout │◀───────────────────┘
│ (reconnect) │ (stream closed)
└──────────────┘

Configuration

The connection is configured through the smclient::Config structure:

ParameterDescription
mCMServerURLgRPC endpoint of CM's smcontroller service (e.g., localhost:8093)
mCertStorageCertificate storage identifier used to obtain mTLS credentials from IAM
mCMReconnectTimeoutDuration to wait before reconnecting after a connection loss

TLS Security

The connection to CM is secured with mutual TLS (mTLS):

  • Secure mode (default): SM obtains client certificates from the local IAM instance via the configured mCertStorage. When certificates are renewed, the smclient receives an OnCertChanged callback and gracefully reconnects with the new credentials.
  • Insecure mode (development only): Uses server-side TLS without client certificate authentication.

Registration Flow

When the bidirectional stream is established, SM immediately sends two messages to identify itself and synchronize state:

1. SMInfo Message

The first message on the stream is SMInfo, which tells CM about this Node's capabilities:

message SMInfo {
string node_id = 1; // Unique Node identifier
repeated RuntimeInfo runtimes = 2; // Available execution runtimes
repeated ResourceInfo resources = 3; // Host resources (devices, shared resources)
}

RuntimeInfo describes each available runtime (container, boot, rootfs):

FieldDescription
runtime_idUnique runtime identifier
typeRuntime type string (e.g., "container", "boot", "rootfs")
max_dmipsMaximum compute capacity in DMIPS
allowed_dmipsDMIPS available for service allocation
total_ramTotal RAM available to the runtime
allowed_ramRAM available for service allocation
max_instancesMaximum concurrent instances supported
os_infoOperating system information
arch_infoCPU architecture information

ResourceInfo describes shared host resources:

FieldDescription
nameResource name (e.g., a device path or shared resource identifier)
shared_countNumber of instances that can share this resource simultaneously

2. NodeInstancesStatus Message

Immediately after SMInfo, SM sends the current status of all service instances on this Node. This allows CM to reconcile its view of the world with the actual state after a reconnection.

Incoming Messages (CM → SM)

After registration, SM enters a message processing loop. CM sends commands through the stream as SMIncomingMessages:

MessagePurposeSM Handler
GetNodeConfigStatusRequest current Node configuration stateReturns NodeConfigStatus with version and any errors
CheckNodeConfigValidate a proposed Node configurationParses JSON config, runs validation, returns status
SetNodeConfigApply a new Node configurationParses and applies config, returns result status
UpdateInstancesStart and/or stop service instancesDelegates to launcher — the primary deployment command
UpdateNetworksConfigure network parameters for servicesDelegates to network manager
SystemLogRequestRetrieve system-level logsDelegates to log provider with time range filter
InstanceLogRequestRetrieve logs for specific service instancesDelegates to log provider with instance filter
InstanceCrashLogRequestRetrieve crash logs for service instancesDelegates to log provider for crash-specific logs
GetAverageMonitoringRequest time-averaged monitoring dataRetrieves from monitoring module and sends response
ConnectionStatusNotify SM of cloud connection state changesUpdates internal state, notifies registered listeners

UpdateInstances — The Core Deployment Command

The most important incoming message is UpdateInstances, which drives the service lifecycle:

message UpdateInstances {
repeated InstanceInfo start_instances = 1;
repeated InstanceIdent stop_instances = 2;
}

Each InstanceInfo in start_instances contains the full specification for a service instance:

FieldDescription
instanceService/subject/instance identity triple
versionService version string
manifest_digestOCI manifest digest for image verification
owner_idOwning entity identifier
runtime_idTarget runtime for execution
uid / gidUnix user/group IDs for the instance process
priorityScheduling priority
storage_path / state_pathPersistent storage locations
env_varsEnvironment variables to inject
network_parametersNetwork configuration (subnet, IP, DNS, firewall rules)
monitoring_parametersAlert rules (CPU/RAM thresholds, traffic limits)

Outgoing Messages (SM → CM)

SM sends status updates, telemetry, and responses back to CM through the same bidirectional stream as SMOutgoingMessages:

MessagePurposeTrigger
SMInfoNode registration and capabilitiesSent once on connection establishment
NodeConfigStatusNode configuration state responseIn response to config commands
UpdateInstancesStatusResult of an UpdateInstances commandAfter processing start/stop operations
NodeInstancesStatusCurrent state of all instancesOn connection (initial sync) and on state changes
InstantMonitoringReal-time resource usage snapshotPeriodically from monitoring module
AverageMonitoringTime-averaged resource metricsIn response to GetAverageMonitoring
AlertThreshold violation or system eventWhen alert conditions are detected
LogDataRequested log content (chunked)In response to log requests

Alert Types

The Alert message carries a timestamp and one of several alert variants:

Alert TypeDescription
SystemQuotaAlertNode-level resource threshold exceeded (CPU, RAM, disk)
InstanceQuotaAlertService instance exceeded its resource allocation
ResourceAllocateAlertFailed to allocate a resource to an instance
SystemAlertCritical system-level event
CoreAlertAosCore component error (SM, CM, IAM)
InstanceAlertService-specific error or event

Interface Aggregation

The smclient module implements multiple interfaces, acting as the single outbound communication channel for several SM subsystems:

InterfacePurposeMethods
alerts::SenderItfForward alerts to CMSendAlert()
monitoring::SenderItfForward monitoring data to CMSendMonitoringData()
logging::SenderItfForward log data to CMSendLog()
launcher::SenderItfForward instance status changes to CMSendNodeInstancesStatuses(), SendUpdateInstancesStatuses()
imagemanager::BlobInfoProviderItfResolve blob digests to download URLs via CMGetBlobsInfo()
cloudconnection::CloudConnectionItfExpose cloud connection state to other modulesSubscribeListener(), UnsubscribeListener(), IsConnected()

This aggregation pattern means other SM modules (monitoring, alerts, launcher, image manager) don't need to know about gRPC or the communication protocol — they simply call their respective sender interface, and smclient handles serialization and transport.

GetBlobsInfo — Unary RPC

In addition to the bidirectional stream, smclient uses a separate unary RPC for blob URL resolution:

rpc GetBlobsInfos(BlobsInfosRequest) returns (BlobsInfos) {}

This is used by the image manager during image download to resolve content-addressable blob digests into downloadable URLs. Unlike the streaming messages, this is a simple request-response call.

Cloud Connection Status

CM forwards cloud connection state changes to SM via the ConnectionStatus message. SM maintains this state and exposes it through the CloudConnectionItf interface:

  • CONNECTED — The Unit has an active connection to AosCloud (via CM)
  • DISCONNECTED — The Unit has lost cloud connectivity

Other SM modules (e.g., monitoring) can subscribe to connection state changes to adjust their behavior (for example, buffering data during disconnection).

Error Handling

The smclient handles failures at multiple levels:

FailureBehavior
Stream write failureReturns error to caller; connection loop will detect and reconnect
Stream read failureExits message processing loop; triggers reconnection after timeout
Certificate renewalReceives OnCertChanged callback; cancels current context to force reconnection with new credentials
CM unavailableRetries connection after mCMReconnectTimeout interval
Individual message processing errorLogs the error but continues processing subsequent messages

The reconnection mechanism ensures SM automatically recovers from transient network issues, CM restarts, and certificate rotations without manual intervention.

comment in service-manager/index.md

Open Questions

  • None

Assumptions

  • The mCMServerURL in production points to CM's smcontroller gRPC port on the Main Node
  • The reconnect timeout is the only backoff mechanism (no exponential backoff observed in code)
  • ProxyService (also defined in v5 proto) is handled separately by the Message Proxy path, not by smclient directly — smclient only uses SMService

Human Review Checklist

  • Technical accuracy verified against source code
  • Terminology compliance (no deprecated terms)
  • Cross-references resolve to correct targets
  • Message tables accurately reflect v5 proto definition
  • Interface aggregation list matches SMClientItf inheritance
  • Connection lifecycle accurately reflects ConnectionLoop() implementation
  • Content appropriate for OEM audience level --- -->