Skip to main content
Version: v1.1

SM Controller

Introduction

The SM Controller (smcontroller) is the Communication Manager's interface to all Service Manager (SM) instances in the Unit. It runs a gRPC server implementing the SMService defined in servicemanager/v5/servicemanager.proto. Each SM instance (one per Node) connects to this server as a client, establishing a bidirectional streaming channel for command distribution and telemetry collection.

This page documents the SM Controller's architecture, how it manages per-Node connections, the commands it distributes, and the data it collects from SM instances.

Architecture

The SM Controller is composed of two layers:

ComponentResponsibility
SMControllergRPC server lifecycle, TLS credential management, Node lookup and routing, interface implementation for other CM modules
SMHandler (one per Node)Per-connection message processing — sends commands to a specific SM and receives its responses, status, monitoring, alerts, and logs
┌──────────────────────────────────────────────────────────────────┐
│ Communication Manager (CM) │
│ │
│ ┌────────────────────────────────────────────────────────────┐ │
│ │ SMController (gRPC Server) │ │
│ │ │ │
│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │
│ │ │ SMHandler │ │ SMHandler │ │ SMHandler │ │ │
│ │ │ (Node A) │ │ (Node B) │ │ (Node C) │ │ │
│ │ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │ │
│ │ │ │ │ │ │
│ └─────────┼──────────────────┼──────────────────┼────────────┘ │
│ │ │ │ │
└────────────┼──────────────────┼──────────────────┼────────────────┘
│ gRPC (mTLS) │ gRPC (mTLS) │ gRPC (mTLS)
▼ ▼ ▼
┌────────────┐ ┌────────────┐ ┌────────────┐
│ SM (Node A)│ │ SM (Node B)│ │ SM (Node C)│
└────────────┘ └────────────┘ └────────────┘

Interfaces Implemented

The SM Controller implements multiple interfaces that other CM modules use to interact with Nodes:

InterfaceConsumerPurpose
NodeConfigHandlerItfUnit Config moduleCheck, get, and update Node configuration
InstanceRunnerItfLauncher moduleStart and stop service instances on a specific Node
MonitoringProviderItfMonitoring moduleRetrieve averaged monitoring data from a Node
NodeNetworkItfNetwork ManagerPush network configuration updates to a Node
LogProviderItfLog forwardingRequest system, instance, or crash logs from Nodes

This design means other CM modules never interact with gRPC directly — they call typed interface methods on the SM Controller, which routes the request to the appropriate Node's SMHandler.

Dependencies

The SM Controller requires the following external interfaces:

InterfacePurpose
CloudConnectionItfSubscribes to cloud connection events; forwards connection state to all connected SMs
CertProviderItfObtains TLS server certificates for the gRPC endpoint
CertLoaderItfLoads certificate files from storage
x509::ProviderItfProvides cryptographic operations for TLS
ItemInfoProviderItfResolves blob digests to download URLs (for GetBlobsInfos RPC)
AlertsReceiverItfReceives alerts forwarded from SM instances
SenderItf (log)Receives log data forwarded from SM instances
launcher::SenderItfReceives environment variable status updates from SM instances
MonitoringReceiverItfReceives instant monitoring data from SM instances
InstanceStatusReceiverItfReceives instance lifecycle status changes from SM instances
SMInfoReceiverItfReceives SM registration info (Node capabilities, runtimes, resources)

gRPC Server

Service Definition

The SM Controller implements two RPCs from the SMService:

service SMService {
rpc RegisterSM(stream SMOutgoingMessages) returns (stream SMIncomingMessages) {}
rpc GetBlobsInfos(BlobsInfosRequest) returns (BlobsInfos) {}
}
  • RegisterSM — A bidirectional streaming RPC. Each SM opens one stream that persists for the lifetime of the connection. CM sends commands through the return stream and receives status/telemetry through the request stream.
  • GetBlobsInfos — A unary RPC that resolves content-addressable blob digests into downloadable URLs. Used by SM's image manager during image download.

Server Configuration

The gRPC server is configured through the smcontroller::Config structure:

ParameterDescription
mCMServerURLListen address for the gRPC server (e.g., :8093 or 0.0.0.0:8093)
mCertStorageCertificate storage identifier for obtaining server TLS credentials
mCACertCA certificate path for verifying SM client certificates

If the address starts with : (port only), the server binds to 0.0.0.0 on that port.

TLS Security

The server uses mutual TLS (mTLS) by default:

  • Server credentials are loaded from the certificate storage identified by mCertStorage
  • Client verification uses the CA certificate at mCACert to validate connecting SM instances
  • Insecure mode is available for development (uses InsecureServerCredentials)

When certificates are renewed, the SM Controller receives an OnCertChanged callback from the certificate provider. It then schedules a server restart (with a 10-second retry timeout) to load the new credentials. All connected SMs are disconnected during the restart and automatically reconnect with their own renewed certificates.

Connection Management

SM Registration Flow

When an SM instance connects:

  1. The RegisterSM RPC handler creates a new SMHandler for the connection
  2. The handler starts two threads: a read thread (receives messages from SM) and a message processing thread (dispatches received messages to appropriate receivers)
  3. The handler immediately sends the current cloud connection status to the SM
  4. The SM sends its SMInfo message (Node ID, runtimes, resources) as the first message on the stream
  5. The handler extracts the Node ID and notifies the SMInfoReceiver that the Node is connected
  6. The handler is added to the active handlers list, making the Node addressable by other CM modules

Per-Node Routing

When a CM module calls an interface method (e.g., UpdateInstances(nodeID, ...)), the SM Controller:

  1. Acquires the handlers mutex
  2. Searches the active handlers list for one matching the requested nodeID
  3. If found, delegates the call to that handler
  4. If not found, returns an eNotFound error

This routing ensures commands are delivered to the correct Node even in multi-Node deployments.

Disconnection Handling

When an SM connection drops (network failure, SM restart, or certificate rotation):

  1. The read thread detects the stream closure and exits
  2. The OnNodeDisconnected callback fires, notifying the SMInfoReceiver
  3. The handler is removed from the active handlers list
  4. The RegisterSM RPC returns, releasing the gRPC thread

The SM is expected to reconnect automatically (handled by the SM's smclient reconnection loop).

Server Shutdown

When the SM Controller stops:

  1. All active SMHandler instances are stopped (their gRPC contexts are cancelled)
  2. The gRPC server is shut down
  3. The controller waits for all RegisterSM RPC threads to complete (all handlers removed from the list)

Commands (CM → SM)

The SM Controller sends commands to SM instances through the SMIncomingMessages stream:

CommandPurposeResponse
UpdateInstancesStart and/or stop service instancesUpdateInstancesStatus (async)
UpdateNetworksPush network configuration (subnet, IP, VLAN)None (fire-and-forget)
GetNodeConfigStatusQuery current Node configuration stateNodeConfigStatus (sync)
CheckNodeConfigValidate a proposed configurationNodeConfigStatus (sync)
SetNodeConfigApply a new Node configurationNodeConfigStatus (sync)
SystemLogRequestRequest system-level logsLogData (async, chunked)
InstanceLogRequestRequest instance-specific logsLogData (async, chunked)
InstanceCrashLogRequestRequest crash logs for an instanceLogData (async, chunked)
GetAverageMonitoringRequest time-averaged monitoring dataAverageMonitoring (sync)
ConnectionStatusNotify SM of cloud connection stateNone (notification)

Synchronous vs Asynchronous Commands

The SM Controller uses two communication patterns:

Synchronous (request-response): For commands that require an immediate answer — GetNodeConfigStatus, CheckNodeConfig, SetNodeConfig, and GetAverageMonitoring. The handler sends the request and blocks (up to 5 seconds) waiting for the matching response message on the stream.

Asynchronous (fire-and-forget): For commands where the response arrives later or not at all — UpdateInstances, UpdateNetworks, log requests, and ConnectionStatus. The handler sends the message and returns immediately. Responses (like UpdateInstancesStatus) arrive asynchronously and are dispatched to the appropriate receiver interface.

Telemetry Collection (SM → CM)

The SM Controller receives the following data from connected SM instances through the SMOutgoingMessages stream:

MessageContentReceiver
SMInfoNode ID, available runtimes (with DMIPS, RAM, OS/arch info), host resourcesSMInfoReceiverItf
UpdateInstancesStatusPer-instance status after a deployment command (state, errors, env var status)InstanceStatusReceiverItf
NodeInstancesStatusComplete snapshot of all instance states on the NodeInstanceStatusReceiverItf
InstantMonitoringReal-time Node and per-instance resource metrics (CPU, RAM, disk, network)MonitoringReceiverItf
AverageMonitoringTime-averaged metrics (response to GetAverageMonitoring)Returned synchronously to caller
AlertThreshold violations and system events (6 alert types)AlertsReceiverItf
LogDataRequested log content, delivered in parts with correlation IDsSenderItf (log)

Message Processing

Each SMHandler processes incoming messages in a dedicated thread:

  1. The read thread continuously reads from the gRPC stream
  2. Messages that are responses to synchronous requests are matched and delivered to the waiting caller
  3. All other messages are queued for the processing thread
  4. The processing thread dispatches each message to the appropriate receiver interface

This two-thread design allows the handler to receive messages concurrently with sending commands, preventing deadlocks on the bidirectional stream.

Cloud Connection Forwarding

The SM Controller subscribes to cloud connection events from CM's CloudConnectionItf. When the cloud connection state changes:

  1. OnConnect() or OnDisconnect() is called on the SM Controller
  2. The controller iterates over all active SMHandler instances
  3. Each handler sends a ConnectionStatus message to its SM with the new state (CONNECTED or DISCONNECTED)

This allows SM instances to know whether the Unit currently has cloud connectivity, which can influence local behavior (e.g., buffering monitoring data during disconnection).

Multi-Node Operation

In a multi-Node Unit, the SM Controller manages connections from multiple SM instances simultaneously:

  • Main Node SM connects directly to the local CM process
  • Secondary Node SMs connect through the network (their traffic may traverse the Message Proxy depending on network topology)

Each SM is identified by its Node ID (extracted from the SMInfo registration message). CM modules address specific Nodes by passing the nodeID parameter to SM Controller interface methods. The controller routes each request to the correct handler.

The SM Controller does not limit the number of concurrent SM connections — it dynamically creates and removes handlers as SMs connect and disconnect.

  • Communication Manager — parent overview of CM architecture and all subcomponents
  • Update Manager — uses SM Controller (via InstanceRunnerItf) to launch instances after downloading updates
  • Network Manager — uses SM Controller (via NodeNetworkItf) to push network configuration to Nodes
  • Unit Configuration — uses SM Controller (via NodeConfigHandlerItf) to manage Node configs

comments in communication-manager/index.md, update-manager.md, and service-manager/client-communication.md

Open Questions

  • None

Assumptions

  • The 5-second response timeout (cResponseTime) for synchronous messages is a hardcoded constant, not configurable
  • The 10-second reconnect retry timeout (cReconnectRetryTimeout) for server restart after certificate change is also hardcoded
  • There is no limit on concurrent SM connections — the mSMHandlers vector grows dynamically
  • The CorrectAddress function confirms that a port-only address (e.g., ":8093") is expanded to "0.0.0.0:8093"
  • Secondary Node SMs connect over the network; the SM Controller does not distinguish between local and remote connections

Human Review Checklist

  • Technical accuracy verified against source code
  • Terminology compliance (no deprecated terms)
  • Cross-references resolve to correct targets
  • Interface list matches SMControllerItf inheritance chain
  • gRPC service description matches v5 proto definition
  • Command and telemetry tables accurately reflect message types
  • Synchronous vs asynchronous pattern correctly described
  • Connection lifecycle (register, disconnect, restart) accurately reflects implementation
  • Content appropriate for OEM audience level --- -->