Service Manager
Introduction
The Service Manager (SM) is the AosCore component responsible for the complete lifecycle of service instances on a Node. It handles everything from acquiring and storing OCI service images, through launching containers or native processes, to monitoring resource usage and collecting logs.
Every Node in a Unit runs its own SM instance. The SM connects to the Communication Manager (CM) on the Main Node via gRPC, receives deployment commands, and reports back instance status, monitoring data, alerts, and logs.
Process: aos_servicemanager
Role Within the Architecture
The SM operates as a managed executor — it does not make autonomous decisions about which services to run. Instead,
it receives instructions from CM (via the smcontroller interface) and carries them out:
- Receives instance update commands from CM specifying which services to start or stop
- Acquires required service images (downloading and verifying OCI layers)
- Launches service instances using the appropriate runtime (container or native)
- Enforces resource limits (CPU, RAM, storage, network bandwidth)
- Monitors Node and instance resource consumption
- Reports status, metrics, alerts, and logs back to CM
This separation of concerns means the SM has no direct cloud connectivity — all communication with AosCloud flows through CM.
Subcomponents
The SM is composed of the following internal modules:
| Module | Responsibility |
|---|---|
imagemanager | Manages OCI service images — coordinates image download, layer unpacking, storage, and integrity verification |
launcher | Starts and stops service instances — supports multiple runtime types (container, boot, rootfs) through a pluggable runtime interface |
resourcemanager | Provides resource information — tracks available host resources (devices, shared resources) for service allocation |
networkmanager | Configures per-service networking — CNI plugin execution for network setup/teardown, traffic monitoring via iptables |
monitoring | Collects Node-level resource metrics — CPU, RAM, disk usage, and network traffic for the Node and individual service instances |
alerts | Generates alert events from systemd journal — watches journal entries for configured patterns and threshold violations |
logprovider | Collects and forwards logs on demand — system logs, per-instance logs, and crash logs |
smclient | gRPC client connecting to CM — registers the SM, receives commands, sends status and telemetry upstream |
iamclient | Client interface to IAM — obtains TLS certificates, Node identity, and service permissions |
database | Local SQLite storage with schema migration — persists service image metadata, instance state, network configuration, and traffic data |
Key Interfaces
CM Interface (outbound gRPC client)
The SM connects to CM's smcontroller as a gRPC client using the servicemanager/v5 protocol. On connection, the SM:
- Registers itself by sending
SMInfo— reporting its Node ID, available runtimes (with capabilities like DMIPS, RAM, OS/architecture info), and host resources - Receives commands from CM:
UpdateInstances— start or stop specific service instancesUpdateNetworks— configure network parameters for servicesGetNodeConfigStatus/CheckNodeConfig/SetNodeConfig— Node configuration managementSystemLogRequest/InstanceLogRequest/InstanceCrashLogRequest— log retrievalGetAverageMonitoring— request averaged monitoring dataConnectionStatus— cloud connection state notifications
- Reports back to CM:
UpdateInstancesStatus/NodeInstancesStatus— instance lifecycle state changesInstantMonitoring/AverageMonitoring— resource usage metricsAlert— threshold violations and system eventsLogData— requested log content
The connection is secured with mutual TLS using certificates obtained from the local IAM instance.
IAM Interface (outbound gRPC client)
The SM's iamclient module connects to the local IAM instance to obtain:
- TLS certificates for securing the gRPC connection to CM
- Node identity information (Node ID, Node type) used during SM registration
- Service permissions — register/unregister instance-level permissions
The IAM client automatically reconnects when certificates are renewed.
Proxy Interface (v5)
In the v5 protocol, SM also interacts with a ProxyService for content download operations. The proxy handles:
DownloadRequest/CancelDownload— image content retrievalClockSync— time synchronization between Nodes
This interface is used in multi-Node deployments where the Secondary Node's SM downloads content through the Message Proxy rather than directly from the network.
Runtime Types
The SM launcher supports multiple runtime types through a pluggable architecture:
| Runtime | Description | Use Case |
|---|---|---|
| Container | OCI-compatible container execution using systemd units (aos-service@.service) | Standard service deployment — isolated execution with cgroups, namespaces, and overlay filesystem |
| Boot | EFI boot-based deployment with partition management | Firmware-level components that require boot-time execution |
| Rootfs | Root filesystem deployment | System-level services that run directly on the host filesystem |
Each runtime reports its capabilities (DMIPS, RAM limits, OS info, architecture) to CM during registration, enabling the cloud to make informed scheduling decisions.
Data Persistence
The SM database (servicemanager.db) persists:
- Service image metadata — item IDs, versions, manifest digests, states, timestamps
- Instance information — full instance configuration including runtime assignment, network parameters, monitoring parameters, environment variables
- Network state — network configurations, instance-to-network mappings, VLAN assignments
- Traffic data — cumulative traffic counters per iptables chain for bandwidth monitoring
- Journal cursor — position in systemd journal for alert processing continuity across restarts
The database supports schema migration for upgrades between SM versions.
Monitoring and Alerting
The SM provides two complementary observability mechanisms:
Resource Monitoring
The monitoring module collects metrics at configurable intervals:
- Node-level: CPU usage, RAM usage, disk partition usage, network traffic (upload/download)
- Instance-level: Per-service CPU, RAM, disk, and network usage (tracked via cgroups and iptables)
Metrics are reported to CM as both instant snapshots and time-averaged values.
Journal-Based Alerts
The alerts module watches the systemd journal and generates alerts when:
- System quota alerts — Node resource usage exceeds configured thresholds
- Instance quota alerts — individual service exceeds its resource allocation
- Resource allocation alerts — resource assignment failures
- System alerts — critical system-level journal messages
- Core alerts — AosCore component errors (SM, CM, IAM)
- Instance alerts — service-specific error messages
Alert rules support configurable thresholds with duration-based evaluation (percentage-based for CPU/RAM, point-based for traffic).
Related Pages
- Architecture Overview — system-wide architecture and component relationships
- Image Manager — detailed image acquisition and storage documentation
- Launcher — service launching and runtime details
- Resource Manager — resource tracking and allocation
- Network Manager — per-service networking and CNI
- Client Communication — SM-to-CM gRPC interface details
- Key Concepts — terminology and foundational concepts
- Unit and Node Model — how Units and Nodes relate