Service Instance States
Introduction
Every service instance in AosCore has a well-defined state that reflects its current lifecycle phase. States are tracked at two levels: the Service Manager (SM) tracks runtime execution states on each Node, while the Communication Manager (CM) tracks scheduling and desired-state states at the Unit level. Understanding both state models is essential for diagnosing instance behavior, interpreting status reports, and building integrations that react to state changes.
This page documents all valid states and transitions at both levels, including the triggers that cause each transition.
State Machine Diagram
loading...SM-Level Instance States
The Service Manager on each Node tracks the runtime execution state of every instance it manages. These states are
defined in the InstanceStateType enum and reported to CM via the InstanceStatus message in the SM gRPC protocol.
States
| State | Enum Value | Description |
|---|---|---|
| Activating | eActivating | The instance is being prepared for execution. The runtime is setting up image layers, configuring networking, applying resource limits, and starting the process. |
| Active | eActive | The instance is running normally. The runtime process is alive and the instance is being monitored for resource usage. |
| Inactive | eInactive | The instance has been stopped. This is either the initial state before first launch or the result of a graceful stop. |
| Failed | eFailed | The instance encountered an error. This can result from a launch failure, runtime crash, resource limit violation, or offline TTL expiration. An error message accompanies this state. |
Transitions
| From | To | Trigger | Description |
|---|---|---|---|
| (initial) | Inactive | Instance data created | When SM receives an UpdateInstances request with new instances to start, it first creates instance data in the Inactive state. |
| Inactive | Activating | StartInstance | SM begins the launch sequence — the runtime prepares the environment and starts the process. |
| Activating | Active | Runtime reports success | The runtime successfully started the process and reports a healthy state. |
| Activating | Failed | Start error | The runtime failed to start the instance — image layers could not be applied, networking setup failed, or the process exited immediately. |
| Active | Inactive | StopInstance (graceful) | SM receives an instruction to stop the instance (e.g., removed from desired state or being replaced by a new version). The runtime stops the process gracefully. |
| Active | Failed | Runtime crash / offline TTL expired | The runtime process exited unexpectedly, or the Node lost cloud connectivity and the instance's offline TTL expired. |
| Failed | Inactive | New UpdateInstances (restart) | When a new desired-state reconciliation cycle includes this instance, SM resets it to Inactive before attempting to start it again. |
Offline TTL Behavior
Each service instance can have an offline TTL (Time-To-Live) configured in its OCI image configuration. When the Node loses cloud connectivity:
- SM records the time the connection was lost
- A timer monitors all Active and Activating instances
- When an instance's offline TTL expires (time since disconnect exceeds the configured duration), SM stops the instance and transitions it to Failed
- If connectivity is restored before the TTL expires, the timer resets and instances continue running
This mechanism ensures that instances do not run indefinitely without cloud oversight when connectivity is lost.
Error Reporting
When an instance enters the Failed state, the InstanceStatus message includes an ErrorInfo field with:
- An error code indicating the failure category
- A human-readable error message describing what went wrong
SM reports instance statuses to CM via the UpdateInstancesStatus and NodeInstancesStatus messages, allowing CM to
aggregate status across all Nodes and forward it to the cloud.
CM-Level Instance States
The Communication Manager tracks instances from a scheduling and desired-state perspective. CM-level states are defined
in a separate InstanceStateType enum within the CM launcher module and represent whether an instance is currently
scheduled, temporarily unscheduled, or retained for potential future use.
States
| State | Enum Value | Description |
|---|---|---|
| Active | eActive | The instance is scheduled and assigned to a specific Node. CM has sent (or will send) an UpdateInstances request to the target Node's SM to run this instance. |
| Disabled | eDisabled | The instance exists in CM's records but is intentionally not running. This occurs when the instance's subject is disabled or no eligible Node is available to host it. |
| Cached | eCached | The instance is no longer part of the current desired state but its data (UID, GID, storage references) is retained locally. If the instance is re-added to the desired state later, CM can reuse this data instead of creating a new instance from scratch. |
Transitions
| From | To | Trigger | Description |
|---|---|---|---|
| (initial) | Active | RunInstanceRequest processed | When CM processes a new desired state that includes this instance, it creates the instance record, assigns it to a Node, and marks it Active. |
| Active | Cached | Removed from desired state | The instance is no longer in the desired state. CM stops it on the Node and retains the instance data locally in Cached state. |
| Active | Disabled | Subject disabled / no eligible Node | The instance's subject (user or group) has been disabled, or no Node with matching labels, resources, and runtime capabilities is available. CM caches the instance but marks it Disabled. |
| Cached | Active | Re-added to desired state | A new desired state includes this instance again. CM reactivates it using the retained data (same UID/GID) and schedules it to a Node. |
| Disabled | Active | Subject re-enabled / Node becomes available | The blocking condition is resolved — the subject is re-enabled or a suitable Node connects. CM schedules the instance to the now-available Node. |
Relationship Between CM and SM States
The CM-level and SM-level states operate independently but are related:
| CM State | Expected SM State on Target Node |
|---|---|
| Active | Activating → Active (normal operation) |
| Active | Failed (if runtime encountered an error) |
| Disabled | No SM state — instance is not sent to any Node |
| Cached | No SM state — instance is not sent to any Node |
When CM marks an instance as Active and sends it to a Node, the SM on that Node manages the runtime lifecycle (Inactive → Activating → Active). If the SM reports a Failed state, CM still considers the instance Active from a scheduling perspective — it remains assigned to that Node. The failure is reported to the cloud, which may trigger a new desired-state update to address the issue.
Instance Identification
Each instance is uniquely identified by a combination of fields in the InstanceIdent structure:
| Field | Description |
|---|---|
mItemID | The Deployable Item (service) identifier |
mSubjectID | The subject (user or group) the instance belongs to |
mInstance | The instance index (for services with multiple instances) |
mType | The update item type (service, component, layer, etc.) |
This composite key allows the system to track multiple instances of the same service running for different subjects or as scaled replicas.
Related Pages
- Service Lifecycle — overview of the complete service lifecycle from desired state to execution
- SM Launcher — how SM launches and manages service instances using different runtimes
- Runtime Types — the available runtime types that execute instances
- Desired State Model — how the desired-state reconciliation drives instance creation and removal
- Scheduling and Placement — how CM assigns instances to Nodes
- Deployment Flows — the broader update flow that triggers instance state changes