Skip to main content
Version: v1.1

Service Instance States

Introduction

Every service instance in AosCore has a well-defined state that reflects its current lifecycle phase. States are tracked at two levels: the Service Manager (SM) tracks runtime execution states on each Node, while the Communication Manager (CM) tracks scheduling and desired-state states at the Unit level. Understanding both state models is essential for diagnosing instance behavior, interpreting status reports, and building integrations that react to state changes.

This page documents all valid states and transitions at both levels, including the triggers that cause each transition.

State Machine Diagram

loading...

SM-Level Instance States

The Service Manager on each Node tracks the runtime execution state of every instance it manages. These states are defined in the InstanceStateType enum and reported to CM via the InstanceStatus message in the SM gRPC protocol.

States

StateEnum ValueDescription
ActivatingeActivatingThe instance is being prepared for execution. The runtime is setting up image layers, configuring networking, applying resource limits, and starting the process.
ActiveeActiveThe instance is running normally. The runtime process is alive and the instance is being monitored for resource usage.
InactiveeInactiveThe instance has been stopped. This is either the initial state before first launch or the result of a graceful stop.
FailedeFailedThe instance encountered an error. This can result from a launch failure, runtime crash, resource limit violation, or offline TTL expiration. An error message accompanies this state.

Transitions

FromToTriggerDescription
(initial)InactiveInstance data createdWhen SM receives an UpdateInstances request with new instances to start, it first creates instance data in the Inactive state.
InactiveActivatingStartInstanceSM begins the launch sequence — the runtime prepares the environment and starts the process.
ActivatingActiveRuntime reports successThe runtime successfully started the process and reports a healthy state.
ActivatingFailedStart errorThe runtime failed to start the instance — image layers could not be applied, networking setup failed, or the process exited immediately.
ActiveInactiveStopInstance (graceful)SM receives an instruction to stop the instance (e.g., removed from desired state or being replaced by a new version). The runtime stops the process gracefully.
ActiveFailedRuntime crash / offline TTL expiredThe runtime process exited unexpectedly, or the Node lost cloud connectivity and the instance's offline TTL expired.
FailedInactiveNew UpdateInstances (restart)When a new desired-state reconciliation cycle includes this instance, SM resets it to Inactive before attempting to start it again.

Offline TTL Behavior

Each service instance can have an offline TTL (Time-To-Live) configured in its OCI image configuration. When the Node loses cloud connectivity:

  1. SM records the time the connection was lost
  2. A timer monitors all Active and Activating instances
  3. When an instance's offline TTL expires (time since disconnect exceeds the configured duration), SM stops the instance and transitions it to Failed
  4. If connectivity is restored before the TTL expires, the timer resets and instances continue running

This mechanism ensures that instances do not run indefinitely without cloud oversight when connectivity is lost.

Error Reporting

When an instance enters the Failed state, the InstanceStatus message includes an ErrorInfo field with:

  • An error code indicating the failure category
  • A human-readable error message describing what went wrong

SM reports instance statuses to CM via the UpdateInstancesStatus and NodeInstancesStatus messages, allowing CM to aggregate status across all Nodes and forward it to the cloud.

CM-Level Instance States

The Communication Manager tracks instances from a scheduling and desired-state perspective. CM-level states are defined in a separate InstanceStateType enum within the CM launcher module and represent whether an instance is currently scheduled, temporarily unscheduled, or retained for potential future use.

States

StateEnum ValueDescription
ActiveeActiveThe instance is scheduled and assigned to a specific Node. CM has sent (or will send) an UpdateInstances request to the target Node's SM to run this instance.
DisabledeDisabledThe instance exists in CM's records but is intentionally not running. This occurs when the instance's subject is disabled or no eligible Node is available to host it.
CachedeCachedThe instance is no longer part of the current desired state but its data (UID, GID, storage references) is retained locally. If the instance is re-added to the desired state later, CM can reuse this data instead of creating a new instance from scratch.

Transitions

FromToTriggerDescription
(initial)ActiveRunInstanceRequest processedWhen CM processes a new desired state that includes this instance, it creates the instance record, assigns it to a Node, and marks it Active.
ActiveCachedRemoved from desired stateThe instance is no longer in the desired state. CM stops it on the Node and retains the instance data locally in Cached state.
ActiveDisabledSubject disabled / no eligible NodeThe instance's subject (user or group) has been disabled, or no Node with matching labels, resources, and runtime capabilities is available. CM caches the instance but marks it Disabled.
CachedActiveRe-added to desired stateA new desired state includes this instance again. CM reactivates it using the retained data (same UID/GID) and schedules it to a Node.
DisabledActiveSubject re-enabled / Node becomes availableThe blocking condition is resolved — the subject is re-enabled or a suitable Node connects. CM schedules the instance to the now-available Node.

Relationship Between CM and SM States

The CM-level and SM-level states operate independently but are related:

CM StateExpected SM State on Target Node
ActiveActivating → Active (normal operation)
ActiveFailed (if runtime encountered an error)
DisabledNo SM state — instance is not sent to any Node
CachedNo SM state — instance is not sent to any Node

When CM marks an instance as Active and sends it to a Node, the SM on that Node manages the runtime lifecycle (Inactive → Activating → Active). If the SM reports a Failed state, CM still considers the instance Active from a scheduling perspective — it remains assigned to that Node. The failure is reported to the cloud, which may trigger a new desired-state update to address the issue.

Instance Identification

Each instance is uniquely identified by a combination of fields in the InstanceIdent structure:

FieldDescription
mItemIDThe Deployable Item (service) identifier
mSubjectIDThe subject (user or group) the instance belongs to
mInstanceThe instance index (for services with multiple instances)
mTypeThe update item type (service, component, layer, etc.)

This composite key allows the system to track multiple instances of the same service running for different subjects or as scaled replicas.

  • Service Lifecycle — overview of the complete service lifecycle from desired state to execution
  • SM Launcher — how SM launches and manages service instances using different runtimes
  • Runtime Types — the available runtime types that execute instances
  • Desired State Model — how the desired-state reconciliation drives instance creation and removal
  • Deployment Flows — the broader update flow that triggers instance state changes