Skip to main content
Version: v1.1

Node Lifecycle

Introduction

Every Node in an AosEdge Unit follows a well-defined lifecycle from initial power-on through provisioning, normal operation, and potential removal. The lifecycle is governed by a state machine with four states, managed by the IAM component on each Node and coordinated through the Main Node's IAM server.

Understanding the Node lifecycle is essential for:

  • Configuring multi-Node Units where Nodes join and leave dynamically
  • Diagnosing Node connectivity and state issues
  • Integrating with the cloud-side pause/resume and deprovisioning operations
  • Understanding which AosCore components are active in each state

Node Lifecycle State Machine

loading...

Node States

The Node state is defined by the NodeStateType enum in the AosCore common types. Each Node maintains its current state in a provisioning status file that persists across restarts.

StateDescriptionActive Components
UnprovisionedInitial state before the Node has been provisioned. The Node has no certificates and cannot participate in service deployment.IAM only
ProvisionedFully operational state. The Node has valid certificates, is registered with the Main Node, and can run services.IAM, SM, CM (Main Node only)
PausedSuspended state triggered by the cloud. The Node remains registered but does not accept new service deployments or updates. Existing services are stopped.IAM only
ErrorThe Node encountered an unrecoverable failure. An error message accompanies this state describing the failure. Requires operator intervention or deprovisioning.IAM only

State Transitions

Unprovisioned → Provisioned

Trigger: FinishProvisioning request completes successfully.

During provisioning, the Node's IAM:

  1. Receives a StartProvisioning request (forwarded from the Main Node if this is a Secondary Node)
  2. Sets the owner for each certificate type
  3. Creates cryptographic keys (via PKCS#11 if hardware security is configured)
  4. Applies certificates issued by the provisioning authority
  5. Receives a FinishProvisioning request
  6. Encrypts the disk (if configured)
  7. Restarts IAM with the new certificates
  8. Transitions state to provisioned and reports the updated NodeInfo

Once provisioned, the Node starts its full component set (SM registers with CM, services can be deployed).

Provisioned → Paused

Trigger: PauseNode request received from the cloud via the Main Node's IAM.

The pause operation is initiated by the cloud and forwarded through the Main Node's IAMNodesService:

  1. The cloud sends a pause command to the Main Node's CM
  2. CM calls PauseNode on the IAM's IAMNodesService gRPC endpoint
  3. The Main Node's IAM forwards the request through the RegisterNode bidirectional stream to the target Node's IAM client
  4. The target Node's IAM client verifies the Node is currently in provisioned state
  5. The IAM client transitions the local state to paused
  6. The updated NodeInfo is sent back through the stream to the Main Node
  7. The Main Node's Node Manager updates its cache and notifies listeners
  8. CM reports the updated Node status to the cloud

Precondition: The Node must be in provisioned state. If the Node is in any other state, the pause request is rejected with an error.

Paused → Provisioned

Trigger: ResumeNode request received from the cloud via the Main Node's IAM.

The resume flow mirrors the pause flow:

  1. The cloud sends a resume command to the Main Node's CM
  2. CM calls ResumeNode on the IAM's IAMNodesService gRPC endpoint
  3. The Main Node's IAM forwards the request through the RegisterNode stream
  4. The target Node's IAM client verifies the Node is currently in paused state
  5. The IAM client transitions the local state to provisioned
  6. The updated NodeInfo is sent back through the stream
  7. The Node resumes normal operation (SM re-registers with CM, services can be deployed)

Precondition: The Node must be in paused state. If the Node is in any other state, the resume request is rejected with an error.

Provisioned → Error

Trigger: An unrecoverable failure occurs during Node operation.

The error state is entered when the Node encounters a critical failure that prevents normal operation. The error is recorded with a descriptive message in the NodeInfo.error field. Common causes include:

  • Certificate expiration or corruption that cannot be auto-renewed
  • Storage failures that prevent state persistence
  • Hardware security module (HSM/TPM) failures

The error state is reported to the Main Node through the RegisterNode stream and subsequently to the cloud.

Provisioned → Unprovisioned

Trigger: Deprovision request completes successfully.

Deprovisioning removes the Node's certificates and identity, returning it to the initial state:

  1. The cloud sends a deprovision command to the Main Node's CM
  2. CM forwards the request to the Main Node's IAM
  3. The Main Node's IAM forwards the Deprovision request through the RegisterNode stream to the target Node
  4. The target Node's IAM client processes the deprovisioning locally
  5. The Node transitions to unprovisioned state
  6. The updated NodeInfo is reported back to the Main Node

Note: Currently only Secondary Nodes can be deprovisioned via this flow. The Main Node's deprovisioning requires direct access to the provisioning tool.

When a Node transitions to unprovisioned, the Node Manager removes its persistent storage entry (since unprovisioned Nodes are not persisted).

Error → Unprovisioned

Trigger: Deprovision request on a Node in error state.

A Node in error state can be deprovisioned to reset it to the initial state, allowing re-provisioning. This follows the same deprovisioning flow as above.

Node Registration

Nodes register with the Unit through the IAM RegisterNode bidirectional streaming RPC. This stream is the primary communication channel between Secondary Nodes and the Main Node's IAM.

Registration Flow

  1. The Secondary Node's IAM client establishes a RegisterNode bidirectional stream to the Main Node's IAMPublicNodesService
  2. Upon connection, the IAM client sends the local NodeInfo as the first message on the stream
  3. The Main Node's Node Controller receives the NodeInfo, extracts the node_id, and links the stream handler to that Node ID
  4. The Main Node's Node Manager updates its cache with the received Node information
  5. CM is notified of the Node change and reports the updated Unit status to the cloud

Stream Messages

The RegisterNode stream carries bidirectional messages:

Outgoing (Secondary → Main):

MessagePurpose
NodeInfoReports current Node identity, state, and capabilities
StartProvisioningResponseResult of a provisioning start operation
FinishProvisioningResponseResult of a provisioning finish operation
DeprovisionResponseResult of a deprovisioning operation
PauseNodeResponseResult of a pause operation
ResumeNodeResponseResult of a resume operation
CreateKeyResponseResult of a key creation operation
ApplyCertResponseResult of a certificate application
CertTypesAvailable certificate types on this Node

Incoming (Main → Secondary):

MessagePurpose
StartProvisioningRequestBegin provisioning on this Node
FinishProvisioningRequestComplete provisioning on this Node
DeprovisionRequestRemove provisioning from this Node
PauseNodeRequestSuspend this Node
ResumeNodeRequestResume this Node
CreateKeyRequestCreate a cryptographic key
ApplyCertRequestApply a certificate
GetCertTypesRequestQuery available certificate types

Connection State

In addition to the provisioning state (unprovisioned/provisioned/paused/error), each Node has an orthogonal connection state (is_connected) that indicates whether the Node is currently reachable by the Main Node.

Key characteristics of the connection state:

  • Not persisted — connection state is runtime-only and resets to false on restart
  • Independent of provisioning state — a Node can be provisioned but disconnected, or unprovisioned but connected
  • Set automatically — the IAM client sets connected = true when the RegisterNode stream is established, and connected = false when the stream is lost
  • Reported to cloud — the Main Node includes connection state in the Unit status reported to the cloud

Disconnection Handling

When a Secondary Node loses its connection to the Main Node (network failure, Node shutdown, process crash):

  1. The gRPC stream terminates
  2. The Main Node's Node Controller detects the stream closure
  3. The Node Manager sets is_connected = false for that Node
  4. CM is notified and reports the disconnected status to the cloud
  5. The Secondary Node's IAM client detects the disconnection and sets its local connected state to false
  6. The IAM client attempts to re-establish the stream (gRPC reconnection with backoff)

When the connection is re-established:

  1. The IAM client opens a new RegisterNode stream
  2. The current NodeInfo is sent as the first message
  3. The Main Node updates its cache and notifies listeners
  4. Normal bidirectional communication resumes

Impact on Operations

Node StateConnectedBehavior
ProvisionedYesFully operational — services run, updates accepted
ProvisionedNoServices continue running locally; no new deployments or state changes from cloud
PausedYesRegistered but suspended — no services, awaiting resume
PausedNoSuspended and unreachable — resume command queued until reconnection
UnprovisionedYesAwaiting provisioning — only IAM stream active
UnprovisionedNoIsolated — will attempt reconnection
ErrorYesError reported to cloud — awaiting intervention
ErrorNoError state persisted locally — will report when reconnected

Main Node vs Secondary Node

The lifecycle model applies to all Nodes, but the Main Node has special characteristics:

AspectMain NodeSecondary Node
RegistrationDoes not register via stream (is the registration target)Registers via RegisterNode stream
Connection stateAlways considered connected (local)Tracked via stream liveness
ProvisioningProvisioned directly by aos-prov toolProvisioned via forwarded requests through Main Node
DeprovisioningRequires direct tool accessCan be deprovisioned via cloud command
IdentificationHas MainNode attribute in attrs[]Does not have MainNode attribute
ComponentsRuns CM, SM, IAM (and optionally MP)Runs SM, IAM