Node Lifecycle
Introduction
Every Node in an AosEdge Unit follows a well-defined lifecycle from initial power-on through provisioning, normal operation, and potential removal. The lifecycle is governed by a state machine with four states, managed by the IAM component on each Node and coordinated through the Main Node's IAM server.
Understanding the Node lifecycle is essential for:
- Configuring multi-Node Units where Nodes join and leave dynamically
- Diagnosing Node connectivity and state issues
- Integrating with the cloud-side pause/resume and deprovisioning operations
- Understanding which AosCore components are active in each state
Node Lifecycle State Machine
loading...Node States
The Node state is defined by the NodeStateType enum in the AosCore common types. Each Node maintains its current state
in a provisioning status file that persists across restarts.
| State | Description | Active Components |
|---|---|---|
| Unprovisioned | Initial state before the Node has been provisioned. The Node has no certificates and cannot participate in service deployment. | IAM only |
| Provisioned | Fully operational state. The Node has valid certificates, is registered with the Main Node, and can run services. | IAM, SM, CM (Main Node only) |
| Paused | Suspended state triggered by the cloud. The Node remains registered but does not accept new service deployments or updates. Existing services are stopped. | IAM only |
| Error | The Node encountered an unrecoverable failure. An error message accompanies this state describing the failure. Requires operator intervention or deprovisioning. | IAM only |
State Transitions
Unprovisioned → Provisioned
Trigger: FinishProvisioning request completes successfully.
During provisioning, the Node's IAM:
- Receives a
StartProvisioningrequest (forwarded from the Main Node if this is a Secondary Node) - Sets the owner for each certificate type
- Creates cryptographic keys (via PKCS#11 if hardware security is configured)
- Applies certificates issued by the provisioning authority
- Receives a
FinishProvisioningrequest - Encrypts the disk (if configured)
- Restarts IAM with the new certificates
- Transitions state to
provisionedand reports the updatedNodeInfo
Once provisioned, the Node starts its full component set (SM registers with CM, services can be deployed).
Provisioned → Paused
Trigger: PauseNode request received from the cloud via the Main Node's IAM.
The pause operation is initiated by the cloud and forwarded through the Main Node's IAMNodesService:
- The cloud sends a pause command to the Main Node's CM
- CM calls
PauseNodeon the IAM'sIAMNodesServicegRPC endpoint - The Main Node's IAM forwards the request through the
RegisterNodebidirectional stream to the target Node's IAM client - The target Node's IAM client verifies the Node is currently in
provisionedstate - The IAM client transitions the local state to
paused - The updated
NodeInfois sent back through the stream to the Main Node - The Main Node's Node Manager updates its cache and notifies listeners
- CM reports the updated Node status to the cloud
Precondition: The Node must be in provisioned state. If the Node is in any other state, the pause request is
rejected with an error.
Paused → Provisioned
Trigger: ResumeNode request received from the cloud via the Main Node's IAM.
The resume flow mirrors the pause flow:
- The cloud sends a resume command to the Main Node's CM
- CM calls
ResumeNodeon the IAM'sIAMNodesServicegRPC endpoint - The Main Node's IAM forwards the request through the
RegisterNodestream - The target Node's IAM client verifies the Node is currently in
pausedstate - The IAM client transitions the local state to
provisioned - The updated
NodeInfois sent back through the stream - The Node resumes normal operation (SM re-registers with CM, services can be deployed)
Precondition: The Node must be in paused state. If the Node is in any other state, the resume request is rejected
with an error.
Provisioned → Error
Trigger: An unrecoverable failure occurs during Node operation.
The error state is entered when the Node encounters a critical failure that prevents normal operation. The error is
recorded with a descriptive message in the NodeInfo.error field. Common causes include:
- Certificate expiration or corruption that cannot be auto-renewed
- Storage failures that prevent state persistence
- Hardware security module (HSM/TPM) failures
The error state is reported to the Main Node through the RegisterNode stream and subsequently to the cloud.
Provisioned → Unprovisioned
Trigger: Deprovision request completes successfully.
Deprovisioning removes the Node's certificates and identity, returning it to the initial state:
- The cloud sends a deprovision command to the Main Node's CM
- CM forwards the request to the Main Node's IAM
- The Main Node's IAM forwards the
Deprovisionrequest through theRegisterNodestream to the target Node - The target Node's IAM client processes the deprovisioning locally
- The Node transitions to
unprovisionedstate - The updated
NodeInfois reported back to the Main Node
Note: Currently only Secondary Nodes can be deprovisioned via this flow. The Main Node's deprovisioning requires direct access to the provisioning tool.
When a Node transitions to unprovisioned, the Node Manager removes its persistent storage entry (since unprovisioned
Nodes are not persisted).
Error → Unprovisioned
Trigger: Deprovision request on a Node in error state.
A Node in error state can be deprovisioned to reset it to the initial state, allowing re-provisioning. This follows the same deprovisioning flow as above.
Node Registration
Nodes register with the Unit through the IAM RegisterNode bidirectional streaming RPC. This stream is the primary
communication channel between Secondary Nodes and the Main Node's IAM.
Registration Flow
- The Secondary Node's IAM client establishes a
RegisterNodebidirectional stream to the Main Node'sIAMPublicNodesService - Upon connection, the IAM client sends the local
NodeInfoas the first message on the stream - The Main Node's Node Controller receives the
NodeInfo, extracts thenode_id, and links the stream handler to that Node ID - The Main Node's Node Manager updates its cache with the received Node information
- CM is notified of the Node change and reports the updated Unit status to the cloud
Stream Messages
The RegisterNode stream carries bidirectional messages:
Outgoing (Secondary → Main):
| Message | Purpose |
|---|---|
NodeInfo | Reports current Node identity, state, and capabilities |
StartProvisioningResponse | Result of a provisioning start operation |
FinishProvisioningResponse | Result of a provisioning finish operation |
DeprovisionResponse | Result of a deprovisioning operation |
PauseNodeResponse | Result of a pause operation |
ResumeNodeResponse | Result of a resume operation |
CreateKeyResponse | Result of a key creation operation |
ApplyCertResponse | Result of a certificate application |
CertTypes | Available certificate types on this Node |
Incoming (Main → Secondary):
| Message | Purpose |
|---|---|
StartProvisioningRequest | Begin provisioning on this Node |
FinishProvisioningRequest | Complete provisioning on this Node |
DeprovisionRequest | Remove provisioning from this Node |
PauseNodeRequest | Suspend this Node |
ResumeNodeRequest | Resume this Node |
CreateKeyRequest | Create a cryptographic key |
ApplyCertRequest | Apply a certificate |
GetCertTypesRequest | Query available certificate types |
Connection State
In addition to the provisioning state (unprovisioned/provisioned/paused/error), each Node has an orthogonal connection
state (is_connected) that indicates whether the Node is currently reachable by the Main Node.
Key characteristics of the connection state:
- Not persisted — connection state is runtime-only and resets to
falseon restart - Independent of provisioning state — a Node can be provisioned but disconnected, or unprovisioned but connected
- Set automatically — the IAM client sets
connected = truewhen theRegisterNodestream is established, andconnected = falsewhen the stream is lost - Reported to cloud — the Main Node includes connection state in the Unit status reported to the cloud
Disconnection Handling
When a Secondary Node loses its connection to the Main Node (network failure, Node shutdown, process crash):
- The gRPC stream terminates
- The Main Node's Node Controller detects the stream closure
- The Node Manager sets
is_connected = falsefor that Node - CM is notified and reports the disconnected status to the cloud
- The Secondary Node's IAM client detects the disconnection and sets its local
connectedstate tofalse - The IAM client attempts to re-establish the stream (gRPC reconnection with backoff)
When the connection is re-established:
- The IAM client opens a new
RegisterNodestream - The current
NodeInfois sent as the first message - The Main Node updates its cache and notifies listeners
- Normal bidirectional communication resumes
Impact on Operations
| Node State | Connected | Behavior |
|---|---|---|
| Provisioned | Yes | Fully operational — services run, updates accepted |
| Provisioned | No | Services continue running locally; no new deployments or state changes from cloud |
| Paused | Yes | Registered but suspended — no services, awaiting resume |
| Paused | No | Suspended and unreachable — resume command queued until reconnection |
| Unprovisioned | Yes | Awaiting provisioning — only IAM stream active |
| Unprovisioned | No | Isolated — will attempt reconnection |
| Error | Yes | Error reported to cloud — awaiting intervention |
| Error | No | Error state persisted locally — will report when reconnected |
Main Node vs Secondary Node
The lifecycle model applies to all Nodes, but the Main Node has special characteristics:
| Aspect | Main Node | Secondary Node |
|---|---|---|
| Registration | Does not register via stream (is the registration target) | Registers via RegisterNode stream |
| Connection state | Always considered connected (local) | Tracked via stream liveness |
| Provisioning | Provisioned directly by aos-prov tool | Provisioned via forwarded requests through Main Node |
| Deprovisioning | Requires direct tool access | Can be deprovisioned via cloud command |
| Identification | Has MainNode attribute in attrs[] | Does not have MainNode attribute |
| Components | Runs CM, SM, IAM (and optionally MP) | Runs SM, IAM |
Related Pages
- Unit and Node Model — hierarchical relationship between Units and Nodes
- Node Identity — how Nodes establish and report their identity
- Provisioning Workflow — detailed provisioning sequence
- Architecture Overview — component distribution across Nodes
- Dynamic Node Registration — how Nodes dynamically join and leave a Unit
- Multi-Node Architecture — overall multi-Node topology and communication