Skip to main content
Version: v1.1

Dynamic Node Registration

Introduction

In a multi-Node AosEdge Unit, Secondary Nodes must register themselves with the Main Node's Identity and Access Manager (IAM) before they can participate in the system. This registration is dynamic — Nodes can join and leave the Unit at runtime without requiring a restart of the Main Node or any static configuration of the Node roster.

Dynamic node registration uses a bidirectional gRPC streaming connection (RegisterNode) between the Secondary Node's IAM client and the Main Node's IAM server. This persistent stream serves as both the registration mechanism and the ongoing command channel for provisioning, certificate management, and lifecycle operations.

This page describes the registration protocol, the message exchange, how the system handles disconnection and reconnection, and how provisioning state affects which server endpoint a Node connects to.

Registration Protocol Overview

The registration protocol operates as follows:

  1. The Secondary Node's IAM client opens a RegisterNode bidirectional stream to the Main Node
  2. The client immediately sends its NodeInfo as the first outgoing message
  3. The Main Node's Node Controller receives the NodeInfo, validates the Node's state, and links the stream to the Node ID
  4. The Main Node's Node Manager persists the Node information and marks it as connected
  5. The stream remains open — the Main Node can send commands (provisioning, certificate operations, pause/resume) and the Secondary Node can send updated NodeInfo when its state changes
┌──────────────────┐ ┌──────────────────┐
│ Secondary Node │ │ Main Node │
│ (IAM Client) │ │ (IAM Server) │
└────────┬─────────┘ └────────┬─────────┘
│ │
│──── Open RegisterNode stream ──────────────▶│
│ │
│──── IAMOutgoingMessages { NodeInfo } ──────▶│
│ │ HandleNodeInfo()
│ │ LinkNodeIDToHandler()
│ │ SetNodeInfo() → persist
│ │
│◀─── IAMIncomingMessages { commands } ───────│
│ │
│──── IAMOutgoingMessages { responses } ─────▶│
│ │
│ ... stream remains open ... │
│ │

Connection Establishment

Endpoint Selection

The IAM client selects which Main Node endpoint to connect to based on the Secondary Node's provisioning state:

Node StateTarget EndpointConnection Security
unprovisionedmainIAMPublicServerURLTLS (server-only authentication) or insecure
provisioned / pausedmainIAMProtectedServerURLmTLS (mutual authentication with client certificate)

This separation ensures that:

  • Unprovisioned Nodes can connect without client certificates (they have none yet) — they use the public endpoint
  • Provisioned Nodes use mutual TLS for stronger authentication — they connect to the protected endpoint using their IAM-issued certificates

Credential Strategy

The IAM client supports a credential fallback mechanism for robust connectivity:

  1. Insecure mode (unprovisioned): Attempts insecure connection first, with TLS as a fallback. This handles the case where the Main Node may already be serving a secure listener.
  2. Secure mode (provisioned): Uses mTLS with the configured certificate storage. If the connection fails, the client cycles through available credentials before retrying.

When a credential fails, the client advances to the next available credential and rebuilds the gRPC stub before the next connection attempt.

Connection Loop

The IAM client runs a dedicated connection thread that implements a persistent connection loop:

ConnectionLoop:
while not stopped:
1. Create gRPC channel with current credentials
2. Wait for channel to become ready (with timeout)
3. Create RegisterNode bidirectional stream
4. Notify: OnConnected()
└── Send NodeInfo as first message
5. Read incoming messages until stream closes
6. Notify: OnDisconnected()
7. If connection failed, try next credential
8. Wait reconnect interval (3 seconds)
9. Retry from step 1

The connection loop runs continuously until explicitly stopped. Any stream closure — whether from network failure, server shutdown, or explicit cancellation — triggers a reconnection attempt after a brief delay.

NodeInfo Message

The first message sent on the stream is always a NodeInfo containing the Secondary Node's complete identity:

FieldDescription
node_idUnique identifier for this Node within the Unit
node_typeNode classification (e.g., "secondary")
titleHuman-readable Node name
max_dmipsMaximum processing capacity
total_ramTotal RAM available
os_infoOperating system type and version
cpus[]CPU information (model, cores, threads, architecture)
partitions[]Storage partitions with names, types, and sizes
attrs[]Custom key-value attributes
stateCurrent provisioning state (unprovisioned, provisioned, paused, error)

The NodeInfo is also re-sent whenever the Node's state changes (e.g., after provisioning completes, or when the Node is paused/resumed).

Server-Side Registration Handling

Node Controller

The Main Node's Node Controller (NodeController) manages all active RegisterNode streams. It maintains a registry mapping Node IDs to their stream handlers.

When a new stream arrives:

  1. A NodeStreamHandler is created and stored in the controller's handler map
  2. The handler enters a read loop, waiting for messages from the Secondary Node
  3. The first message must be a NodeInfo — this triggers HandleNodeInfo()

HandleNodeInfo Processing

When the Node Controller receives a NodeInfo message:

  1. State validation — The controller checks whether the Node's provisioning state is appropriate for the connection type:

    • Public endpoint: Only unprovisioned Nodes are accepted
    • Protected endpoint: Only provisioned or paused Nodes are accepted
    • If the state doesn't match, the handler is unlinked (but the stream is not forcibly closed)
  2. Node Manager update — The Node information is passed to the Node Manager, which:

    • Stores the Node info in its cache and persistent database
    • Marks the Node as connected (is_connected = true)
    • Notifies all registered listeners of the new or updated Node
  3. Handler linking — The Node ID is linked to the stream handler. If a previous handler was linked to the same Node ID (stale connection), it is unlinked first.

Command Forwarding

Once a Node is registered, the Main Node can forward operations to it through the stream. The Node Controller provides methods that:

  1. Look up the stream handler by Node ID
  2. Send a request message (IAMIncomingMessages) on the stream
  3. Wait for the corresponding response (IAMOutgoingMessages) with a timeout

Supported operations forwarded to Secondary Nodes:

OperationRequestResponse
Get certificate typesGetCertTypesRequestCertTypes
Start provisioningStartProvisioningRequestStartProvisioningResponse
Finish provisioningFinishProvisioningRequestFinishProvisioningResponse
DeprovisionDeprovisionRequestDeprovisionResponse
Pause nodePauseNodeRequestPauseNodeResponse
Resume nodeResumeNodeRequestResumeNodeResponse
Create keyCreateKeyRequestCreateKeyResponse
Apply certificateApplyCertRequestApplyCertResponse

Each forwarded operation uses a request-response pattern with a configurable timeout. If the Secondary Node does not respond within the timeout, the operation returns a timeout error.

Disconnection and Reconnection

Disconnection Detection

Disconnection is detected when:

  • The gRPC stream read operation returns false (stream closed by peer or network failure)
  • The server context is cancelled (server shutdown)
  • A write operation fails (broken pipe)

Server-Side Cleanup

When a Secondary Node disconnects:

  1. The NodeStreamHandler destructor calls SetNodeConnected(nodeID, false) on the Node Manager
  2. The handler is removed from the Node Controller's registry
  3. The Node's information remains in the Node Manager's persistent storage — only the connected flag changes
  4. All pending response promises are cleared (any in-flight operations receive cancellation)

Client-Side Reconnection

The IAM client handles disconnection automatically:

  1. OnDisconnected() is called — the CurrentNode handler is notified that the Node is no longer connected
  2. The connection loop waits for the reconnect interval (3 seconds by default)
  3. A new connection attempt begins — credential cycling if the previous attempt failed
  4. On successful reconnection, OnConnected() fires and the client sends a fresh NodeInfo

Certificate-Triggered Reconnection

The IAM client also subscribes to certificate change notifications. When the Node's IAM certificate is renewed:

  1. The GRPCClientCertListener receives the certificate change event
  2. A reconnection is scheduled (with a 10-second retry timeout)
  3. The client rebuilds its credentials and reconnects with the new certificate

This ensures that certificate rotation does not permanently break the registration stream.

Provisioning State Transitions

The registration stream plays a central role during provisioning of Secondary Nodes:

Initial Registration (Unprovisioned)

Secondary Node Main Node (Public Endpoint)
│ │
│── RegisterNode stream (insecure/TLS) ───────▶│
│── NodeInfo { state: "unprovisioned" } ──────▶│
│ │ Node registered as unprovisioned
│◀── StartProvisioningRequest ─────────────────│
│── StartProvisioningResponse ────────────────▶│
│◀── CreateKeyRequest ─────────────────────────│
│── CreateKeyResponse { CSR } ────────────────▶│
│◀── ApplyCertRequest { signed cert } ─────────│
│── ApplyCertResponse ────────────────────────▶│
│◀── FinishProvisioningRequest ────────────────│
│── FinishProvisioningResponse ───────────────▶│
│ │
│── NodeInfo { state: "provisioned" } ────────▶│
│ │ State updated

Re-Registration (Provisioned)

After provisioning completes, the Node has certificates and reconnects to the protected endpoint:

  1. The IAM client detects the state change to provisioned
  2. The certificate change triggers a reconnection
  3. The client reconnects to mainIAMProtectedServerURL using mTLS
  4. A fresh NodeInfo with state: "provisioned" is sent
  5. The Node Controller validates the state against the protected endpoint (accepted)

Node Departure

A Node "leaves" the Unit in one of two ways:

Graceful Departure (Deprovisioning)

  1. A DeprovisionRequest is sent to the Node through the registration stream
  2. The Node clears its provisioning state and certificates
  3. The Node sends updated NodeInfo with state: "unprovisioned"
  4. The Node Controller detects the state mismatch (unprovisioned on protected endpoint) and unlinks the handler
  5. The Node Manager removes the unprovisioned Node from persistent storage

Ungraceful Departure (Network Loss)

  1. The stream read fails — the handler detects disconnection
  2. The Node is marked as disconnected but remains in the registry
  3. If the Node never reconnects, its entry persists in the Node Manager's database
  4. The Node can be explicitly removed through administrative action

Configuration

The IAM client configuration controls registration behavior:

ParameterDescriptionDefault
mainIAMPublicServerURLMain Node's public IAM endpoint (for unprovisioned Nodes)
mainIAMProtectedServerURLMain Node's protected IAM endpoint (for provisioned Nodes)
certStorageCertificate storage name for mTLS credentials

Connection timing constants (defined in the implementation):

ConstantValuePurpose
Reconnect interval3 secondsDelay between reconnection attempts
Connect timeout3 secondsMaximum time to wait for channel handshake
Service timeout10 secondsTimeout for individual RPC operations
Cert reconnect retry10 secondsTimeout for certificate-triggered reconnection