Add/remove nodes dynamically
In a Software-Defined Vehicle (SDV) environment, hardware configurations evolve over the vehicle’s lifetime. New compute modules or domain controllers may be added, and existing nodes may be replaced or removed. To keep the platform flexible and maintain long-term operability, it is valuable to support adding and removing nodes without disrupting deployed applications.
AosEdge enables this capability when the device is configured with dynamic topologies and flexible placement rules. This page introduces the concepts and the high-level steps required to enable dynamic node operations.
Overview
Dynamic node management allows a multi-node device to:
- Detect and integrate newly added nodes
- Gracefully handle node removal or failure
- Rebalance active services to available nodes
- Maintain service continuity with minimal interruption
- Adapt topology based on real-time hardware availability
AosEdge’s Dynamic Rebalance, Unit Config, and Target Systems work together to support these behaviors.
Prepare your device
Before enabling dynamic node operations, ensure:
1. Unified base image
All nodes should run the same AosCore version and compatible system configurations.
2. Node discovery
Nodes must be able to discover and communicate with each other through the chosen networking setup.
3. Stable node identity
Each node must have a persistent and unique identity so the orchestrator can track it reliably.
Note: Detailed bootstrap/discovery instructions will be added once finalized.
Adjust Unit Config
Unit Config must be written to support dynamic topology changes.
Recommended patterns
- Avoid hard-coding specific node names in placement rules
- Prefer resource-based constraints (CPU, memory, GPU, accelerators)
- Define optional selectors for nodes that may not always be present
- Ensure Dynamic Rebalance is enabled for services that can move between nodes
Future additions:
- Example Unit Config snippets for dynamic placement
Edit Target Systems
Target Systems describe how the multi-node device is structured. To support dynamic add/remove:
- Avoid defining the system as a fixed list of nodes
- Describe roles or capability groups instead of enumerating all node IDs
- Mark nodes or groups as replaceable, expandable, or optional
- Use capability-based matching (e.g.,
type: compute,type: accelerator)
Future additions:
- Example Target Systems schema for dynamic topologies
Confirm behavior
To verify that dynamic node operations work correctly:
1. Add a node
- Boot the new node
- Confirm the node joins the cluster
- Check that its capabilities appear in the orchestrator
- Verify rebalancing occurs if enabled
2. Remove a node
- Power off or detach the node
- Confirm services migrate to surviving nodes
- Ensure the system stabilizes without errors
3. Validate via logs
- Observe orchestrator logs for topology updates and scheduling events
4. Validate service continuity
- Ensure critical services remain operational without interruption
Additional details, examples, and troubleshooting will be added as the dynamic topology feature matures.