SOTA/FOTA
Terms
- Bundle – a package of software/firmware files for different unit ECUs. Bundles can contain update data for several components.
- Platform ID – unique identifier of a subset of units with the same upgradable HW components.
Generic architecture
AosEdge update system is designed to perform updates of complex, multicomponent systems and consists of the following parts:
- CM – communication manager is responsible to download, decrypt and check the signature of the update image. It also provides the corresponding part of the image to the related UM and controls update process at a high level;
- UM – update manager is responsible to update specific components. AosEdge provides its own reference implementation of UM. It has a modular architecture and can be easily extended with update plugins. Also, custom UM can be implemented if there is no possibility to use Aos UM.
Preparation flow
AosCloud should have info about all supported boards for each OEM before creating any updates. Information about supported update components is available in the Unit configuration file. See Board configuration file for information on its format.
Image preparation flow
Software packages are developed by service providers. Service providers always used their own CI/CD pipelines to build and test software packages. After passing all checks, software should be published in the AosEdge system.
- OEM builds and tests a new image.
- OEM sends the new image info to AosCloud using the bundle format.
- Edit by the cloud team.
- AosCloud creates the new metadata object for the update procedure.
- AosCloud signs the metadata object with its own appropriate key.
- AosCloud checks if the file is present on the CDNs.
- AosCloud checks the signature of each file on the CDNs.
- AosCloud marks the metadata object as ready-for-update.
- AosCloud informs the OEM by email about the status of the update (ready-for-update or verification-fail).
The update procedure is very similar to campaign management.
Deploy component images to the target
During update, AosCloud prepares a list of components that should be updated based on the dependencies and installed on the target component versions. To have updated information about target components and their versions, the targes should send the list of components and their versions after connection to AosCloud.
{
"imageVersion": 2,
"components": [{
"id": "rootfs",
"version": "1.0",
"description": "this is rootfs update",
"annotations": {...},
"urls": [
"url1",
"url2"
],
"sha256": "sha256checksum",
"sha512": "sha512checksum",
"size": 1234
},
{
"id": "boot",
"version": "2.0",
"description": "this is boot update",
"urls": [
"url1",
"url2"
],
"sha256": "sha256checksum",
"sha512": "sha512checksum",
"size": 1234
}
]
"decryptionInfo": { ... }
}
- version – incremental image version, used to distinguish between system upgrade requests and to prevent security attacks
- components – list of components that should be updated
- id – component ID
- version – component version, used for information purposes
- description – used for information purposes
- urls – an array of URL, where the update image is located
- sha256, sha512, size – checksums and file size to check the integrity
- decryptionInfo – data to decrypt and check the signature
Update procedure
- AosCloud assigns new image version and stores the new metadata in the internal DB, on the metadata upload
- When the unit connects to the cloud, it sends platform id and current image version
- AosCloud compares the unit image version with the latest cloud image version:
- If the unit image version is the same as the cloud image version, the procedure is terminated
- If the unit image version is lower than the cloud image version, AosCloud should switch the unit to the next cloud version
- If the unit image version is higher than the cloud image version: AosCloud should switch the unit to the previous version
- To exclude frequently update, AosCloud should send a new update only after defined timeout (once per day for example)
- AosCloud should provide a possibility to run the update procedure manually by user interaction. The update timeout is not applied in this case.
- The last uploaded cloud image can be deleted from the cloud. Its version should not be assigned to the new uploaded image. For example, the last uploaded image version is 5. Then it is deleted from the cloud. OEM should not be able to delete version 4. If OEM uploads a new image version, AosCloud should assign version 6 to this image.
AosCore architecture
The update dispatcher component is responsible to perform the whole system update. The Aos Service Manager acts as a proxy to download and provide the update image to the update dispatcher.
AosEdge uses a client-server architecture to communicate CM with multiple UM’s. CM acts as a server whereas UM’s act as clients. Aos core uses gRPC framework for the communication protocol.
CM implements the following commands:
- PrepareUpdate – notifies UM that system update is available, and UM should verify and validate the update;
- StartUpdate – instructs UM to perform the update and to check that system is functional (the system should have possibility to roll back the current update after this command);
- ApplyUpdate – applies the current update (no rollback is possible after this command);
- RevertUpdate – reverts the current update, this command can be sent at any time before ApplyUpdate.
UM implements only one command to send its status to CM:
- UpdateStatus – this command is sent as a response to any CM command after the requested operation is completed.
The following sequence diagram illustrates the update process.
When the update request is received from the cloud, CM downloads it, decrypts, verifies signs and unpacks the image. Then it sends PrepareUpdate message to each referenced in the update image UM’s. UM’s get their update artifacts from the update image, unpack, validates it and prepare the update. They go to the state Prepared. When all to be updated UM’s are prepared, CM sends StartUpdate message to the related UM’s. UM’s performs update and validate system state after update. If the system is functional UM’s go to the UpdatedState. If CM receives notification with UpdatedState from all required UM’s, it sends ApplyUpdate message to the UM’s. UM’s finish the update and go to the state Idle.
If update of some UM fails, it goes to the Failed state and sends status notification to CM. CM sends RevertUpdate message to all related UM’s. UM’s revert the update and go to the Idle state.
It is not expected to have any error on ApplyUpdate and RevertUpdate commands. These actions should be performed in reliable and atomic way. Errors during applying and reverting update are usually indicated some serious HW problems.
UM consists of following packages:
- umclient – communicates with CM by gRPC;
- updatehandler – handles update procedure;
- updatemodules – set of update modules responsible to update each individual system component;
- database – used to store persistant data;
- config – handles UM static configuration.
In order to work with updatehandler, an update module should implement following interface:
- Init() (err error) - initializes the module;
- Prepare(imagePath string) (err error) - prepares module update: performs image validation, required unpacking and other before update actions;
- Update() (err error) - performs module update;
- Apply() (err error) - applies current update;
- Revert() (err error) - reverts current update
- Reboot() (err error) - performs module reboot, this function is called after Update, Apply and Revert. It is up to the module to perform component reboot or not. Also, the module should check the component state after reboot and in case the component is not functional, the module should return error;
- Close Close() (err error) - closes update module, frees allocated resources etc.
Some components require to perform itself or system reboot to apply changes done by update, revert etc. For this reason, UM calls Reboot function of each module after update, revert and apply operation. The module should define itself if the reboot is required now and perform the reboot. If reboot is successful and the component is updated correctly, it should return OK status. Otherwise it should return error status.
The following sequence diagram illustrates the module update.
UM calls Update, Apply, Revert function of dedicated modules in parallel. But Reboot function is called according to the priority specified in metadata. It is done for purpose of minimizing the number of reboots if reboot of one component leads to reboot the other. For example, the system has rootfs and some bios components, both require the system reboot. In case of both components are updated, only one system reboot will be enough. The bios module detects that the reboot is performed and just return OK status.
Image deployment
On update request, CM downloads the update image to the internal storage. After decrypting and validating the image, CM extracts the archive to the shared with UM’s storage. If there is no possibility to provide the shared storage for UM, CM can be configured to serve http file server. In this scenario, CM provides URL for UM. UM downloads the corresponding image to the internal storage, unpacks it and processes.
The following figure illustrates shared storage deployment:
The following figure illustrates the HTTP file server deployment.