Error Propagation
Introduction
This page documents how errors propagate through the AosCore system — from the point of failure within a component, through inter-component gRPC interfaces, and ultimately to AosCloud via the WebSocket JSON protocol. Understanding this flow is essential for OEMs who need to interpret error reports from the cloud dashboard, correlate them with on-device behavior, or build monitoring integrations.
AosCore uses a layered error propagation model. Each layer has its own representation of error information, with well-defined conversion functions at each boundary:
- Internal — the C++
Errorclass with typed enumerations and optional messages - Inter-component — the
ErrorInfoprotobuf message carried in gRPC streams - gRPC transport — gRPC status codes for RPC-level failures
- Cloud protocol — JSON
errorInfoobjects embedded inunitStatusand update notification messages
Internal Error Representation
All AosCore components use the Error class defined in aos_core_lib_cpp as their internal error type. This class
carries:
| Field | Type | Description |
|---|---|---|
mErr | Error::Enum | Typed error category (see table below) |
mErrno | int | System errno value (non-zero for runtime errors wrapping OS calls) |
mMessage | char[] | Human-readable error description (fixed-size buffer) |
mFileName | const char* | Source file where the error originated |
mLineNumber | int | Line number where the error originated |
Error Categories
The Error::Enum defines the following error categories:
| Enum Value | Integer Code | String Representation | Typical Cause |
|---|---|---|---|
eNone | 0 | "none" | No error — operation succeeded |
eFailed | 1 | "failed" | Generic failure without a more specific category |
eRuntime | 2 | "runtime error" | OS-level failure (wraps system errno) |
eNoMemory | 3 | "not enough memory" | Memory allocation failure |
eOutOfRange | 4 | "out of range" | Index or value exceeds valid bounds |
eNotFound | 5 | "not found" | Requested resource does not exist |
eInvalidArgument | 6 | "invalid argument" | Invalid parameter passed to a function |
eTimeout | 7 | "timeout" | Operation exceeded its time limit |
eAlreadyExist | 8 | "already exist" | Attempted to create a resource that already exists |
eWrongState | 9 | "wrong state" | Operation invalid in the current state |
eInvalidChecksum | 10 | "invalid checksum" | Integrity Verification failed |
eAlreadyLoggedIn | 11 | "already logged in" | Duplicate login attempt |
eNotSupported | 12 | "not supported" | Operation not supported by this implementation |
eEOF | 13 | "EOF" | End of data stream reached |
eCanceled | 14 | "canceled" | Operation was explicitly canceled |
Return Value Pattern
Functions that can fail return errors using two patterns:
Direct return — when the function has no other return value:
Error DoSomething();
RetWithError — when the function returns both a value and a potential error:
RetWithError<ImageInfo> GetImageInfo(const String& digest);
Callers use the Tie() helper to destructure the result:
ImageInfo info;
Error err;
Tie(info, err) = GetImageInfo(digest);
if (!err.IsNone()) {
return AOS_ERROR_WRAP(err);
}
The AOS_ERROR_WRAP macro attaches the current source file and line number to an error as it propagates up the call
stack, creating a traceable error chain.
Proto ErrorInfo Structure
When errors cross component boundaries (SM → CM, IAM → CM), they are serialized into the ErrorInfo protobuf message:
// common/v2/common.proto
message ErrorInfo {
int32 aos_code = 1; // Maps to Error::Enum integer value
int32 exit_code = 2; // Maps to Error::Errno() (system errno)
string message = 3; // Human-readable error description
}
Conversion: Error → ErrorInfo
The ConvertAosErrorToProto() function performs the mapping:
| Error Field | ErrorInfo Field | Mapping |
|---|---|---|
Error::Value() | aos_code | Direct cast of Error::Enum to int32 |
Error::Errno() | exit_code | System errno value (0 if not a runtime error) |
Error::Message() | message | Full error string including context |
When aos_code is 0 (eNone), the error is considered absent. Components only populate the ErrorInfo field in proto
messages when an actual error exists.
Where ErrorInfo Appears in Proto Messages
The ErrorInfo message is embedded in status messages across all inter-component APIs:
Service Manager → CM (servicemanager/v5):
| Message | Field | Meaning |
|---|---|---|
NodeConfigStatus | error | Node configuration application failure |
InstanceStatus | error | Service instance failure (crash, resource limit, launch error) |
EnvVarStatus | error | Environment variable injection failure |
LogData | error | Log retrieval failure |
DownloadContent | error | Content download failure |
IAM → CM (iamanager/v6):
| Message | Field | Meaning |
|---|---|---|
NodeInfo | error | Node-level error (registration failure, state error) |
CreateCertResponse | error | Certificate creation failure |
ApplyCertResponse | error | Certificate application failure |
PauseNodeResponse | error | Node pause operation failure |
ResumeNodeResponse | error | Node resume operation failure |
StartProvisioningResponse | error | Provisioning initiation failure |
FinishProvisioningResponse | error | Provisioning completion failure |
DeprovisionResponse | error | Deprovisioning failure |
Update Manager → CM (updatemanager/v2):
| Message | Field | Meaning |
|---|---|---|
ComponentStatus | error | Individual firmware component update failure |
UpdateStatus | error | Overall update operation failure |
CM Update Scheduler (communicationmanager/v3):
| Message | Field | Meaning |
|---|---|---|
UpdateFOTAStatus | error | FOTA update cycle error |
UpdateSOTAStatus | error | SOTA update cycle error |
gRPC Status Codes
In addition to ErrorInfo fields within messages, AosCore uses gRPC status codes for RPC-level error signaling. This
applies to unary RPCs and stream establishment failures.
Conversion: Error → gRPC Status
The ConvertAosErrorToGrpcStatus() function maps all non-None errors to grpc::StatusCode::INTERNAL with the error
message as the status detail:
| Error Condition | gRPC Status Code | Detail |
|---|---|---|
Error::IsNone() returns true | OK | (empty) |
| Any non-None error | INTERNAL | Full error message string |
This simplified mapping means that gRPC status codes alone do not convey the specific error category — the ErrorInfo
field within the response message provides the detailed classification. The gRPC status is primarily used for:
- Signaling that an RPC itself failed (network error, server unavailable)
- IAM server responses where the operation-level error is returned as a gRPC status rather than an in-message field
Usage Pattern
IAM server endpoints use gRPC status codes as their primary error reporting mechanism:
Client → IAM: GetCert(request)
IAM → Client: gRPC Status(INTERNAL, "not found: certificate for storage 'online' not found")
SM and UM use in-message ErrorInfo fields as their primary mechanism, with gRPC status reserved for transport-level
failures:
SM → CM: InstanceStatus { state: "Failed", error: { aos_code: 1, exit_code: 137, message: "OOM killed" } }
Cloud Protocol Error Encoding
When CM reports the Unit's status to AosCloud, errors are serialized as JSON objects within the unitStatus message.
The JSON encoding uses the same three-field structure as the proto ErrorInfo:
{
"errorInfo": {
"aosCode": 1,
"exitCode": 137,
"message": "runtime error: process exited with signal SIGKILL"
}
}
Error Fields in unitStatus
The unitStatus message carries errors at multiple levels of the status hierarchy:
Node-level errors — reported in the nodes array:
{
"messageType": "unitStatus",
"nodes": [
{
"identity": { "codename": "main-node" },
"state": "error",
"errorInfo": {
"aosCode": 2,
"exitCode": 28,
"message": "runtime error: no space left on device"
}
}
]
}
Deployable Item errors — reported in the items array:
{
"messageType": "unitStatus",
"items": [
{
"item": { "id": "service-navigation" },
"version": "2.1.0",
"state": "error",
"errorInfo": {
"aosCode": 10,
"exitCode": 0,
"message": "invalid checksum: image digest mismatch"
}
}
]
}
Instance-level errors — reported in the instances array, nested within each instance entry:
{
"messageType": "unitStatus",
"instances": [
{
"item": { "id": "service-navigation" },
"subject": { "id": "subject-001" },
"version": "2.1.0",
"instances": [
{
"node": { "codename": "main-node" },
"runtime": { "codename": "container-runtime" },
"instance": 0,
"state": "failed",
"errorInfo": {
"aosCode": 2,
"exitCode": 137,
"message": "runtime error: container exited with code 137"
}
}
]
}
]
}
Unit configuration errors — reported in the unitConfig array:
{
"messageType": "unitStatus",
"unitConfig": [
{
"version": "1.2.0",
"state": "error",
"errorInfo": {
"aosCode": 6,
"exitCode": 0,
"message": "invalid argument: unknown field 'networkMode' in unit config"
}
}
]
}
Error Fields in Update Notifications
During active updates, the UpdateSchedulerService streams notifications that include error information:
FOTA status — firmware update errors:
{
"fotaStatus": {
"state": "updating",
"components": [
{ "componentId": "bios", "componentType": "firmware", "version": "3.0.1" }
],
"error": {
"aosCode": 1,
"exitCode": 0,
"message": "failed: component update rejected by update manager"
}
}
}
SOTA status — software update errors:
{
"sotaStatus": {
"state": "downloading",
"error": {
"aosCode": 7,
"exitCode": 0,
"message": "timeout: image download exceeded deadline"
}
}
}
Conditional Error Inclusion
Error fields are only included in JSON output when an actual error exists. If the internal Error is eNone, the
errorInfo key is omitted entirely from the JSON object. This keeps status messages compact during normal operation.
End-to-End Error Flow Example
The following traces a service instance failure from detection to cloud reporting:
1. Container runtime detects process exit (exit code 137 — OOM killed)
└─ SM Launcher: Error(eRuntime, errno=137, "container exited with code 137")
2. SM Launcher reports to SM Client
└─ InstanceStatus { state: "failed", error: ErrorInfo { aos_code: 2, exit_code: 137, message: "..." } }
3. SM Client sends via gRPC stream to CM's SM Controller
└─ SMOutgoingMessages::NodeInstancesStatus { instances: [ InstanceStatus { ... } ] }
4. CM SM Controller receives and updates internal state
└─ CM Update Manager aggregates into UnitStatus
5. CM Communication module serializes to JSON and sends to AosCloud
└─ unitStatus { instances: [ { instances: [ { state: "failed", errorInfo: { ... } } ] } ] }
At each boundary, the error information is preserved:
- SM internal → gRPC:
Error→ErrorInfoviaConvertAosErrorToProto() - gRPC → CM internal:
ErrorInfo→Error(reconstructed fromexit_codeandmessage) - CM internal → Cloud JSON:
Error→ JSONerrorInfoviaToJSON(error, json)
Error Code Reference
For quick reference, the mapping between internal error codes and their meaning when seen in cloud error reports:
aosCode | Category | Common Causes |
|---|---|---|
| 0 | None | No error (should not appear in errorInfo) |
| 1 | Failed | Generic failure — check message for details |
| 2 | Runtime | OS-level error — exitCode contains the system errno or process exit code |
| 3 | No Memory | Memory allocation failure on the Node |
| 4 | Out of Range | Configuration value exceeds limits |
| 5 | Not Found | Referenced resource (image, service, certificate) does not exist |
| 6 | Invalid Argument | Malformed configuration or invalid parameter in desired state |
| 7 | Timeout | Operation deadline exceeded (download, connection, update step) |
| 8 | Already Exists | Duplicate resource creation attempt |
| 9 | Wrong State | Operation attempted in an invalid lifecycle state |
| 10 | Invalid Checksum | Image or component integrity Verification failed |
| 12 | Not Supported | Requested feature not available on this Node |
| 14 | Canceled | Operation was explicitly canceled (e.g., superseded update) |
The exitCode field is most meaningful for runtime errors (aosCode: 2), where it carries the system errno or process
exit code. For other error categories, exitCode is typically 0.
Related Pages
- Error Handling and Recovery — overview of error handling philosophy and recovery strategies
- Service Failure Handling — how SM detects and reports service instance failures
- Update Failure and Rollback — update-specific error handling and rollback procedures
- SM Client Communication — the gRPC interface SM uses to report errors to CM
- Architecture Overview — component relationships and communication paths
- Cloud Communication — the WebSocket JSON protocol carrying error reports
comment in error-handling/index.md
Open Questions
- None
Assumptions
- The
common.v2.ErrorInfois the current version used by SM v5 and IAM v6; older components (UM v2, CM v3 update scheduler) still referencecommon.v1.ErrorInfowhich has the same structure - All non-None errors map to gRPC INTERNAL status code — there is no fine-grained mapping to other gRPC codes (UNAVAILABLE, NOT_FOUND, etc.)
- The cloud JSON encoding uses camelCase field names (aosCode, exitCode, message) matching the cloud protocol convention
- Error fields are conditionally included in JSON — omitted entirely when the error is None, keeping normal-operation messages compact
- The file/line information from AOS_ERROR_WRAP is used for internal debugging only and is not propagated to proto or cloud representations
Human Review Checklist
- Technical accuracy verified against source code
- Terminology compliance (no deprecated terms)
- Cross-references resolve to correct targets
- Error::Enum values and integer codes match error.hpp
- ErrorInfo proto structure matches common/v2/common.proto
- ConvertAosErrorToProto mapping accurately reflects common.cpp implementation
- ConvertAosErrorToGrpcStatus behavior (all → INTERNAL) matches implementation
- Cloud JSON field names (aosCode, exitCode, message) match cloudprotocol/common.cpp ToJSON
- unitStatus JSON examples accurately reflect unitstatus.cpp serialization logic
- Conditional error inclusion behavior matches code (only when !IsNone())
- Content appropriate for OEM audience level --- -->