Skip to main content
Version: v1.1

Error Propagation

Introduction

This page documents how errors propagate through the AosCore system — from the point of failure within a component, through inter-component gRPC interfaces, and ultimately to AosCloud via the WebSocket JSON protocol. Understanding this flow is essential for OEMs who need to interpret error reports from the cloud dashboard, correlate them with on-device behavior, or build monitoring integrations.

AosCore uses a layered error propagation model. Each layer has its own representation of error information, with well-defined conversion functions at each boundary:

  1. Internal — the C++ Error class with typed enumerations and optional messages
  2. Inter-component — the ErrorInfo protobuf message carried in gRPC streams
  3. gRPC transport — gRPC status codes for RPC-level failures
  4. Cloud protocol — JSON errorInfo objects embedded in unitStatus and update notification messages

Internal Error Representation

All AosCore components use the Error class defined in aos_core_lib_cpp as their internal error type. This class carries:

FieldTypeDescription
mErrError::EnumTyped error category (see table below)
mErrnointSystem errno value (non-zero for runtime errors wrapping OS calls)
mMessagechar[]Human-readable error description (fixed-size buffer)
mFileNameconst char*Source file where the error originated
mLineNumberintLine number where the error originated

Error Categories

The Error::Enum defines the following error categories:

Enum ValueInteger CodeString RepresentationTypical Cause
eNone0"none"No error — operation succeeded
eFailed1"failed"Generic failure without a more specific category
eRuntime2"runtime error"OS-level failure (wraps system errno)
eNoMemory3"not enough memory"Memory allocation failure
eOutOfRange4"out of range"Index or value exceeds valid bounds
eNotFound5"not found"Requested resource does not exist
eInvalidArgument6"invalid argument"Invalid parameter passed to a function
eTimeout7"timeout"Operation exceeded its time limit
eAlreadyExist8"already exist"Attempted to create a resource that already exists
eWrongState9"wrong state"Operation invalid in the current state
eInvalidChecksum10"invalid checksum"Integrity Verification failed
eAlreadyLoggedIn11"already logged in"Duplicate login attempt
eNotSupported12"not supported"Operation not supported by this implementation
eEOF13"EOF"End of data stream reached
eCanceled14"canceled"Operation was explicitly canceled

Return Value Pattern

Functions that can fail return errors using two patterns:

Direct return — when the function has no other return value:

Error DoSomething();

RetWithError — when the function returns both a value and a potential error:

RetWithError<ImageInfo> GetImageInfo(const String& digest);

Callers use the Tie() helper to destructure the result:

ImageInfo info;
Error err;
Tie(info, err) = GetImageInfo(digest);
if (!err.IsNone()) {
return AOS_ERROR_WRAP(err);
}

The AOS_ERROR_WRAP macro attaches the current source file and line number to an error as it propagates up the call stack, creating a traceable error chain.

Proto ErrorInfo Structure

When errors cross component boundaries (SM → CM, IAM → CM), they are serialized into the ErrorInfo protobuf message:

// common/v2/common.proto
message ErrorInfo {
int32 aos_code = 1; // Maps to Error::Enum integer value
int32 exit_code = 2; // Maps to Error::Errno() (system errno)
string message = 3; // Human-readable error description
}

Conversion: Error → ErrorInfo

The ConvertAosErrorToProto() function performs the mapping:

Error FieldErrorInfo FieldMapping
Error::Value()aos_codeDirect cast of Error::Enum to int32
Error::Errno()exit_codeSystem errno value (0 if not a runtime error)
Error::Message()messageFull error string including context

When aos_code is 0 (eNone), the error is considered absent. Components only populate the ErrorInfo field in proto messages when an actual error exists.

Where ErrorInfo Appears in Proto Messages

The ErrorInfo message is embedded in status messages across all inter-component APIs:

Service Manager → CM (servicemanager/v5):

MessageFieldMeaning
NodeConfigStatuserrorNode configuration application failure
InstanceStatuserrorService instance failure (crash, resource limit, launch error)
EnvVarStatuserrorEnvironment variable injection failure
LogDataerrorLog retrieval failure
DownloadContenterrorContent download failure

IAM → CM (iamanager/v6):

MessageFieldMeaning
NodeInfoerrorNode-level error (registration failure, state error)
CreateCertResponseerrorCertificate creation failure
ApplyCertResponseerrorCertificate application failure
PauseNodeResponseerrorNode pause operation failure
ResumeNodeResponseerrorNode resume operation failure
StartProvisioningResponseerrorProvisioning initiation failure
FinishProvisioningResponseerrorProvisioning completion failure
DeprovisionResponseerrorDeprovisioning failure

Update Manager → CM (updatemanager/v2):

MessageFieldMeaning
ComponentStatuserrorIndividual firmware component update failure
UpdateStatuserrorOverall update operation failure

CM Update Scheduler (communicationmanager/v3):

MessageFieldMeaning
UpdateFOTAStatuserrorFOTA update cycle error
UpdateSOTAStatuserrorSOTA update cycle error

gRPC Status Codes

In addition to ErrorInfo fields within messages, AosCore uses gRPC status codes for RPC-level error signaling. This applies to unary RPCs and stream establishment failures.

Conversion: Error → gRPC Status

The ConvertAosErrorToGrpcStatus() function maps all non-None errors to grpc::StatusCode::INTERNAL with the error message as the status detail:

Error ConditiongRPC Status CodeDetail
Error::IsNone() returns trueOK(empty)
Any non-None errorINTERNALFull error message string

This simplified mapping means that gRPC status codes alone do not convey the specific error category — the ErrorInfo field within the response message provides the detailed classification. The gRPC status is primarily used for:

  • Signaling that an RPC itself failed (network error, server unavailable)
  • IAM server responses where the operation-level error is returned as a gRPC status rather than an in-message field

Usage Pattern

IAM server endpoints use gRPC status codes as their primary error reporting mechanism:

Client → IAM: GetCert(request)
IAM → Client: gRPC Status(INTERNAL, "not found: certificate for storage 'online' not found")

SM and UM use in-message ErrorInfo fields as their primary mechanism, with gRPC status reserved for transport-level failures:

SM → CM: InstanceStatus { state: "Failed", error: { aos_code: 1, exit_code: 137, message: "OOM killed" } }

Cloud Protocol Error Encoding

When CM reports the Unit's status to AosCloud, errors are serialized as JSON objects within the unitStatus message. The JSON encoding uses the same three-field structure as the proto ErrorInfo:

{
"errorInfo": {
"aosCode": 1,
"exitCode": 137,
"message": "runtime error: process exited with signal SIGKILL"
}
}

Error Fields in unitStatus

The unitStatus message carries errors at multiple levels of the status hierarchy:

Node-level errors — reported in the nodes array:

{
"messageType": "unitStatus",
"nodes": [
{
"identity": { "codename": "main-node" },
"state": "error",
"errorInfo": {
"aosCode": 2,
"exitCode": 28,
"message": "runtime error: no space left on device"
}
}
]
}

Deployable Item errors — reported in the items array:

{
"messageType": "unitStatus",
"items": [
{
"item": { "id": "service-navigation" },
"version": "2.1.0",
"state": "error",
"errorInfo": {
"aosCode": 10,
"exitCode": 0,
"message": "invalid checksum: image digest mismatch"
}
}
]
}

Instance-level errors — reported in the instances array, nested within each instance entry:

{
"messageType": "unitStatus",
"instances": [
{
"item": { "id": "service-navigation" },
"subject": { "id": "subject-001" },
"version": "2.1.0",
"instances": [
{
"node": { "codename": "main-node" },
"runtime": { "codename": "container-runtime" },
"instance": 0,
"state": "failed",
"errorInfo": {
"aosCode": 2,
"exitCode": 137,
"message": "runtime error: container exited with code 137"
}
}
]
}
]
}

Unit configuration errors — reported in the unitConfig array:

{
"messageType": "unitStatus",
"unitConfig": [
{
"version": "1.2.0",
"state": "error",
"errorInfo": {
"aosCode": 6,
"exitCode": 0,
"message": "invalid argument: unknown field 'networkMode' in unit config"
}
}
]
}

Error Fields in Update Notifications

During active updates, the UpdateSchedulerService streams notifications that include error information:

FOTA status — firmware update errors:

{
"fotaStatus": {
"state": "updating",
"components": [
{ "componentId": "bios", "componentType": "firmware", "version": "3.0.1" }
],
"error": {
"aosCode": 1,
"exitCode": 0,
"message": "failed: component update rejected by update manager"
}
}
}

SOTA status — software update errors:

{
"sotaStatus": {
"state": "downloading",
"error": {
"aosCode": 7,
"exitCode": 0,
"message": "timeout: image download exceeded deadline"
}
}
}

Conditional Error Inclusion

Error fields are only included in JSON output when an actual error exists. If the internal Error is eNone, the errorInfo key is omitted entirely from the JSON object. This keeps status messages compact during normal operation.

End-to-End Error Flow Example

The following traces a service instance failure from detection to cloud reporting:

1. Container runtime detects process exit (exit code 137 — OOM killed)
└─ SM Launcher: Error(eRuntime, errno=137, "container exited with code 137")

2. SM Launcher reports to SM Client
└─ InstanceStatus { state: "failed", error: ErrorInfo { aos_code: 2, exit_code: 137, message: "..." } }

3. SM Client sends via gRPC stream to CM's SM Controller
└─ SMOutgoingMessages::NodeInstancesStatus { instances: [ InstanceStatus { ... } ] }

4. CM SM Controller receives and updates internal state
└─ CM Update Manager aggregates into UnitStatus

5. CM Communication module serializes to JSON and sends to AosCloud
└─ unitStatus { instances: [ { instances: [ { state: "failed", errorInfo: { ... } } ] } ] }

At each boundary, the error information is preserved:

  • SM internal → gRPC: ErrorErrorInfo via ConvertAosErrorToProto()
  • gRPC → CM internal: ErrorInfoError (reconstructed from exit_code and message)
  • CM internal → Cloud JSON: Error → JSON errorInfo via ToJSON(error, json)

Error Code Reference

For quick reference, the mapping between internal error codes and their meaning when seen in cloud error reports:

aosCodeCategoryCommon Causes
0NoneNo error (should not appear in errorInfo)
1FailedGeneric failure — check message for details
2RuntimeOS-level error — exitCode contains the system errno or process exit code
3No MemoryMemory allocation failure on the Node
4Out of RangeConfiguration value exceeds limits
5Not FoundReferenced resource (image, service, certificate) does not exist
6Invalid ArgumentMalformed configuration or invalid parameter in desired state
7TimeoutOperation deadline exceeded (download, connection, update step)
8Already ExistsDuplicate resource creation attempt
9Wrong StateOperation attempted in an invalid lifecycle state
10Invalid ChecksumImage or component integrity Verification failed
12Not SupportedRequested feature not available on this Node
14CanceledOperation was explicitly canceled (e.g., superseded update)

The exitCode field is most meaningful for runtime errors (aosCode: 2), where it carries the system errno or process exit code. For other error categories, exitCode is typically 0.

comment in error-handling/index.md

Open Questions

  • None

Assumptions

  • The common.v2.ErrorInfo is the current version used by SM v5 and IAM v6; older components (UM v2, CM v3 update scheduler) still reference common.v1.ErrorInfo which has the same structure
  • All non-None errors map to gRPC INTERNAL status code — there is no fine-grained mapping to other gRPC codes (UNAVAILABLE, NOT_FOUND, etc.)
  • The cloud JSON encoding uses camelCase field names (aosCode, exitCode, message) matching the cloud protocol convention
  • Error fields are conditionally included in JSON — omitted entirely when the error is None, keeping normal-operation messages compact
  • The file/line information from AOS_ERROR_WRAP is used for internal debugging only and is not propagated to proto or cloud representations

Human Review Checklist

  • Technical accuracy verified against source code
  • Terminology compliance (no deprecated terms)
  • Cross-references resolve to correct targets
  • Error::Enum values and integer codes match error.hpp
  • ErrorInfo proto structure matches common/v2/common.proto
  • ConvertAosErrorToProto mapping accurately reflects common.cpp implementation
  • ConvertAosErrorToGrpcStatus behavior (all → INTERNAL) matches implementation
  • Cloud JSON field names (aosCode, exitCode, message) match cloudprotocol/common.cpp ToJSON
  • unitStatus JSON examples accurately reflect unitstatus.cpp serialization logic
  • Conditional error inclusion behavior matches code (only when !IsNone())
  • Content appropriate for OEM audience level --- -->