Version: v1.1

Logging Pipeline

Introduction

The logging pipeline provides on-demand access to system and service instance logs. Unlike the monitoring pipeline (which pushes metrics continuously), logs are pulled — the cloud sends a request specifying what logs are needed, and AosCore responds with compressed log archives delivered in parts.

This page documents the end-to-end flow from cloud log request to archive delivery, the three log request types, how journal entries are filtered and formatted, and how large log responses are split into manageable parts for transport.

Request Flow Overview

A log request traverses the following path through the system:

AosCloud                    CM                         SM (per Node)
   │                         │                              │
   │  requestLog (JSON/WS)   │                              │
   │────────────────────────▶│                              │
   │                         │  gRPC: SystemLogRequest /    │
   │                         │  InstanceLogRequest /        │
   │                         │  InstanceCrashLogRequest     │
   │                         │─────────────────────────────▶│
   │                         │                              │ Read systemd journal
   │                         │                              │ Filter + format entries
   │                         │                              │ Compress (gzip)
   │                         │                              │ Split into parts
   │                         │         gRPC: Log (part N)   │
   │                         │◀─────────────────────────────│
   │   pushLog (JSON/WS)     │                              │
   │◀────────────────────────│                              │

Cloud initiates — AosCloud sends a requestLog message over the WebSocket connection to CM, specifying the log type, optional time range, and target Node(s).
CM routes to Node(s) — The CM Communication module parses the request and calls SMController::RequestLog(). The SM Controller iterates over the requested Nodes and forwards the appropriate gRPC message (SystemLogRequest, InstanceLogRequest, or InstanceCrashLogRequest) to each Node's SM via the bidirectional gRPC stream.
SM collects logs — The SM's LogProvider reads matching entries from the systemd journal, formats them, compresses the output using gzip, and splits the result into parts.
SM sends response — Each compressed part is sent back to CM as a Log gRPC message containing the part number, total part count, and compressed content.
CM forwards to cloud — CM receives the log parts from the SM Handler and forwards each as a pushLog message over the WebSocket connection to AosCloud.

Log Request Types

The cloud can request three types of logs:

Type	gRPC Message	Description
System log	`SystemLogRequest`	Journal entries from AosCore system components (CM, SM, IAM)
Instance log	`InstanceLogRequest`	Journal entries from a specific service instance, filtered by its systemd cgroup
Crash log	`InstanceCrashLogRequest`	Journal entries surrounding a service instance crash event

All request types share a common filter structure:

Filter Field	Description
`correlationId`	Unique identifier linking the request to its response parts
`from`	Optional start timestamp — only entries at or after this time are included
`till`	Optional end timestamp — only entries before this time are included
`nodes`	List of Node IDs to collect logs from
Instance filter	For instance and crash logs: service ID, subject ID, and instance number to identify the target service

Log Collection

Journal Access

The LogProvider in SM reads log entries from the systemd journal using the sd-journal API. It supports both cgroup v1 and cgroup v2 environments for filtering service instance logs.

System logs — When no instance IDs are specified, the LogProvider reads all journal entries within the requested time range. Each entry includes the systemd unit name to identify which component produced it.

Instance logs — The LogProvider resolves the requested service filter to concrete instance IDs, then applies cgroup-based journal filters:

Cgroup Version	Filter Pattern
v1	`_SYSTEMD_CGROUP=/system.slice/system-aos\x2dservice.slice/aos-service@<instanceID>.service`
v2	`_SYSTEMD_CGROUP=/system.slice/system-aos\x2dservice.slice/<instanceID>`

Both patterns are added as disjunctive matches, so entries from either cgroup layout are captured.

Crash Log Extraction

Crash log collection uses a two-phase approach:

Find crash time — The LogProvider seeks backward from the till timestamp (or journal tail) looking for a "process exited" message. The monotonic timestamp of this entry becomes the crash boundary.
Collect surrounding entries — Using the crash time as an upper bound, the LogProvider collects all journal entries from the service instance's cgroup that occurred between the service start ("Started" message) and the crash. This captures the full execution context leading up to the failure.

If no crash event is found within the requested time range, an empty response with absent status is returned.

Entry Formatting

Each journal entry is formatted as a single line:

<ASN1-timestamp> [<systemd-unit>] <message>

The timestamp is formatted as an ASN.1 time string for consistent parsing.
The systemd unit field is included only for system logs (where entries come from multiple components). For instance logs, the unit is omitted since all entries belong to the same service.

Compression and Archiving

Log responses are compressed using gzip (via the Poco DeflatingOutputStream with Z_BEST_COMPRESSION) and split into multiple parts for efficient transport.

Archive Configuration

Two parameters control archive splitting:

Parameter	Description
`maxPartSize`	Maximum uncompressed size (in bytes) of log content per part. When this threshold is reached, the current part is finalized and a new part begins.
`maxPartCount`	Maximum number of parts allowed per response. If the log content exceeds `maxPartSize × maxPartCount`, collection stops.

Archiving Process

The Archiver creates a gzip compression stream for the first part.
As formatted log entries are added, the Archiver tracks the uncompressed byte count.
When maxPartSize is reached, the current compression stream is finalized, a new stream is created for the next part, and the part counter increments.
If maxPartCount is reached, no further entries are accepted.
When all entries are processed, SendLog() is called — it finalizes the last compression stream and sends all parts sequentially.

Each part is independently compressed, meaning the receiver can decompress parts individually without needing the full archive.

Response Delivery

Response Structure

Each log response part contains:

Field	Description
`correlationId`	Matches the original request, allowing the cloud to associate parts with their request
`partsCount`	Total number of parts in this response
`part`	Part number (1-indexed)
`content`	Gzip-compressed log data for this part
`status`	Response status: `ok`, `empty`, `error`, or `absent`
`error`	Error description (populated when status is `error` or `absent`)
`nodeId`	Identifies which Node produced this log data

Response Statuses

Status	Meaning
`ok`	Log data collected successfully — content contains compressed entries
`empty`	Request was valid but no matching entries found in the time range
`absent`	For crash logs: no crash event found for the specified instance
`error`	Processing failed — the error field contains the failure reason

Delivery Path

On the SM side, the Archiver calls SenderItf::SendLog() for each part. This interface is implemented by the SM Client, which writes each part as a Log message on the gRPC stream to CM.

On the CM side, the SMHandler receives each Log message from the gRPC stream, converts it to a PushLog structure, and calls SenderItf::SendLog() — which is implemented by the CM Communication module. Communication serializes the log part as a pushLog JSON message and sends it to AosCloud over the WebSocket connection.

Request Processing Model

The LogProvider processes requests asynchronously using a dedicated worker thread and a request queue:

When a log request arrives (via GetInstanceLog, GetInstanceCrashLog, or GetSystemLog), it is placed in a thread-safe queue.
The worker thread picks requests from the queue one at a time.
Each request is processed to completion (journal read → format → compress → send) before the next request begins.
If processing throws an exception, an error response is sent for that request's correlation ID, and the worker continues with the next queued request.

This serialized processing model prevents concurrent journal access issues and bounds resource usage, while the queue ensures no requests are lost during processing.

Multi-Node Log Collection

When a log request targets multiple Nodes, the SM Controller iterates over the Node list and sends the request to each Node's SM independently. Each Node processes the request and responds with its own set of log parts. The cloud correlates all parts using the shared correlationId.

If a requested Node is not connected, the SM Controller returns an error immediately for that Node without affecting other Nodes in the request.

Monitoring and Observability — overview of all monitoring subsystems
Monitoring Pipeline — continuous metrics collection and transmission

Alerts and Thresholds — threshold-based alerting including journal-based alerts

SM Controller — gRPC server managing SM connections
Service Manager — SM component hosting the LogProvider

Introduction​

Request Flow Overview​

Log Request Types​

Log Collection​

Journal Access​

Crash Log Extraction​

Entry Formatting​

Compression and Archiving​

Archive Configuration​

Archiving Process​

Response Delivery​

Response Structure​

Response Statuses​

Delivery Path​

Request Processing Model​

Multi-Node Log Collection​

Related Pages​