Skip to main content
Version: v1.1

Logging Pipeline

Introduction

The logging pipeline provides on-demand access to system and service instance logs. Unlike the monitoring pipeline (which pushes metrics continuously), logs are pulled — the cloud sends a request specifying what logs are needed, and AosCore responds with compressed log archives delivered in parts.

This page documents the end-to-end flow from cloud log request to archive delivery, the three log request types, how journal entries are filtered and formatted, and how large log responses are split into manageable parts for transport.

Request Flow Overview

A log request traverses the following path through the system:

AosCloud CM SM (per Node)
│ │ │
│ requestLog (JSON/WS) │ │
│────────────────────────▶│ │
│ │ gRPC: SystemLogRequest / │
│ │ InstanceLogRequest / │
│ │ InstanceCrashLogRequest │
│ │─────────────────────────────▶│
│ │ │ Read systemd journal
│ │ │ Filter + format entries
│ │ │ Compress (gzip)
│ │ │ Split into parts
│ │ gRPC: Log (part N) │
│ │◀─────────────────────────────│
│ pushLog (JSON/WS) │ │
│◀────────────────────────│ │
  1. Cloud initiates — AosCloud sends a requestLog message over the WebSocket connection to CM, specifying the log type, optional time range, and target Node(s).
  2. CM routes to Node(s) — The CM Communication module parses the request and calls SMController::RequestLog(). The SM Controller iterates over the requested Nodes and forwards the appropriate gRPC message (SystemLogRequest, InstanceLogRequest, or InstanceCrashLogRequest) to each Node's SM via the bidirectional gRPC stream.
  3. SM collects logs — The SM's LogProvider reads matching entries from the systemd journal, formats them, compresses the output using gzip, and splits the result into parts.
  4. SM sends response — Each compressed part is sent back to CM as a Log gRPC message containing the part number, total part count, and compressed content.
  5. CM forwards to cloud — CM receives the log parts from the SM Handler and forwards each as a pushLog message over the WebSocket connection to AosCloud.

Log Request Types

The cloud can request three types of logs:

TypegRPC MessageDescription
System logSystemLogRequestJournal entries from AosCore system components (CM, SM, IAM)
Instance logInstanceLogRequestJournal entries from a specific service instance, filtered by its systemd cgroup
Crash logInstanceCrashLogRequestJournal entries surrounding a service instance crash event

All request types share a common filter structure:

Filter FieldDescription
correlationIdUnique identifier linking the request to its response parts
fromOptional start timestamp — only entries at or after this time are included
tillOptional end timestamp — only entries before this time are included
nodesList of Node IDs to collect logs from
Instance filterFor instance and crash logs: service ID, subject ID, and instance number to identify the target service

Log Collection

Journal Access

The LogProvider in SM reads log entries from the systemd journal using the sd-journal API. It supports both cgroup v1 and cgroup v2 environments for filtering service instance logs.

System logs — When no instance IDs are specified, the LogProvider reads all journal entries within the requested time range. Each entry includes the systemd unit name to identify which component produced it.

Instance logs — The LogProvider resolves the requested service filter to concrete instance IDs, then applies cgroup-based journal filters:

Cgroup VersionFilter Pattern
v1_SYSTEMD_CGROUP=/system.slice/system-aos\x2dservice.slice/aos-service@<instanceID>.service
v2_SYSTEMD_CGROUP=/system.slice/system-aos\x2dservice.slice/<instanceID>

Both patterns are added as disjunctive matches, so entries from either cgroup layout are captured.

Crash Log Extraction

Crash log collection uses a two-phase approach:

  1. Find crash time — The LogProvider seeks backward from the till timestamp (or journal tail) looking for a "process exited" message. The monotonic timestamp of this entry becomes the crash boundary.
  2. Collect surrounding entries — Using the crash time as an upper bound, the LogProvider collects all journal entries from the service instance's cgroup that occurred between the service start ("Started" message) and the crash. This captures the full execution context leading up to the failure.

If no crash event is found within the requested time range, an empty response with absent status is returned.

Entry Formatting

Each journal entry is formatted as a single line:

<ASN1-timestamp> [<systemd-unit>] <message>
  • The timestamp is formatted as an ASN.1 time string for consistent parsing.
  • The systemd unit field is included only for system logs (where entries come from multiple components). For instance logs, the unit is omitted since all entries belong to the same service.

Compression and Archiving

Log responses are compressed using gzip (via the Poco DeflatingOutputStream with Z_BEST_COMPRESSION) and split into multiple parts for efficient transport.

Archive Configuration

Two parameters control archive splitting:

ParameterDescription
maxPartSizeMaximum uncompressed size (in bytes) of log content per part. When this threshold is reached, the current part is finalized and a new part begins.
maxPartCountMaximum number of parts allowed per response. If the log content exceeds maxPartSize × maxPartCount, collection stops.

Archiving Process

  1. The Archiver creates a gzip compression stream for the first part.
  2. As formatted log entries are added, the Archiver tracks the uncompressed byte count.
  3. When maxPartSize is reached, the current compression stream is finalized, a new stream is created for the next part, and the part counter increments.
  4. If maxPartCount is reached, no further entries are accepted.
  5. When all entries are processed, SendLog() is called — it finalizes the last compression stream and sends all parts sequentially.

Each part is independently compressed, meaning the receiver can decompress parts individually without needing the full archive.

Response Delivery

Response Structure

Each log response part contains:

FieldDescription
correlationIdMatches the original request, allowing the cloud to associate parts with their request
partsCountTotal number of parts in this response
partPart number (1-indexed)
contentGzip-compressed log data for this part
statusResponse status: ok, empty, error, or absent
errorError description (populated when status is error or absent)
nodeIdIdentifies which Node produced this log data

Response Statuses

StatusMeaning
okLog data collected successfully — content contains compressed entries
emptyRequest was valid but no matching entries found in the time range
absentFor crash logs: no crash event found for the specified instance
errorProcessing failed — the error field contains the failure reason

Delivery Path

On the SM side, the Archiver calls SenderItf::SendLog() for each part. This interface is implemented by the SM Client, which writes each part as a Log message on the gRPC stream to CM.

On the CM side, the SMHandler receives each Log message from the gRPC stream, converts it to a PushLog structure, and calls SenderItf::SendLog() — which is implemented by the CM Communication module. Communication serializes the log part as a pushLog JSON message and sends it to AosCloud over the WebSocket connection.

Request Processing Model

The LogProvider processes requests asynchronously using a dedicated worker thread and a request queue:

  1. When a log request arrives (via GetInstanceLog, GetInstanceCrashLog, or GetSystemLog), it is placed in a thread-safe queue.
  2. The worker thread picks requests from the queue one at a time.
  3. Each request is processed to completion (journal read → format → compress → send) before the next request begins.
  4. If processing throws an exception, an error response is sent for that request's correlation ID, and the worker continues with the next queued request.

This serialized processing model prevents concurrent journal access issues and bounds resource usage, while the queue ensures no requests are lost during processing.

Multi-Node Log Collection

When a log request targets multiple Nodes, the SM Controller iterates over the Node list and sends the request to each Node's SM independently. Each Node processes the request and responds with its own set of log parts. The cloud correlates all parts using the shared correlationId.

If a requested Node is not connected, the SM Controller returns an error immediately for that Node without affecting other Nodes in the request.