Logging Pipeline
Introduction
The logging pipeline provides on-demand access to system and service instance logs. Unlike the monitoring pipeline (which pushes metrics continuously), logs are pulled — the cloud sends a request specifying what logs are needed, and AosCore responds with compressed log archives delivered in parts.
This page documents the end-to-end flow from cloud log request to archive delivery, the three log request types, how journal entries are filtered and formatted, and how large log responses are split into manageable parts for transport.
Request Flow Overview
A log request traverses the following path through the system:
AosCloud CM SM (per Node)
│ │ │
│ requestLog (JSON/WS) │ │
│────────────────────────▶│ │
│ │ gRPC: SystemLogRequest / │
│ │ InstanceLogRequest / │
│ │ InstanceCrashLogRequest │
│ │─────────────────────────────▶│
│ │ │ Read systemd journal
│ │ │ Filter + format entries
│ │ │ Compress (gzip)
│ │ │ Split into parts
│ │ gRPC: Log (part N) │
│ │◀─────────────────────────────│
│ pushLog (JSON/WS) │ │
│◀────────────────────────│ │
- Cloud initiates — AosCloud sends a
requestLogmessage over the WebSocket connection to CM, specifying the log type, optional time range, and target Node(s). - CM routes to Node(s) — The CM Communication module parses the request and calls
SMController::RequestLog(). The SM Controller iterates over the requested Nodes and forwards the appropriate gRPC message (SystemLogRequest,InstanceLogRequest, orInstanceCrashLogRequest) to each Node's SM via the bidirectional gRPC stream. - SM collects logs — The SM's
LogProviderreads matching entries from the systemd journal, formats them, compresses the output using gzip, and splits the result into parts. - SM sends response — Each compressed part is sent back to CM as a
LoggRPC message containing the part number, total part count, and compressed content. - CM forwards to cloud — CM receives the log parts from the SM Handler and forwards each as a
pushLogmessage over the WebSocket connection to AosCloud.
Log Request Types
The cloud can request three types of logs:
| Type | gRPC Message | Description |
|---|---|---|
| System log | SystemLogRequest | Journal entries from AosCore system components (CM, SM, IAM) |
| Instance log | InstanceLogRequest | Journal entries from a specific service instance, filtered by its systemd cgroup |
| Crash log | InstanceCrashLogRequest | Journal entries surrounding a service instance crash event |
All request types share a common filter structure:
| Filter Field | Description |
|---|---|
correlationId | Unique identifier linking the request to its response parts |
from | Optional start timestamp — only entries at or after this time are included |
till | Optional end timestamp — only entries before this time are included |
nodes | List of Node IDs to collect logs from |
| Instance filter | For instance and crash logs: service ID, subject ID, and instance number to identify the target service |
Log Collection
Journal Access
The LogProvider in SM reads log entries from the systemd journal using the sd-journal API. It supports both cgroup
v1 and cgroup v2 environments for filtering service instance logs.
System logs — When no instance IDs are specified, the LogProvider reads all journal entries within the requested time range. Each entry includes the systemd unit name to identify which component produced it.
Instance logs — The LogProvider resolves the requested service filter to concrete instance IDs, then applies cgroup-based journal filters:
| Cgroup Version | Filter Pattern |
|---|---|
| v1 | _SYSTEMD_CGROUP=/system.slice/system-aos\x2dservice.slice/aos-service@<instanceID>.service |
| v2 | _SYSTEMD_CGROUP=/system.slice/system-aos\x2dservice.slice/<instanceID> |
Both patterns are added as disjunctive matches, so entries from either cgroup layout are captured.
Crash Log Extraction
Crash log collection uses a two-phase approach:
- Find crash time — The LogProvider seeks backward from the
tilltimestamp (or journal tail) looking for a "process exited" message. The monotonic timestamp of this entry becomes the crash boundary. - Collect surrounding entries — Using the crash time as an upper bound, the LogProvider collects all journal entries from the service instance's cgroup that occurred between the service start ("Started" message) and the crash. This captures the full execution context leading up to the failure.
If no crash event is found within the requested time range, an empty response with absent status is returned.
Entry Formatting
Each journal entry is formatted as a single line:
<ASN1-timestamp> [<systemd-unit>] <message>
- The timestamp is formatted as an ASN.1 time string for consistent parsing.
- The systemd unit field is included only for system logs (where entries come from multiple components). For instance logs, the unit is omitted since all entries belong to the same service.
Compression and Archiving
Log responses are compressed using gzip (via the Poco DeflatingOutputStream with Z_BEST_COMPRESSION) and split into
multiple parts for efficient transport.
Archive Configuration
Two parameters control archive splitting:
| Parameter | Description |
|---|---|
maxPartSize | Maximum uncompressed size (in bytes) of log content per part. When this threshold is reached, the current part is finalized and a new part begins. |
maxPartCount | Maximum number of parts allowed per response. If the log content exceeds maxPartSize × maxPartCount, collection stops. |
Archiving Process
- The
Archivercreates a gzip compression stream for the first part. - As formatted log entries are added, the Archiver tracks the uncompressed byte count.
- When
maxPartSizeis reached, the current compression stream is finalized, a new stream is created for the next part, and the part counter increments. - If
maxPartCountis reached, no further entries are accepted. - When all entries are processed,
SendLog()is called — it finalizes the last compression stream and sends all parts sequentially.
Each part is independently compressed, meaning the receiver can decompress parts individually without needing the full archive.
Response Delivery
Response Structure
Each log response part contains:
| Field | Description |
|---|---|
correlationId | Matches the original request, allowing the cloud to associate parts with their request |
partsCount | Total number of parts in this response |
part | Part number (1-indexed) |
content | Gzip-compressed log data for this part |
status | Response status: ok, empty, error, or absent |
error | Error description (populated when status is error or absent) |
nodeId | Identifies which Node produced this log data |
Response Statuses
| Status | Meaning |
|---|---|
ok | Log data collected successfully — content contains compressed entries |
empty | Request was valid but no matching entries found in the time range |
absent | For crash logs: no crash event found for the specified instance |
error | Processing failed — the error field contains the failure reason |
Delivery Path
On the SM side, the Archiver calls SenderItf::SendLog() for each part. This interface is implemented by the SM
Client, which writes each part as a Log message on the gRPC stream to CM.
On the CM side, the SMHandler receives each Log message from the gRPC stream, converts it to a PushLog structure,
and calls SenderItf::SendLog() — which is implemented by the CM Communication module. Communication serializes the log
part as a pushLog JSON message and sends it to AosCloud over the WebSocket connection.
Request Processing Model
The LogProvider processes requests asynchronously using a dedicated worker thread and a request queue:
- When a log request arrives (via
GetInstanceLog,GetInstanceCrashLog, orGetSystemLog), it is placed in a thread-safe queue. - The worker thread picks requests from the queue one at a time.
- Each request is processed to completion (journal read → format → compress → send) before the next request begins.
- If processing throws an exception, an error response is sent for that request's correlation ID, and the worker continues with the next queued request.
This serialized processing model prevents concurrent journal access issues and bounds resource usage, while the queue ensures no requests are lost during processing.
Multi-Node Log Collection
When a log request targets multiple Nodes, the SM Controller iterates over the Node list and sends the request to each
Node's SM independently. Each Node processes the request and responds with its own set of log parts. The cloud
correlates all parts using the shared correlationId.
If a requested Node is not connected, the SM Controller returns an error immediately for that Node without affecting other Nodes in the request.
Related Pages
- Monitoring and Observability — overview of all monitoring subsystems
- Monitoring Pipeline — continuous metrics collection and transmission
- Alerts and Thresholds — threshold-based alerting including journal-based alerts
- SM Controller — gRPC server managing SM connections
- Service Manager — SM component hosting the LogProvider