Skip to main content
Version: v1.1

Image Deployment Pipeline

Introduction

The image deployment pipeline is the sequence of operations that transforms a cloud-published Deployable Item into a running service instance on a Node. This pipeline spans multiple AosCore components — the Communication Manager (CM) coordinates the overall update, the Service Manager's (SM) Image Manager handles download and storage, and the Launcher assembles the runtime environment and starts the container.

Understanding this pipeline is essential for OEMs because it governs how quickly services deploy, what happens when downloads fail, and how the system ensures that only verified, untampered images execute on the Unit.

Pipeline Overview

The image deployment pipeline consists of five stages:

StageComponentDescription
1. Blob URL resolutionCM → SMCM provides download URLs for image blobs when SM requests them via GetBlobsInfo
2. DownloadSM Image Manager + DownloaderBlobs are retrieved over HTTP/HTTPS with retry, resume, and progress reporting
3. VerificationSM Image ManagerEach downloaded blob is verified against its SHA-256 content digest
4. Layer unpackingSM Image Manager + ImageHandlerCompressed tar layers are extracted to overlay-compatible filesystem layout
5. Rootfs assembly and launchSM LauncherLayers are stacked into an OverlayFS mount and the container runtime starts the service
loading...

Stage 1: Blob URL Resolution

When CM receives a new desired state from AosCloud, it determines which Deployable Items need to be deployed to each Node. CM's own Image Manager downloads the image index and manifests from cloud-provided URLs, then makes these blobs available to SM through a local gRPC interface.

SM's Image Manager does not communicate directly with the cloud. Instead, when it needs to download a blob (identified by its SHA-256 digest), it calls GetBlobsInfo on the SM client, which issues a gRPC request to CM. CM resolves the digest to a download URL — either a cloud-hosted blob URL or a local file server URL (in multi-Node configurations where the Message Proxy serves images to secondary Nodes).

SM Image Manager SM Client (gRPC) CM
│ │ │
│── GetBlobsInfo([digest1, ...]) ──▶│ │
│ │── GetBlobsInfos(digests) ──▶│
│ │ │── resolve digests
│ │◀── BlobsInfos(urls) ────────│ to URLs
│◀── urls[] ────────────────────────│ │
│ │ │

This indirection allows CM to control blob distribution — for example, serving images from a local cache rather than re-downloading from the cloud when multiple Nodes need the same image.

Stage 2: Download

Once the SM Image Manager has a URL for a blob, it delegates the actual download to the Downloader module. The download process includes:

  1. Space allocation — the Image Manager reserves storage space via the Space Allocator before starting the download
  2. Duplicate detection — if the same blob (by digest) is already being downloaded (e.g., shared layer between two services), the second request waits for the first to complete
  3. HTTP/HTTPS retrieval — the Downloader fetches the blob with automatic retry (up to 3 attempts with exponential backoff) and resume support via HTTP range requests
  4. Progress tracking — CM-initiated downloads report progress to the cloud; SM-initiated downloads operate silently

Download Parameters

ParameterValueDescription
Max retries3Total download attempts before failure
Initial backoff1 secondDelay before first retry
Max backoff5 secondsUpper bound on retry delay
Connection timeout10 secondsMaximum time to establish connection

If all retry attempts fail, the download error propagates up to the Launcher, which marks the instance as failed. The Space Allocator releases the reserved space.

Stage 3: Verification

After each blob is downloaded, the Image Manager computes its SHA-256 hash and compares it against the declared digest from the OCI manifest. This verification ensures:

  • Integrity — the blob was not corrupted during transfer
  • Authenticity — the blob matches what was declared in the signed manifest

Verification occurs at multiple points in the pipeline:

CheckWhenAction on failure
Blob digestAfter download completesDelete blob, return error, fail installation
Layer digestAfter unpackingDelete unpacked content, return error
Periodic integrityBackground timer (every 24 hours)Remove corrupted item, reclaim space

If verification fails during installation, the entire Deployable Item installation is aborted and partially-downloaded content is cleaned up. The Launcher receives an error and reports the instance as failed to CM.

Stage 4: Layer Unpacking

For service-type Deployable Items, each layer blob is a compressed tar archive that must be unpacked into a filesystem tree suitable for OverlayFS. The platform-specific ImageHandler performs this operation:

  1. Extract tar archive — decompress (gzip) and extract the layer contents to a dedicated directory under {imagePath}/layers/sha256/{diffID}/layer/
  2. Convert OCI whiteouts to OverlayFS format:
    • .wh.<filename> markers → character device nodes (major/minor 0) signaling file deletion
    • .wh..wh..opq markers → trusted.overlay.opaque extended attribute on the parent directory
  3. Set ownership — adjust file UID/GID to the configured service execution context
  4. Compute unpacked digest — calculate the digest of the unpacked layer content for future integrity checks
  5. Remove compressed blob — delete the original tar archive to reclaim space; the blob path is repurposed to store the diff ID as a pointer to the unpacked layer location

After unpacking, layers are stored by their diff ID (the uncompressed content digest declared in the image config's rootfs.diffIDs array). This content-addressable layout enables layer sharing — if two services use the same base layer, it is stored only once.

Supported Layer Formats

Media TypeDescription
application/vnd.oci.image.layer.v1.tarUncompressed tar
application/vnd.oci.image.layer.v1.tar+gzipGzip-compressed tar (most common)

For non-service items (firmware components), layers are stored as raw blobs without unpacking — the boot and rootfs runtimes handle them differently.

Stage 5: Rootfs Assembly and Launch

Once all layers are verified and unpacked, the Launcher prepares the runtime environment and starts the container:

5.1 Load Configurations

The container runtime's Instance module loads three configuration files from the Image Manager's blob storage:

  1. Image Manifest — identifies the image config, item config, and layer references
  2. Image Config — provides the container's entrypoint, environment variables, working directory, and the rootfs.diffIDs array listing layers in stack order
  3. Item Config — provides AosEdge-specific metadata: resource quotas, runtime selection, permissions, network rules, and alert thresholds

5.2 Assemble Root Filesystem

The Launcher creates an OverlayFS mount by stacking multiple directories:

OverlayFS mount (container rootfs)
├── Mount points directory ← top layer (proc, dev, sys mount points)
├── Image layer N ← uppermost service layer
├── Image layer N-1 ← ...
├── Image layer 1 ← base service layer
├── Host whiteouts directory ← masks host files not needed in container
└── Host root filesystem (/) ← provides system binaries (bin, sbin, lib, usr)

Each image layer path is resolved by calling GetLayerPath on the Image Manager with the layer's diff ID from the image config. The layers are stacked in order — the first diff ID is the bottom layer, the last is the top.

5.3 Generate Runtime Config

The Launcher generates an OCI Runtime Config (config.json) that includes:

  • Process — entrypoint, arguments, environment variables (including AOS_ITEM_ID, AOS_INSTANCE_ID, etc.), UID/GID
  • Resource limits — CPU quota/period, memory limit, PID limit (from item config quotas)
  • Namespaces — PID, mount, IPC, UTS, and optionally network namespace
  • Mounts — state directory, storage directory, tmpfs, proc, dev, sys
  • Devices — hardware access rules from the Resource Manager
  • Network — hostname, DNS configuration, hosts file

5.4 Start Container

The container is started as a systemd transient unit (aos-service@<instanceID>.service) using an OCI-compatible runtime binary (e.g., crun). The Runner module:

  1. Creates a systemd drop-in with instance-specific parameters
  2. Starts the systemd unit, which invokes the OCI runtime to create and run the container
  3. Begins monitoring the unit status for state changes

Once the container process is running, the Launcher reports the instance state as Active to CM.

Error Handling

The pipeline handles failures at each stage with appropriate recovery:

Failure PointBehavior
URL resolution failsSM retries via GetBlobsInfo; if CM is unavailable, instance remains in Activating state
Download fails (all retries exhausted)Space allocation released; instance reported as Failed
Verification failsCorrupted blob deleted; installation aborted; instance reported as Failed
Layer unpacking failsPartial content cleaned up; installation aborted; instance reported as Failed
Rootfs assembly failsMount cleaned up; instance reported as Failed
Container start failsRuntime reports failure; Launcher marks instance as Failed

In all failure cases, the instance status is reported back to CM, which forwards it to the cloud. The cloud can then decide whether to retry the deployment or take corrective action.

Concurrency and Parallelism

The pipeline exploits parallelism at multiple levels:

  • Multiple items in parallel — when an UpdateInstances command includes multiple services, their image installations run concurrently in a thread pool
  • Shared layer deduplication — if two services share a common layer, only one download occurs; the second waits for the first to complete and reuses the result
  • Stop before install — outdated instances are stopped first, then new images are installed, then new instances are started (stop → install → start ordering within a single update)

Storage Lifecycle

After deployment, the Image Manager continues to manage stored images:

  • Version retention — at most 2 versions of each Deployable Item are kept simultaneously
  • TTL-based cleanup — removed items are permanently deleted after 30 days (configurable)
  • Orphan removal — blobs and layers no longer referenced by any item are automatically cleaned up
  • Space pressure eviction — the Space Allocator can trigger removal of outdated items when storage is constrained