Slow First Launch

Symptom

The first session.create on a newly registered BYOVM machine takes much longer to reach READY than later launches. The dashboard launch panel shows the timeline stuck on image_pull (or image_warm) for 60+ seconds; subsequent launches of the same template finish in a few seconds.

You will see this in cm-runner agent logs:


INFO  session_create runner_id=runner_xxx template=openclaw-base
INFO  image_pull_start image=ghcr.io/curate-me-ai/openclaw-base:vYYYY.M.D
INFO  image_pull_progress layer=... mb_received=420/1843
INFO  image_pull_done elapsed_seconds=87
INFO  session_ready

Why it happens

cm-runner agents lazy-pull session images the first time a template is launched on the host. OpenClaw-based images are 1-4 GB of compressed layers per template, and a fresh machine’s Docker daemon has nothing cached. The download is bounded by your upstream bandwidth, not by Curate-Me’s gateway. Once an image is on disk, the next launch of the same template skips the pull entirely and meets the 60-second startup SLO .

This is also expected after:

A host reboot if Docker was started with --storage-driver=tmpfs (you should not do this in production — see Cleanup pruned the image below).
Manual docker system prune removing the cached layers.
A template version bump that changes the image digest.

Fix

Option 1 — Ask support to pre-pull for you

The fastest unblock for a single machine is to have support dispatch a pre-pull job. Open a ticket with your agent_id and the template name and support can run:


curl -X POST \
  -H "X-CM-API-Key: $SUPPORT_KEY" -H "X-Org-Id: $ORG_ID" \
  -H "Content-Type: application/json" \
  https://api.curate-me.ai/gateway/admin/runners/byovm/agents/$AGENT_ID/pre-pull \
  -d '{"image_ref": "<image_from_template>"}'

The pre-pull runs in the background and the next launch will hit cache.

Option 2 — Pre-pull manually from the host

If you have shell access to the machine and know the image you need, pull it ahead of time:


docker pull ghcr.io/curate-me-ai/openclaw-base:2026.5.21
docker pull ghcr.io/curate-me-ai/cm-runner:2026.5.21

Use the exact tag your template references, not :latest — the agent caches by digest, so a :latest pull does not warm the cache the next template launch will look at.

Option 3 — Opt the machine into pre-pull policy

For templates you launch often, set the machine to pre-pull on a schedule so first-launch is always warm. Today this is configured via support; a self-serve pre-pull policy UI is on the roadmap. See the Runner Operations runbook for the operational path.

Option 4 — Use the warm pool (managed runners only)

If you are using Curate-Me-managed runners (not BYOVM), the warm pool keeps N provisioned VMs idle with pre-pulled images. Set HETZNER_WARM_POOL_SIZE=2 (or higher) in the runner control plane to keep first-launch latency under the SLO. The warm pool does not apply to BYOVM hosts you control directly.

When this is not what you have

The symptom looks similar to several other failure modes — check these if a pre-pull does not help:

Looks similar but isn’t	What you actually have
Pull starts, then fails with `denied`, `manifest unknown`, or `no space left`	Image Pull Failed
Pull completes, OpenClaw still does not reach `READY`	OpenClaw Boot Failed
Launch reaches `READY` and only the first prompt is slow	LLM cold-start, not runner-side. Pre-warm the provider connection.
Every launch on the host is slow, not just the first	Machine Offline (intermittent network) or under-provisioned host

Cleanup pruned the image

If launches were fast yesterday and slow today, check whether docker system prune (or a restart=always Docker daemon flag like --storage-driver=tmpfs) wiped the cache. docker images should show the template image; if it does not, you are pulling from scratch each time.

For production hosts:

Do not run docker system prune -af on a cadence. Use prune --filter with until= and label= to target only orphaned layers.
Allocate enough disk for the templates you use — see the Connect Your Machine cloud-VM notes for the 20 GB recommended floor.

Image Pull Failed — pull errors that look like slow pulls
OpenClaw Boot Failed — image is cached but the runner never reaches READY
Machine Offline — agent flapping that interrupts pulls
Runner Startup SLO — the 60s SLO target