Slow First Launch
Symptom
The first session.create on a newly registered BYOVM machine takes
much longer to reach READY than later launches. The dashboard launch
panel shows the timeline stuck on image_pull (or image_warm) for
60+ seconds; subsequent launches of the same template finish in a few
seconds.
You will see this in cm-runner agent logs:
INFO session_create runner_id=runner_xxx template=openclaw-base
INFO image_pull_start image=ghcr.io/curate-me-ai/openclaw-base:vYYYY.M.D
INFO image_pull_progress layer=... mb_received=420/1843
INFO image_pull_done elapsed_seconds=87
INFO session_readyWhy it happens
cm-runner agents lazy-pull session images the first time a template is launched on the host. OpenClaw-based images are 1-4 GB of compressed layers per template, and a fresh machine’s Docker daemon has nothing cached. The download is bounded by your upstream bandwidth, not by Curate-Me’s gateway. Once an image is on disk, the next launch of the same template skips the pull entirely and meets the 60-second startup SLO .
This is also expected after:
- A host reboot if Docker was started with
--storage-driver=tmpfs(you should not do this in production — see Cleanup pruned the image below). - Manual
docker system pruneremoving the cached layers. - A template version bump that changes the image digest.
Fix
Option 1 — Ask support to pre-pull for you
The fastest unblock for a single machine is to have support dispatch a
pre-pull job. Open a ticket with your agent_id and the template name
and support can run:
curl -X POST \
-H "X-CM-API-Key: $SUPPORT_KEY" -H "X-Org-Id: $ORG_ID" \
-H "Content-Type: application/json" \
https://api.curate-me.ai/gateway/admin/runners/byovm/agents/$AGENT_ID/pre-pull \
-d '{"image_ref": "<image_from_template>"}'The pre-pull runs in the background and the next launch will hit cache.
Option 2 — Pre-pull manually from the host
If you have shell access to the machine and know the image you need, pull it ahead of time:
docker pull ghcr.io/curate-me-ai/openclaw-base:2026.5.21
docker pull ghcr.io/curate-me-ai/cm-runner:2026.5.21Use the exact tag your template references, not :latest — the agent
caches by digest, so a :latest pull does not warm the cache the next
template launch will look at.
Option 3 — Opt the machine into pre-pull policy
For templates you launch often, set the machine to pre-pull on a schedule so first-launch is always warm. Today this is configured via support; a self-serve pre-pull policy UI is on the roadmap. See the Runner Operations runbook for the operational path.
Option 4 — Use the warm pool (managed runners only)
If you are using Curate-Me-managed runners (not BYOVM), the warm pool
keeps N provisioned VMs idle with pre-pulled images. Set
HETZNER_WARM_POOL_SIZE=2 (or higher) in the runner control plane to
keep first-launch latency under the SLO. The warm pool does not apply
to BYOVM hosts you control directly.
When this is not what you have
The symptom looks similar to several other failure modes — check these if a pre-pull does not help:
| Looks similar but isn’t | What you actually have |
|---|---|
Pull starts, then fails with denied, manifest unknown, or no space left | Image Pull Failed |
Pull completes, OpenClaw still does not reach READY | OpenClaw Boot Failed |
Launch reaches READY and only the first prompt is slow | LLM cold-start, not runner-side. Pre-warm the provider connection. |
| Every launch on the host is slow, not just the first | Machine Offline (intermittent network) or under-provisioned host |
Cleanup pruned the image
If launches were fast yesterday and slow today, check whether
docker system prune (or a restart=always Docker daemon flag like
--storage-driver=tmpfs) wiped the cache. docker images should show
the template image; if it does not, you are pulling from scratch each
time.
For production hosts:
- Do not run
docker system prune -afon a cadence. Useprune --filterwithuntil=andlabel=to target only orphaned layers. - Allocate enough disk for the templates you use — see the Connect Your Machine cloud-VM notes for the 20 GB recommended floor.
Related pages
- Image Pull Failed — pull errors that look like slow pulls
- OpenClaw Boot Failed — image is cached but the runner never reaches
READY - Machine Offline — agent flapping that interrupts pulls
- Runner Startup SLO — the 60s SLO target