OpenClaw Boot Failed

Symptom

The runner container starts but the session never transitions from PROVISIONING to READY within the per-tier timeout. Dashboard launch panel shows openclaw_boot_failed or openclaw_ready_timeout. The cm-runner agent log contains:


WARN  openclaw_not_ready_yet attempts=N max=10
ERROR openclaw_boot_failed reason=startup_timeout | gateway_unreachable |
      out_of_memory | provider_key_missing

Likely causes

Reason	What happened	Fix
`startup_timeout`	OpenClaw process is still doing first-launch setup when the supervisor gave up	Bump container memory; check disk I/O speed.
`gateway_unreachable`	The container can’t reach the configured `CM_GATEWAY_PROXY_URL` for LLM calls	Verify the value is set and reachable from inside the container.
`out_of_memory`	Kernel OOM-killed the OpenClaw process	Host needs more RAM or fewer concurrent sessions.
`provider_key_missing`	The session needs an LLM provider key the agent didn’t inject	See Missing Credentials.
Skip flags missing	OpenClaw is spawning ~19 worker processes and exhausting RAM	Confirm the executor is setting `OPENCLAW_SKIP_*` env vars (already the default in cm-runner).

Fix

Step 1 — Read OpenClaw’s own log

The cm-runner executor streams the container’s stdout/stderr to the agent log, prefixed with the session ID:


docker logs cm-runner --tail 500 | grep "session_$SESSION_ID"

Look for the last non-info line before the timeout — it almost always identifies the failure (missing key, port collision, malformed config).

Step 2 — Confirm gateway reachability from inside the container


docker exec session_$SESSION_ID curl -s -o /dev/null -w "%{http_code}\n" \
  "$CM_GATEWAY_PROXY_URL/health"

A 200 confirms the LLM proxy path is open. Anything else means the agent’s CM_GATEWAY_PROXY_URL either points to the wrong host or the container’s network can’t reach it.

Step 3 — Check memory headroom


docker stats --no-stream session_$SESSION_ID

The OpenClaw skip-flag profile (default in current cm-runner) keeps each session at ~450 MB steady-state. If you see > 2 GB sustained, either:

The session is running a heavy workload — expected, scale the host.
A non-default OpenClaw configuration is spawning the full worker fan-out — unset any OPENCLAW_SKIP_*=0 overrides.

Step 4 — Rule out a bad image

If the same template fails on every machine, the published image is the suspect. Pin the template back to a known-good tag:


# Current pinned tag (default for new templates).
curl -X PUT \
  -H "X-CM-API-Key: cm_sk_your_key_here" \
  -H "Content-Type: application/json" \
  https://api.curate-me.ai/gateway/admin/runners/templates/$TEMPLATE_ID \
  -d '{"base_image_ref": "ghcr.io/curate-me-ai/openclaw-base:v2026.5.22"}'
 
# Rollback to the previous pin (2026.4.2 stays available for 14 days
# post-canary as the approved rollback target).
curl -X PUT \
  -H "X-CM-API-Key: cm_sk_your_key_here" \
  -H "Content-Type: application/json" \
  https://api.curate-me.ai/gateway/admin/runners/templates/$TEMPLATE_ID \
  -d '{"base_image_ref": "ghcr.io/curate-me-ai/openclaw-base:v2026.4.2"}'

(See Runner Operations runbook § 5 for the full template / skill-pack rollback procedure.)

Where to find logs


# Agent + OpenClaw merged
docker logs cm-runner --tail 1000 | grep -E "openclaw|session_"
 
# Container itself
docker logs session_$SESSION_ID --tail 200

Server-side: runner_state_transition_failed with the rejected target state.

Missing Credentials
Runner Startup SLO
Runner Operations runbook: Section 4 — OpenClaw CLI failure rollback