Skip to Content
TroubleshootingOpenClaw Boot Failed

OpenClaw Boot Failed

Symptom

The runner container starts but the session never transitions from PROVISIONING to READY within the per-tier timeout. Dashboard launch panel shows openclaw_boot_failed or openclaw_ready_timeout. The cm-runner agent log contains:

WARN openclaw_not_ready_yet attempts=N max=10 ERROR openclaw_boot_failed reason=startup_timeout | gateway_unreachable | out_of_memory | provider_key_missing

Likely causes

ReasonWhat happenedFix
startup_timeoutOpenClaw process is still doing first-launch setup when the supervisor gave upBump container memory; check disk I/O speed.
gateway_unreachableThe container can’t reach the configured CM_GATEWAY_PROXY_URL for LLM callsVerify the value is set and reachable from inside the container.
out_of_memoryKernel OOM-killed the OpenClaw processHost needs more RAM or fewer concurrent sessions.
provider_key_missingThe session needs an LLM provider key the agent didn’t injectSee Missing Credentials.
Skip flags missingOpenClaw is spawning ~19 worker processes and exhausting RAMConfirm the executor is setting OPENCLAW_SKIP_* env vars (already the default in cm-runner).

Fix

Step 1 — Read OpenClaw’s own log

The cm-runner executor streams the container’s stdout/stderr to the agent log, prefixed with the session ID:

docker logs cm-runner --tail 500 | grep "session_$SESSION_ID"

Look for the last non-info line before the timeout — it almost always identifies the failure (missing key, port collision, malformed config).

Step 2 — Confirm gateway reachability from inside the container

docker exec session_$SESSION_ID curl -s -o /dev/null -w "%{http_code}\n" \ "$CM_GATEWAY_PROXY_URL/health"

A 200 confirms the LLM proxy path is open. Anything else means the agent’s CM_GATEWAY_PROXY_URL either points to the wrong host or the container’s network can’t reach it.

Step 3 — Check memory headroom

docker stats --no-stream session_$SESSION_ID

The OpenClaw skip-flag profile (default in current cm-runner) keeps each session at ~450 MB steady-state. If you see > 2 GB sustained, either:

  • The session is running a heavy workload — expected, scale the host.
  • A non-default OpenClaw configuration is spawning the full worker fan-out — unset any OPENCLAW_SKIP_*=0 overrides.

Step 4 — Rule out a bad image

If the same template fails on every machine, the published image is the suspect. Pin the template back to a known-good tag:

# Current pinned tag (default for new templates). curl -X PATCH \ -H "X-CM-API-Key: cm_sk_your_key_here" \ -H "Content-Type: application/json" \ https://api.curate-me.ai/gateway/admin/runners/templates/$TEMPLATE_ID \ -d '{"image_ref": "ghcr.io/curate-me-ai/openclaw-base:v2026.5.22"}' # Rollback to the previous pin (2026.4.2 stays available for 14 days # post-canary as the approved rollback target). curl -X PATCH \ -H "X-CM-API-Key: cm_sk_your_key_here" \ -H "Content-Type: application/json" \ https://api.curate-me.ai/gateway/admin/runners/templates/$TEMPLATE_ID \ -d '{"image_ref": "ghcr.io/curate-me-ai/openclaw-base:v2026.4.2"}'

(See Runner Operations runbook § 5 for the full template / skill-pack rollback procedure.)

Where to find logs

# Agent + OpenClaw merged docker logs cm-runner --tail 1000 | grep -E "openclaw|session_" # Container itself docker logs session_$SESSION_ID --tail 200

Server-side: runner_state_transition_failed with the rejected target state.