Machine Offline
Symptom
A machine that was previously Online now shows Offline (or Stale)
in Runners → Your Machines. Active sessions on that machine fail to
launch with agent_unreachable.
Likely causes
| Cause | What you’d see in agent logs | Fix |
|---|---|---|
| Network blip | Heartbeat HTTP errors (connection_reset, dial_timeout) | Usually self-heals on next heartbeat (30s). If it lasts > 5 min, check egress. |
| Container exited | No agent logs at all (docker ps doesn’t show cm-runner) | Restart the container — see below. |
| Host crashed / rebooted | Host SSH refuses connections too | Boot the host, then docker start cm-runner. |
| Docker daemon hung | Agent process up but heartbeats fail with EOF from socket | Restart Docker (systemctl restart docker) then the agent. |
| Agent OOM-killed | OOMKilled in docker inspect cm-runner | Bump host RAM or remove other heavy containers. |
| Gateway-side state drift | Agent thinks it’s healthy; dashboard says offline | See State drift below. |
Fix
Step 1 — Confirm the agent process
docker ps -a --filter name=cm-runner --format "table {{.Names}}\t{{.Status}}\t{{.RunningFor}}"If Status is Exited, restart it:
docker start cm-runner
docker logs cm-runner --tail 50 -fIf Status is Up but the dashboard still says offline, check egress:
docker exec cm-runner curl -s -o /dev/null -w "%{http_code}\n" \
https://api.curate-me.ai/healthA non-200 means the gateway is unreachable from inside the container — inspect host firewall / DNS / TLS.
Step 2 — Force a fresh heartbeat
docker exec cm-runner cm-runner agent --send-heartbeat-nowThe dashboard should flip back to Online within ~5s.
Step 3 — Last resort: re-register
If the agent is healthy but the control plane no longer knows about it (e.g. the agent’s persistent state was wiped), generate a fresh registration token and re-register:
docker rm -f cm-runner
docker volume rm cm-runner-data # clears the saved agent_id
# ... then run the install command from /quickstart/connect-your-machineState drift
If the agent log shows heartbeat_sent every 30s but the dashboard still
says offline, the gateway’s stale-detection job is misclassifying the agent.
File a support ticket with:
- Agent ID (visible in
docker logs cm-runner | grep agent_id) - Org ID
- Timestamp of the last heartbeat the agent thinks it sent
- Last 50 lines of
docker logs cm-runner
Where to find logs
# Agent side
docker logs cm-runner --tail 200 -f | grep -E "heartbeat|registered|disconnect"
# Server side (support team)
./scripts/errors by-source gateway | grep byovm_agent