Machine Offline

Symptom

A machine that was previously Online now shows Offline (or Stale) in Runners → Your Machines. Active sessions on that machine fail to launch with agent_unreachable.

Likely causes

Cause	What you’d see in agent logs	Fix
Network blip	Heartbeat HTTP errors (`connection_reset`, `dial_timeout`)	Usually self-heals on next heartbeat (30s). If it lasts > 5 min, check egress.
Container exited	No agent logs at all (`docker ps` doesn’t show `cm-runner`)	Restart the container — see below.
Host crashed / rebooted	Host SSH refuses connections too	Boot the host, then `docker start cm-runner`.
Docker daemon hung	Agent process up but heartbeats fail with `EOF` from socket	Restart Docker (`systemctl restart docker`) then the agent.
Agent OOM-killed	`OOMKilled` in `docker inspect cm-runner`	Bump host RAM or remove other heavy containers.
Gateway-side state drift	Agent thinks it’s healthy; dashboard says offline	See State drift below.

Fix

Step 1 — Confirm the agent process


docker ps -a --filter name=cm-runner --format "table {{.Names}}\t{{.Status}}\t{{.RunningFor}}"

If Status is Exited, restart it:


docker start cm-runner
docker logs cm-runner --tail 50 -f

If Status is Up but the dashboard still says offline, check egress:


docker exec cm-runner curl -s -o /dev/null -w "%{http_code}\n" \
  https://api.curate-me.ai/health

A non-200 means the gateway is unreachable from inside the container — inspect host firewall / DNS / TLS.

Step 2 — Force a fresh heartbeat


docker exec cm-runner cm-runner agent --send-heartbeat-now

The dashboard should flip back to Online within ~5s.

Step 3 — Last resort: re-register

If the agent is healthy but the control plane no longer knows about it (e.g. the agent’s persistent state was wiped), generate a fresh registration token and re-register:


docker rm -f cm-runner
docker volume rm cm-runner-data   # clears the saved agent_id
# ... then run the install command from /quickstart/connect-your-machine

State drift

If the agent log shows heartbeat_sent every 30s but the dashboard still says offline, the gateway’s stale-detection job is misclassifying the agent. File a support ticket with:

Agent ID (visible in docker logs cm-runner | grep agent_id)
Org ID
Timestamp of the last heartbeat the agent thinks it sent
Last 50 lines of docker logs cm-runner

Where to find logs


# Agent side
docker logs cm-runner --tail 200 -f | grep -E "heartbeat|registered|disconnect"
 
# Server side (support team)
./scripts/errors by-source gateway | grep byovm_agent