Troubleshooting Your Machines

Common issues you may encounter when setting up and running connected machines, and how to fix them.

Registration Issues

Machine stuck in REGISTERING state

Symptom: Machine appears in the dashboard but never transitions to ONLINE.

Causes and fixes:

Network connectivity — Verify the machine can reach the gateway:
```
docker exec curateme-agent curl -s https://api.curate-me.ai/health
```
If this fails, check your firewall rules allow outbound HTTPS (port 443).
Expired token — Registration tokens expire after 1 hour. Generate a new one from the dashboard and restart the container with the new token.

DNS resolution — Ensure DNS works inside the container:


docker exec curateme-agent nslookup api.curate-me.ai

“Token already used” error

Each registration token is single-use. If the container crashed after using the token but before completing registration, you need to:

Remove the old container: docker rm -f curateme-agent
Generate a new token from the dashboard
Start a new container with the fresh token

”Token expired” error

Tokens are valid for 1 hour. Generate a new one and try again. You can extend the TTL via API:


curl -X POST https://api.curate-me.ai/gateway/admin/byovm/register-token \
  -H "X-CM-API-Key: cm_sk_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{"ttl_seconds": 7200}'

Connection Issues

Machine shows STALE or DEAD

Symptom: Machine was ONLINE but is now STALE (5+ min no heartbeat) or DEAD (15+ min).

Fixes:

Check container is running:
```
docker ps | grep curateme-agent
```
Check container logs:
```
docker logs curateme-agent --tail 50
```
Restart the container:
```
docker restart curateme-agent
```
Check network — Temporary network issues cause missed heartbeats. The agent auto-recovers when connectivity returns.

WebSocket connection drops

Symptom: Real-time features stop working, dashboard shows stale data.

Cause: Corporate proxies or firewalls may terminate long-lived WebSocket connections.

Fix: The agent falls back to HTTP polling automatically. For better real-time performance, ensure WebSocket connections to wss://api.curate-me.ai are allowed through your network.

Resource Issues

High CPU/memory from agent container

The OpenClaw container is lightweight (~200 MB idle). High resource usage usually means an agent is actively processing a job.

To check:


docker stats curateme-agent --no-stream

To limit resources:


docker run -d \
  --name curateme-agent \
  --cpus=2 \
  --memory=4g \
  -e CM_REGISTRATION_TOKEN="..." \
  -e CM_GATEWAY_URL="https://api.curate-me.ai" \
  ghcr.io/curate-me-ai/openclaw-base:latest

Disk space running low

The agent stores session data and logs locally. Clean up old data:


# Remove stopped containers
docker container prune
 
# Remove unused images
docker image prune
 
# Check container disk usage
docker system df

Job Execution Issues

Jobs stuck in “queued” state

Cause: No ONLINE machines available to pick up the job.

Fixes:

Verify at least one machine shows ONLINE in the dashboard
Check the agent is polling: docker logs curateme-agent | grep "poll"
Ensure the job’s required_labels match a machine’s capability labels

Jobs failing repeatedly

Cause: Command execution errors on the machine.

To debug:

Check the job’s last_error field in the dashboard (Sessions tab)
Check agent logs: docker logs curateme-agent --tail 100
Verify the required tools are available in your container image:
- openclaw-base: Shell, git, Node.js
- openclaw-web: Browser (Playwright)
- openclaw-locked: No external tools

Job timeout

Jobs in dispatched state for over 5 minutes are automatically requeued. If jobs consistently timeout:

Increase timeout in the dispatch request
Check machine health — a slow heartbeat means the machine may be overloaded
Scale the fleet — add more machines to distribute load

Governance Issues

”Rate limit exceeded” errors

Your machine’s LLM requests are being throttled by the governance chain.

Fix: Increase the rate limit in Runners > Your Machines > Machine > Policies:


curl -X PUT https://api.curate-me.ai/gateway/admin/byovm/agents/{agent_id}/policies \
  -H "X-CM-API-Key: cm_sk_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{"rate_limit_rpm": 300}'

“Daily budget exceeded” errors

The machine has hit the daily spending cap.

Fix: Increase the daily budget or wait for midnight UTC reset:


curl -X PUT https://api.curate-me.ai/gateway/admin/byovm/agents/{agent_id}/policies \
  -H "X-CM-API-Key: cm_sk_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{"daily_budget_usd": 50.0}'

PII detected in requests

The governance chain is blocking requests that contain potential PII or secrets.

To investigate: Check the gateway logs (Gateway > Logs) for the specific PII patterns detected. Common triggers:

API keys or tokens in prompts
Email addresses
Phone numbers
Social security numbers

To allow: If these are intentional (e.g., processing customer data), adjust PII scanning rules in Gateway > Policies.

Container-Specific Issues

macOS Docker Desktop performance

Docker Desktop on macOS uses a Linux VM which adds overhead.

Tips:

Allocate at least 4 GB RAM to Docker Desktop
Use --platform linux/amd64 if on Apple Silicon for best compatibility
Consider running the agent natively if Docker overhead is an issue

Windows container issues

Windows containers require Docker Desktop with Windows containers enabled.

Common fixes:

Switch Docker Desktop to Windows containers mode
Use ghcr.io/curate-me-ai/openclaw-windows:latest image
Ensure Hyper-V is enabled in Windows features

Getting Help

If none of these solutions resolve your issue:

Check the Runners API Reference for endpoint details
Review Runners Security for policy configuration
Visit the Dashboard to check machine status and logs
Contact support at support@curate-me.ai with your machine ID and error details