Troubleshooting

If something isn’t working, this page covers the ten most common failure modes we’ve seen during onboarding. Each has a quick diagnosis check and a concrete fix. If none match, open a support ticket and attach a diagnostics bundle (see #7).

For runner-specific failures the dashboard surfaces, each has a dedicated page with logs to grep, env vars to check, and recovery steps:

1. The runner agent won’t start on my machine

Diagnose. Run the bundled doctor command:


cm-runner doctor --gateway-url https://api.curate-me.ai

It checks Docker, socket permissions, architecture, disk, memory, gateway reachability, and required environment in under 5 seconds, and prints actionable remediation for each failure.

Common causes:

Docker isn’t running → start Docker Desktop, or sudo systemctl start docker.
Current user can’t talk to the Docker socket → sudo usermod -aG docker $USER then log out and back in.
Outbound HTTPS to api.curate-me.ai is blocked → check corporate proxy / VPN egress rules.

2. My registration token says “Invalid” or “expired”

Tokens are single-use by default and expire after one hour. Each new machine needs its own token.

Fix: generate a fresh token in Runners → Your Machines → Connect Machine. The dashboard’s install command pre-fills it. If you’re using the API directly, POST /gateway/admin/runners/byovm/register-token returns a new one.

If a token has been revoked by an org admin (e.g. after rotation), it will say “revoked” specifically. Ask the admin to issue a new one.

3. A template launch says `admin_role_required_for_tool_profile`

That template uses the full_vm_tools profile (full VM access — shell, filesystem, browser), which the platform restricts to admin / operator / owner roles on every launch path. Non-admin members cannot launch it even if it’s visible in the gallery.

Fix: ask an admin in your org to launch it, OR pick a template with a more restricted profile (locked, web_automation, or developer). The launch UI shows the profile next to each template.

4. A run says `daily_budget_exceeded` or `cost_per_request_exceeded`

Your org has hit one of its plan-tier daily caps. See Settings → Plan for the live limits. Free tier: $10/day; Starter: $25/day; Growth: $250/day.

Fix:

Raise the org’s daily budget (admin only) in Guardrails → Budgets.
Pick a cheaper model — the template’s launch modal shows current estimates per model.
Upgrade your plan — the Settings → Billing page lists each tier’s entitlements.

5. My BYOVM agent is “Online” but the run sits at `provisioning` forever

Diagnose. Check the launch progress panel and read the timeline at the bottom of the run detail card. If it stalls at image_pull for more than 60 seconds, you’re likely blocked at network egress — or the image just isn’t cached yet on a fresh machine. See Slow First Launch for the warm-pull path, and Image Pull Failed for the error-suffix table.

Fix:

Confirm the host can pull ghcr.io/curate-me-ai/* images: docker pull ghcr.io/curate-me-ai/cm-runner:latest
If the pull fails with denied, your corporate firewall blocks ghcr.io — configure an HTTPS proxy in /etc/docker/daemon.json or mirror the image to your private registry and override runner_image in the install command.

6. The template I want is grayed out with “Preview”

That template hasn’t been certified for production yet. Preview templates work but may have rough edges (missing inputs, partial output, surprise costs). The public gallery is filtered to certified templates only — each one has been smoke-tested by CI and carries an owner, a runbook URL, and a cost / duration estimate.

Fix: if you need it certified, open a ticket. Until it’s certified, admins can still launch it from Runners → Templates → All by unchecking the “Public only” filter.

7. I need to attach something to a support ticket

Every failed run has a Copy diagnostics button on its detail page. It generates a JSON support bundle with org/run/runner/agent/template IDs, the failure reason, recent timeline events, runner capabilities, and the agent version — already redacted of all secrets by the gateway.

Paste it directly into a support ticket. No screen-sharing needed.

For runner-side issues, cm-runner doctor --json produces a machine-readable health report you can also attach. Tokens and keys are masked automatically.

8. The dashboard shows `Audit logs (Preview)` — where’s my real data?

You’re viewing the demo dataset (URL contains ?demo=1). The yellow “Preview data” banner says so explicitly. Click Show my events to get back to your org’s live audit feed. Real entries appear there as soon as your first run, token rotation, or policy change happens.

9. The cm-runner test suite hangs locally

Pre-launch, the cm-runner test suite was known to hang at TestSessionCRUD::test_create_session_success. That’s been fixed — every Docker call now has an env-tunable timeout and the polling loops are bounded by a wall-clock budget.

If you see a hang now, you’re on a stale checkout. Pull main (or develop), reinstall (pip install -e packages/cm-runner[dev]), and re-run:


python3 -m pytest packages/cm-runner/tests -q --maxfail=1

Expect the suite to finish in under a second on a host without Docker (Docker-only tests auto-skip).

10. How do I cancel my subscription / get a refund?

Cancel: Settings → Billing → Cancel subscription. Your plan keeps working until the end of the current billing period.

Refund: within 14 days of a charge, email billing@curate-me.ai with your org ID and the Stripe receipt. Charges older than 14 days are case-by-case but we default to “yes” for design-partner customers.

Still stuck? Email support@curate-me.ai with your org ID, the run ID (if applicable), and a “Copy diagnostics” bundle from the failed run. Median first-response time during business hours is under 2 hours.