Runbook: Self-Service Onboarding
Owner: Platform Team Backup owner: On-call engineer Last validated: 2026-05-15 (Phase 0 turnkey audit) Validation method: Smoke test (
tests/integration/test_signup_smoke.py) + manual signup Severity trigger: SEV2 (blocked signups directly impact PLG funnel) Customer impact: New users cannot create accounts or activate; existing users unaffected Required access: MongoDB (read), Redis (read), VPS shell (write) Related services: curateme-backend-b2b, dashboard
Self-service onboarding is the entry point of the PLG funnel. A new user posts to POST /api/v1/platform/onboard, the request passes anti-spam layers, creates a user + org + governance policy + API key, and returns active credentials. Any breakage here silently kills signup conversion. This runbook covers the architecture, the anti-spam layers, the auto-activate feature flag, and how to diagnose / fix common failures.
Architecture overview
The signup endpoint lives at services/backend/src/api/routes/platform/onboard.py. Three relevant routes:
| Route | Purpose |
|---|---|
POST /api/v1/platform/onboard | Self-service signup — creates user + org + membership + governance policy + API key |
POST /api/v1/platform/onboard/approve/{user_id} | Admin activates a pending user (used when SIGNUP_AUTO_ACTIVATE=False) |
GET /api/v1/platform/onboard/verify-email?token={token} | Marks email verified; activates account if auto-activate is on |
The behavior depends on the SIGNUP_AUTO_ACTIVATE feature flag (see below).
The SIGNUP_AUTO_ACTIVATE flag
This is the most critical config in the funnel. Defined at services/backend/src/config/feature_flags.py:444, default value at feature_flags.py:784:
FeatureFlag.SIGNUP_AUTO_ACTIVATE: True, # Auto-activate — ON for self-service free tier (no credit card)Default: True. New signups immediately get status="active" + an issued API key + a governance policy.
If set to False (override via env or org_feature_flag_overrides): new signups land in status="pending" and must be activated manually by an admin via POST /api/v1/platform/onboard/approve/{user_id}.
Regression guard: unit test tests/unit/test_feature_flags.py:99 (test_signup_auto_activate_enabled) asserts the default is True. The integration smoke test at tests/integration/test_signup_smoke.py exercises the end-to-end golden path.
Do NOT flip this off in production unless you’re intentionally rolling out a waitlist. The dashboard does not currently surface “pending” accounts well — pending users will land on /welcome and see a confusing state.
Anti-spam layers (checked in order, cheapest first)
The endpoint stacks 5 anti-spam layers before any DB write. Each can produce a fake-success (return 200 to confuse the bot) or a 400/422 error:
| Layer | Check | Failure mode | Logged event |
|---|---|---|---|
| 1. Honeypot | Hidden website field; bots auto-fill | Fake success returned (status=pending, fake IDs) | onboard_rejected_honeypot |
| 2. Submit-too-fast | Form t timestamp; under 2s = bot | Fake success returned | onboard_rejected_too_fast |
| 3. IP rate limit | 5 signups/IP/hour | 429 Too Many Requests | signup_rate_limit_exceeded |
| 4. Cloudflare Turnstile | Proof-of-humanity token | 400 with “Verification failed” | turnstile_verification_failed |
| 5. Disposable email | Pydantic validator | 422 Unprocessable Entity (rejected at schema layer) | (no signup attempt logged) |
Local dev: if TURNSTILE_SECRET_KEY is empty, layer 4 short-circuits to success — Turnstile is skipped. This is intentional for dev but means tests must explicitly handle Turnstile mocking.
Common failures and diagnosis
Symptom: every signup returns 400 “Verification failed”
Cause: Turnstile is misconfigured.
# On VPS:
echo $TURNSTILE_SECRET_KEY # must be set in production
echo $TURNSTILE_SITE_KEY # must match dashboard env
# Check Cloudflare dashboard for Turnstile widget status — disabled widget returns failureFix: rotate keys in Cloudflare → update .env on VPS → restart curateme-backend-b2b.
Symptom: every signup succeeds with status="pending" but never activates
Cause: SIGNUP_AUTO_ACTIVATE is OFF.
# On VPS:
poetry run python -c "
from src.config.feature_flags import FeatureFlag, is_feature_enabled
print('SIGNUP_AUTO_ACTIVATE:', is_feature_enabled(FeatureFlag.SIGNUP_AUTO_ACTIVATE))
"Fix: unset the override env var (FF_SIGNUP_AUTO_ACTIVATE=true or remove the entry from org_feature_flag_overrides Mongo collection), restart service. Verify the unit test still passes (pytest tests/unit/test_feature_flags.py::TestFeatureDefaults::test_signup_auto_activate_enabled).
Symptom: bot traffic is bypassing layers 1-2
Cause: Honeypot field accidentally removed from the form, or t timestamp not being sent.
# On the dashboard:
curl -s https://dashboard.curate-me.ai/signup | grep -E 'name="website"|name="t"'
# Should find both inputsFix: check apps/dashboard/app/signup/page.tsx — both fields must be present (website hidden, t populated with Date.now() at mount).
Symptom: duplicate users created on retry
Cause: Honeypot or too-fast layer returned a fake success with random IDs; user clicked submit again.
The endpoint should be idempotent: returning a real error on duplicate email is handled at line 432-440 of onboard.py. But fake-success paths bypass this check.
Fix: check the users collection for duplicates by email; deduplicate manually. Long-term fix tracked in #2095.
Symptom: signup succeeds but no API key in response
Cause: _generate_api_key() at onboard.py:157 succeeded but the Mongo write failed silently.
# Check the org has an api_key:
mongosh "$MONGO_URI" --eval 'db.api_keys.find({org_id: "org_XXX"}).pretty()'Fix: if missing, run the recreate script (see docs/runbooks/api-key-recovery.md — not yet written; raise an issue) OR have the user re-onboard.
Manually activate a pending user
If SIGNUP_AUTO_ACTIVATE=False was set (intentionally or by accident) and accounts are stuck:
# Find pending users:
mongosh "$MONGO_URI" --eval 'db.users.find({status: "pending"}, {_id: 1, email: 1, created_at: 1}).pretty()'
# Activate via admin API (requires platform-admin token):
curl -X POST "https://api.curate-me.ai/api/v1/platform/onboard/approve/usr_XXX" \
-H "Authorization: Bearer $PLATFORM_ADMIN_TOKEN"The approval endpoint at onboard.py:558+ activates user + org + membership, generates the API key, and creates the governance policy.
Smoke test
The end-to-end smoke test at services/backend/tests/integration/test_signup_smoke.py:
- POSTs to
/api/v1/platform/onboardwith a valid free-tier payload - Bypasses Turnstile (sets
TURNSTILE_SECRET_KEY=""via autouse fixture) - Asserts response 200,
status="active",api_key.startswith("cm_sk_") - Mocks heavy deps (auth service, DB, Stripe, governance policy, entitlements)
Run locally:
cd services/backend
poetry run pytest tests/integration/test_signup_smoke.py -vIn CI: runs on every PR that touches services/backend/src/api/routes/platform/onboard.py (per .github/workflows/verify-claims.yml related triggers).
Related
- #2095 — original signup-flow audit + acceptance criteria
- #2113 — lifecycle email (welcome, day-1, day-3, dormant) integrates with this flow once shipped
- #2114 — free trial mechanics — when the user clicks “Start trial”, the same onboarding endpoint fires; this runbook covers the underlying mechanism
- #2122 — agent identity (M365) is downstream consumer of signup events
Out of scope
This runbook does not cover:
- Org provisioning (handled by
b2b_auth_service.create_org) - API key rotation (separate runbook needed — file as follow-up)
- Stripe customer creation (handled by
metered_billing_service.create_customer) - Email verification flow specifics (see
verify-emailroute handler)