Runbook: Family Manager M365 Email Lane Setup
Owner: Platform Team (Family Manager dogfood) Backup owner: On-call engineer Last validated: 2026-06-14 Validation method: Live E2E on the curate-me.ai tenant — real forwarded email → pending approval (proposal) in the iOS app Severity trigger: SEV3 (consumer inbound lane inoperable, no data-loss) Customer impact: Forwarded household email is silently not ingested; no proposals are created from email until the lane is restored Required access: Microsoft 365 / Entra admin center (Global Admin), Azure portal, SSH (VPS
curateme@178.105.8.25), MongoDB on the VPS Related services: curateme-backend-b2b, curateme-celery-beat, Microsoft Graph API Time to complete: ~30–45 minutes (one-time tenant setup); ~5 minutes to wire each additional household org
This runbook stands up the consumer inbound-email lane for Family Manager: a single license-free shared mailbox (assistant@curate-me.ai) receives every household’s forwarded mail via Exchange plus-addressing (assistant+<token>@curate-me.ai). A Microsoft Graph change-notification subscription fires the platform webhook on each new message; the dispatcher routes the message into the mobile alias pipeline, which resolves the plus-tag to an org, gates on a verified sender, and creates a pending approval (proposal) — exactly the HITL surface the iOS app reads.
This is the active consumer lane (it replaced the Resend lane, which is the dormant default-off fallback). It is distinct from the per-org premium identity provisioning in Agent Identity Provisioning — that flow creates a licensed per-org user; this flow uses one shared, license-free mailbox for all households.
Live facts from the production setup (referenced throughout):
- Shared mailbox:
assistant@curate-me.ai, Graph user id06af3503-5c27-4ea8-8add-d65a65d29b2e - Azure app registration: “Curate-Me Agent Identity”, client id
bd0a8fc3-13a6-4035-8a41-e622ce120ac8, tenant id1488d628-c536-4788-978b-51b679de1a39
Prerequisites
Before starting, confirm you have:
- M365 / Entra admin access — Global Admin in the
1488d628-…tenant (needed to create the mailbox, grant application permissions, and consent on behalf of the org). - Azure portal access to the app registration “Curate-Me Agent Identity” (client id
bd0a8fc3-13a6-4035-8a41-e622ce120ac8). - SSH access to the platform VPS:
ssh curateme@178.105.8.25. Env vars live in~/platform/.env.production(never commit values). - The Graph client secret value, kept locally at
~/Documents/fm-dogfood/m365_graph_secret.txt(this file holds the value ofMICROSOFT_GRAPH_CLIENT_SECRET; the first one was rotated after a leak — never paste it into a doc, ticket, or transcript). - A test sender mailbox you can send from (the operator’s own address), to drive the end-to-end verification.
Step 1: Create the license-free shared mailbox
- Open the Exchange admin center > Recipients > Mailboxes > Add a shared mailbox.
- Create the mailbox:
- Display name: Family Manager Assistant
- Email address:
assistant@curate-me.ai
- Do not assign a license — a shared mailbox is license-free, which is the whole point of this lane (per-household plus-tags need no Exchange provisioning, only a Mongo alias row).
- Note the mailbox’s Graph user object id. For the production tenant this is
06af3503-5c27-4ea8-8add-d65a65d29b2e.
Verification: Resolve the mailbox in Graph (a freshly created shared mailbox takes several minutes to appear in the Graph directory — retry until it returns):
curl -s -H "Authorization: Bearer $TOKEN" \
"https://graph.microsoft.com/v1.0/users/assistant@curate-me.ai?\$select=id,userPrincipalName,mail" | jqThe id field must equal 06af3503-5c27-4ea8-8add-d65a65d29b2e. If you get a 404, wait and retry — the new shared mailbox has not propagated to Graph yet. Do not proceed until it resolves.
Step 2: Create / verify the Azure app registration + application permissions
The app registration “Curate-Me Agent Identity” (client id bd0a8fc3-13a6-4035-8a41-e622ce120ac8) authenticates app-only via the MSAL client-credentials flow (GraphConfig / GraphClient._get_access_token in src/services/agent_identity/graph_client.py).
- In the Azure portal , open App registrations > Curate-Me Agent Identity.
- Under API permissions, confirm these Application (app-only) Microsoft Graph permissions are present:
Mail.ReadWrite— read inbox messages, delete after ingestMail.Send— send the sender-verification link and any agent repliesUser.ReadWrite.All— user lookups / per-org premium provisioningOrganization.Read.All— license SKU / org reads
- Click Grant admin consent for Curate-Me and confirm every permission shows a green “Granted for Curate-Me” status. (Application permissions are inert without admin consent — the Graph token will be issued but every call 403s.)
Verification: The API permissions blade shows all four permissions with Granted status and no pending consent banner.
Step 3: Create / verify the client secret
- Still in the app registration, open Certificates & secrets > Client secrets.
- If there is no current secret (or it is near expiry), click New client secret, give it a description and an expiry, and copy the value immediately — Azure shows it only once.
- Store the value in
~/Documents/fm-dogfood/m365_graph_secret.txtlocally. Never paste it into this runbook, a PR, a ticket, or a chat transcript. (The original secret was leaked once and had to be rotated.)
Verification: The secret value will be validated end-to-end in Step 5 (the from_env() token acquisition succeeds). For now, confirm a non-expired secret exists in the Client secrets list.
Step 4: Set the VPS environment variables
SSH to the VPS and edit ~/platform/.env.production. Set the Graph credentials, the shared-mailbox ingest vars, the webhook shared-secret, and the license SKU. Set only the variable names below — paste the actual secret value from ~/Documents/fm-dogfood/m365_graph_secret.txt, never from this doc.
ssh curateme@178.105.8.25
# edit ~/platform/.env.production:| Env var | Where the value comes from |
|---|---|
MICROSOFT_GRAPH_TENANT_ID | 1488d628-c536-4788-978b-51b679de1a39 |
MICROSOFT_GRAPH_CLIENT_ID | bd0a8fc3-13a6-4035-8a41-e622ce120ac8 |
MICROSOFT_GRAPH_CLIENT_SECRET | value from ~/Documents/fm-dogfood/m365_graph_secret.txt |
M365_WEBHOOK_CLIENT_STATE | a strong random shared secret you generate (Graph echoes it on every notification; the webhook rejects mismatches) |
M365_DEFAULT_LICENSE_SKU_ID | Business/premium SKU id (only used by per-org premium provisioning, not the shared lane) |
M365_CONSUMER_INGEST_USER_ID | 06af3503-5c27-4ea8-8add-d65a65d29b2e (the shared mailbox’s Graph id) |
M365_CONSUMER_INGEST_ADDRESS | assistant@curate-me.ai |
Notes grounded in the code:
- Both
M365_CONSUMER_INGEST_USER_IDandM365_CONSUMER_INGEST_ADDRESSmust be set to arm the lane. With either unset, the M365 ingest branch ininbound_dispatcher._match_ingest_recipientsnever matches and no consumer mail is ingested (.env.examplelines 815–829). M365_CONSUMER_INGEST_ADDRESSmust contain an@and its local part must have no+. The alias generator insrc/services/mobile/inbound_alias.py(_ingest_address_base) splits it into(local, domain)and producesassistant+<token>@curate-me.ai.M365_WEBHOOK_CLIENT_STATEis read by both the subscription creator (provisioning_service.ensure_inbound_subscription) and the webhook validator (agent_identity_webhook._get_expected_client_state). If it is empty,ensure_inbound_subscriptionreturns{"status": "skipped", "reason": "no_client_state"}and no subscription is created.- Optional:
M365_WEBHOOK_NOTIFICATION_URLoverrides the notification URL. If unset, the code uses_DEFAULT_WEBHOOK_NOTIFICATION_URL=https://api.curate-me.ai/api/v1/platform/agent-identity/m365-webhook(set this only for a staging/local tunnel).
Recreate the backend so it picks up the new env:
# from your local machine
./scripts/deploy-to-vps.sh --backendVerification: Confirm the container has all three Graph vars (the renewal task no-ops unless all three are present — see m365_subscription_renewal._graph_env_configured):
ssh curateme@178.105.8.25
docker exec curateme-backend-b2b env | grep -E \
'MICROSOFT_GRAPH_(TENANT|CLIENT)_ID|M365_CONSUMER_INGEST_(USER_ID|ADDRESS)|M365_WEBHOOK_CLIENT_STATE' \
| sed 's/=.*/=<set>/'You should see MICROSOFT_GRAPH_TENANT_ID, MICROSOFT_GRAPH_CLIENT_ID, M365_WEBHOOK_CLIENT_STATE, M365_CONSUMER_INGEST_USER_ID, and M365_CONSUMER_INGEST_ADDRESS all marked <set> (the sed masks values so nothing secret prints).
Step 5: Confirm the webhook route is public-path exempt
Microsoft Graph calls the webhook with no auth headers — it validates itself with the clientState shared secret. The route must therefore be exempt from TenantIsolationMiddleware, or every Graph notification (and the subscription-creation validation handshake) gets a 401/redirect and the subscription create fails.
This is already wired: /api/v1/platform/agent-identity/m365-webhook is listed in PUBLIC_PATHS in src/middleware/tenant_isolation.py (line 89). Confirm it is still present on the deployed image:
ssh curateme@178.105.8.25
docker exec curateme-backend-b2b \
grep -n "agent-identity/m365-webhook" src/middleware/tenant_isolation.pyVerification: The grep returns the line inside PUBLIC_PATHS. As a live check, the Graph validation handshake must echo the token back as text/plain 200:
curl -s -o /dev/null -w "%{http_code} %{content_type}\n" \
"https://api.curate-me.ai/api/v1/platform/agent-identity/m365-webhook?validationToken=ping"Expect 200 text/plain (the handler echoes the token in agent_identity_webhook.m365_webhook). A 401/307/404 means the route is not public-path exempt — fix PUBLIC_PATHS and redeploy before continuing.
Step 6: Create the Graph change-notification subscription
The shared ingest mailbox has no agent_identities row (it belongs to no org), so its subscription is created by the renewal task’s _ensure_ingest_subscription path (stored under the platform_settings doc id m365_consumer_ingest_subscription). The fastest way to create it immediately is to run the renewal sweep once inside the backend container:
ssh curateme@178.105.8.25
docker exec -e PYTHONPATH=/app curateme-backend-b2b \
python -c "from src.tasks.m365_subscription_renewal import renew_m365_subscriptions; print(renew_m365_subscriptions())"This calls _ensure_ingest_subscription, which POSTs https://graph.microsoft.com/v1.0/subscriptions with resource = users/<id>/mailFolders('Inbox')/messages, changeType=created, clientState=<M365_WEBHOOK_CLIENT_STATE>, and an expiration of now + 4200 minutes (~70h), then persists the subscription_id in platform_settings.
The expected return is a summary dict, e.g. {'scanned': 0, 'renewed': 0, 'recreated': 0, 'created': 1, 'failed': 0} (the ingest mailbox counts into the same buckets). If you instead get {'skipped': 1}, the three MICROSOFT_GRAPH_* vars are not all set — go back to Step 4.
Per-org premium mailboxes (created via the agent-identity flow) get their subscription from
ProvisioningService.ensure_inbound_subscriptioninstead; the shared consumer lane uses the path above.
Verification: Confirm the stored subscription and that Graph has it live:
# Stored subscription id
docker exec curateme-backend-b2b mongosh "$MONGO_URI" --quiet --eval \
'db.platform_settings.findOne({_id:"m365_consumer_ingest_subscription"})'
# Graph sees it (use the same $TOKEN as Step 1)
curl -s -H "Authorization: Bearer $TOKEN" \
"https://graph.microsoft.com/v1.0/subscriptions" \
| jq '.value[] | {id, resource, expirationDateTime}'You should see a subscription whose resource is users/06af3503-5c27-4ea8-8add-d65a65d29b2e/mailFolders('Inbox')/messages with an expirationDateTime ~70h out.
Step 7: Confirm the renewal beat task is scheduled
Graph subscriptions expire after ~70h. The m365-subscription-renewal beat task renews them every 12 hours (two attempts fit inside one subscription lifetime, so one transient failure never causes an outage). It is registered in src/celery_app.py:
"m365-subscription-renewal": {
"task": "src.tasks.m365_subscription_renewal.renew_m365_subscriptions",
"schedule": crontab(minute=10, hour="*/12"),
}For active per-identity subscriptions it PATCH-renews and recreates on a Graph 404; for the shared ingest mailbox it runs _ensure_ingest_subscription (renew → recreate-on-404 → create-when-absent).
Verification: Confirm beat is running and emitting the sweep:
ssh curateme@178.105.8.25
docker logs curateme-celery-beat --since 13h 2>&1 | grep -i "m365-subscription-renewal"
docker logs curateme-celery-worker --since 13h 2>&1 | grep -i "m365_subscription_renewal_sweep\|m365_ingest_subscription_ensured"You should see the task being dispatched on the ~12h cadence and a m365_subscription_renewal_sweep log line with the summary counts.
Step 8: Wire a household org and seed a verified sender
A household must (a) have an active forwarding alias row and (b) a verified sender so the inbound gate (inbound_alias.is_verified_sender) lets the message through. For a dogfood org, use the one-time wiring script scripts/fm_m365_wire_org.py, which is idempotent and seeds the operator’s own address as a pre-verified sender.
The script lives at the repo root (scripts/fm_m365_wire_org.py), which is outside the curateme-backend-b2b image build context (the image builds from services/backend/), so it is not baked into the running container. Copy it into the container first, then exec it (run from the VPS ~/platform checkout):
ssh curateme@178.105.8.25
cd ~/platform
docker cp scripts/fm_m365_wire_org.py curateme-backend-b2b:/app/scripts/fm_m365_wire_org.py
docker exec -e PYTHONPATH=/app curateme-backend-b2b \
python /app/scripts/fm_m365_wire_org.py \
--org org_xxxxxxxxxxxx \
--mailbox assistant@curate-me.ai \
--m365-user-id 06af3503-5c27-4ea8-8add-d65a65d29b2e \
--seed-sender you@example.com:member_xxxxxxxxThe script flips the fm_premium_identity per-org flag, adopts the mailbox as the org’s ACTIVE primary identity, links the assistant profile, ensures the inbound subscription, and (with --seed-sender EMAIL:MEMBER_ID) inserts a pre-verified mobile_allowed_senders row — the dogfood shortcut around the signed-link verification flow. Production senders should instead go through the in-app add_sender → signed-link verification flow.
The household’s plus-addressed alias (assistant+<token>@curate-me.ai) is created from the app’s Forwarding Setup screen (inbound_alias.create_alias), which is idempotent — one active alias per org.
Verification: Confirm the alias row and the verified sender exist for the org:
docker exec curateme-backend-b2b mongosh "$MONGO_URI" --quiet --eval '
print("alias:", JSON.stringify(db.mobile_inbound_aliases.findOne({org_id:"org_xxxxxxxxxxxx", status:"active"})));
print("senders:", JSON.stringify(db.mobile_allowed_senders.find({org_id:"org_xxxxxxxxxxxx"},{email:0}).toArray()));
'You should see one active alias whose address is assistant+<token>@curate-me.ai (provider m365) and at least one sender row with status: "verified". (The projection drops email so no address prints to the terminal.)
Step 9: Verify an inbound email becomes a proposal (end-to-end)
From the verified sender address, send an actionable email (e.g. subject “Soccer practice Tuesday 4pm”) to the household’s plus-addressed alias assistant+<token>@curate-me.ai.
What happens, grounded in the code:
- Graph delivers to the shared mailbox and POSTs a notification to the webhook. Live Graph notifications use the capitalized, no-mailFolders resource shape
Users/<id>/Messages/<id>— even though the subscription was created onusers/<id>/mailFolders('Inbox')/messages.agent_identity_webhook._extract_user_idmatches theuserssegment case-insensitively, so both shapes resolve. inbound_dispatcher.process_inbound_messagesees the message is forM365_CONSUMER_INGEST_USER_IDand calls_route_ingest_mailbox_message, which fetches the message, matches the plus-tagged recipient (_match_ingest_recipients), and routes intoinbound_email.process_m365_alias_inbound.- The alias pipeline resolves the plus-tag → org (
inbound_alias.resolve_alias), gates onis_verified_sender, then creates a pending approval (proposal). - After a handled outcome the message is deleted from the shared mailbox (
graph_client.delete_message) so household mail never accumulates in the shared inbox.
Verification (two ways):
# Logs — content-free machine codes only (no sender/subject/body)
ssh curateme@178.105.8.25
docker logs curateme-backend-b2b --since 5m 2>&1 | grep -iE \
"webhook_notification_received|ingest_routed_to_alias_pipeline|graph_message_deleted"Look for webhook_notification_received → ingest_routed_to_alias_pipeline (with status set) → graph_message_deleted.
# A pending approval (proposal) now exists for the org
docker exec curateme-backend-b2b mongosh "$MONGO_URI" --quiet --eval \
'db.approvals.find({org_id:"org_xxxxxxxxxxxx", status:"pending"}).sort({created_at:-1}).limit(3).toArray()'Finally, open the iOS Family Manager app signed in as that household and confirm the new item appears in the approval / Inbox surface. That is the lane working end to end.
Rollback / If it goes wrong
Disarm the lane (fastest safe rollback): unset M365_CONSUMER_INGEST_USER_ID (or M365_CONSUMER_INGEST_ADDRESS) in ~/platform/.env.production and ./scripts/deploy-to-vps.sh --backend. With either unset, the ingest branch never matches, no consumer mail is ingested, and Graph notifications fall back to per-identity behavior. The Resend lane (FM_RESEND_INBOUND_ENABLED) remains the separate, default-off fallback.
| Symptom | Likely cause | Fix |
|---|---|---|
Subscription create returns {"status": "skipped", "reason": "no_client_state"} | M365_WEBHOOK_CLIENT_STATE not set in the container | Set it in .env.production, redeploy backend, re-run Step 6 |
Renewal task returns {'skipped': 1} | One of the three MICROSOFT_GRAPH_* vars missing (_graph_env_configured) | Set all three, redeploy, re-run |
Graph 403 Insufficient privileges on subscription create / message fetch | App permissions not admin-consented | Re-grant admin consent (Step 2) |
Graph 401 Invalid client secret | Secret expired or wrong value | Rotate in Azure, update MICROSOFT_GRAPH_CLIENT_SECRET from ~/Documents/fm-dogfood/m365_graph_secret.txt, redeploy |
| Validation handshake returns 401/307/404 | Webhook not public-path exempt | Confirm /api/v1/platform/agent-identity/m365-webhook in PUBLIC_PATHS, redeploy (Step 5) |
Webhook fires but nothing ingests; logs show webhook_client_state_mismatch | M365_WEBHOOK_CLIENT_STATE differs from the value used at subscription creation | Align the env var, then recreate the subscription (run Step 6 — _ensure_ingest_subscription recreates on next sweep) |
Email arrives but no proposal; logs show no ingest_routed_to_alias_pipeline | Sender not verified, or no active alias for the org | Re-run Step 8 (seed sender / confirm alias row); production senders must complete signed-link verification |
curl .../users/assistant@curate-me.ai returns 404 right after creating the mailbox | Shared mailbox not yet propagated to Graph | Wait several minutes and retry — a new shared mailbox takes minutes to appear in Graph |
| Inbound stopped after ~3 days | Subscription expired and renewal isn’t running | Confirm curateme-celery-beat is up and the m365-subscription-renewal task is dispatching (Step 7); run Step 6 manually to recreate |
If a single message fails to process, the dispatcher skips the delete so the mailbox copy is the only replay artifact — but Graph does not re-notify, so the message will only be reprocessed if you re-trigger it; do not rely on automatic redelivery.
Related
- Agent Identity Provisioning — the per-org premium licensed-mailbox flow (this runbook is the shared, license-free consumer lane)
scripts/fm_m365_wire_org.py— one-time per-org wiring script at the repo root (copy into the container withdocker cp scripts/fm_m365_wire_org.py curateme-backend-b2b:/app/scripts/, then run viadocker exec -e PYTHONPATH=/app curateme-backend-b2b python /app/scripts/fm_m365_wire_org.py)services/backend/src/services/agent_identity/provisioning_service.py—ensure_inbound_subscription+_DEFAULT_WEBHOOK_NOTIFICATION_URLservices/backend/src/api/routes/platform/agent_identity_webhook.py— them365-webhookroute +_extract_user_id(handles the capitalizedUsers/<id>/Messages/<id>live shape)services/backend/src/tasks/m365_subscription_renewal.py— the 12h renewal sweep (crontab(minute=10, hour="*/12")) +_ensure_ingest_subscriptionservices/backend/src/services/mobile/inbound_alias.py—_ingest_address_base/ plus-addressing alias generation + sender verificationservices/backend/src/services/agent_identity/graph_client.py—GraphConfig.from_env,create_subscription,delete_message