Webwright Preview
Webwright Browser Agent is in preview. It ships behind the
webwright_preview per-org feature flag and is hidden from the runner
template gallery and autopilot launcher for orgs without the flag.
Reach out to hello@curate-me.ai to be
added.
Webwright is a code-as-action browser harness. The agent solves a web task by writing a rerunnable Playwright script and producing a fixed-shape evidence bundle: a plan, the final script, an action log, screenshots at each critical point, and an optional self-verification record. The script is the output — not just an answer.
The preview ships in two surfaces:
- A managed runner template named
Webwright Browser Agent(use the runner gallery to launch an interactive session). - An autopilot template with id
webwright_agent(use for one-shot or repeatable browser workflows triggered by Slack/Teams, the dashboard, or the API).
Both surfaces share the same image, tool profile, viewport, and artifact contract documented below.
Webwright vs Web Agent
Curate-Me already ships a web_agent autopilot template. Webwright is
not a replacement — they target different shapes of work.
| Dimension | web_agent | webwright_agent |
|---|---|---|
| Best for | Open-ended research, multi-source synthesis, narrative answers | Repeatable form-fill, filtering, structured extraction, lookup-with-evidence |
| Primary output | Markdown report with citations | A rerunnable Playwright script (final_script.py) plus evidence bundle |
| Determinism | Low — same task may produce different prose | Higher — the script is the contract; rerunning the script yields the same data |
| Screenshots | Optional, narrative-driven | Required at every critical point |
| Self-verification | Reviewer LLM critiques the report | Agent re-runs final_script.py end-to-end and checks artifacts |
| Failure mode | Reports may hallucinate when blocked | Agent must report a blocker with evidence rather than guess |
| Image | Default worker image | curate-me/openclaw-web:latest (Chromium + Playwright preinstalled) |
Reach for Webwright when the task fits this shape:
- “Find the cheapest refundable hotel in downtown Austin for June 12–14.”
- “Filter this job board to remote senior backend roles posted this week.”
- “Compare the listed monthly price of the Pro plan across these three SaaS sites.”
- “Look up the renewal fee on the official state DMV form page.”
Reach for web_agent when the task is:
- “Summarize what’s new in Playwright since v1.50.”
- “Find three case studies of teams migrating off Selenium.”
- “Compile a market overview of the headless browser landscape.”
When to use Webwright
Webwright is designed for the following workflow shapes:
- Ecommerce filter + sort — exact brand, size, price ceiling.
- Travel search — exact dates, refundable filter, cheapest price capture.
- Job board — exact location, remote filter, newest sort.
- SaaS pricing extraction — compare plan prices from official pages.
- Government form / public-info lookup — find official requirement and cite source URL.
- Marketplace listings — filter used/new, distance, price ceiling.
- Restaurant reservation availability — given a date/party size, surface the open slots.
- Academic paper lookup — exact venue, year, author.
- Multi-site comparison with final table — same query across N official pages.
The common thread: the task has a definite answer that lives behind a site’s normal UI, and the user wants the lookup to be rerunnable later (next week, next quarter) without rebuilding from scratch.
Launching Webwright
Managed runner (interactive)
From the dashboard, open Runners → New Runner and pick the
Webwright Browser Agent template (preview badge visible to enabled
orgs). The runner provisions on openclaw-web with the
browser_coding tool profile and lands at
/workspace/webwright_runs/ as the working directory. Use the
session terminal to issue the task — the workspace contract is
preloaded into the agent’s boot instructions.
Or, via API:
curl -X POST https://api.curate-me.ai/gateway/admin/runners/ \
-H "Content-Type: application/json" \
-H "X-CM-API-Key: cm_sk_your_key_here" \
-d '{
"template_name": "Webwright Browser Agent",
"ttl_seconds": 3600
}'Autopilot template (one-shot / scheduled)
curl -X POST https://api.curate-me.ai/api/v1/autopilot/run \
-H "Content-Type: application/json" \
-H "X-CM-API-Key: cm_sk_your_key_here" \
-d '{
"template_id": "webwright_agent",
"task": "Find the cheapest refundable hotel in downtown Austin for June 12-14 and save the evidence"
}'Python SDK:
from curate_me import CurateMe
client = CurateMe(api_key="cm_sk_your_key_here")
run = client.autopilot.start(
template_id="webwright_agent",
task="Find the cheapest refundable hotel in downtown Austin for June 12-14 and save the evidence",
)
print(run["task_id"], run["report_url"])The autopilot run produces both a standard autopilot report and the full Webwright artifact bundle inside the worker’s runner workspace.
The artifact contract
Every Webwright session writes into a single workspace tree under
/workspace/webwright_runs/<task_id>/. The shape is fixed — UIs,
support bundles, and the optional self-reflection step depend on it.
/workspace/webwright_runs/<task_id>/
├── plan.md # Critical points + verification checklist
├── final_script.py # The reusable, end-to-end Playwright script
└── final_runs/
├── run_1/
│ ├── final_script.py # Snapshot of the script as run
│ ├── final_script_log.txt # stdout + action log; final datum printed at the end
│ ├── screenshots/
│ │ ├── 01_search.png
│ │ ├── 02_filter_applied.png
│ │ └── 03_result.png
│ └── self_reflect_result.json # optional; checklist + screenshot review
└── run_2/
└── ...| File | Required | What it contains |
|---|---|---|
plan.md | Yes | The agent’s decomposition of the task into critical points — concrete claims that must be verified (e.g. “dates are June 12 check-in and June 14 check-out”, “refundable filter is on”, “price is lowest visible”). |
final_script.py | Yes | A self-contained Playwright script that, run with no human in the loop, reproduces the answer. Must launch Chromium, navigate to a known start URL, apply filters via site controls, take screenshots at each critical point, and print the final datum to stdout. |
final_runs/run_<n>/final_script.py | Yes | A snapshot of the script that was actually executed for run <n>. The agent re-runs the script from scratch — different snapshots may exist if the agent iterated. |
final_runs/run_<n>/final_script_log.txt | Yes | Captured stdout + stderr. Includes the action log and the final datum (the answer) printed on the last line. Secrets are redacted before storage. |
final_runs/run_<n>/screenshots/*.png | Yes | One PNG per critical point, viewport-only (1280×1800, full_page=False). Used as visual evidence the constraint was actually applied at runtime. |
final_runs/run_<n>/self_reflect_result.json | Optional | The agent’s own checklist verdict per critical point. v1 uses a deterministic checklist; native host vision is used when available. |
The final_script.py is what makes Webwright differ from a one-off
research run: rerunning it next quarter should produce the same shape
of answer against the same site, with fresh screenshots. Treat it as
the deliverable.
Expected limitations
Webwright is a preview. Several behaviors are intentional, not bugs:
- No CAPTCHA bypass. If a site presents a CAPTCHA, the agent records a screenshot, marks a blocker, and stops. It does not attempt to solve, evade, or proxy around CAPTCHAs.
- No paywall bypass. Content behind a paywall is treated as unreachable. The agent will not log in with credentials it was not given, and will not use cached cookies from another session.
- No login-gate bypass. If a site requires authentication and no credentials were provided (or required HITL was denied), the agent reports a blocker. Login flows with provided credentials are still human-in-the-loop on the shell channel.
- No bypass of access controls. Geofencing, IP blocks, and bot protection are treated as terminal blockers — the agent does not rotate identity to evade them.
- Exact filters may be unavailable on some sites. When a site lacks a control for an exact filter (e.g. price ceiling), the agent uses site sort controls (e.g. “sort by lowest price”) and applies the remaining filter in code. If neither is possible, it reports a blocker.
final_script.pymay not always rerun from scratch. Some sites inject anti-replay tokens or randomize selectors per session. The eval suite tracks rerun success rate; current target is 70% of successful scripts rerun cleanly once.- No PII or secret entry by default. The agent is instructed not to type secrets, credit card numbers, or PII into form fields unless the task explicitly provided them and HITL approves.
- No automated purchases, submissions, or destructive actions. These require human confirmation through the shell HITL channel.
The agent reports a blocker only after gathering evidence (screenshot + log entry) of the obstacle. If you see a “success” reported without artifacts that prove the critical points, that’s a defect — see Inspecting failures.
Inspecting failures
When a Webwright run fails or returns an unexpected answer, every artifact is browsable through the runner detail page. From the dashboard:
- Open Runners and pick the session.
- Open the Files tab.
- Navigate to
/workspace/webwright_runs/<task_id>/. - Read
plan.mdto understand what the agent set out to verify. - Open
final_runs/run_<n>/final_script_log.txtto see the action log and the final printed datum. - Step through
screenshots/to confirm each critical point was visible on screen at runtime. - If present, open
self_reflect_result.jsonfor the agent’s own per-checkpoint verdict.
Or, via API:
curl -sS "https://api.curate-me.ai/gateway/admin/runners/sessions/<session_id>/files?path=/workspace/webwright_runs" \
-H "X-CM-API-Key: cm_sk_your_key_here"Support bundle fields
If you escalate to Curate-Me support, the support bundle includes the
following Webwright-specific summary fields. Quote the task_id in
your ticket — support can then pull these without asking for screen
recordings:
| Field | Source | Purpose |
|---|---|---|
template_id | autopilot run / runner template | webwright_agent or Webwright Browser Agent |
tool_profile | runner session | Should always be browser_coding |
image_ref | runner session | curate-me/openclaw-web:latest |
artifact_root | runner workspace | /workspace/webwright_runs/<task_id>/ |
final_run_id | bundle | The most recent run_<n> directory |
screenshot_count | bundle | Count of PNGs in the final run |
final_script_log_excerpt | bundle | Last ~200 lines of final_script_log.txt, secrets redacted |
self_reflect_status | bundle, optional | Per-checkpoint verdict if self_reflect_result.json is present |
redaction_applied | bundle | true if any tokens were redacted before storage |
The bundle is stored in runner_output_bundles keyed by session_id.
Cost and time budgets
The preview is budget- and time-limited at the template level:
| Surface | Default cost cap | Default time cap |
|---|---|---|
| Managed runner template | $10 USD / session | 60 minutes |
Autopilot webwright_agent (per worker) | $4 USD | Inherits autopilot task budget |
Org-level gateway budgets, plan entitlements, and HITL gates apply on top, exactly as for any other runner. See the Governance Chain and Cost Tracking guides.
Disabling the preview
Org admins can opt out at any time by flipping the webwright_preview
feature flag for the org. While disabled:
- The
Webwright Browser Agentrunner template is hidden from the template gallery. - New
webwright_agentautopilot launches return403 webwright_preview_disabled. - In-flight sessions are allowed to complete unless a separate security stop is required.
From the dashboard, open Settings → Feature Flags → Webwright Preview and toggle off. Programmatic toggle is available through the admin API:
curl -X POST https://api.curate-me.ai/api/v1/admin/feature-flags/webwright_preview \
-H "Content-Type: application/json" \
-H "X-CM-API-Key: cm_sk_your_admin_key" \
-d '{"enabled": false}'To re-enable, repeat with "enabled": true. The toggle is per-org —
disabling for one org does not affect others.
Roadmap
Promotion criteria from preview to broader availability are documented in the implementation plan. Highlights:
- Webwright matches or beats
web_agenton structured browser tasks in the internal eval suite. - At least 80% of successful runs have a complete artifact bundle.
- At least 70% of successful
final_script.pysnapshots rerun cleanly once. - No security regression in logs, egress, or credentials.
A reusable script registry — promote validated scripts to a shared library with source-site metadata, last-verified date, and a retirement policy — is on the roadmap after preview, not in v1.
Next steps
- Runners Quickstart — the sandboxed execution layer Webwright runs on
- Runners Security — egress policies, sandbox tiers, shell HITL
- Reports — autopilot report anatomy and retrieval
- Governance Chain — what every gateway-routed call goes through
- Feature Flags — flag system overview