Skip to Content
GuidesWebwright Preview

Webwright Preview

Webwright Browser Agent is in preview. It ships behind the webwright_preview per-org feature flag and is hidden from the runner template gallery and autopilot launcher for orgs without the flag. Reach out to hello@curate-me.ai to be added.

Webwright is a code-as-action browser harness. The agent solves a web task by writing a rerunnable Playwright script and producing a fixed-shape evidence bundle: a plan, the final script, an action log, screenshots at each critical point, and an optional self-verification record. The script is the output — not just an answer.

The preview ships in two surfaces:

  • A managed runner template named Webwright Browser Agent (use the runner gallery to launch an interactive session).
  • An autopilot template with id webwright_agent (use for one-shot or repeatable browser workflows triggered by Slack/Teams, the dashboard, or the API).

Both surfaces share the same image, tool profile, viewport, and artifact contract documented below.

Webwright vs Web Agent

Curate-Me already ships a web_agent autopilot template. Webwright is not a replacement — they target different shapes of work.

Dimensionweb_agentwebwright_agent
Best forOpen-ended research, multi-source synthesis, narrative answersRepeatable form-fill, filtering, structured extraction, lookup-with-evidence
Primary outputMarkdown report with citationsA rerunnable Playwright script (final_script.py) plus evidence bundle
DeterminismLow — same task may produce different proseHigher — the script is the contract; rerunning the script yields the same data
ScreenshotsOptional, narrative-drivenRequired at every critical point
Self-verificationReviewer LLM critiques the reportAgent re-runs final_script.py end-to-end and checks artifacts
Failure modeReports may hallucinate when blockedAgent must report a blocker with evidence rather than guess
ImageDefault worker imagecurate-me/openclaw-web:latest (Chromium + Playwright preinstalled)

Reach for Webwright when the task fits this shape:

  • “Find the cheapest refundable hotel in downtown Austin for June 12–14.”
  • “Filter this job board to remote senior backend roles posted this week.”
  • “Compare the listed monthly price of the Pro plan across these three SaaS sites.”
  • “Look up the renewal fee on the official state DMV form page.”

Reach for web_agent when the task is:

  • “Summarize what’s new in Playwright since v1.50.”
  • “Find three case studies of teams migrating off Selenium.”
  • “Compile a market overview of the headless browser landscape.”

When to use Webwright

Webwright is designed for the following workflow shapes:

  1. Ecommerce filter + sort — exact brand, size, price ceiling.
  2. Travel search — exact dates, refundable filter, cheapest price capture.
  3. Job board — exact location, remote filter, newest sort.
  4. SaaS pricing extraction — compare plan prices from official pages.
  5. Government form / public-info lookup — find official requirement and cite source URL.
  6. Marketplace listings — filter used/new, distance, price ceiling.
  7. Restaurant reservation availability — given a date/party size, surface the open slots.
  8. Academic paper lookup — exact venue, year, author.
  9. Multi-site comparison with final table — same query across N official pages.

The common thread: the task has a definite answer that lives behind a site’s normal UI, and the user wants the lookup to be rerunnable later (next week, next quarter) without rebuilding from scratch.

Launching Webwright

Managed runner (interactive)

From the dashboard, open Runners → New Runner and pick the Webwright Browser Agent template (preview badge visible to enabled orgs). The runner provisions on openclaw-web with the browser_coding tool profile and lands at /workspace/webwright_runs/ as the working directory. Use the session terminal to issue the task — the workspace contract is preloaded into the agent’s boot instructions.

Or, via API:

curl -X POST https://api.curate-me.ai/gateway/admin/runners/ \ -H "Content-Type: application/json" \ -H "X-CM-API-Key: cm_sk_your_key_here" \ -d '{ "template_name": "Webwright Browser Agent", "ttl_seconds": 3600 }'

Autopilot template (one-shot / scheduled)

curl -X POST https://api.curate-me.ai/api/v1/autopilot/run \ -H "Content-Type: application/json" \ -H "X-CM-API-Key: cm_sk_your_key_here" \ -d '{ "template_id": "webwright_agent", "task": "Find the cheapest refundable hotel in downtown Austin for June 12-14 and save the evidence" }'

Python SDK:

from curate_me import CurateMe client = CurateMe(api_key="cm_sk_your_key_here") run = client.autopilot.start( template_id="webwright_agent", task="Find the cheapest refundable hotel in downtown Austin for June 12-14 and save the evidence", ) print(run["task_id"], run["report_url"])

The autopilot run produces both a standard autopilot report and the full Webwright artifact bundle inside the worker’s runner workspace.

The artifact contract

Every Webwright session writes into a single workspace tree under /workspace/webwright_runs/<task_id>/. The shape is fixed — UIs, support bundles, and the optional self-reflection step depend on it.

/workspace/webwright_runs/<task_id>/ ├── plan.md # Critical points + verification checklist ├── final_script.py # The reusable, end-to-end Playwright script └── final_runs/ ├── run_1/ │ ├── final_script.py # Snapshot of the script as run │ ├── final_script_log.txt # stdout + action log; final datum printed at the end │ ├── screenshots/ │ │ ├── 01_search.png │ │ ├── 02_filter_applied.png │ │ └── 03_result.png │ └── self_reflect_result.json # optional; checklist + screenshot review └── run_2/ └── ...
FileRequiredWhat it contains
plan.mdYesThe agent’s decomposition of the task into critical points — concrete claims that must be verified (e.g. “dates are June 12 check-in and June 14 check-out”, “refundable filter is on”, “price is lowest visible”).
final_script.pyYesA self-contained Playwright script that, run with no human in the loop, reproduces the answer. Must launch Chromium, navigate to a known start URL, apply filters via site controls, take screenshots at each critical point, and print the final datum to stdout.
final_runs/run_<n>/final_script.pyYesA snapshot of the script that was actually executed for run <n>. The agent re-runs the script from scratch — different snapshots may exist if the agent iterated.
final_runs/run_<n>/final_script_log.txtYesCaptured stdout + stderr. Includes the action log and the final datum (the answer) printed on the last line. Secrets are redacted before storage.
final_runs/run_<n>/screenshots/*.pngYesOne PNG per critical point, viewport-only (1280×1800, full_page=False). Used as visual evidence the constraint was actually applied at runtime.
final_runs/run_<n>/self_reflect_result.jsonOptionalThe agent’s own checklist verdict per critical point. v1 uses a deterministic checklist; native host vision is used when available.

The final_script.py is what makes Webwright differ from a one-off research run: rerunning it next quarter should produce the same shape of answer against the same site, with fresh screenshots. Treat it as the deliverable.

Expected limitations

Webwright is a preview. Several behaviors are intentional, not bugs:

  • No CAPTCHA bypass. If a site presents a CAPTCHA, the agent records a screenshot, marks a blocker, and stops. It does not attempt to solve, evade, or proxy around CAPTCHAs.
  • No paywall bypass. Content behind a paywall is treated as unreachable. The agent will not log in with credentials it was not given, and will not use cached cookies from another session.
  • No login-gate bypass. If a site requires authentication and no credentials were provided (or required HITL was denied), the agent reports a blocker. Login flows with provided credentials are still human-in-the-loop on the shell channel.
  • No bypass of access controls. Geofencing, IP blocks, and bot protection are treated as terminal blockers — the agent does not rotate identity to evade them.
  • Exact filters may be unavailable on some sites. When a site lacks a control for an exact filter (e.g. price ceiling), the agent uses site sort controls (e.g. “sort by lowest price”) and applies the remaining filter in code. If neither is possible, it reports a blocker.
  • final_script.py may not always rerun from scratch. Some sites inject anti-replay tokens or randomize selectors per session. The eval suite tracks rerun success rate; current target is 70% of successful scripts rerun cleanly once.
  • No PII or secret entry by default. The agent is instructed not to type secrets, credit card numbers, or PII into form fields unless the task explicitly provided them and HITL approves.
  • No automated purchases, submissions, or destructive actions. These require human confirmation through the shell HITL channel.

The agent reports a blocker only after gathering evidence (screenshot + log entry) of the obstacle. If you see a “success” reported without artifacts that prove the critical points, that’s a defect — see Inspecting failures.

Inspecting failures

When a Webwright run fails or returns an unexpected answer, every artifact is browsable through the runner detail page. From the dashboard:

  1. Open Runners and pick the session.
  2. Open the Files tab.
  3. Navigate to /workspace/webwright_runs/<task_id>/.
  4. Read plan.md to understand what the agent set out to verify.
  5. Open final_runs/run_<n>/final_script_log.txt to see the action log and the final printed datum.
  6. Step through screenshots/ to confirm each critical point was visible on screen at runtime.
  7. If present, open self_reflect_result.json for the agent’s own per-checkpoint verdict.

Or, via API:

curl -sS "https://api.curate-me.ai/gateway/admin/runners/sessions/<session_id>/files?path=/workspace/webwright_runs" \ -H "X-CM-API-Key: cm_sk_your_key_here"

Support bundle fields

If you escalate to Curate-Me support, the support bundle includes the following Webwright-specific summary fields. Quote the task_id in your ticket — support can then pull these without asking for screen recordings:

FieldSourcePurpose
template_idautopilot run / runner templatewebwright_agent or Webwright Browser Agent
tool_profilerunner sessionShould always be browser_coding
image_refrunner sessioncurate-me/openclaw-web:latest
artifact_rootrunner workspace/workspace/webwright_runs/<task_id>/
final_run_idbundleThe most recent run_<n> directory
screenshot_countbundleCount of PNGs in the final run
final_script_log_excerptbundleLast ~200 lines of final_script_log.txt, secrets redacted
self_reflect_statusbundle, optionalPer-checkpoint verdict if self_reflect_result.json is present
redaction_appliedbundletrue if any tokens were redacted before storage

The bundle is stored in runner_output_bundles keyed by session_id.

Cost and time budgets

The preview is budget- and time-limited at the template level:

SurfaceDefault cost capDefault time cap
Managed runner template$10 USD / session60 minutes
Autopilot webwright_agent (per worker)$4 USDInherits autopilot task budget

Org-level gateway budgets, plan entitlements, and HITL gates apply on top, exactly as for any other runner. See the Governance Chain and Cost Tracking guides.

Disabling the preview

Org admins can opt out at any time by flipping the webwright_preview feature flag for the org. While disabled:

  • The Webwright Browser Agent runner template is hidden from the template gallery.
  • New webwright_agent autopilot launches return 403 webwright_preview_disabled.
  • In-flight sessions are allowed to complete unless a separate security stop is required.

From the dashboard, open Settings → Feature Flags → Webwright Preview and toggle off. Programmatic toggle is available through the admin API:

curl -X POST https://api.curate-me.ai/api/v1/admin/feature-flags/webwright_preview \ -H "Content-Type: application/json" \ -H "X-CM-API-Key: cm_sk_your_admin_key" \ -d '{"enabled": false}'

To re-enable, repeat with "enabled": true. The toggle is per-org — disabling for one org does not affect others.

Roadmap

Promotion criteria from preview to broader availability are documented in the implementation plan. Highlights:

  • Webwright matches or beats web_agent on structured browser tasks in the internal eval suite.
  • At least 80% of successful runs have a complete artifact bundle.
  • At least 70% of successful final_script.py snapshots rerun cleanly once.
  • No security regression in logs, egress, or credentials.

A reusable script registry — promote validated scripts to a shared library with source-site metadata, last-verified date, and a retirement policy — is on the roadmap after preview, not in v1.

Next steps