Files
openclaw/docs/tools/browser-control.md
Peter Steinberger ed8f50f240 refactor: simplify plugin dependency handling
Simplify plugin installation and runtime loading around package-manager-owned dependencies, with Jiti reserved for local/TS fallback paths.

Also scans npm plugin install roots so hoisted transitive dependencies are covered by dependency denylist and node_modules symlink checks.
2026-05-01 21:32:22 +01:00

381 lines
16 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
summary: "OpenClaw browser control API, CLI reference, and scripting actions"
read_when:
- Scripting or debugging the agent browser via the local control API
- Looking for the `openclaw browser` CLI reference
- Adding custom browser automation with snapshots and refs
title: "Browser control API"
---
For setup, configuration, and troubleshooting, see [Browser](/tools/browser).
This page is the reference for the local control HTTP API, the `openclaw browser`
CLI, and scripting patterns (snapshots, refs, waits, debug flows).
## Control API (optional)
For local integrations only, the Gateway exposes a small loopback HTTP API:
- Status/start/stop: `GET /`, `POST /start`, `POST /stop`
- Tabs: `GET /tabs`, `POST /tabs/open`, `POST /tabs/focus`, `DELETE /tabs/:targetId`
- Snapshot/screenshot: `GET /snapshot`, `POST /screenshot`
- Actions: `POST /navigate`, `POST /act`
- Hooks: `POST /hooks/file-chooser`, `POST /hooks/dialog`
- Downloads: `POST /download`, `POST /wait/download`
- Permissions: `POST /permissions/grant`
- Debugging: `GET /console`, `POST /pdf`
- Debugging: `GET /errors`, `GET /requests`, `POST /trace/start`, `POST /trace/stop`, `POST /highlight`
- Network: `POST /response/body`
- State: `GET /cookies`, `POST /cookies/set`, `POST /cookies/clear`
- State: `GET /storage/:kind`, `POST /storage/:kind/set`, `POST /storage/:kind/clear`
- Settings: `POST /set/offline`, `POST /set/headers`, `POST /set/credentials`, `POST /set/geolocation`, `POST /set/media`, `POST /set/timezone`, `POST /set/locale`, `POST /set/device`
All endpoints accept `?profile=<name>`. `POST /start?headless=true` requests a
one-shot headless launch for local managed profiles without changing persisted
browser config; attach-only, remote CDP, and existing-session profiles reject
that override because OpenClaw does not launch those browser processes.
If shared-secret gateway auth is configured, browser HTTP routes require auth too:
- `Authorization: Bearer <gateway token>`
- `x-openclaw-password: <gateway password>` or HTTP Basic auth with that password
Notes:
- This standalone loopback browser API does **not** consume trusted-proxy or
Tailscale Serve identity headers.
- If `gateway.auth.mode` is `none` or `trusted-proxy`, these loopback browser
routes do not inherit those identity-bearing modes; keep them loopback-only.
### `/act` error contract
`POST /act` uses a structured error response for route-level validation and
policy failures:
```json
{ "error": "<message>", "code": "ACT_*" }
```
Current `code` values:
- `ACT_KIND_REQUIRED` (HTTP 400): `kind` is missing or unrecognized.
- `ACT_INVALID_REQUEST` (HTTP 400): action payload failed normalization or validation.
- `ACT_SELECTOR_UNSUPPORTED` (HTTP 400): `selector` was used with an unsupported action kind.
- `ACT_EVALUATE_DISABLED` (HTTP 403): `evaluate` (or `wait --fn`) is disabled by config.
- `ACT_TARGET_ID_MISMATCH` (HTTP 403): top-level or batched `targetId` conflicts with request target.
- `ACT_EXISTING_SESSION_UNSUPPORTED` (HTTP 501): action is not supported for existing-session profiles.
Other runtime failures may still return `{ "error": "<message>" }` without a
`code` field.
### Playwright requirement
Some features (navigate/act/AI snapshot/role snapshot, element screenshots,
PDF) require Playwright. If Playwright isnt installed, those endpoints return
a clear 501 error.
What still works without Playwright:
- ARIA snapshots
- Role-style accessibility snapshots (`--interactive`, `--compact`,
`--depth`, `--efficient`) when a per-tab CDP WebSocket is available. This is
a fallback for inspection and ref discovery; Playwright remains the primary
action engine.
- Page screenshots for the managed `openclaw` browser when a per-tab CDP
WebSocket is available
- Page screenshots for `existing-session` / Chrome MCP profiles
- `existing-session` ref-based screenshots (`--ref`) from snapshot output
What still needs Playwright:
- `navigate`
- `act`
- AI snapshots that depend on Playwright's native AI snapshot format
- CSS-selector element screenshots (`--element`)
- full browser PDF export
Element screenshots also reject `--full-page`; the route returns `fullPage is
not supported for element screenshots`.
If you see `Playwright is not available in this gateway build`, the packaged
Gateway is missing the core browser runtime dependency. Reinstall or update
OpenClaw, then restart the gateway. For Docker, also install the Chromium
browser binaries as shown below.
#### Docker Playwright install
If your Gateway runs in Docker, avoid `npx playwright` (npm override conflicts).
Use the bundled CLI instead:
```bash
docker compose run --rm openclaw-cli \
node /app/node_modules/playwright-core/cli.js install chromium
```
To persist browser downloads, set `PLAYWRIGHT_BROWSERS_PATH` (for example,
`/home/node/.cache/ms-playwright`) and make sure `/home/node` is persisted via
`OPENCLAW_HOME_VOLUME` or a bind mount. See [Docker](/install/docker).
## How it works (internal)
A small loopback control server accepts HTTP requests and connects to Chromium-based browsers via CDP. Advanced actions (click/type/snapshot/PDF) go through Playwright on top of CDP; when Playwright is missing, only non-Playwright operations are available. The agent sees one stable interface while local/remote browsers and profiles swap freely underneath.
## CLI quick reference
All commands accept `--browser-profile <name>` to target a specific profile, and `--json` for machine-readable output.
<AccordionGroup>
<Accordion title="Basics: status, tabs, open/focus/close">
```bash
openclaw browser status
openclaw browser start
openclaw browser start --headless # one-shot local managed headless launch
openclaw browser stop # also clears emulation on attach-only/remote CDP
openclaw browser tabs
openclaw browser tab # shortcut for current tab
openclaw browser tab new
openclaw browser tab select 2
openclaw browser tab close 2
openclaw browser open https://example.com
openclaw browser focus abcd1234
openclaw browser close abcd1234
```
</Accordion>
<Accordion title="Inspection: screenshot, snapshot, console, errors, requests">
```bash
openclaw browser screenshot
openclaw browser screenshot --full-page
openclaw browser screenshot --ref 12 # or --ref e12
openclaw browser screenshot --labels
openclaw browser snapshot
openclaw browser snapshot --format aria --limit 200
openclaw browser snapshot --interactive --compact --depth 6
openclaw browser snapshot --efficient
openclaw browser snapshot --labels
openclaw browser snapshot --urls
openclaw browser snapshot --selector "#main" --interactive
openclaw browser snapshot --frame "iframe#main" --interactive
openclaw browser console --level error
openclaw browser errors --clear
openclaw browser requests --filter api --clear
openclaw browser pdf
openclaw browser responsebody "**/api" --max-chars 5000
```
</Accordion>
<Accordion title="Actions: navigate, click, type, drag, wait, evaluate">
```bash
openclaw browser navigate https://example.com
openclaw browser resize 1280 720
openclaw browser click 12 --double # or e12 for role refs
openclaw browser click-coords 120 340 # viewport coordinates
openclaw browser type 23 "hello" --submit
openclaw browser press Enter
openclaw browser hover 44
openclaw browser scrollintoview e12
openclaw browser drag 10 11
openclaw browser select 9 OptionA OptionB
openclaw browser download e12 report.pdf
openclaw browser waitfordownload report.pdf
openclaw browser upload /tmp/openclaw/uploads/file.pdf
openclaw browser fill --fields '[{"ref":"1","type":"text","value":"Ada"}]'
openclaw browser dialog --accept
openclaw browser wait --text "Done"
openclaw browser wait "#main" --url "**/dash" --load networkidle --fn "window.ready===true"
openclaw browser evaluate --fn '(el) => el.textContent' --ref 7
openclaw browser highlight e12
openclaw browser trace start
openclaw browser trace stop
```
</Accordion>
<Accordion title="State: cookies, storage, offline, headers, geo, device">
```bash
openclaw browser cookies
openclaw browser cookies set session abc123 --url "https://example.com"
openclaw browser cookies clear
openclaw browser storage local get
openclaw browser storage local set theme dark
openclaw browser storage session clear
openclaw browser set offline on
openclaw browser set headers --headers-json '{"X-Debug":"1"}'
openclaw browser set credentials user pass # --clear to remove
openclaw browser set geo 37.7749 -122.4194 --origin "https://example.com"
openclaw browser set media dark
openclaw browser set timezone America/New_York
openclaw browser set locale en-US
openclaw browser set device "iPhone 14"
```
</Accordion>
</AccordionGroup>
Notes:
- `upload` and `dialog` are **arming** calls; run them before the click/press that triggers the chooser/dialog.
- `click`/`type`/etc require a `ref` from `snapshot` (numeric `12`, role ref `e12`, or actionable ARIA ref `ax12`). CSS selectors are intentionally not supported for actions. Use `click-coords` when the visible viewport position is the only reliable target.
- Download, trace, and upload paths are constrained to OpenClaw temp roots: `/tmp/openclaw{,/downloads,/uploads}` (fallback: `${os.tmpdir()}/openclaw/...`).
- `upload` can also set file inputs directly via `--input-ref` or `--element`.
Stable tab ids and labels survive Chromium raw-target replacement when OpenClaw
can prove the replacement tab, such as same URL or a single old tab becoming a
single new tab after form submission. Raw target ids are still volatile; prefer
`suggestedTargetId` from `tabs` in scripts.
Snapshot flags at a glance:
- `--format ai` (default with Playwright): AI snapshot with numeric refs (`aria-ref="<n>"`).
- `--format aria`: accessibility tree with `axN` refs. When Playwright is available, OpenClaw binds refs with backend DOM ids to the live page so follow-up actions can use them; otherwise treat the output as inspection-only.
- `--efficient` (or `--mode efficient`): compact role snapshot preset. Set `browser.snapshotDefaults.mode: "efficient"` to make this the default (see [Gateway configuration](/gateway/configuration-reference#browser)).
- `--interactive`, `--compact`, `--depth`, `--selector` force a role snapshot with `ref=e12` refs. `--frame "<iframe>"` scopes role snapshots to an iframe.
- `--labels` adds a viewport-only screenshot with overlayed ref labels (prints `MEDIA:<path>`).
- `--urls` appends discovered link destinations to AI snapshots.
## Snapshots and refs
OpenClaw supports two “snapshot” styles:
- **AI snapshot (numeric refs)**: `openclaw browser snapshot` (default; `--format ai`)
- Output: a text snapshot that includes numeric refs.
- Actions: `openclaw browser click 12`, `openclaw browser type 23 "hello"`.
- Internally, the ref is resolved via Playwrights `aria-ref`.
- **Role snapshot (role refs like `e12`)**: `openclaw browser snapshot --interactive` (or `--compact`, `--depth`, `--selector`, `--frame`)
- Output: a role-based list/tree with `[ref=e12]` (and optional `[nth=1]`).
- Actions: `openclaw browser click e12`, `openclaw browser highlight e12`.
- Internally, the ref is resolved via `getByRole(...)` (plus `nth()` for duplicates).
- Add `--labels` to include a viewport screenshot with overlayed `e12` labels.
- Add `--urls` when link text is ambiguous and the agent needs concrete
navigation targets.
- **ARIA snapshot (ARIA refs like `ax12`)**: `openclaw browser snapshot --format aria`
- Output: the accessibility tree as structured nodes.
- Actions: `openclaw browser click ax12` works when the snapshot path can bind
the ref through Playwright and Chrome backend DOM ids.
- If Playwright is unavailable, ARIA snapshots can still be useful for
inspection, but refs may not be actionable. Re-snapshot with `--format ai`
or `--interactive` when you need action refs.
- Docker proof for the raw-CDP fallback path: `pnpm test:docker:browser-cdp-snapshot`
starts Chromium with CDP, runs `browser doctor --deep`, and verifies role
snapshots include link URLs, cursor-promoted clickables, and iframe metadata.
Ref behavior:
- Refs are **not stable across navigations**; if something fails, re-run `snapshot` and use a fresh ref.
- `/act` returns the current raw `targetId` after action-triggered replacement
when it can prove the replacement tab. Keep using stable tab ids/labels for
follow-up commands.
- If the role snapshot was taken with `--frame`, role refs are scoped to that iframe until the next role snapshot.
- Unknown or stale `axN` refs fail fast instead of falling through to
Playwright's `aria-ref` selector. Run a fresh snapshot on the same tab when
that happens.
## Wait power-ups
You can wait on more than just time/text:
- Wait for URL (globs supported by Playwright):
- `openclaw browser wait --url "**/dash"`
- Wait for load state:
- `openclaw browser wait --load networkidle`
- Wait for a JS predicate:
- `openclaw browser wait --fn "window.ready===true"`
- Wait for a selector to become visible:
- `openclaw browser wait "#main"`
These can be combined:
```bash
openclaw browser wait "#main" \
--url "**/dash" \
--load networkidle \
--fn "window.ready===true" \
--timeout-ms 15000
```
## Debug workflows
When an action fails (e.g. “not visible”, “strict mode violation”, “covered”):
1. `openclaw browser snapshot --interactive`
2. Use `click <ref>` / `type <ref>` (prefer role refs in interactive mode)
3. If it still fails: `openclaw browser highlight <ref>` to see what Playwright is targeting
4. If the page behaves oddly:
- `openclaw browser errors --clear`
- `openclaw browser requests --filter api --clear`
5. For deep debugging: record a trace:
- `openclaw browser trace start`
- reproduce the issue
- `openclaw browser trace stop` (prints `TRACE:<path>`)
## JSON output
`--json` is for scripting and structured tooling.
Examples:
```bash
openclaw browser status --json
openclaw browser snapshot --interactive --json
openclaw browser requests --filter api --json
openclaw browser cookies --json
```
Role snapshots in JSON include `refs` plus a small `stats` block (lines/chars/refs/interactive) so tools can reason about payload size and density.
## State and environment knobs
These are useful for “make the site behave like X” workflows:
- Cookies: `cookies`, `cookies set`, `cookies clear`
- Storage: `storage local|session get|set|clear`
- Offline: `set offline on|off`
- Headers: `set headers --headers-json '{"X-Debug":"1"}'` (legacy `set headers --json '{"X-Debug":"1"}'` remains supported)
- HTTP basic auth: `set credentials user pass` (or `--clear`)
- Geolocation: `set geo <lat> <lon> --origin "https://example.com"` (or `--clear`)
- Media: `set media dark|light|no-preference|none`
- Timezone / locale: `set timezone ...`, `set locale ...`
- Device / viewport:
- `set device "iPhone 14"` (Playwright device presets)
- `set viewport 1280 720`
## Security and privacy
- The openclaw browser profile may contain logged-in sessions; treat it as sensitive.
- `browser act kind=evaluate` / `openclaw browser evaluate` and `wait --fn`
execute arbitrary JavaScript in the page context. Prompt injection can steer
this. Disable it with `browser.evaluateEnabled=false` if you do not need it.
- For logins and anti-bot notes (X/Twitter, etc.), see [Browser login + X/Twitter posting](/tools/browser-login).
- Keep the Gateway/node host private (loopback or tailnet-only).
- Remote CDP endpoints are powerful; tunnel and protect them.
Strict-mode example (block private/internal destinations by default):
```json5
{
browser: {
ssrfPolicy: {
dangerouslyAllowPrivateNetwork: false,
hostnameAllowlist: ["*.example.com", "example.com"],
allowedHostnames: ["localhost"], // optional exact allow
},
},
}
```
## Related
- [Browser](/tools/browser) — overview, configuration, profiles, security
- [Browser login](/tools/browser-login) — signing in to sites
- [Browser Linux troubleshooting](/tools/browser-linux-troubleshooting)
- [Browser WSL2 troubleshooting](/tools/browser-wsl2-windows-remote-cdp-troubleshooting)