Files
openclaw/docs/concepts/queue.md
2026-04-30 01:22:43 +01:00

6.5 KiB

summary, read_when, title
summary read_when title
Auto-reply queue modes, defaults, and per-session overrides
Changing auto-reply execution or concurrency
Explaining /queue modes or message steering behavior
Command queue

We serialize inbound auto-reply runs (all channels) through a tiny in-process queue to prevent multiple agent runs from colliding, while still allowing safe parallelism across sessions.

Why

  • Auto-reply runs can be expensive (LLM calls) and can collide when multiple inbound messages arrive close together.
  • Serializing avoids competing for shared resources (session files, logs, CLI stdin) and reduces the chance of upstream rate limits.

How it works

  • A lane-aware FIFO queue drains each lane with a configurable concurrency cap (default 1 for unconfigured lanes; main defaults to 4, subagent to 8).
  • runEmbeddedPiAgent enqueues by session key (lane session:<key>) to guarantee only one active run per session.
  • Each session run is then queued into a global lane (main by default) so overall parallelism is capped by agents.defaults.maxConcurrent.
  • When verbose logging is enabled, queued runs emit a short notice if they waited more than ~2s before starting.
  • Typing indicators still fire immediately on enqueue (when supported by the channel) so user experience is unchanged while we wait our turn.

Defaults

When unset, all inbound channel surfaces use:

  • mode: "steer"
  • debounceMs: 500
  • cap: 20
  • drop: "summarize"

steer is the default because it keeps the active model turn responsive without starting a second session run. It drains all steering messages that arrived before the next model boundary. If the current run cannot accept steering, OpenClaw falls back to a followup queue entry.

Queue modes

Inbound messages can steer the current run, wait for a followup turn, or do both:

  • steer: queue steering messages into the active runtime. Pi delivers all pending steering messages after the current assistant turn finishes executing its tool calls, before the next LLM call; Codex app-server receives one batched turn/steer. If the run is not actively streaming or steering is unavailable, OpenClaw falls back to a followup queue entry.
  • queue (legacy): old one-at-a-time steering. Pi delivers one queued steering message at each model boundary; Codex app-server receives separate turn/steer requests. Prefer steer unless you need the previous serialized behavior.
  • followup: enqueue each message for a later agent turn after the current run ends.
  • collect: coalesce queued messages into a single followup turn after the quiet window. If messages target different channels/threads, they drain individually to preserve routing.
  • steer-backlog (aka steer+backlog): steer now and preserve the same message for a followup turn.
  • interrupt (legacy): abort the active run for that session, then run the newest message.

Steer-backlog means you can get a followup response after the steered run, so streaming surfaces can look like duplicates. Prefer collect/steer if you want one response per inbound message.

For runtime-specific timing and dependency behavior, see Steering queue.

Configure globally or per channel via messages.queue:

{
  messages: {
    queue: {
      mode: "steer",
      debounceMs: 500,
      cap: 20,
      drop: "summarize",
      byChannel: { discord: "collect" },
    },
  },
}

Queue options

Options apply to followup, collect, and steer-backlog (and to steer or legacy queue when steering falls back to followup):

  • debounceMs: quiet window before draining queued followups. Bare numbers are milliseconds; units ms, s, m, h, and d are accepted by /queue options.
  • cap: max queued messages per session. Values below 1 are ignored.
  • drop: "summarize": default. Drop the oldest queued entries as needed, keep compact summaries, and inject them as a synthetic followup prompt.
  • drop: "old": drop the oldest queued entries as needed, without preserving summaries.
  • drop: "new": reject the newest message when the queue is already full.

Defaults: debounceMs: 500, cap: 20, drop: summarize.

Precedence

For mode selection, OpenClaw resolves:

  1. Inline or stored per-session /queue override.
  2. messages.queue.byChannel.<channel>.
  3. messages.queue.mode.
  4. Default steer.

For options, inline or stored /queue options win over config. Then channel-specific debounce (messages.queue.debounceMsByChannel), plugin debounce defaults, global messages.queue options, and built-in defaults are applied. cap and drop are global/session options, not per-channel config keys.

Per-session overrides

  • Send /queue <mode> as a standalone command to store the mode for the current session.
  • Options can be combined: /queue collect debounce:0.5s cap:25 drop:summarize
  • /queue default or /queue reset clears the session override.

Scope and guarantees

  • Applies to auto-reply agent runs across all inbound channels that use the gateway reply pipeline (WhatsApp web, Telegram, Slack, Discord, Signal, iMessage, webchat, etc.).
  • Default lane (main) is process-wide for inbound + main heartbeats; set agents.defaults.maxConcurrent to allow multiple sessions in parallel.
  • Additional lanes may exist (e.g. cron, cron-nested, nested, subagent) so background jobs can run in parallel without blocking inbound replies. Isolated cron agent turns hold a cron slot while their inner agent execution uses cron-nested; both use cron.maxConcurrentRuns. Shared non-cron nested flows keep their own lane behavior. These detached runs are tracked as background tasks.
  • Per-session lanes guarantee that only one agent run touches a given session at a time.
  • No external dependencies or background worker threads; pure TypeScript + promises.

Troubleshooting

  • If commands seem stuck, enable verbose logs and look for “queued for …ms” lines to confirm the queue is draining.
  • If you need queue depth, enable verbose logs and watch for queue timing lines.
  • When diagnostics are enabled, sessions that remain in processing past diagnostics.stuckSessionWarnMs log a stuck-session warning. Active embedded runs, active reply operations, and active lane tasks remain warning-only by default; stale startup bookkeeping with no active session work can release the affected session lane so queued work drains.