Back to article list

PPT Agent Capability Gating: Stop Blind Retries

3 min read

Over the past week, OUTBIRD's PPT add-in and AI Agent received important updates. The core improvement: the PPT Agent now senses host capability boundaries and plans/executes accordingly—"know what you can, avoid what you can't"—instead of blindly trying API calls that are doomed to fail.

The Problem

PowerPoint Office.js Add-in APIs are not uniform across platforms and versions. Microsoft uses requirement sets (e.g. PowerPointApi 1.4, 1.5, 1.10) to distinguish API availability:

  • Web / Windows M365: typically up to 1.10
  • Windows retail 2021+: up to 1.9
  • Windows LTSC 2024: only 1.5
  • iPad: only 1.1

If the Agent tries set_slide_background_color (which depends on the 1.10 SlideBackgroundFill model) on iPad, it will fail—yet the Agent might still optimistically report completion. This "try and see" behavior hurts user trust.

Solution: Capability Matrix + Tool Gating

We did three things:

1. Tool-to-Requirement-Set Mapping

In claude-agent-ppt.js we maintain a TOOL_MIN_SET_POLICY table:

  • Shape ops like insert_textbox, set_shape_fill_color: ≥ 1.4
  • set_selected_text: ≥ 1.5 (selection APIs)
  • set_slide_background_color: ≥ 1.10 (new background model)

The Planner checks the host's isSetSupported("PowerPointApi", "x.y") before emitting tool calls, and only emits tools the host supports.

2. Runtime Capability Probing

On each Agent turn, the frontend probes 1.1–1.10 support via Office.context.requirements.isSetSupported() and writes results into slide_context.capabilities, sent with the request. The backend Planner filters tools accordingly; the Executor re-checks before execution and returns explicit unsupported by host errors instead of optimistic success.

3. Visual Verification Order

Previously we verified visuals before running Office mutations, which caused false negatives ("operation not done yet"). We switched to: run Office mutations first, then verify visuals. Only after the operation completes do we use screenshots/OCR to confirm, avoiding bogus completion.

Related Work: Payment, Shadow Debugging, Docs

Also shipped:

  • Real payment gateway: Removed fallback mocks; unified WeChat Pay; min recharge 0.1 CNY; WECHAT_PAY_NATIVE_APP_ID config
  • Shadow debugging: Dropped local-mac-shadow; unified manifest.shadow.xml; PPT_TEST_* auto-login; taskpane-standalone entry
  • Orchestrator cwd overrides: OUTBIRD_EMPLOYEE_CWD_OVERRIDES per virtual employee
  • Office.js capability research: docs/ops/ppt-officejs-capability-research-2026-03.md documents PowerPoint requirement sets 1.1–1.10, platform support, and tool-to-set mapping for future work

Takeaways

  1. Capability awareness over optimistic retry: In cross-platform Add-ins, probe first, then call—fewer invalid attempts and less user confusion.
  2. Docs drive engineering: Writing requirement-set-to-tool mapping as docs helps alignment and CI regression (e.g. background color should pass on 1.10 hosts, return unsupported on <1.10).
  3. Visual verification timing matters: Mutation first, verification second—only then does completion reflect whether the operation actually took effect.

If you're building Office Add-ins or integrating AI Agents with host APIs, let's connect.

Found this helpful? Buy me a coffee

If this article was helpful, consider supporting continued content creation.

WeChat
WeChat
Alipay
Alipay

评论