WeiMCP — AI control surface for Unreal Engine

01What WeiMCP is

WeiMCP is an editor plugin for Unreal Engine 5.7. It runs an in-process MCP (Model Context Protocol) server inside the editor and exposes 1,200+ first-class tools across 23 categories, covering authoring, materials, animation, world building, GAS, Niagara, sequencer, foliage, landscape, world partition, automation, AI/EQS, source control, and more.

An AI assistant — Claude, GPT, Gemini, Grok, a local model — connects over standard JSON-RPC, sees the catalog, and starts working. Inside the editor itself you also get a chat panel, an embedded Claude Code Terminal, a job queue, and a status bar that reports live state. It's the editor with an AI seat at the table, not a chat window glued to the side.

The wire stack and the tool runtime are the part that took the engineering. Brief modes, deflate compression, schema trim, HTTP 304/ETag, auto-summarizer, result stash, and a split-mode worker pattern push per-session cost 3.18× lower than a naive MCP server and 2.04× lower than the typical Python-wrapper pattern, while keeping warm latency around 46 ms.

02Features

Everything below ships in the plugin. No external service, no companion process for the basics, no vendor lock — the LLM provider is swappable at runtime.

Tool catalog

1,200+ first-class tools, 23 categories

Authoring (actors, components, blueprints, materials, animations, sequences), pipeline (import, datatables, source control), world (landscape, foliage, world partition, streaming), systems (GAS, Niagara, AI/EQS, behavior trees, state trees), editor utilities (widgets, EUWs, MVVM, automation). Every tool runs inside an editor undo transaction.

Chat panel

Native AI chat docked in the editor

Talk to the active LLM right in the editor — no Alt-Tab to a browser, no copy-paste loop. Conversation context is local, history persists per project, and the panel can hand off prompts directly into the tool layer.

Claude Code Terminal

Embedded Claude Code Terminal tab

The Claude Code CLI runs inside the editor as a real terminal tab. The same agent that knows your codebase can now also drive the editor through the MCP socket — one assistant, two surfaces, zero context switch.

Multi-LLM runtime

Provider-agnostic by design

Anthropic (Haiku / Sonnet / Opus), OpenAI (GPT-5.4), Google (Gemini 3.1 Pro/Flash), xAI (Grok 4), DeepSeek (V3.2), and local Ollama all work. Swap providers at runtime — no recompile, no re-install. AnthropicCLI mode routes through a Claude Max subscription for a flat-rate ceiling.

Job queue + scheduler

Scheduled and event-triggered automation

The job queue panel runs tool sequences on a timer, on save, on package, or on custom editor events. Drop in a recipe — “every Monday rebuild HLODs and post a thumbnail diff” — and let it execute unattended. Composes any of the 1,200+ tools.

Dynamic port fallback

Multiple editor instances, no conflict

The MCP server walks an offset window if its preferred port is in use, then publishes the bound port to the editor status bar. Two open projects on one machine, both with WeiMCP active, both addressable — the way the editor itself handles multi-instance.

Permission gating

Per-tool authorization, opt-in destructive ops

Tools advertise their risk class (read / write / mutate-asset / mutate-source-control). The plugin can block, prompt, or auto-allow based on user policy. Source-control pushes, project-wide renames, and asset deletes default to prompt-on-call.

HTTP/JSON-RPC

Plain MCP over HTTP — no shims

Streamable HTTP, JSON-RPC 2.0, the published MCP 2.0 wire format. Anything that speaks MCP works: the Anthropic SDK, the Claude Code CLI, OpenAI's tool-use, Continue.dev, custom agents. POST /mcp is all there is.

Project-ID on initialize

Server tells the client which editor it’s in

The initialize response carries the project name, root path, and engine version. No more “am I talking to GASP or to Meatbags?” — the agent knows on connect, and can route accordingly when several projects are open.

Status bar widget

Live state at a glance

Bound port, active provider, request count, last error, queue depth — all visible in the editor status bar. Click through for a session log of every tool call, ETag hit, and rate-limit decision.

Result stash

Re-fetch large results without re-running

Heavy responses are stashed by content hash. The agent can ask for the next page or the verbose follow-up without paying the original execution cost a second time.

Editor-only by construction

Zero footprint in shipping builds

Module type is UncookedOnly. The MCP server, the chat panel, the terminal, the job queue — none of it is in the packaged game. WeiMCP is a workshop tool, not a runtime dependency.

03Efficiencies

The reason WeiMCP costs less to run than a naive MCP server isn’t one big trick — it’s seven small disciplines compounding through the request stack. Each was measured against the live tools/list payload and a representative session.

3.18× cheaper per session vs naive MCP, same provider, same 1,200+ tool catalog

2.04× cheaper than Python wrappers avoids the ~2,050 ms localhost penalty most Python MCP shims pay on Windows

63.5× auto-summarizer compression heavy list responses collapsed to a brief by default; verbose on request

76.3% wire compression 68,820 bytes tools/list → 16,317 bytes deflated, transparent to the client

How each one earns its line item

Mechanism	What it does	Where the savings come from
Brief mode default	Every list-style tool returns a compact summary first (counts, names, paths). The agent asks for verbose only when it actually needs the detail.	Cuts typical response payload by 60-95%, which directly cuts input-token cost on the next agent turn (cached-in or not).
Auto-summarizer	When a tool has to return a long list anyway (every actor in a level, every material in a project), the runtime collapses it into a structured digest with sample rows.	63.5× reduction on the heavy lists. The agent gets shape, distribution, and exemplars — usually enough to decide the next call.
Deflate wire compression	HTTP-level `Content-Encoding: deflate` on every response over a threshold. Negotiated transparently.	76.3% on the worst-case `tools/list` (68,820 → 16,317 bytes). Bandwidth cost matters for hosted editors and remote-dev rigs.
HTTP 304 / ETag	`tools/list` ships a strong ETag. If the catalog hasn’t changed since last fetch, the server returns 304 with no body.	Zero-byte response on every reconnect after the first. Spec-compliant clients (and most modern MCP clients) skip the 16 KB re-download entirely.
Schema-emitter trim	Tool input schemas omit empty `properties:{}` and `required:[]` blocks at emit time.	Shaves ~6% off the catalog payload — small per-tool, real across 1,200+.
Result stash	Heavy responses are content-hashed and held server-side for a short window.	The agent can ask for “show me page 2” or “the full version of that” without re-running the tool, which is often the most expensive part.
Split-mode worker	Brief responses run inline on the worker thread; only verbose runs marshal to the game thread for full editor-API access.	The cheap path stays cheap and never blocks the editor frame. Verbose gets the full editor surface when it actually needs it.
16 KiB cap on tool output	Hard ceiling on any single tool result. Over the cap, the runtime trims and inserts a stash handle.	Predictable upper bound on input-token spend per turn — the price-tag never surprises you.

Why this matters now

Token cost compounds across a workday. A studio running an AI-pair-programming workflow makes thousands of MCP calls per developer per week. Cutting payload at every layer of the stack — not just at the model — is how you keep the bill in the same band as the engineer’s coffee budget instead of their salary.

04Performance

Live measurements off a current build, talking to a running UE 5.7 editor over loopback. Numbers are cold-start and warm-cache so you see the steady-state floor, not just the optimistic one.

~46 ms warm round-trip typical brief tool call, editor on same machine

16,317 B tools/list deflated down from 68,820 bytes uncompressed (76.3% smaller)

1,200+ tools registered across 23 categories, all undo-transacted

0 ms on warm 304 unchanged catalog re-fetch returns nothing

Latency profile (warm, brief mode, single tool call)

WeiMCP (in-process)

~46 ms

Naive MCP server

~140 ms

Python wrapper (Windows)

~2,050 ms

tools/list payload (1,200+ tool catalog)

Uncompressed JSON

68,820 B

After schema-emitter trim

~64,400 B

After deflate

16,317 B

On warm 304 reconnect

0 B

Throughput is bounded by the editor’s game-thread rate, which is exactly the right place for it to be bounded. The server itself can handle hundreds of concurrent brief calls per second; verbose calls queue against the editor frame so they never starve other editor work.

05Cost savings

All numbers below come from four progressively-deeper cost analyses run against the same plugin against the same editor, with current-as-of-2026 provider pricing. References at the bottom of the section.

Per-session cost (warm, one provider conversation, mixed brief + verbose)

Provider	Model	Input $/MTok	Output $/MTok	WeiMCP / session	Naive MCP / session	Savings
Anthropic	Haiku 4.5	$1.00	$5.00	$0.0416	$0.132	3.17×
Anthropic	Sonnet 4.6	$3.00	$15.00	$0.1247	$0.397	3.18×
Anthropic	Opus 4.7	$5.00	$25.00	$0.2078	$0.661	3.18×
OpenAI	GPT-5.5	$5.00	$30.00	$0.2478	$0.700	2.82×
Google	Gemini 3.1 Pro	$2.00	$12.00	$0.0931	$0.272	2.92×
Google	Gemini 3.1 Flash	$0.30	$2.50	$0.0181	$0.054	2.98×
xAI	Grok 4	$3.00	$15.00	$0.1374	$0.397	2.89×
DeepSeek	V3.2	$0.14	$1.10	$0.0083	$0.024	2.89×
Anthropic	Claude Max sub	flat $200/mo subscription, ~hundreds of sessions		~$0.00	n/a	unbounded

Per-session figures assume warm prompt-cache (10% effective input rate) — the discount the WeiMCP wire stack is specifically tuned to land. GPT-5.5 cache discount is comparable to Anthropic’s; output-token cost is what compresses the savings ratio relative to Sonnet.

Annual savings vs the same workflow on a Python-wrapper MCP

Workflow	Sessions / year	Sonnet baseline	Opus baseline	Haiku baseline
Solo developer	~30,000	$8,208	$10,476	$6,084
Pro / small team (~5 seats)	~150,000	$50,396	$62,340	$39,720
Automation / agentic farm	~2,000,000	$697,975	$921,400	$475,200

How fast it pays for itself — a worked example

Take Sonnet 4.6 as the realistic mid-line model. WeiMCP costs $0.1247 per session warm; a naive Python-wrapper MCP costs $0.397 — a savings of $0.272 per session. That is the only number doing work below.

Profile	Sessions / day	Daily savings	Weekly savings	Annualized
Solo developer, light AI use	~30	$8.16	$57	$2,975
Solo developer, daily AI pair-programmer	~80	$21.76	$152	$8,208
Five-seat studio, normal AI workflow	~400	$108.80	$762	$50,396
Automation farm, 24/7 agentic work	~5,500	$1,496	$10,470	$697,975

Most paid editor plugins in this category sell in the $400-$800 band. Match WeiMCP’s feature surface against any plugin in that band, and the token savings alone retire the purchase price inside the first month for a busy solo developer, inside the first week for a five-seat studio, and inside the first day for an automation farm. That is before you count the latency, the embedded terminal, the chat panel, the job queue, or the multi-LLM swap — all of which a Python-wrapper MCP would not give you at any price.

Source documents

Four payback analyses were produced across the development of WeiMCP. Each measured the same plugin against the same editor; what changed is the framing question.

PAYBACK · v1

Phase 4 cost study

First end-to-end measurement of the wire savings against a freshly-instrumented build. Established the 3.18× baseline.

PAYBACK · v2

Sonnet 4.6 baseline

Per-session economics at $3/$15 with cache-aware reading; produced the $8,208 / $50,396 / $697,975 annual band.

PAYBACK · v3

Opus 4.7 frontier model

Per-session economics at the top of the Anthropic line. Establishes that even on the most expensive model, the wire savings still land at the same 3.18× ratio.

PAYBACK · v4

Side-by-side by provider

Anthropic / OpenAI / Google / xAI / DeepSeek / Claude Max in one frame. Surfaces which optimizations land hardest on which billing model.

06Compatibility

Engine

Unreal Engine 5.7 (primary)
5.6 supported with minor adjustments
Editor-only — zero runtime footprint
Module type: UncookedOnly

Platforms

Windows 10 / 11 (primary)
macOS (in-development)
Linux editor (untested but architecturally clean)

LLM providers

Anthropic (Haiku, Sonnet, Opus)
OpenAI (GPT-5.4)
Google (Gemini 3.1 Pro / Flash)
xAI (Grok 4)
DeepSeek (V3.2)
Local Ollama (Llama, Qwen, etc.)
AnthropicCLI (Claude Max subscription)

Wire

MCP 2.0 (Streamable HTTP)
JSON-RPC 2.0
HTTP/1.1 with deflate
Default port 3000, dynamic fallback
Optional Bearer auth
Loopback by default; bind override gated

Anything that speaks MCP works without modification — the Anthropic SDK, Claude Code CLI, OpenAI’s tool-use, Continue.dev, custom agents, the upcoming Linux Foundation MCP runtimes. There is no proprietary client, no SaaS dependency, no phone-home.