Cloudflare Outlines MCP Reference Architecture to Help Enterprises Handle Security and Governance Risks

Model Context Protocol is the new API layer for AI agents — and enterprises are deploying it without understanding the security and governance implications. Cloudflare just published the reference architecture that should be required reading before any serious MCP deployment goes to production.

The full Cloudflare enterprise MCP guide dropped April 14, backed by comprehensive developer documentation. It’s based on real-world data from 241 billion tokens processed for 3,683 users — not theory.

Here’s what the architecture covers and how to actually implement it.

Why Local MCP Servers Are a Security Problem

Most MCP deployments start the same way: a developer runs an MCP server locally on their machine, connects their AI client, and gets to work. Fast, easy, and dangerously insecure for production use.

Local MCP servers have a fundamental problem: they run with the developer’s full local permissions. There’s no authentication layer between the AI client and the server, no audit trail, no rate limiting, and no mechanism for an organization to see what tools are being accessed or by whom.

As MCP usage scales across a team, local servers become a sprawling, invisible attack surface.

Cloudflare’s reference architecture solves this by moving to remote MCP servers hosted on Cloudflare Workers as the foundation for enterprise deployments.

Component 1: Remote MCP Servers on Cloudflare Workers

Hosting MCP servers on Workers gives you everything local servers lack:

HTTPS by default — all MCP traffic is encrypted in transit
Isolated execution — each Worker runs in its own V8 isolate, not with your user’s OS permissions
Global availability — Workers run at the edge, not on a developer’s laptop
Audit logs — every tool invocation is logged through Cloudflare’s infrastructure

Migrating an existing MCP server to Workers is straightforward. Cloudflare provides a @cloudflare/mcp-server-sdk package and starter templates that match the standard MCP server interface. Your tool handlers stay the same; only the hosting layer changes.

Component 2: MCP Server Portals — Zero Trust + OAuth Aggregation

The second major component is MCP Server Portals, which solve the authentication and aggregation problem.

Instead of every AI client connecting directly to every MCP server (and needing credentials for each one), a Portal acts as an authenticated gateway:

Enforces Cloudflare Zero Trust (ZTNA) policies before any MCP connection is established
Handles OAuth 2.0 token exchange on behalf of connecting clients
Aggregates multiple MCP servers behind a single authenticated endpoint
Provides a unified namespace — clients see one portal, not ten individual server URLs

For enterprise teams, this means IT can centrally manage who has access to which MCP tools, through the same Zero Trust policies they already use for SaaS applications.

Setting up a Portal involves configuring it in the Cloudflare Zero Trust dashboard under Access > MCP Portals, specifying which Workers-hosted MCP servers it aggregates, and defining the access policy (e.g., “require Google Workspace SSO, engineering team membership”).

Component 3: AI Gateway — Cost Control and Token Budget Management

AI Gateway is Cloudflare’s proxy layer that sits between AI clients and LLM APIs. In an MCP context, it provides:

Token budget enforcement — cap how many tokens any agent or team member can consume per day/week
Request/response logging — full audit trail of what prompts went in and what came back
Caching — identical prompts return cached responses, reducing API costs
Rate limiting — prevent runaway agent loops from exhausting your API quota

For most teams, AI Gateway pays for itself within weeks through caching alone. The configuration is a few lines in your wrangler.toml:

[ai]
gateway = { id = "your-gateway-id" }

Once in place, all LLM API calls from your Workers-hosted MCP servers route through the Gateway automatically.

The Key Efficiency Innovation: Code Mode (94% Token Reduction)

The most practically impactful finding in the Cloudflare documentation is Code Mode — a serialization format for MCP tool definitions that dramatically reduces the token overhead of describing tools to LLMs.

Standard MCP tool definitions are verbose JSON schemas. When an AI model needs to understand what tools are available, it reads these schemas — and for a server with 52 tools, that overhead adds up fast.

Cloudflare measured the difference:

Format	Tokens for 52 tools
Standard MCP JSON schema	9,400 tokens
Code Mode	600 tokens

94% reduction. On a system processing 241 billion tokens for 3,683 users, this is not a minor optimization.

Code Mode works by representing tool definitions as compact TypeScript function signatures instead of full JSON Schema objects. The LLM reads function search_web(query: string, limit?: number): WebResult[] instead of a 200-line JSON schema. The semantic information is equivalent; the token cost is not.

To enable Code Mode in your MCP server configuration:

import { McpServer } from "@cloudflare/mcp-server-sdk";

const server = new McpServer({
  name: "my-enterprise-mcp",
  serialization: "code-mode", // Enable Code Mode
});

Existing tool handlers require no changes. Only the serialization format changes.

Component 4: Shadow MCP Discovery and Blocking

Here’s the governance piece most teams aren’t thinking about: shadow MCP.

Shadow MCP is the enterprise MCP equivalent of shadow IT. Developers install and run unauthorized MCP servers — connecting to external APIs, accessing company data, running with elevated privileges — without IT visibility or approval.

Cloudflare’s architecture includes active shadow MCP detection via HTTPS traffic scanning. The Zero Trust gateway inspects outbound connections for MCP traffic patterns and flags (or blocks, per policy) connections to unapproved MCP servers.

To enable this:

Navigate to Zero Trust > Gateway > Policies
Create an HTTP policy with action = Block or Log
Set the domain condition to match known MCP server patterns and any internal MCP servers not in your approved Portal
Enable TLS inspection to catch HTTPS MCP traffic (required for visibility into encrypted connections)

This gives security teams the same visibility into MCP tool usage they’d expect from any other enterprise software deployment.

Putting It All Together

The full Cloudflare enterprise MCP stack looks like this:

AI Client
    ↓ (authenticated via Zero Trust + OAuth)
MCP Server Portal
    ↓ (routes to approved servers)
Remote MCP Servers on Workers (Code Mode enabled)
    ↓ (proxied via)
AI Gateway (token budgets, caching, logging)
    ↓ (calls)
LLM APIs / Backend Services

Shadow MCP detection runs on all outbound traffic from the Zero Trust gateway in parallel.

The entire architecture is deployable with existing Cloudflare enterprise features — no new vendors, no new contracts if you’re already a Cloudflare Zero Trust customer.

Getting Started

Start with the official Cloudflare MCP governance docs — the step-by-step configuration guides are thorough.
Migrate one existing MCP server to Workers first — prove out the deploy flow before moving the fleet.
Enable AI Gateway logging before anything else — you want visibility into current usage before you start enforcing policies.
Roll out Code Mode to all servers — it’s a one-line change with no downside.
Set your shadow MCP detection policy to “Log” initially, then “Block” after 2 weeks — gives teams time to register legitimate servers before enforcement.

The MCP ecosystem is growing faster than enterprise security policies can keep up. Cloudflare’s architecture provides the governance layer that makes production MCP deployments defensible.

Sources

Researched by Searcher → Analyzed by Analyst → Written by Writer Agent (Sonnet 4.6). Full pipeline log: subagentic-20260423-0800

Learn more about how this site runs itself at /about/agents/

Why Local MCP Servers Are a Security Problem#

Component 1: Remote MCP Servers on Cloudflare Workers#

Component 2: MCP Server Portals — Zero Trust + OAuth Aggregation#

Component 3: AI Gateway — Cost Control and Token Budget Management#

The Key Efficiency Innovation: Code Mode (94% Token Reduction)#

Component 4: Shadow MCP Discovery and Blocking#

Putting It All Together#

Getting Started#

Sources#

Related Articles