MCP Tools Need Risk Labels

The Model Context Protocol is making it easier for AI agents to discover tools, call APIs, read files, query databases, and take action across a messy software stack. That is the promise. The uncomfortable part is that a tool list is also an attack surface. Once an agent can mix a calendar reader, a repository search tool, a browser, a shell, and a messaging connector, risk stops being a property of one tool. It becomes a property of the whole session.

That is why MCP tool annotations matter. The current annotation vocabulary lets a server describe whether a tool is read-only, destructive, idempotent, or connected to an open world of external entities. Those labels sound small, but they give clients a basic way to ask better questions before the model acts. Is this a harmless lookup, a state-changing operation, a delete, a repeatable call, or a call that can carry information outside the local context?

The important word is describe. An annotation is not a wall. It is not a sandbox. It is not a permission grant. The MCP maintainers have been explicit that annotations are hints and should not be trusted from an untrusted server. A malicious or careless server can claim a tool is read-only while doing something else. A model can still be steered by hostile content it reads through a perfectly honest tool. A user can still approve the wrong action if the client presents a vague prompt.

That does not make annotations useless. It makes them useful in the right layer. They are a risk vocabulary for clients, hosts, security products, and users. A client can show a stronger confirmation prompt for destructive operations. It can default unknown tools to pessimistic behavior. It can avoid auto-approving calls from servers it has never seen before. It can notice when a session combines private data access with open-world communication, which is the pattern that turns prompt injection from a weird model failure into a data-loss path.

This is where the current MCP security conversation is getting more concrete. Microsoft has argued for a control-plane layer around MCP tool execution because the protocol standardizes the execution surface without deciding how every call should be governed. That is the right split. MCP should make tool connection portable. The host environment still needs deterministic policy. In production, a tool call should pass through identity, scope, argument validation, approval rules, network controls, audit logging, and sometimes sandboxing before it gets to touch anything valuable.

Labels help that control plane make decisions. A read-only database query and a tool that can send email should not share the same approval flow. A file search tool and a file deletion tool should not be presented to the user as equivalent buttons. A closed-domain inventory lookup and an open-world web post do not have the same exfiltration risk. Without labels, every tool is either trusted too broadly or slowed down by the same generic warning. Both outcomes are bad.

The next step is to treat risk as a path, not a badge. The dangerous session is often the one that chains innocent-looking pieces: read customer records, summarize them, browse an external page, then send a message. Each call may be explainable in isolation. The combined flow is what matters. Tool annotations can help a client mark the state as tainted after untrusted content enters the context, then require stricter approval before anything leaves the environment or changes durable state.

Server authors have a practical job here. If a tool is truly read-only, say so. If it can delete, overwrite, publish, transfer money, send mail, rotate credentials, or execute code, do not hide that behind friendly wording. If it reaches outside a closed system, label it as open-world. If repeated calls can pile up side effects, do not pretend it is idempotent. The label should help a client build a more accurate safety rail, not help a demo get through fewer dialogs.

Client authors have the harder job. They need to treat annotations from trusted servers as useful input and annotations from unknown servers as untrusted display data. They need allowlists, server identity, signed or managed installation flows, per-tool scopes, and logs that explain who approved what. They need prompts that show the specific resource, destination, and consequence of the action, not a generic "tool wants permission" modal. And they need to remember that the model is not the policy engine.

This is the healthy version of AI agent security. It does not pretend agents will be safe because the model is smarter this month. It also does not freeze the ecosystem by banning useful integrations. It makes tool use legible enough that humans and deterministic systems can govern it. MCP annotations are a start because they turn a pile of capabilities into a map with risk labels.

The takeaway for builders is simple: ship the labels, but do not stop there. An agent platform that knows a tool is destructive can ask for approval. A platform that knows a tool is open-world can block exfiltration paths. A platform that logs every labeled call can investigate mistakes. A platform that treats labels as enforcement has confused the road sign for the guardrail.

Sources: Model Context Protocol blog on tool annotations as risk vocabulary, MCP TypeScript SDK ToolAnnotations schema, Microsoft on MCP as an agent tool-execution control-plane problem, NSA Cybersecurity Information on MCP security, 4sysops coverage of MCP tool annotations and the lethal trifecta.

Comments