MCP Is Not Optional — Alchemist AI

Six months ago, when Anthropic’s account manager pitched us on the Model Context Protocol, I assumed it was a sales gimmick. Yet another standard. Probably a way to lock us further into Claude. I was tired. I shrugged. I didn’t try it for a month.

That was, retrospectively, the most expensive month I’ve ever had. Once we adopted MCP, the autonomous cluster I described in the manifesto became possible. Without MCP, I’m still up at 2 a.m. fixing production. With it, the agents fix production while I sleep.

This post explains what MCP actually is, why nothing else moves the ceiling on agentic systems the same way, and how to use it well — including the gotchas that cost us several thousand dollars of token spend before we figured them out.

The core thing to internalize

A language model only does one thing: output text. Every demo of an “AI agent” you’ve ever seen — the ones that browse the web, query a database, send a Slack message, file a GitHub issue — those are not the model doing anything. They’re the model emitting text, and a piece of software around the model interpreting that text as an instruction.

Re-read that, because it’s the entire game. The model never executes anything. It just describes what it wants executed, and software outside the model does the executing.

The conventional way for the model to describe what it wants is via a tool call. The model emits something like (in pseudocode):

I want to call a tool named "read_file" with the argument {"path": "src/index.ts"}

The framework around the model — Claude Code, Cursor, your own agent loop — parses that, runs the actual read_file tool on your machine, gets the result, and feeds the result back into the model’s next turn. The model now “knows” the contents of src/index.ts. It didn’t read the file. The framework read the file. The model just got text saying “here’s what was in it.”

This is the loop. Every interesting capability of every modern AI agent reduces to this loop.

MCP is the standard for the toolbox

Tool calls have been around for a couple of years. What MCP — Anthropic’s Model Context Protocol — added is a standard for how tools are described, registered, and called.

Before MCP, every framework had its own format for tools. If you wanted to give Claude Code a tool that talked to your database, you had to write Claude Code-specific glue. If you wanted to also give it to Cursor, you wrote Cursor-specific glue. The same tool, three times.

After MCP, you write one MCP server — a normal HTTP service that responds to a couple of standard endpoints — and any MCP-compatible client (Claude Code, Cursor, increasingly anything else) can use it. The tool advertises its name, its description, and its parameters. The framework loads the advertisement into the model’s context. When the model decides to call the tool, the framework translates that to an HTTP call against your MCP server. Your server runs the actual logic and returns a result.

This sounds boring. It’s not. Standardization unlocked the entire ecosystem. Now you have hundreds of MCP servers maintained by third parties — Stripe’s official MCP, GitHub’s, Cloudflare’s, Firecrawl’s, Supabase’s. You don’t write a Stripe integration anymore. You add the official Stripe MCP to your agent’s tool list, and the agent can do everything Stripe’s API supports.

That’s the difference between two months of tool-integration work and zero.

Two flavors: local and remote

You’ll see two kinds of MCP servers in the wild.

Local MCP servers run on your machine, as a child process of Claude Code (or whatever client you use). Claude Code starts the process when you start a session, and they communicate over standard input/output. Local servers are great for development, prototyping, and anything that needs access to your machine — file system, local databases, locally-running browsers.

Remote MCP servers are HTTP endpoints somewhere on the internet, maintained by a vendor or by you. They’re long-lived, accessible from any client, and don’t require local installation. The vendor can update the server without you doing anything. Stripe’s MCP is a remote server. So is Firecrawl’s. So is ours, increasingly, as we move features over to remote.

The boring rule: prefer remote when one exists, local when it doesn’t or when you need machine access.

Why tool descriptions matter more than you think

This is the part that took me longest to get right.

The model decides whether to call a tool based entirely on the description in the tool’s advertisement. Not the tool’s name. Not the parameter list. The description.

I had a database MCP server with a tool called query. The description said “run a SQL query.” The model would constantly call it for things that weren’t SQL queries — checking config, looking up environment variables — because the description was so vague the model couldn’t tell what the tool was actually for.

I rewrote the description: “Run a read-only SQL query against the application’s PostgreSQL database. Returns rows as JSON arrays. Use this for inspecting application state, debugging customer issues, and verifying data integrity. Common tables: app.users, chat.sessions, app.applications. Schema uses snake_case column names.” Hallucinations dropped to nearly zero.

The lesson: write tool descriptions like you’re writing a system prompt, because they’re being read by the same kind of model that reads system prompts. Be specific. Mention edge cases. Tell the model when not to use the tool. List the schema if you can. Let the model choose intelligently.

I’ve started having Claude itself write my tool descriptions. The model knows what other models will respond to better than I do.

The browser MCP is the autonomy unlock

If I had to pick one MCP that more than any other turned our cluster from “occasionally useful” to “actually shipping production code without me,” it’s the browser MCP.

Here’s why. When the agent writes code that adds a new button to our settings page, it can compile the code without errors. It can pass the tests. It can think the code is correct. None of that proves the button actually renders, that the click handler fires, that the new feature works in a browser.

With a browser MCP — we use Chromium under the hood, accessed via Chrome’s DevTools Protocol — the agent can:

Spin up a local instance of our app on a unique port (each parallel agent gets its own).
Open the page in a headless browser.
Take a screenshot.
Inspect the DOM.
Click things, fill out forms, submit.
Read the browser console for errors.
Read the dev server logs for backend errors.

The model is multimodal — it can ingest screenshots as input. So the agent literally sees the page it just built, the same way you and I would see it. If the button is the wrong color, the agent notices. If clicking it throws a stack trace in the console, the agent notices. If the network request to the API fails, the agent notices.

Closing this verification loop is what turns “AI generates plausible code” into “AI ships working code.” Without it, you’re trusting the model’s training data. With it, you have ground truth.

Off-the-shelf vs roll-your-own

Most discussions of MCP frame it as “use the official server.” That’s good advice 80% of the time and terrible the other 20%.

The 80%: when an official server exists for a service you want to integrate with — Stripe, Firecrawl, GitHub, Cloudflare — use it. The vendor maintains it, updates it, fixes its bugs. You shouldn’t be in that business.

The 20%: when the integration is with your own systems — your database, your internal tools, your custom backend — write your own MCP server. Off-the-shelf database MCPs hallucinated column names on us constantly. They didn’t know our schema, didn’t know our conventions, didn’t have our gotchas baked in. We rolled a custom MCP server that knew about our CamelCasePlugin, our soft-delete patterns, our common query mistakes. Hallucinations dropped by an order of magnitude.

Our custom MCP server isn’t fancy. It’s about 200 lines of TypeScript wrapping our existing query layer with MCP-flavored metadata. Claude wrote most of it. The point isn’t that the server is impressive. The point is that the description of the tools is precise enough that the model never has to guess.

The gotchas

Two failure modes will eat a few hours of your life if I don’t warn you.

console.log in your MCP server breaks the protocol. Local MCP servers communicate with Claude Code over standard I/O, in a structured format. If you console.log anything, that text gets injected into the protocol stream, the framework chokes on the malformed bytes, and your tools mysteriously stop working. Use a real logger that writes to stderr. (I learned this debugging at midnight.)

MCP servers must be restarted for code changes to take effect. Claude Code starts the MCP process once, at session start. If you change the server’s source code, you have to restart Claude Code. Otherwise, you’ll spend an hour debugging why “your fix doesn’t seem to be working” when the issue is that the agent is calling the old version of your tool.

The bigger picture

Without MCP, you have a model that emits text. That’s it. It can’t read your code, can’t look at your screen, can’t query your database, can’t verify anything it produces. Every “intelligent” agent without tool use is, fundamentally, a parrot with a thesaurus.

With MCP, the model can interact with the world. It can check its own work. It can correct its own mistakes. It can chain together capabilities that no single component of your system would have alone.

That’s not an upgrade. That’s a phase transition.

If your agents are still hallucinating, you don’t need a smarter model. You need to give the model you have senses. MCP is how you do that.

In the next post, I’ll walk through the architecture of the autonomous cluster I’ve described — five-stage pipeline, Loki integration, GitHub work-trees, ports per agent, the whole thing. By the end of that post, you’ll have a complete picture of what an autonomous coding system looks like in production, and what it costs.

If you want all of this packaged for you — the MCP servers, the cluster, the deployment infrastructure — join the Alchemist waitlist. We’ve spent a year tuning this. You don’t have to.