The Autonomous Development Manifesto
====================================

Why we're building Alchemist AI: a working theory of how autonomous coding agents change the unit economics of software, who wins, who's at risk, and what the next three years look like from the inside.

The printing press got invented in 1450. Roughly a century later, the entire economic and political structure of Europe was different. Not because the press had changed anything directly — the press just printed paper — but because the act of distributing knowledge had been delaminated from the act of producing it. Once that came apart, every system built on that coupling came apart with it.

Software is in 1450 right now.

For the last fifty years, producing software and distributing it have been welded together. To ship a feature, you needed someone — usually a credentialed someone — to hand-write the lines. To run a software business, you needed a roster of those someones. Every SaaS price tag is, ultimately, a way of paying down the cost of the team that wrote the code.

Autonomous coding agents are pulling those two halves apart. The act of producing software is delaminating from the act of running a software business. That's the thesis, and it's the thesis Alchemist is built on.

I'm going to make the case for what's actually happening, what isn't, and why we're betting our company on it.

What changed

I've been a software engineer at Reddit, at Amazon Music, at Home Depot. I've been an engineering manager. I've shipped a lot of code by hand. None of that prepared me for the last six months at Chipp.

We don't write code anymore. Scott — my co-founder — and I have eight Claude Code sessions running in parallel on my laptop right now, and they're shipping features straight to production. Not pull requests. Not "AI-assisted commits." Production. There is no human review gate between the agent and the live system. We deploy 20 to 30 times a day. Two months ago I started sleeping through the night for the first time since we raised our seed round, because the agents — not me — get paged when production breaks, and they fix it before I wake up.

This isn't a hype reel. It's a normal Tuesday. The autonomous cluster has been running in production for two months. It's how we built our agentic commerce protocol implementation. It's how we built our voice-agent stack. It's how our customer support works — when a customer Slacks us a bug, an AI agent reads our codebase, writes the fix, verifies it in a real browser, pushes it to production, and posts the result back in the channel. Median time from "this is broken" to "this is fixed in prod" is around thirty minutes, and we're trying to get it to ten.

I tell people this and they nod politely and assume I'm exaggerating. I'm not. The reason I'm not is the reason this manifesto exists.

Why now and not last year

Three things compounded.

First, the models got good enough. Claude Opus 4.6 was the inflection point for us. Below that line, the agent would write code that looked plausible but didn't run; it would hallucinate function signatures; it would take twelve turns to do what should have taken two. Above that line, you can hand it a bug report and walk away. Opus 4.6 is the printing press of this analogy — without it, none of the rest matters.

Second, the tool surface got rich enough. Two years ago, an LLM was a chat interface. Now it's a programmable agent that can read files, run shell commands, query a database, drive a browser, hit your production logs, and call any HTTP API on the internet. The Model Context Protocol — Anthropic's USB-for-AI standard — is the boring detail that makes this whole thing work. The model still hallucinates plenty. But now it can check itself. It can take a screenshot of the page it just built and notice that the button is the wrong color. It can read its own server logs after the deploy and notice that the error rate spiked. The hallucinations stop costing you when the agent has hands and eyes to verify with.

Third, context engineering turned out to be a real skill. The first time I tried to get an agent to ship a feature autonomously, it failed. Not because the model was dumb. Because I dumped a 200,000-token codebase into its context and asked it to fix something on line 14,000. We had to figure out — and we did, painfully, over thousands of dollars in token spend — how to load only the relevant context into a finite window, how to summarize without losing the load-bearing details, how to chain agent runs together so the output of one becomes the input of the next without information loss. That's not a model improvement. That's a software engineering discipline. Anyone can learn it. Most people haven't yet.

When those three things land together — capable model, rich tool surface, learned discipline — you get autonomous software development. Which is what Alchemist is.

What this changes

The unit economics of software flip.

Before: a feature costs you one engineer-week. Maybe two. The engineer gets paid whether the feature is valuable or not. To run a software business, you stockpile engineer-weeks. That's why software companies look like software companies — most of the budget is people, most of the people are engineers, most of the engineers are working on things you can't sell yet.

After: a feature costs you a few dollars in API tokens and the time it takes you to describe it. Most of our changes at Chipp cost between $2 and $4 in token spend, end to end — research, implementation, code review, deploy. A senior engineer's hourly rate, by comparison, is somewhere north of $150 an hour fully loaded. The arithmetic is grim if you're a ten-thousand-person engineering org and exhilarating if you're two people trying to ship a venture-backed product.

This is the part that most public commentary gets wrong. The story is not "AI replaces engineers." The story is "AI replaces the bottleneck of engineer time." Engineering judgment — what to build, what to skip, what's worth shipping rough versus polishing — does not get cheaper. It might get more valuable. What gets cheap is the labor of typing the code. That used to be the limit. Now it isn't.

A handful of consequences fall out of that.

Tiny TAMs become buildable. A "total addressable market" too small to justify the engineer-weeks is not too small to justify a weekend. There's exactly one HOA-management SaaS in the country that serves Upstate South Carolina specifically — there isn't, actually, but you see where I'm going. The HOAs hate the national tools, the national tools can't justify customizing for them, and a builder with an autonomous cluster can spin one up in a week that's better, cheaper, and locally serviced. The barrier was always the cost of the build, not the size of the market. The build cost just collapsed.

SaaS unbundles. Big SaaS companies are priced to amortize the engineering team that built them. That price is now an arbitrage opportunity. You can build the slice of the SaaS your customer actually uses for one to two orders of magnitude less, charge less than the incumbent, and be the local face the incumbent can never be. We have customers right now selling Chipp-hosted agents at 90% margins on top of us, to enterprises whose previous spend was on roll-your-own engineering teams. None of these customers are software people. They're domain people who learned how to describe a feature.

Every engineer becomes a manager. Not in the headcount sense — most companies are about to have fewer engineers, not more. In the cognitive sense. The day-to-day work of a senior engineer is becoming what a senior engineer's day used to look like in the rare moments when they had a productive intern. You assign tickets, you review the work, you push back on the bad calls, you absorb the merge. The agent does the typing. If you've been a strong manager and a mediocre coder, this is your moment. If you've been a strong coder and a weak manager, you have homework.

What hasn't changed

I'm going to be honest about the limits, because the maximalist version of this story is wrong and the people selling it are about to get a lot of folks hurt.

Autonomous agents fail. They fail about 20-30% of the time at Chipp, even on tickets we've tuned the system for. When they fail, the failure mode is usually one of three things: the prompt didn't have enough context, the context window overflowed and the agent forgot what it was doing, or the agent hit a class of problem (cross-cutting performance work, ambiguous product calls, anything requiring you to hold the whole system in your head) that the model genuinely cannot do yet.

The first two are fixable with better engineering. We've been chipping away at them for a year. The third is a model problem, and you're at the mercy of Anthropic's release schedule.

There's also a platform-risk story that nobody likes to talk about. Right now, Anthropic's models are the best for coding, by a margin. If you build your business on top of Claude — like we have — Anthropic eventually owns your margin. That's the deal you're in. The defense against that is distillation: training your own smaller model on the outputs of the frontier model, locking in the parts of your workflow that won't change for a year. We've been distilling for two months. It's the long game, and it's the part of this that the AI labs are going to lobby to make illegal, because it's the only thing that prevents a one-lab monopoly from owning the entire software industry. Watch that fight. It's the most consequential AI policy debate of the decade and almost nobody is paying attention.

What we're building

Alchemist is the cluster I just described, packaged as a product anyone can use.

You describe what you want. We deploy an autonomous engineering team — research agent, implementation agent, code-review agent, documentation agent, deploy agent — that builds it. The output is a real codebase, in a stack we've spent six figures of token spend tuning (Deno on the server, Svelte on the client, Cloudflare for delivery), running on infrastructure we've made cheap. You can use the platform forever, or you can eject — pull the GitHub repo, take the code, run it yourself, and stop paying us. We optimized for that on purpose. If we ever raise prices into a corner, you walk away and we deserved it.

That's the bet. The bet is that the same way 1450 didn't end up being about the press itself but about everything the press uncoupled, the next decade isn't about coding agents. It's about everything that comes apart once code stops being scarce.

The companies that are going to be huge in 2030 are the ones building right now, while everyone else is still arguing about whether the agents really work. The agents really work. We've been running them in production for two months. The window where this is a contrarian take is short.

If you want to be early, join the waitlist.

If you want to understand the engineering — the context-engineering tricks, the self-healing loop, the bash harness, the distillation moat — I'm writing a series of posts that goes deep on each piece. The series picks up where this one ends.

We'll see you on the other side.

— Hunter Hodnett, co-founder, Alchemist AI