Twelve months of building and throwing away work.

Aaron Levie posted this on X this week.

It’s remarkable how often you need to be dramatically upgrading your AI architecture given the pace of progress in AI models right now.

If you’re building agents, you basically need to throw away large parts of previous work that you setup to compensate for model limitations… https://t.co/TXNTGLE2nQ
— Aaron Levie (@levie) April 19, 2026

That’s been my whole year.

I started Vettaris a little over a year ago. At the time I was using Cursor to build things faster. It was error-prone and made lots of mistakes. I had to tell it exactly how to do things (not just what) and I was actively watching every line, steering every move. It felt amazing.

That was twelve months ago. Today I’m running multiple Claude Code sessions in parallel. My involvement isn’t line-by-line anymore. I hand off requirements. I hand off design files. The agents write the code, and my job is mostly to check their work.

In the first four months of this year, with AI, I’ve (we’ve?) written more code than I’ve written in my entire life.

The pattern Levie’s describing isn’t just that the tools keep improving — it’s that whole categories of complexity we built up recently have just evaporated.

When LLMs first showed up, RAG was the answer for how to search large document collections. Companies invested in vector databases, chunking strategies, embedding pipelines, re-ranking layers. There were conference talks. There were consulting practices. There were startup rounds funded on it.

Whispers started late last year that RAG might not be needed. What I didn’t see coming was the replacement: grep and markdown files.

That’s it. Plain text, searched with a fifty-year-old Unix utility, handed to a model with a large enough context window. The model reads the files the same way you would.

The vector database companies are not going to put that on their homepage.

Same story with agents.

A year ago the discourse was all about how to control them. Orchestration frameworks. Whole architectures designed to keep agents on task and prevent them from wandering off into the weeds.

Then OpenClaw came out in late January and it turns out the answer was: just tell them what you want. They go do it (mostly). Check their work when they’re done. Tell them what you don’t like so they don’t do that again. Repeat.

Overnight, a whole category of software became unnecessary.

A year ago, when something I tried with AI worked, it was surprising enough that I’d run to LinkedIn to describe the experience.

Now I’m surprised when something doesn’t work. The baseline has flipped.

A few days ago my wife asked me for help with something she was stuck on: take a poor-quality photo of some sheet music and produce a clean PDF. This should be a slam dunk. Every model I tried failed. To its credit, when I tried Grok (from xAI) it thought for several minutes and then came back and told me that my request was beyond its capabilities. This is the first time I’ve really ever seen a model back off on a request because it was aware of its own limitations. If this becomes the norm, “overconfidence” will join “hallucination” as a problem that mostly goes away with better model training.

The hardest part of all this is keeping up with where the weak spots are and tracking how they change.

Claude can’t read my wife’s sheet music. But the same week I was struggling with that, Anthropic released Claude Design — a UI design product that’s about to give Figma a run for their money. I would not have expected Claude to be so good at visual design.

It’s impossible for me to imagine what this looks like a year from now because humans are bad at exponentials. I’m just resigned to the fact that anything I build today is likely to be almost useless in a quarter or two. The key is to build things that the models understand so they can do the refactoring for you when new things come along. The pace is only going to get faster, so building with that in mind is the only real defense. Buckle Up!