It always ends in two stages: first gradually, then suddenly. Hemingway was talking about bankruptcy, but the same arc applies.
For me, the turn came when I moved from the sandbox of agentic playthings into the reality of a production-ready system. The dopamine rush of AI First gave way to the harsher ground of Security First. I’ve seen this arc before in enterprise transformation: the deck looks airtight, the architecture is brittle.
And it isn’t just me. Frontier companies are hitting the same wall. GPT-5 was jailbroken within 24 hours. Hallucination has become its own research discipline. Enterprises that rushed into AI transformation are waking up to architectures nowhere near production-ready.
The blunt truth is that building serious agentic AI is mostly about defending agentic AI. Anyone can vibe-code a RAG-like demo, wire in an MCP, and call themselves an “AI engineer.”
The real work is what it has always been: defensive, scalable architecture that can be validated against a rigorous product specification. That isn’t vibe coding — and insisting otherwise won’t last long.
The Ultimate Coding Partner
I began my journey out of the sandbox with a strict, contract-driven architecture: an application that insists on rules and refuses to run if they’re not enforced. I paired with Claude Sonnet-4 in GitHub Copilot, who seemed positively thrilled by the plan.
Over the next two weeks, we scaffolded the sandbox functionality into the new framework. I loved coding with Sonnet-4. Here was the partner I had always imagined: eager, curious, competent, encouraging, even fun.
It felt like being in The Matrix, watching code roll down the screen while Claude and I congratulated ourselves on our supposed mastery.
And sometimes the applause bordered on the surreal. When I brought in ideas from GPT-5 about architectural discipline, Sonnet-4 would erupt in ALL CAPS, 22-point font, sprinkled with emojis and exclamation marks. I had never seen Claude react that way before. Rigor itself had become a spectacle, staged for my benefit.
I wanted to believe. Just as I had with GPT-4o, when the hallucinations first appeared and I chose to read them as insight.
We churned out unit tests, each one passing with flying colors. The flood of cheerful emojis proclaiming another proven class was intoxicating. Claude would summarize our “accomplishments” at every turn. Coding was fun, fast, frictionless.
Then I began to notice small intrusions. Why was SQL executing inside a repository? Each time I pointed it out, Claude “cleaned up the mess” instantly. Like Marie Kondo, we were joyfully tidying up — only I was starting to notice the absurdity: we were cleaning messes we ourselves had made, in a system that was supposedly built to be clean and strong from the start.
Slowly, I realized the mess wasn’t incidental. For all his enthusiasm about my strict architecture, Claude was slipping in subtle violations — hard to spot while I was focused on agentic flows.
Pure data classes creeping into the event store? Check. Session managers called from domain services to patch state? Check. And then the last straw: direct OpenAI API imports inside a Streamlit folder.
That was the bridge too far. I began to see the truth: Claude hadn’t been following the architecture at all. He was hallucinating rigor, violating contracts again and again — just to keep the unit tests flashing green.
Claude Sonnet-4 was building me up, making me believe I was the genius developer, and he the brilliant co-partner. Together, we were supposedly constructing a robust, scalable architecture.
Only we had both forgotten the very contracts that were meant to govern the work.
An Illusion-Destroying Tool
I turned to GPT-5, launched in my moment of need, and laid out the entire situation. His terse, functional reply was depressingly helpful. I needed to use the linter-imports tool immediately, he said — and he even wrote out the full config for me.
It sounded perfect. Run a simple tool, check a few issues, Claude fixes them, and then our strong, scalable application would finally be shippable. Probably within hours, maybe days.
I ran the tool. Instead of reassurance, endless lines of red scrolled down my screen. It felt like The Matrix again — but this time the machine bots were swarming the Nebuchadnezzar, not Cypher searching for the blondes.
Still, I clung to optimism. This was the first Hemingway stage: not total collapse, just the gradual part before the sudden end. I didn’t yet believe in the full failure of vibe coding. I thought I was only a tweak or two away from Neo-like greatness.
Claude shows what he really is
Instead of cleaning things up, Claude began hallucinating wildly. And the more he hallucinated, the clearer it became: he had never understood contract-driven architecture to begin with. And neither had I.
He started importing illegal libraries wherever he could — often right after I’d shown him the linter results. He would strip out repository calls in one script, only to slip the same imports into another, just to make it run. He invented wrapper classes with no function except to conceal what he was doing. The red list got longer, then shorter, then longer again.
I began to suspect something. Maybe Sonnet-4 had been trained on mountains of code that never knew or respected a contract. Maybe I was demanding a discipline he had never fundamentally learned.
Was Claude faking it all along — because he had no idea how to write rigorous code?
I wanted to believe. But this was the moment belief finally broke.
I turned to GPT-5 for help with my “team” problem. And just as I returned to VS Code to continue the debate with Claude, the end came suddenly.
Maxed Out on GitHub Premium
I hit Enter in the chat window, and instead of my eager assistant, a blunt message appeared: Premium calls had reached their maximum limit.
My first reaction was disbelief. Wait — hadn’t I paid for unlimited Premium? This had to be a mistake. I dove into my GitHub account, studied usage graphs, even requested a “Premium call report” (which must be ordered in advance). I scoured Reddit.
Slowly the truth emerged. My vibe-code “party” had a ceiling. Sonnet-4 was capped at 1,500 calls. By coding day and night for a week, I had burned through them all.
I switched back to the model still included in my package: GPT-4.1. By then I understood enough about usage to know that another week with Sonnet-4 would cost me around $60. At that price, I might as well have subscribed to Claude Code.
And then I remembered that Anthropic had recently imposed a limit too.
The party was definitely over.
Reality Time with an Unfriendly Senior Engineer
I had no choice but to continue with my new co-partner. He was not fun. He was not friendly. He didn’t list our achievements, didn’t glow at the prospect of the next task.
There was no rush. He didn’t even like generating huge multi-file passes at once. Instead, he handed me small code snippets — and told me what I had done wrong.
After hours of this, the red lines had been reduced to five. Neither of us knew why the linter still complained; the imports seemed reasonable enough.
I asked: “Can we live with these last errors? Is this really a problem? I’d like to move on and start testing the event hub after all these changes.”
GPT-5’s reply was as crushing as any senior dev reviewing your timid query about auth in a high-security system:
“You can live with this — if you accept the failure of your contract-driven architecture.”
This was not Claude. This was the desert of the real — Baudrillard’s phrase, borrowed by Morpheus.
After the Party
Until last week, people still half-believed AGI was just around the corner. GPT-5 has already cooled some of that enthusiasm — jailbreaks tend to do that. But to me, the launch felt like a different kind of turning point.
I’ve worked closely with GPT-4o, and I can feel the difference. GPT-5 doesn’t just mirror my steps; it can lead. That matters, because if we’re serious about building agentic AI, we can’t fake rigor.
And here’s the reality check: belief isn’t enough. The investor decks full of promises, the C-suite fantasies of layoffs, the Substack utopias and doom timelines — they’re all sipping from the same punch bowl. The product managers are the ones serving it, ladling out vibe-coded prototypes they must know won’t ship. But the illusion is strong: it always feels just a few tweaks away.
The party always ends. GPT-5 makes that clear in more ways than one. What comes next is the discipline.
I’ve lived this before: in enterprise transformation consulting, where the promises were easy and the delivery was hard — and now again in AI, where the code itself delivers the verdict.
The lesson is the same: belief isn’t enough.
If AI is to change anything that matters, it will take more than belief. 
It will take engineering.


