I vibe coded something good. That was the easy part
The PoC worked. The demo was strong. Then the team asked how to take it to production - and the real story started.
The demo worked. The retail store briefing agent understood natural language queries. Store managers could ask about yesterday's wastage, discount performance, today's plan - and get structured answers across four stores. Looked real.
The architect reviewed it and asked good questions. The code structure was reasonable. The documentation I generated explained the schema clearly — tables, columns, relationships. It was not a throwaway prototype. It was a working system built through careful iteration, domain knowledge baked into every prompt, and multiple refinements until the outputs felt right.
Then the real question arrived.
How do we make this production ready?
What the Code Believed (vs. What Production Needs)
The agent worked because I told it what to believe during the build.
For perishable items — bread, milk, fresh produce — I had encoded logic that made intuitive sense: as the day progresses and stock remains, increase the discount to move product. Cap it at 60%. The agent followed this faithfully and the demo outputs looked realistic.
But in a real retail system that logic is not a rule. It is a starting point. There are supplier agreements that restrict how low a price can go. There are store manager overrides when a promotion is already running. There are category-specific behaviours — bread on Thursday behaves differently from bread on Saturday in a store near a market. There is a provision to override the discount ceiling entirely when circumstances require it.
None of that existed in the code. The agent had an assumption. Production needs a parameter, a configuration layer, a connection to actual pricing rules, and a human override mechanism.
The same problem appeared in loyalty segmentation.
I had built four loyalty tiers as a dictionary — a clean, logical structure that worked perfectly in the demo. Online customers were identified by email or mobile. POS customers swiped a card or provided a loyalty number. The agent applied discount multipliers based on tier.
But real loyalty in retail is messier. The same customer might exist in the online system under their email and in the POS system under a card number that was never linked. The tiers I defined were conceptually sound but operationally disconnected from how loyalty data actually flows across channels. The dictionary was not wrong. It was a simulation of loyalty, not a connection to it.
When the team looked at the code they found columns and fields that existed purely to generate randomness for the demo. Category-wise discount percentages hardcoded as defaults. Start and end date assumptions that made sense for a PoC running in a single session but would break in a system running across real days with real data.
The CLAUDE.md problem
The architect asked about the CLAUDE.md file.
For anyone who has not encountered it — this is the file that tells Claude Code what the project is, what decisions have been made, what rules to follow, and what context to carry across sessions. It is the memory of the build. Without it every new session starts without the reasoning behind the code.
I had not created one.
Everything that shaped the agent — the decisions about how to handle perishable items, the loyalty tier logic, the reasoning behind the schema structure, the specific retail context I had brought to every prompt — lived inside a single chat session. When the session ended the reasoning ended with it. What remained was the code, a schema document, and the demo.
The team could see what the code did. They could not see why it was built that way. The CLAUDE.md file would have captured that. Without it the handover was a transfer of outputs without a transfer of intent.
This is not a vibe coding failure. It is a documentation gap specific to how AI-assisted development works. The code is generated fast. The reasoning that produced it evaporates unless someone deliberately captures it. In traditional development that reasoning lives in PRs, commit messages, architecture decision records, and the conversations that happen during code review. In a vibe coding session it lives in the prompts — and prompts are not automatically preserved anywhere.
The Gap Every Enterprise Will Face
This is where the story gets broader than one PoC.
Enterprise teams are building with vibe coding in two distinct situations. Some have Copilot licences and work within an integrated development environment where context can be maintained and teams can collaborate around the AI. Many others — and this is more common than the tooling conversations suggest — are building through prompt sessions, often without licences, often on personal accounts, often in a single extended conversation that produces something good and then closes.
The PoC gets built. It works. The demo is strong. And then the organisation asks the question it always asks: how do we take this to production?
At that point the real requirements appear. Code review against actual standards. Security audit — not just environment variables but the logic itself, the assumptions, the edge cases the prompt never surfaced. NFR assessment — can this handle real POS load across four stores simultaneously? What happens when the inventory feed is stale? What happens when a store manager’s query hits a category the agent was never trained on?
And underneath all of it the question the architect was really asking: where is the reasoning? Not the code. The reasoning.
What production-ready actually means for AI-assisted code
The vibe coding session produced something genuinely good. The domain knowledge I brought to the prompts meant the outputs were grounded in real retail understanding. The perishable discount logic made sense. The loyalty tier structure was logical. The briefing format was useful.
But production-ready is a different standard from demo-ready. Not just technically — organisationally.
Production-ready means a developer who did not build this can understand why it was built this way. It means a store manager can override the discount ceiling without calling a developer. It means the loyalty segmentation connects to actual customer data rather than a dictionary that approximates it. It means the randomness that made the demo look realistic has been replaced by real data feeds or configurable parameters.
And it means the CLAUDE.md file — or its equivalent — exists. The document that says: here is what this system is responsible for, here is the domain context it operates in, here is what the agent should never assume, here are the constraints that must hold when reality does not match the template.
That file is more important than the code. You can regenerate code from a well-written context document. You cannot regenerate the reasoning from the code alone.
The pattern every enterprise will face
The store briefing agent is not unusual. Across retail, logistics, finance, and operations, teams are building AI PoCs through vibe coding sessions — some with enterprise tooling, many without — and producing things that genuinely work. The barrier to a working demo has collapsed.
The barrier to production has not.
It has just moved. From “can we build this” to “can we hand this over, maintain this, extend this, and trust this when something breaks at 2am on a Saturday.”
The PoC that took hours to build will take weeks to productionise — not because the code was bad but because the reasoning was never captured, the assumptions were never validated against real data, and the domain knowledge that made it work lived in one person’s head and one chat session that is now closed.
The demo is the easy part.
The CLAUDE.md file is where the real work begins.
The barrier to demo has collapsed. The barrier to production hasn't - it's just moved.
Have you hit this wall — a working PoC that stalled on the way to production?
I would like to hear what specifically broke down. Hit reply.



What takes weeks isn’t usually the code. It’s capturing the reasoning, testing assumptions against reality, and turning one person’s domain knowledge into something a team can trust and run. Huge difference!