When Claude AI meets senior engineers, how much faster can delivery actually get?

Over the past six months, "vibe coding" has been abused by the industry to mean "AI writes code, it looks fine to me, ship it." We insist on putting the definition back:
VIBE = Vision-led, Iteration-driven, Boundary-aware, Engineered.
What's the difference? Counter-example first: opening Cursor, copy-pasting ChatGPT output, watching it run, and committing — that isn't vibe coding, that's a tech-debt assembly line. Our methodology is disciplined AI-assisted development: plan first, generate next, then review strictly. AI's role at every step is explicit, and "let AI handle everything" is not on the menu.
Every week in internal review I ask the same question: "This block AI wrote — do you understand why it's written this way?"
The essence of that question is judgment. AI can ship a reasonable-looking React component in 30 seconds, but it won't tell you:
Judging these things is exactly what gives senior engineers their leverage today. AI writes code fast, but reviewing AI-written code takes more skill. Our internal phrasing: "AI made writing code cheap; it made understanding code expensive."
People keep asking us: "Is VIBE Coding really fast?" The most concrete proof is that we turned the knife on ourselves — we ditched Swingvy after nearly two years and built EKel's internal HR / operations system from scratch with VIBE Coding. Traditional estimate: 4–6 months. Actual: live in 4 weeks, three platforms shipped in lockstep — Web Application, iOS Mobile App, Android Mobile App.
Why operate on ourselves: because it's the most honest test. Working on yourself, you can't hide behind client pressure, blame ambiguous requirements, or wave away anything with "the client asked for it." From spec to launch the entire project was on us, every hour with a clear owner.
The technology choices were deliberately conservative and maintainable for the long haul:
Below is the breakdown of who did what across these 4 weeks — AI vs human (5-person consulting team):
| Stage | Traditional hours | AI-assisted hours | Main work |
|---|---|---|---|
| Requirements → spec | 60h | 30h | AI turns interview transcripts into user stories; humans review and patch holes |
| Data model design | 40h | 40h | Fully manual: architectural decisions are not delegated to AI |
| Schema / API generation | 160h | 16h | AI generates Supabase schema, API routes, type definitions from the spec |
| Web frontend | 240h | 60h | AI writes components; engineers review and refactor |
| iOS / Android Mobile App | 280h | 80h | AI writes cross-platform shared logic; engineers handle platform-specific differences |
| Receipt AI integration | 80h | 30h | Claude API prompt design; humans gate edge-case handling |
| Automated testing | 120h | 32h | AI generates test cases; engineers decide what to cover |
| Code Review | 80h | 80h | Hours didn't drop, they went up: AI's output has to be reviewed |
| Integration testing | 100h | 60h | AI helps scaffold integration tests |
| Documentation | 60h | 8h | AI reverse-engineers docs from code; humans proofread |
| Total | 1220h | 436h | ~64% saved |
Two key observations:
Related reading on this project: EKel internal HR system — full case — Sprint 1 / Sprint 2 delivery cadence in detail, the concrete implementation of receipt AI, why we chose this stack, and how it's running in production now.
Every ticket walks through these 8 steps, no exceptions:
In this flow, AI is always the first draft, humans are always the final draft.
Claude once produced code calling a particular REST API endpoint — and that endpoint didn't exist. It took us an hour to debug before we realized AI was hallucinating.
Countermeasure: All AI-generated calls to external APIs must come with a link to the original documentation, and engineers verify before shipping. We added a rule to our prompt template: "If you aren't sure an API exists, output a TODO instead of pretending to know." Hallucination rate dropped noticeably.
Passing the TypeScript compiler doesn't mean correct. AI-written integration layers often mix Date and string at the boundary — fine on localhost, blows up in production the moment timezone-aware data lands.
Countermeasure: Boundary layers (anywhere we talk to an external system) are written by humans; AI is used heavily inside internal business logic.
AI has strong opinions about aesthetics and zero opinions about usage context. It will hand you a dashboard that looks slick — with the button users press 30 times a day buried three menus deep.
Countermeasure: AI can draft the UI, but face-to-face walkthroughs with users are non-negotiable. We require at least 5 real-user feedback notes on file before any UI ships.
AI loves writing try { ... } catch (e) { console.log(e) } — the kind of "fake error handling" that looks safe but leaves you without a stack trace when production breaks.
Countermeasure: A CI lint rule bans catch blocks that only console.log without rethrowing; Code Review forcibly checks every catch for actual meaning.
VIBE Coding isn't just a tool upgrade — it's quietly rewriting the engineering team pyramid:
| Role | Past team mix | VIBE team mix |
|---|---|---|
| Senior engineers (10+ yrs) | 10–15% | 30–40% |
| Mid-level engineers (3–7 yrs) | 50–60% | 30–40% |
| Junior engineers | 25–35% | 10–20% |
The pyramid gets squeezed into a diamond. The "write the first version from spec" work that used to belong to mid-level engineers — AI takes 60–80% of it. The "boilerplate code" work juniors used to do — AI eats it whole. What's left in demand are senior people who can design, review, and judge across domains.
Implications for enterprise hiring:
Many companies miscalculate the ROI of VIBE Coding by subtracting "AI tool subscriptions" from "engineer salary saved." That math is wrong. The correct formula is:
ROI = (time-to-market acceleration × business opportunity) − (AI subscription + extra review cost + training investment)
For B2B SaaS, launching 3 months earlier usually means two extra quarters of contract revenue; for internal-controls / compliance apps, launching earlier shrinks the risk window. These business effects vastly outweigh the tool fees.
At the same time, companies often overlook hidden costs:
Only when all of these are in the ledger can you honestly judge whether VIBE Coding is worth it for your organization.
The truth about VIBE Coding: it makes senior engineers more valuable and junior engineers more anxious. Because AI has replaced the rote work of writing code, what's left is all judgment.
If no one on your team is doing AI-assisted development by the rules yet, it's not too late to start. If you're looking for senior people to lead this transformation, come talk to us.
Not all prompts are useful. The four templates below survived hundreds of attempts in our team:
You are a senior engineer. Please write a service class for the following user story:
User Story: <paste user story>
Requirements:
- Handle permissions and data boundaries (do not bypass row-level security)
- All database queries must be at the top of the method; avoid queries inside loops
- Throw explicit custom exceptions; do not silent-fail
- Include matching test cases covering positive, negative, and bulk scenarios
- If the user story is ambiguous, list the parts you assumedPlease review the UI component below, focusing on:
1. Whether the data-loading strategy is reasonable (sync vs async)
2. Any unnecessary re-renders
3. Whether error handling silent-fails
4. Any reusable sub-components that should be extracted
5. Accessibility (a11y) issues
Provide concrete suggestions (not just "could be improved" — say "change it to this").The following is integration code that calls an external system. Please list every possible failure mode (network, auth, schema mismatch, timeout, rate limit, etc.) and for each one explain:
1. How will our code react?
2. If the reaction is not good enough, how do you suggest changing it?The following is a requirement description from the client. Before writing any code, please:
1. List the parts you are assuming (e.g., "I assume 'all customers' means customers where IsActive=true")
2. List at least 5 boundary cases that could leave the requirement ambiguous (e.g., VIPs, dormant accounts, accounts in arrears)
3. Write two versions of the user story, each representing a reasonable interpretationThese four templates get used every day on our team. A good prompt template isn't fancy writing — it's writing the traps up front.
The two most common failure modes for new engineers: either they don't trust AI at all (slow output), or they trust AI completely (buggy output). We designed a three-week program:
| Week | Activities | Evaluation |
|---|---|---|
| Week 1 | Pure manual coding, AI banned; ship 5 small tickets | Establish baseline code quality |
| Week 2 | AI required, but no accepting v1 directly; minimum 3 rounds of AI ↔ human dialogue | Assess prompting and review skill |
| Week 3 | Free AI use, but every PR includes "what AI wrote, what I changed" notes | Assess judgment |
The key principle: first let them trust themselves (Week 1), then let them learn to ride AI (Week 2), and finally let them develop discernment (Week 3). The order can't be reversed, or you'll grow engineers with "AI dependency syndrome."
The last commonly ignored question: when AI-written code breaks in production, who carries the blame?
Our internal rule: every AI-generated commit must tag its commit message with ai-assisted, but the PR author is the final accountable party. AI cannot sign off on a PR, so when a human signs, that human owns the judgment.
For monitoring, do two things:
ai-assisted=true tag in Sentry / DataDog: makes it easy to analyze the production failure rate of AI-assisted code after the fact.Only when both of these are in place does AI-assisted development become "engineering practice" rather than "gambling."
AI makes "just build it ourselves" look trivial: two internal engineers, Cursor plus Claude Code, a working demo by week 4. But enterprises don't need demos — they need systems that employees still want to use 18 months later, that audits clear, that don't blow up at the next compliance check. This essay walks the timeline of the DIY-with-AI path — what it looks like at week 4, month 6, month 12, and month 18 — and why the gap between expert and non-expert AI use is the 5–10× output multiplier that decides which path you actually walk.
We haven't shipped Agentforce for a client yet — but we've spent 18 months tracking it. This post compiles failure modes from Western early adopters, Salesforce's platform evolution from Agent Builder to Testing Center to Agentforce Script, and a decision framework with code samples for enterprises preparing to launch in 2026.
Our financial-services delivery experience comes from Australia — our CTO led FSC implementations at two Australian Tier 1 banks and one mid-sized bank. This article maps that experience onto Taiwan's regulations, core systems, and budget structures, giving decision-makers about to kick off a project a frank, vendor-spin-free basis for judgement.
A 30-minute conversation with a CTA. Based on your situation, we will answer directly: worth doing, too early, or not our fit.
We use cookies
We use strictly necessary cookies to run this site, plus optional analytics cookies (Google Analytics) to understand how visitors use it. See our Cookie Policy and Privacy Policy.