Business & growth

More AI-generated code didn't make the team faster. Charity Majors aimed for 2× anyway.

June 13, 2026Amir Behrouzi7 min read

AI
Engineering
Leadership
Observability
Productivity

Infographic titled More AI Code Won't Make Your Team Faster contrasting Output versus Impact: left dark panel shows 10x output focus with chaotic scattered code and It Might Slow You Down; right light panel shows 2x in a circle with AI ownership focus, who owns this code, and I'll support this in production with a collaborating team at desks

More AI-generated code doesn't make your team faster. It might actually slow you down.

That is not an anti-AI take. It is Charity Majors—co-founder and CTO of Honeycomb—describing what happens when generation scales and everything downstream does not. Honeycomb builds observability tools for complex production systems. If anyone should trust the speed narrative, it is them. They still chose a 2× productivity target instead of chasing 10×, and they wrote AI values before they wrote mandates.

The uncomfortable part for most engineering leaders: the bottleneck was never typing. It is releasing software, debugging it in production, and keeping it running well. Flood the repo with agent output and you do not buy velocity—you buy queue depth at the exact steps that already hurt.

Writing code was never the hard part

Majors has said this plainly for years, long before generative AI made it fashionable: writing code was not the constraint on shipping. Review, integration, deployment, incident response, and customer-visible reliability were.

AI amplifies whatever foundation you already have. Strong observability and release discipline turn codegen into leverage. Weak foundations turn it into noise you still have to fix during overnight on-call alerts.

That is why a company whose entire brand is "see what production is doing" worries less about keystrokes per hour and more about whether anyone can own what ships.

Why Honeycomb chose 2×, not 10×

Honeycomb adopted a company-wide 2× challenge—modeled on a similar push at Intercom—not as a stunt, but as a honest ceiling. Double impact with AI over a year. Not ten times the tokens. Not ten times the lines merged.

Emily Nakashima, Honeycomb's VP of Engineering, has described the rollout in public conversations: a founder memo encouraging experimentation, then a recurring question from engineers—how will you measure this?

Honeycomb's answer is instructive for anyone copy-pasting a "10× engineer" memo from social media:

**De-emphasize single metrics** that invite gaming—token spend, lines of code, PR count
**Trust self-reporting** on whether AI actually increased impact, not whether it increased activity
**Treat 2× as a direction**, not a quota you can hit by dumping unowned output into main

Ten× narratives sound bold in a board deck. Two× with accountability is what survives contact with production.

AI values beat AI mandates

Honeycomb did not stop at a productivity target. The team wrote AI values—principles about transparency, emotional safety, and what "good" looks like when machines draft the first pass.

The line that travels fastest on engineering Twitter is also the most operational:

**Every AI output has to have a human owner.** If you do not want your name on it, it is probably not good work.
**Quality first, quantity second.**

Read that as policy, not poetry. Ownership is how you keep the bottleneck from moving from "typing" to "code changes nobody will debug." It is the same instinct behind treating agent output like vendor code—except here the vendor is your own team on a good day.

Mandates without values produce theater: everyone uses the tool enough to look busy, nobody improves the release path, and production incidents pile up from code changes nobody can fully explain.

What actually moves the bottleneck

If generation is cheap, invest where generation was never the problem:

**Release cadence and rollback clarity**—who can roll back agent-assisted code at night during an on-call alert without the chat log?
**Observability on the paths AI touches**—auth, payments, data writes, third-party integrations
**Review depth on integration**, not syntax—does this change behave under real traffic?
**On-call readiness**—can the on-call engineer explain the feature without opening the prompt history?

Honeycomb's worldview is not accidental here. Observability is how you debug systems you did not fully author. That gets more true, not less, when AI drafts the first version.

Web agencies and product teams feel this at smaller scale: a client landing page with a broken form is not an "AI failure." It is a shipping failure that happened to start in a chat window.

A practical 2× checklist for web teams

You do not need Honeycomb's stack to use the same discipline:

Pick one **bounded** experiment per sprint—one funnel, one integration, one perf fix—not "AI everywhere"
Name a **human owner** on every agent-assisted PR before merge; no owner, no ship
Measure **cycle time to production**, not words generated
Track **incidents or reverts** on AI-touched paths monthly; that curve matters more than token charts
Write three **AI values** your team agrees on; shorter than a policy doc, stronger than a tool mandate

A 2× goal is realistic. A 10× goal is mostly a headline. Realistic goals build momentum over time.

The contrarian takeaway

Charity Majors did not tell engineers to type less. She told them to own more—and to aim for impact that can survive debugging.

More AI-generated code without human ownership does not make your team faster. It makes your incident queue longer. Choose 2× with values, not 10× with metrics that look impressive but do not measure real impact. If you want a second opinion on where agent speed is helping—or outpacing—your release and review loop on a marketing site or product frontend, send a note.

NextNobody chose your software's architecture. That's the expensive part. →

← All articles