We're Designing AI-Era Teams Without a Blueprint

Abstraction Has Always Been the Job
The Job Quietly Moved Up a Layer
The Evidence That Tool-Drop Isn't Working
Why There's No Blueprint Yet
What Designed (Not Procured) Looks Like
Sitting With the Discomfort

The first years of my career, I built almost everything from scratch. Authentication, ORMs, queues, deployment scripts, the boring middle of the web app — if I needed it, I wrote it. There were libraries, but most of the load-bearing pieces were mine. The job was, in the most literal sense, implementation. You learned an idea, you typed it out, you owned every line.

That world ended slowly, then all at once. By the mid-2010s the load-bearing pieces had names and version numbers. By the early 2020s almost every meaningful capability — payments, search, vector databases, observability — was one import away. And then AI added another layer above that, one that does not just hand you a function but proposes whole shapes of solutions. Each of these waves did more than change the tools. They changed what engineering means. The current wave changed it more than most leaders are willing to admit, and the awkward truth is that we have not yet figured out how teams should be designed around it.

There is no playbook. We are improvising in public, and the data on how the improvisation is going has finally started to come in.

Abstraction Has Always Been the Job

It is tempting to treat AI as a discontinuity, a thing that broke the timeline. It is not. It is the latest entry in a forty-year pattern of rising abstraction, and the pattern has rules.

Fred Brooks named the rules in 1986. In No Silver Bullet he split software complexity into two kinds: essential and accidental (Brooks, 1986). Essential complexity is the problem itself — the messy, irreducible business of modeling reality in code. Accidental complexity is the cruft we accumulate around it: yak-shaving, boilerplate, ceremony, the work of translating intent into the machine's vocabulary. Brooks' claim was that every productivity revolution in software — high-level languages, structured programming, OOP, frameworks, package managers — attacked accidental complexity, never essential. The essential problem stayed roughly the same size; the cost of the surrounding ritual kept shrinking.

AI is doing exactly the same thing, just faster and at a higher altitude. It is not making the problem smaller. It is removing another slice of the ritual around it: the lookup, the boilerplate, the translation between what you mean and what the machine accepts. That is genuinely valuable, and it is also genuinely familiar. The discontinuity story is wrong; the cumulative-effect story is right. Each layer raised what was expected of the engineer above it, and the latest layer raised it sharply. As I argued in Why AI System Design Demands a New Engineering Mindset, the constraints engineers operate under have shifted; the discipline has to shift with them.

The Job Quietly Moved Up a Layer

What an engineer actually does has not changed: make decisions under constraint. What has changed, repeatedly, is which constraints land in scope.

A staff engineer in 2005 made decisions about memory layout, threading, and module boundaries. A staff engineer in 2015 made decisions about which framework, which queue, which database, and how to make four services not corrupt each other's state. A staff engineer today still makes those decisions, but increasingly they are also making decisions about which model handles which workflow, what level of autonomy a given AI surface should have, where the evaluation harnesses sit, and which guardrails the team will actually use rather than route around. The team itself has become part of the system being engineered. Tool choice is architectural.

This is the quieter shift inside the loud one. Engineering judgment that used to apply to the codebase now also applies to the team's AI stack — and the two are not separable, because the team's stack shapes the codebase the team produces. A team that uses AI heavily for scaffolding produces a different codebase than one that uses it heavily for review. A team where junior engineers get the same autonomy surface as principals produces different code than one where the autonomy surface is tiered. None of this is in a textbook yet.

The first anti-pattern of the moment falls out of this directly: rolling one tool, one workflow, one autonomy level uniformly across a heterogeneous team. A staff engineer, a domain specialist, and a six-month junior do not need the same AI surface. Treating them as if they do is not democratization; it is just under-engineered. The same point shows up in You Can't Vibe Code Past Your Own Engineering Judgment — the tool's leverage tracks the skill it is leveraging.

The Evidence That Tool-Drop Isn't Working

If the team is part of the system, then dropping a tool into a team without redesigning anything is the AI-era equivalent of bolting a new database onto a service without changing the schema. It usually works for a while. Then the measurements come in.

The most uncomfortable measurement came from METR in 2025. Sixteen experienced open-source developers, with around five years of average prior experience on the repositories they were working in, were randomly assigned to complete 246 real tasks either with or without AI tools — primarily Cursor Pro and Claude 3.5 / 3.7 Sonnet. Before starting, the developers forecast that AI would make them 24% faster. Outside experts in economics and machine learning predicted gains closer to 38%. The actual result was the opposite: with AI tools, the developers were 19% slower (METR, 2025). After completing the study, the developers still estimated they had been 20% faster. The perception gap was wider than the productivity gap itself.

That study is about individual time. Google's 2024 DORA report, drawing on a much larger population, looks at team-level delivery. It found that AI adoption tracked with a 1.5% decrease in delivery throughput and a 7.2% decrease in delivery stability, even as individual flow and job satisfaction rose. Burnout, despite all the talk of AI relieving tedium, did not meaningfully decline. The report's authors point at a mechanical reason: batch sizes grow when AI is in the loop, and large batches were always corrosive to delivery performance (DORA & Google Cloud, 2024). The tool is upstream; the team's process is not.

The third measurement is at the level of the code artifact itself. GitClear analyzed roughly 211 million lines of changed code across the AI-adoption curve. Between 2021 and 2024, the share of changes classified as refactoring collapsed from 25% to under 10%. Two-week code churn — the share of new code revised within a fortnight, an indirect signal of "ship it then fix it" — climbed from 5.5% to 7.9%. Duplicated code blocks rose roughly eightfold in 2024 alone (GitClear, 2025). The pattern is consistent: less reuse, more rewriting, more redundancy.

Three independent measurements, three different layers — individual time, team delivery, artifact quality — all moving the same direction when tools are dropped into teams without team-level redesign. That convergence is the story. It also produces the second anti-pattern of the moment: measuring individual velocity, declaring victory, and missing that the team-level outcomes are quietly degrading underneath. Vanity metrics have rarely been more available, or more misleading.

Why There's No Blueprint Yet

The honest answer to "what should an AI-era engineering team look like?" is that nobody knows yet, and the ecosystem around the question is making the discovery harder rather than easier.

Look at the developers themselves. Stack Overflow's 2024 survey found that 76% of respondents were using or planning to use AI tools, up from 70% the year before. But favorability dropped from 77% to 72%, the trust gap between use and confidence in output widened, and 45% of professional developers explicitly said AI tools were bad or very bad at complex tasks (Stack Overflow, 2024). Adoption is still rising. Confidence is falling. That gap is not a sign of failure — it is what an industry in the middle of its discovery phase looks like. People are using the thing while learning that the thing does not yet do what was promised.

The AI-influencer economy actively makes that discovery slower. Engagement-optimized content rewards novelty, not the long, boring signal of team-level outcomes over many quarters. The loudest voices in the space have, almost by construction, not run an experiment for long enough to learn anything from it. Frameworks are starting to appear — McKinsey's framing of future teams as "orchestrators of parallel and asynchronous AI agents" is one of the more useful sketches, and the 40–70% productivity gains it cites are real on routine tasks, which is the asterisk most coverage drops (McKinsey & Company, 2025). But a sketch is not a playbook. The patterns we will actually use are still being discovered in places that do not generate viral content: inside teams quietly measuring their own outcomes, past the novelty curve, against fundamentals that have not changed. I covered the wider rhythm of this in The AI Rollercoaster: Lessons from Hype Cycles — every prior wave of AI followed roughly the same arc, and the useful patterns always emerged after the noise quieted.

We are, right now, in the noisy phase. Acting otherwise is wishful thinking.

What Designed (Not Procured) Looks Like

Even without a settled playbook, the shape of what a designed AI-era team needs to respect is becoming visible. Three things, at minimum.

The first is skill fit. Different roles need different AI surfaces, and the combination that helps a staff engineer can be the same combination that erodes a junior's craft. Autonomy is a knob that should be tied to the skill it is leveraging, not handed out uniformly because procurement bought a license. The second is guardrails as design, not policy. Review tooling, evaluation harnesses, sandboxing, isolated environments, and team rituals are part of the tool fit — not a memo sent out after rollout. A guardrail that lives in a wiki is not a guardrail. The third is team-level metrics. DORA fundamentals — small batches, change-failure rate, lead time, stability — over individual-velocity vanity numbers that mostly measure how good the autocomplete feels in the moment.

The third anti-pattern of the moment shows up here: leadership treating AI adoption as a procurement problem ("we bought Copilot, we are AI-enabled") rather than a team-design problem. The tool is one input. The team's structure, autonomy distribution, review process, training, and incentives are the rest of the inputs, and they were never going to install themselves. This is the same dynamic I described in The AI Paradox: How Innovation Can Lead to Failure — the failure mode is rarely the technology. The failure mode is the missing organizational response to the technology. A Business Leader's Guide to Strategic AI Evaluation covers the upstream version of this: deciding whether a given AI investment fits a real pain point before designing the team around it.

None of these three principles is a playbook. They are the constraints any future playbook will have to respect.

Sitting With the Discomfort

The conclusion most leaders want is a checklist. There is not one yet, and the people offering you one are usually selling something. The honest conclusion is harder: we are in the discovery phase, the noise around the discovery is unusually loud, and the only way through is to run experiments long enough to read team-level signal — not individual-velocity signal, not anecdote, not whatever the AI-influencer feed served up this morning.

The teams that figure this out first will not be the ones with the most tools. They will be the ones that treated team design as an engineering problem in its own right — same judgment, same iteration, same willingness to revise the design when the measurements come back wrong, applied to the team rather than only to the code.

My career started in a world where the abstraction was the language. Today the abstraction is the team itself. The job did not shrink. It moved.

References

Brooks, F. P. (1986). No Silver Bullet: Essence and Accidents of Software Engineering (Techreport No. TR86-020). University of North Carolina at Chapel Hill. https://www.cs.unc.edu/techreports/86-020.pdf

DORA, & Google Cloud. (2024). Accelerate State of DevOps Report 2024. https://dora.dev/research/2024/dora-report/

GitClear. (2025). AI Copilot Code Quality: 2025 Data Suggests 4x Growth in Code Clones. https://www.gitclear.com/ai_assistant_code_quality_2025_research

McKinsey & Company. (2025). Leading AI-driven software organizations: Unlocking the value of AI in software development. https://www.mckinsey.com/industries/technology-media-and-telecommunications/our-insights/unlocking-the-value-of-ai-in-software-development

METR. (2025). Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity. https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/

Stack Overflow. (2024). 2024 Stack Overflow Developer Survey: AI. https://survey.stackoverflow.co/2024/ai