"How Are You Going to Prove That?" — The Client Question That Forces Better AI Experiments

The Question That Reframes Everything

Is architecture going to be replaced by AI? Not the profession— but the architects and firms whose AI use cannot survive a client's "how are you going to prove that?" question are already being displaced, regardless of how many tools they have on their license.

Picture the moment. A project review on a Tuesday afternoon. An AI-influenced deliverable on the table— a programming study, a code-research summary, a generative site option. The client taps the page and asks the only question that actually matters: how are you going to prove that? Right there, in that pause, the architect either has an answer or doesn't.

That question is the forcing function this article is about. The literal keyword has a literal answer— the profession survives. The operational answer is harder, and it is the one your partners and clients will judge you on: can your firm run a disciplined AI experiment that produces a defensible outcome? AI is not replacing architecture. It is replacing the architects and firms whose AI use cannot withstand client or regulatory scrutiny.

The rest of this piece is the framework for being on the right side of that question.

The 95% That Nobody in AEC Is Talking About

Across industries, 95% of generative AI pilots produce no measurable business return— and architecture is not exempt. Only 6% of U.S. architects use AI regularly², only 8% of firms have implemented it², and the AIA data shows no widespread audit-trail practice across those deployments. Adoption is not the same as outcome.

The cross-industry data point comes from MIT's NANDA initiative, an academic research effort tracking enterprise AI outcomes. Their 2025 GenAI Divide report¹ found that "despite $30-40 billion in enterprise investment, 95% of generative AI projects yield no measurable business return." The research draws on 150 leader interviews, a 350-employee survey, and analysis of 300 public AI deployments. This is not a hot take. It is the most thoroughly sourced statement of the failure curve currently available.

The American Institute of Architects² provides the AEC-side mirror. Six percent of U.S. architects report regular use; another 53% are experimenting. Eight percent of firms have implemented AI; another 20% are working on it. The activity skews heavily toward firms with 50-plus employees— meaning the smallest practices are barely in the data at all.

Measure	United States (AIA)	United Kingdom (RIBA)
Practice-level adoption	8% implemented	59% using AI in some capacity (up from 41% in 2024)
Individual regular use	6% of architects	Not separately reported
Practices with a formal AI policy	Not reported	15%
Practices that have invested in AI R&D	Not reported	Fewer than 1 in 5

Both figures are correct. Both measure different things. Neither tells you whether the work is defensible. RIBA³ tracks practice-level any-use; AIA tracks individual regular-use. An honest read pairs them rather than picking one.

Then there is the buyer-behavior dimension that does not appear in the surveys. Fielding Jezreel, a federal grant writing consultant with a decade of practice, bought and requested refunds for "numerous AI tools that claimed to do things that they absolutely could not do." He concluded— in October 2024— that "the MVP wasn't there." Different industry, identical pattern: an experienced professional applying his own "prove it" question to AI tool vendors and finding most of them could not answer it. That is the operational version of adoption-versus-outcome.

McKinsey's 2025 State of AI⁴ survey closes the loop. Only 39% of organizations report any level of enterprise-wide EBIT impact from AI. Roughly one-third of organizations are scaling AI across the enterprise; two-thirds remain in what McKinsey calls "pilot purgatory"— running experiments that never graduate to production. If this were any other professional service, that failure rate would be a margin problem.

For architects, it is also a liability problem.

AIA Code 4.102 and the 1–27% Hallucination Problem

Architects bear personal professional responsibility for any work they sign or seal— AIA Code of Conduct Rule 4.102 prohibits stamping deliverables "for which they do not have responsible control," which means every AI-generated claim that ends up on a stamped drawing must be independently verified.

The exact text matters. From the AIA Code of Conduct⁵:

"Members shall not sign or seal drawings, specifications, reports, or other professional work for which they do not have responsible control."

Rule 1.101⁵ reinforces it: "Members shall demonstrate a consistent pattern of reasonable care and competence." Read those two rules together and the implication is operational. "Responsible control" cannot be delegated to a tool that hallucinates.

How often does it hallucinate? Studies estimate AI hallucination rates from 1–3% on the low end to 27% on the high end, depending on the task and study methodology⁵. The range is wide because tasks and models vary. The implication does not change with the percentage. If even three percent of the AI-generated content in a stamped deliverable is fabricated and unverified, the responsible-control standard becomes very difficult to defend— independent of any liability insurance carrier's position on AI use.

Verification at this scale is not a sentence on a project page that says "AI was used." It is checking every code citation the AI surfaces against the published code text, not against the AI's summary. It is logging every AI output and every human review before the deliverable goes out. It is building the kind of AI governance strategy that survives outside scrutiny.

Architects themselves recognize the stakes. Ninety-four percent of AIA survey respondents² expressed concern about AI inaccuracy— the highest concern category in the entire study. RIBA's parallel risk guidance⁶ for UK practices runs the same direction: AI use creates specific liability and risk-management obligations for the practice, not just the practitioner.

The professional bodies are aligned. NCARB, the U.S. architectural licensing authority, has formally positioned⁷ AI as an enhancer for the architect, not a replacement. That language is not marketing. It is licensing-body framing that any state board will quote back to you in a complaint review.

Hallucination is not a bug to manage. For a professional with stamping authority, it is a liability category. And verification at scale doesn't happen by accident. It happens because the firm built it before the deliverable went out.

The Five-Part Discipline That Separates the 5% from the 95%

A disciplined AI experiment in an architecture practice has five parts: a single narrow use case, three to five pre-defined KPIs, a 1–2 week test window, a documented go/no-go criterion, and an audit trail of every AI output and human review. Pilots without these parts are theater.

Here is the structure, in practice:

A single, narrow use case. One high-impact, low-complexity task— automated meeting-minute capture, spec-section drafting from a model, a code-research first-pass. Pick one. Run it.
Three to five pre-defined KPIs. Both technical staff and business stakeholders⁸ have to be able to read the same scorecard. Three is enough. Five is plenty.
A 1–2 week test window. Long enough to produce signal⁸. Short enough to kill cleanly if the answer is no.
A documented go/no-go criterion. Written before the pilot starts. Not negotiated after the team gets attached to the tool.
An audit trail. Every AI output and every human review logged before the deliverable goes out. This is the same documentation Section 3's 4.102 requirement is about.

The single biggest variable here is KPI selection. Vague KPIs are why pilots produce no measurable return. Concrete ones are why some pilots survive scrutiny.

Use Case	Bad KPI (vague)	Defensible KPI (measurable)
Automated meeting-minute capture	"Saves time for project managers"	Hours of PM time per project recovered vs. baseline; minutes-to-distribution after meeting; error rate on action items vs. human-captured baseline
Code-research first-pass	"Speeds up code research"	% of code citations correctly attributed; % requiring correction by reviewer; reviewer hours per project vs. baseline
Spec-section drafting	"Better drafts"	Time-to-first-draft; revision count to final; words of fabricated content per spec section (target: zero)

The pattern is consistent: anything you would actually report to a client or a managing partner. This is the operational meaning of measuring AI success— a number a peer can argue with, not a feeling.

Then there is the build-versus-buy question. MIT NANDA¹ found that purchasing AI tools from specialized vendors and building partnerships succeeds about 67% of the time, while internal builds succeed at one-third that rate. The pattern isn't about the tool— it's about whether somebody outside your firm has already debugged the workflow. For most AEC firms, that finding alone should reframe the AI decision framework for founders conversation.

Harvard Business Review's Furr and Shipilov⁹ make the same point from a different angle: "The real problem isn't trying new things— it's unfocused efforts that fail to connect to real business opportunities." The discipline that separates the 5% of AI pilots that work from the 95% that fail is not better AI. It is measurable, falsifiable experiments tied to a real client outcome— and one of the hidden costs of AI projects is what happens when a firm skips this step and lets the audit trail rebuild itself under deadline pressure.

This discipline isn't optional much longer. Clients are about to start asking for it.

Why "Prove It" Is Coming for Every AEC Firm

Most architecture clients are not yet asking "how are you going to prove that?" about AI. They will. AIA Code 4.102, the broader regulatory direction, and insurance underwriting trends all point at the same accountability layer, and the firms that already have the answer are the ones who will keep the work.

Three pressures are converging on the same point:

Client pressure. The "prove it" question is already routine in design-build engagements about traditional deliverables. Extending it to AI-influenced work is a small step, not a large one, and the more sophisticated the client, the sooner it happens.
Regulatory direction. AIA Code 4.102 is functionally an attestation requirement applied through professional responsibility. Emerging state-level proposals on AI attestation are moving in the same direction. RIBA's professional risk guidance⁶ is the UK analog. The text is consistent across jurisdictions.
Insurance underwriting. In our experience working with AEC firms, professional indemnity conversations are starting to include AI audit-trail documentation as part of risk assessment. A firm with documented AI experiments and human-review logs is having a different underwriting conversation than a firm without one.

The exposure direction is real. Anthropic research¹⁰ identified architects and engineers among the professions most exposed to AI automation. Both things are true. Exposure acts task-by-task— drafting, visualization, code research, specification— not profession-wide. Stamped accountability, client interpretation, and judgment under uncertainty don't transfer to a system that fabricates between 1% and 27% of the time.

The forcing function logic is straightforward. Build the audit trail and the discipline before clients ask for it, and you have a competitive advantage at exactly the moment when the market starts to differentiate. Wait until clients ask, and you're reverse-engineering documentation under deadline pressure with your professional liability on the line.

Client pressure is not a future risk. It is the only forcing function strong enough to convert AI adoption into AI outcomes— and it is closer than most AEC leaders think.

These questions come up in almost every AEC client conversation. Here are the clean answers.

Frequently Asked Questions: AI and the Future of Architecture

Will AI replace architects?

Is architecture going to be replaced by AI? Not the profession— professional responsibility, client interpretation, and stamped accountability cannot be transferred to a system that hallucinates. But specific tasks like drafting, visualization, and code research are being automated, and architects whose AI use cannot withstand client scrutiny will be displaced before the profession as a whole is.²⁵⁷

What percentage of AI pilots fail?

95% of generative AI pilots produce no measurable business return, according to MIT NANDA's 2025 GenAI Divide report. The research is based on 150 leader interviews, a 350-employee survey, and analysis of 300 public AI deployments.¹

How many architects actually use AI?

In the United States, 6% of architects use AI regularly and 8% of firms have implemented it, mostly larger firms with 50 or more employees. In the United Kingdom, 59% of practices use AI in some capacity— up from 41% in 2024— but RIBA measures practice-level any-use, not individual regular-use.²³¹¹

What are the liability risks of AI for architects?

AIA Code of Conduct Rule 4.102 requires architects to maintain "responsible control" over any work they sign or seal, including AI-influenced deliverables. AI hallucination rates range from 1–3% in some studies to 27% in others, which means independent verification is not optional for stamped work.⁵

What separates successful AI pilots from failed ones?

Disciplined measurement: a single narrow use case, 3–5 KPIs defined before launch, a 1–2 week test window, clear go/no-go criteria, and an audit trail of human reviews. Specialized vendor partnerships succeed about 67% of the time; internal AI builds succeed at one-third that rate.¹⁸⁹

Which means the question for an AEC firm leader is no longer whether to use AI. It's whether the use is defensible.

What Separates the Displaced from the Durable

AI is not going to replace architecture. It is going to replace the architects and firms whose AI use cannot survive a client's "how are you going to prove that?" question. Is architecture going to be replaced by AI? No. Are specific architects and firms going to be displaced by it? Yes— and the deciding variable is operational discipline, not tool selection.

The work in front of an AEC firm leader right now is simple to describe and hard to do: build the audit trail, the KPI structure, and the experimentation discipline before clients require it. The 67% versus one-third finding¹ on specialized vendors versus internal builds is the empirical case for partnering on the build rather than hiring three more people to figure it out internally.

If you're sizing up which AI experiments your firm should run first— and how to structure them so they survive client and regulatory scrutiny— that's exactly the kind of conversation a fractional implementation partner can help structure. Our AI strategy services for founder-led firms are built around this work with AEC practices.

What separates the displaced from the durable is not whether you use AI. It is whether your use survives scrutiny.

⚠️ EVERYTHING BELOW IS PIPELINE METADATA — NOT PUBLISHED

References

MIT NANDA Initiative, "The GenAI Divide: State of AI in Business 2025" (August 2025) — https://www.aigl.blog/state-of-ai-in-business-2025/
American Institute of Architects, "Architects and AI: Practical Guidance for a Changing Profession" (2025) — https://www.aia.org/aia-architect/article/architects-and-ai-practical-guidance-changing-profession
Royal Institute of British Architects, "RIBA AI Report 2025" (2025) — https://www.riba.org/work/insights-and-resources/ai-report/riba-ai-report-2025/
McKinsey & Company, "The State of AI in 2025" (2025) — https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai
Fabyanske, Westra, Hart & Thomson, "Legal Risks of the Use of AI in the Design-Build Process" (May 2025) — https://www.fwhtlaw.com/blog/2025/05/16/legal-risks-of-the-use-of-ai-in-the-design-build-process/
Royal Institute of British Architects, "What are the risks to architects and practices who use AI" (2025) — https://www.riba.org/work/insights-and-resources/professional-features/ai-professional-features/what-are-the-risks-to-architects-and-practices-who-use-ai/
National Council of Architectural Registration Boards, "AI as an Enhancer for the Architect, Not a Replacement" (2025) — https://www.ncarb.org/blog/ai-enhancer-the-architect-not-a-replacement
aec+tech, "Practical AI in AEC: How to Start, What to Measure, and What to Avoid" (2025) — https://www.aecplustech.com/blog/practical-ai-in-aec-how-to-start-what-to-measure-and-what-to-avoid
Furr, Nathan, and Andrew Shipilov, "Beware the AI Experimentation Trap," Harvard Business Review (August 2025) — https://hbr.org/2025/08/beware-the-ai-experimentation-trap
Dezeen (reporting Anthropic Economic Index research), "Architects and engineers among professions most automatable by AI according to Anthropic" (March 2026) — https://www.dezeen.com/2026/03/11/architects-highly-expose-ai-anthropic-research/
Dezeen (reporting AIA Journey to Specification data), "Only six per cent of architects regularly using AI says AIA study" (March 2025) — https://www.dezeen.com/2025/03/14/ai-architecture-study-american-architect/

Dan Cumberland

Dan Cumberland has spent his career at the intersection of technology and human behavior. With an MA in psychology, a background in software development, and six companies built (two exits), he was building AI systems years before ChatGPT made them mainstream. Through Dan Cumberland Labs, he helps engineering firms, construction companies, and professional services leaders implement AI that makes their teams more effective—not less necessary. Through his newsletter and other writings, he is read by millions, including leaders at firms like Google, Microsoft, and Amazon.

AI Strategy