The Dark-Data Reality (Why Your Past Is Already the Future)
Approximately 96% of construction project data is collected but never used again1, compared to roughly 52% across other industries. The cost of that gap, by a 2021 Autodesk and FMI survey of 3,900-plus professionals covering 2020 industry data2, was $1.85 trillion globally in 2020— including $88.69 billion in rework alone, accounting for 14% of all rework performed that year.
Dark data is the operating term for content a firm collects but never queries again. Drawings sit in a project folder. Lessons learned live in a closeout document no one reopens. Proposal responses get filed by date. Specifications are stored, then forgotten. The data exists. It just isn't findable.
The Cost of Dark Data in Construction
| Metric | Value | Source | Date |
|---|---|---|---|
| Global bad-data cost | $1.85 trillion | Autodesk + FMI | 2020 data, 2021 publication |
| Rework cost from bad data | $88.69 billion | Autodesk + FMI | 2020 data, 2021 publication |
| Share of rework attributable to bad data | 14% | Autodesk + FMI | 2020 data |
| Firms saying >50% of project data is "bad" | 30% | Autodesk + FMI | 2020 data |
| Dark data rate in construction | ~96% | Trillium (synthesis of Veritas + Autodesk Construction Insights) | ~2020 era |
| Dark data rate in other industries | ~52% | Trillium (same synthesis) | ~2020 era |
| Hours/week professionals spend searching for data | 13% | Trillium | ~2020 era |
| Hours/week architects waste searching detail libraries | 5–10 | Pirros (vendor) | 2026 |
| Detail conditions already designed but not findable | 90% | Pirros (vendor) | 2026 |
The numbers are not new. That's part of the argument. Construction professionals spend roughly 13% of working hours searching for data they already have1. Architecture and engineering firms waste 5–10 hours per week per professional searching through fragmented detail libraries, and 90% of detailing conditions have already been designed, according to Pirros9. These are not 2026 figures. They are 2020-era benchmarks that the industry has chosen, collectively, not to fix.
A few caveats matter. The $1.85 trillion figure is a survey synthesis of 2020 industry conditions, not a 2026 reading. The 96% dark-data figure traces through Trillium's synthesis of Veritas dark-data analysis and Autodesk Construction Insights— directional, not a fresh primary statistic. The Pirros numbers come from a vendor; they are useful as illustrative scale, not as an independent benchmark.
None of those caveats make the gap smaller. They make it more striking. Five years on, the industry has the same dark-data problem, the same time-loss problem, and the same retrieval problem. The future of construction is not whether AI shows up. It is whether the firm can finally use the project archive it already paid to create.
If that's the size of the gap, the obvious next question is: what does trade press actually mean when it talks about the future of construction? Worth being honest about it before arguing why most of it comes second.
What the Trade Press Calls "The Future" (And Why It's Not Wrong)
The 2026 construction technology stack, per Deloitte's November 2025 outlook4, is agentic AI systems for scheduling and workflow coordination, computer vision for real-time site safety analytics, digital twins integrated with BIM, IoT devices with 5G connectivity, and autonomous equipment and robotics. Each is real. Each is being adopted. None of it is the first move for most firms reading this article.
Only 27% of AEC professionals currently use AI in their operations, though 94% of those plan to expand usage in 2026, per Autodesk's 2026 expert survey5. 38% of contractors now report measurable business impact from AI, up from 17% a year ago6— meaningful acceleration, but from a low base.
Deloitte's 2026 Construction Technology Stack
| Technology | What It Does | Maturity |
|---|---|---|
| Agentic AI systems | Autonomously manages scheduling and workflow coordination | Early production, enterprise-led |
| Computer vision for site safety | Real-time hazard identification and incident analysis | Production for top-decile firms |
| Digital twins with BIM | Operating model of the physical asset; timeline reductions cited up to 20% | Production but data-quality-bound |
| IoT with 5G | Asset tracking, predictive maintenance | Production, expanding |
| Autonomous equipment + robotics | Addresses labor shortages, improves safety | Field-trial early for general use |
The stack is not vaporware. Computer vision for jobsite safety has measurable business impact at firms running it on production sites. Digital twins with BIM integration deliver real timeline reductions for owners who have the underlying model discipline. IoT-and-5G asset tracking is real, in production, and saving money. Agentic AI scheduling has moved from concept to early production at firms with the integration capacity to deploy it.
The "AEC industry lags other sectors in AI adoption" finding is well-established. ASCE's December 2025 survey confirms it7, and McKinsey has flagged construction as near-bottom on digitization for years. Data quality, security concerns, and ROI uncertainty are the cited causes. None of those causes are solved by buying a tool.
McKinsey estimates AI can boost construction productivity by up to 20% when applied across planning, risk, safety, design optimization, and predictive maintenance8. That figure comes from McKinsey's widely-cited 2018 report and is the foundational number every downstream "AI in construction" article echoes. It is real. It is also seven years old. It compounds best in firms that redesign workflow alongside tool deployment, not in firms that swap a manual process for an AI-shaped version of the same manual process.
The data-center construction boom is the visible adoption story right now. AI-related data centers are the standout growth segment for 2026 per Deloitte4, and the firms building them are the firms most likely to deploy agentic scheduling and computer-vision safety in production. That's a real adoption signal. It's also concentrated at the top of the market, not at $20M–$100M firms.
If the stack is real and the adoption curve is steepening, why isn't this the first move for most firms reading this article? Three reasons. Each rooted in data we already cited.
Why the Futuristic Stack Isn't Move #1 for Your Firm
Three reasons the futuristic stack isn't the first AI move for a $20M–$100M AEC firm. First, the data substrate underneath it doesn't exist yet. Second, McKinsey's 20% productivity gain is contingent on workflow redesign, not tool purchase. Third, the people who hold the firm's institutional knowledge are retiring faster than tooling can compensate.
Reason 1: The substrate gap. Computer-vision safety models perform better when paired with the firm's historical incident data. Agentic scheduling models work better grounded in the firm's actual past project durations. Digital twins compound on structured BIM, not on a messy archive. Deloitte's 2026 outlook4 flags data quality as the persistent bottleneck undermining AI ROI— meaning the stack is being bought, but it isn't paying back, and the diagnosis points squarely at the substrate.
Reason 2: The McKinsey caveat. McKinsey's 20% productivity figure is from its widely-cited 2018 report on AI in construction technology8. The number is contingent on workflow redesign, not tool purchase alone. Buying agentic scheduling without re-architecting how scheduling decisions move through the firm doesn't unlock the gain— it adds a layer of tooling on top of a process that wasn't designed for it. This is one of the hidden costs of AI projects that gets written off as "low adoption" when the real issue was sequence.
41% of construction workers are expected to retire by 2031. The undocumented knowledge in their heads is leaving on a fixed schedule.
Reason 3: The labor clock. Deloitte projects 499,000 new construction workers needed for 2026, up from 439,000 in 20254. 41% of the construction workforce is expected to retire by 2031, with a potential $124 billion loss in construction output from unfilled positions4. The senior partners and project leads who hold institutional knowledge— the rules of thumb about how a particular building type behaves, the lessons from a job that went sideways in 2003, the reason a detail was drawn the way it was— are walking out the door. A back-catalog AI move preserves that knowledge before they leave. A futuristic-stack move doesn't touch the problem.
None of this means skip the futuristic stack. It means sequence it. The substrate compounds. The stack works better on top of it. Both are true.
If those are the reasons, what's the actual move? Concretely. The next section is the sequence— what a principal can authorize this quarter.
The Back-Catalog Move (Audit → Tag → Query → Layer)
The first AI move for a $20M–$100M AEC firm is a four-step sequence: audit the project archive, tag and structure it using AI-driven indexing, layer a retrieval interface on top, and only then evaluate the futuristic stack. Each step is something the firm can authorize, scope, and measure independently.
The Back-Catalog Move — Four Steps
| Step | What It Is | Authorize | Outcome |
|---|---|---|---|
| 1. Audit | Inventory the project archive: what exists, where it lives, what's findable | A half-day to two days, no tool purchase | Baseline of current findability |
| 2. Tag & Structure | AI-driven indexing of details, documents, project archives | A scoped pilot against one content category | Structured, queryable corpus |
| 3. Layer Retrieval (RAG) | Search interface over the firm's own content; ask a question, get a sourced answer | A pilot RAG deployment scoped to one category | Hours per query reclaimed; surfaced past work |
| 4. Evaluate the Stack | Computer vision safety, agentic scheduling, digital twins— now on top of a real substrate | A quarter-by-quarter decision matrix | Tooling decisions grounded in structured firm data |
Step 1: Audit the Archive
The audit is a planning exercise, not a tool purchase. The questions are basic. What's in the archive? Where is it stored— network drive, Egnyte, SharePoint, BIM 360, project-management tool, partner laptops? What categories matter most: detail libraries, proposal and RFP responses, lessons learned, project archives, specifications? And what's the firm's findability baseline today, measured in hours per typical search?
This is a half-day to two-day exercise. It produces a one-page inventory and a candid assessment of how much is currently retrievable. In our work with $20M–$100M firms, the archive is almost always bigger, messier, and more fragmented than the firm expected. That's not a problem. It's the diagnosis the rest of the sequence is built on.
Step 2: Tag and Structure
AI-driven indexing reduces the cleanup burden dramatically vs. the manual-tagging era. The math has changed. Detail libraries can now be indexed with computer-vision recognition of drawing content. Documents can be indexed semantically, by meaning rather than by filename. Project archives can be tagged by phase, building type, client, and outcome with far less human curation than was required even three years ago.
Vendor categories illustrative of the space:
- Detail-management platforms: Pirros, PiAxis, AVAIL, Autodesk Content Catalog
- Proposal automation: Joist AI, AutoRFP.ai, Workorb
- Data layer for documents: Egnyte and similar
These are categories, not endorsements. Each platform has tradeoffs— what content it indexes, how it handles security, whether it integrates with Revit, what the per-seat economics look like. The point of naming them is to anchor the sequence in tools that exist today, not to argue any one of them is the right fit for a given firm.
Step 3: Layer Retrieval (RAG)
Retrieval-augmented generation (RAG) is the technical category that lets a firm query its own historical projects. The firm-facing experience is a search box. Ask a question. Get a sourced answer pulled from the firm's own content. The technology category matters less than what gets indexed.
What this looks like in practice:
- "Find every detail we've drawn for a curtain wall at a high-rise residential project."
- "Summarize the lessons learned from our last three healthcare projects."
- "Draft a project-experience narrative for this RFP from our prior wins in this building type."
This dynamic isn't construction-specific. Fielding Jezreel, a federal grant-writing consultant, had a realization that surprised him when he started building AI tools for his domain: the prompt matters less than the context underneath it. A modest prompt over rich context will produce useful output; a clever prompt over messy context won't. The principle applies whether the context is grant-program criteria or 30 years of architectural details.
This is also where measuring AI success matters. Hours saved per query. Accuracy spot-checks on retrieved details. Time from RFP question to first-draft answer. Pilot the layer. Measure it against a baseline. Move only when the metrics are real.
One honest caveat. RAG is for first-pass retrieval and summarization— proposals, lessons learned, detail discovery, RFP narrative drafting. It is not for sign-off-grade engineering output. Specifications still need licensed-professional review. Models still hallucinate, particularly when asked to extrapolate beyond what's in the indexed corpus. Name the limit. Build the workflow around it.
Step 4: Then— and Only Then— Evaluate the Futuristic Stack
Now the firm has structured data underneath. Computer-vision safety models work better when paired with the firm's own historical incident data. Agentic scheduling works better when grounded in actual past project durations. Digital twins compound on structured BIM. Every element of the stack performs better on a substrate the firm controls.
The substrate becomes the firm's source of truth for past work. Every retrieval, every summary, every project-experience narrative the firm produces from this point compounds the substrate's value. As AEC Magazine puts it: every architecture practice, structural consultancy, and MEP engineer possesses decades of intellectual property in standard details, design rules, and lessons learned— and that accumulated knowledge is their competitive advantage3. The back catalog is the moat. The stack is what runs on it.
Sequencing matters, but it isn't a refusal of the rest of the stack. The futuristic technologies compound when the substrate is there— which is exactly what the both/and reading of the future of construction sounds like.
Both/And: The Futuristic Stack Works Better on This Substrate
The futuristic stack— computer vision safety, agentic AI scheduling, digital twins, autonomous equipment— is not in competition with the back-catalog move. It's downstream of it. Structured firm data is the substrate that makes the rest compound.
Stack element by stack element:
- Computer vision safety: Pairs with the firm's historical incident data to flag patterns specific to the firm's work, not just generic hazards.
- Agentic scheduling: Grounds projections in the firm's actual past project durations, not industry averages that misrepresent the firm's portfolio.
- Digital twins: Compound on structured BIM, not a messy archive. The twin is only as good as the substrate.
The "AI replaces architects and engineers" framing is the wrong question. The dominant use cases amplify senior expertise— faster proposals, faster detail retrieval, better lessons-learned access, sharper go/no-go decisions on RFPs. Senior judgment becomes scalable. AI amplifies human capability. It doesn't replace the architect. It makes the architect's last 20 years of judgment accessible to the next junior on the team.
That framing matters most when the labor-shortage clock is ticking. The answer to 41% retirement by 2031 isn't replacing a senior partner with a robot. It's encoding the partner's 30-year intuition into queryable form before they leave. That's a building AI culture move first, and a tooling move second.
Both are true. All of it matters. Which leaves the practical question every reader actually has— what do I do this quarter?
What to Do This Quarter (Five Moves a Principal Can Authorize)
Five concrete moves any $20M–$100M AEC firm leader can authorize this quarter. None require a major capital outlay. All of them compound. Stop at the back-catalog work this quarter— don't try to solve everything in one cycle.
- Audit your archive (Week 1). Who owns it. Where it lives. What categories matter. Whether anything is findable today. A half-day with the project leads and the IT lead is enough to produce the inventory.
- Pick one back-catalog category to start (Week 2). Detail libraries, past proposals, or lessons learned. One category, not all three. Measure the baseline— how long does it currently take to find what the team needs? That baseline is the metric every subsequent step measures against.
- Pilot one AI-indexing tool against that category (Weeks 3–8). A detail-management platform for details. A proposal-automation tool for proposals. A RAG layer over Egnyte or SharePoint for documents. Time-bounded pilot. Real metrics— hours saved per query, accuracy spot-checks, user adoption among the people actually doing the work.
- Encode one retiring expert (Quarter 2). Interview the senior person closest to retirement. Capture the rules of thumb that don't live in any drawing. Add them as structured content the RAG layer can retrieve. This is the institutional-knowledge move that no off-the-shelf tool replaces.
- Re-evaluate the futuristic stack (Quarter 3+). With the substrate in place, computer-vision safety, agentic scheduling, and digital-twin integration become meaningful decisions to scope. Not before.
If sequencing these moves and picking the right pilot feels like the part that's actually hard, that's exactly the work that AI implementation services are built for. Not as a sales pitch. As the natural next step for a firm leader who knows what they want to do but doesn't want to burn a quarter learning what every other mid-size firm has already learned.
The Future of Construction Is Whether Your Past Survives Into It
The future of construction is not whether your firm adopts AI. Every firm will. The future is whether your firm's past survives into it— whether the 30 years of detail libraries, proposal wins, and lessons learned that built the firm are still queryable five years from now.
The retiring partner walks out with three decades of judgment. Some of it is in drawings. Most of it isn't. The encoded version, the version the next generation of associates can actually use, is the version that lives inside a structured, queryable archive. Without that, the firm starts every project incrementally closer to a blank page than the last one. With it, every senior judgment compounds forward.
AI amplifies human capability. The senior partner's expertise is the firm's moat. The back-catalog move is how that moat compounds. The futuristic stack is coming. The substrate decides whether it works for this firm.
Both are true. All of it matters. The firm that wins the next decade isn't the one with the best tools. It's the one whose past is still alive in the system.
FAQ
What is the future of construction for mid-size AEC firms?
The future of construction for most $20M–$100M AEC firms is making the firm's existing project archive queryable, not adopting humanoid robots or autonomous equipment. Roughly 96% of construction project data is collected but never used again1, and that dark data is the highest-leverage AI asset most firms can act on. The futuristic stack— computer vision safety, agentic AI scheduling, digital twins— is real and worth adopting, but it compounds on top of structured firm data the firm doesn't yet have.
How much does bad data cost the construction industry?
Bad data may have cost the global construction industry $1.85 trillion in 2020, including $88.69 billion in rework alone, according to a 2021 Autodesk and FMI survey of more than 3,900 industry professionals2. The figure is five years old and remains the most-cited industry number on the topic. Construction professionals also spend roughly 13% of working hours searching for data they already have1.
What is the first AI move an AEC firm should make?
The first AI move for a $20M–$100M AEC firm is making the existing project archive queryable through a four-step sequence: audit the archive, tag and structure its contents with AI-driven indexing, layer a retrieval interface (RAG) on top, and only then evaluate the futuristic stack. Each step is independently scopeable and measurable. Architecture and engineering firms typically waste 5–10 hours per week per professional searching fragmented detail libraries, and 90% of detailing conditions have already been designed9— a clear baseline against which a pilot pays back.
Will AI replace architects and engineers?
No. The dominant AI use cases in AEC amplify senior expertise rather than replace it— faster proposals, faster detail retrieval, better lessons-learned access, sharper go/no-go decisions on RFPs. With 41% of construction workers expected to retire by 20314, the load-bearing question is how to encode senior judgment before it walks out the door, not how to replace the people who hold it.
What is retrieval-augmented generation (RAG) in construction?
Retrieval-augmented generation is the technical category that lets an AEC firm query its own historical projects. In practice, it's a search interface over the firm's indexed content— ask a question, get an answer sourced from the firm's own drawings, specs, proposals, and lessons learned. RAG is well-suited to first-pass retrieval and summarization (proposals, lessons learned, detail discovery) but not to sign-off-grade engineering output, which still requires licensed-professional review.
References
- Trillium Group, "Construction's Dark Data Problem" (Dec 2021, updated Mar 2023; synthesizing Veritas and Autodesk Construction Insights data) — https://www.trilliumgroup.io/post/constructions-dark-data-problem
- Autodesk + FMI Corporation, "Harnessing the Data Advantage in Construction" (press release, 2021; covering 2020 industry data; survey of 3,900+ professionals) — https://adsknews.autodesk.com/en/pressrelease/study-from-autodesk-and-fmi-finds-better-data-strategies-could-save-the-global-construction-industry-1-85-trillion/
- AEC Magazine, "From Information to Intelligence" (2026) — https://aecmag.com/ai/from-information-to-intelligence
- Deloitte Insights, "2026 Engineering and Construction Industry Outlook" (November 13, 2025) — https://www.deloitte.com/us/en/insights/industry/engineering-and-construction/engineering-and-construction-industry-outlook.html
- Autodesk Digital Builder, "2026 AI Construction Trends: 25+ Experts Share Insights" (2026) — https://www.autodesk.com/blogs/construction/2026-ai-trends-25-experts-share-insights/
- ForConstructionPros, "Industry Report Finds AI Adoption Accelerating Across Commercial Construction" (ServiceTitan report, 2026) — https://www.forconstructionpros.com/construction-technology/project-management/article/22963634/servicetitan-industry-report-finds-ai-adoption-accelerating-across-commercial-construction
- American Society of Civil Engineers, Civil Engineering Source, "Architecture, engineering, construction sector slow to adopt AI, survey shows" (December 18, 2025) — https://www.asce.org/publications-and-news/civil-engineering-source/article/2025/12/18/architecture-engineering-construction-sector-slow-to-adapt-ai-survey-shows
- McKinsey & Company, "Artificial Intelligence: Construction Technology's Next Frontier" (originally 2018; continuously cited) — https://www.mckinsey.com/capabilities/operations/our-insights/artificial-intelligence-construction-technologys-next-frontier
- Pirros, "Detail Management for AEC Teams" (vendor product site, accessed 2026) — https://www.pirros.com/