The types of data warehouse architecture, explained simply
The three traditional types of data warehouse architecture are single-tier, two-tier, and three-tier, distinguished by how many layers separate your data sources from the dashboards people actually look at.1 Modern variants— cloud-native warehouses, data lakehouses, and hub-and-spoke designs— are deployment patterns layered on top of that same idea. A "tier" here just means a layer in the stack. Get that, and the whole taxonomy stops being intimidating.
Single-tier architecture collapses storage and analytics into one layer. It's fine for small, isolated datasets and rarely shows up in an enterprise.3
Two-tier architecture lets the warehouse connect more directly to your BI tools— the software that turns data into charts and dashboards. It's faster. But Snowflake notes that direct connections between data sources and end-user tools can hinder scalability and performance as volume grows.1 Hold onto that idea. Two-tier is the conceptual ancestor of the direct-connect path we'll get to.
Three-tier architecture is the most common enterprise setup, and it's the one people usually mean when they say "data warehouse." IBM breaks it into three layers: the bottom tier stores the data, the middle tier— an OLAP server, the engine that runs queries and transformations— processes it, and the top tier delivers the BI dashboards and reports.2
| Tier | What it is | What it does | The AEC translation |
|---|---|---|---|
| Bottom | The database / storage layer | Holds the raw data | Your project records, timesheets, GL entries |
| Middle | The OLAP server | Runs queries, joins, and transformations | The "math"— margin by project, utilization by team |
| Top | The BI / dashboard layer | Delivers the reports people read | The screen your principals actually open |
Here's the part that matters for you: the top tier is the only piece you really want. And it can connect to data without the two tiers beneath it.
Ask "how many types of data warehouse architecture are there" and you'll hear both three and five. Both are right. They're answering different questions— three describes the structure (the tiers), five describes the deployment pattern. The academic taxonomy from Ariyachandra and Watson lists five: centralized, independent data mart, federated, hub-and-spoke, and data mart bus.4
The modern variants you'll see named are deployment choices on top of the tier model:
- Cloud-native warehouses (Snowflake, BigQuery, Redshift) that separate storage from compute so each scales on its own.
- Data lakehouses that blend warehouse structure with data-lake flexibility.
- Hub-and-spoke and federated designs that connect a central store to satellite data marts.
Every type of data warehouse architecture on that list is still some arrangement of those same three layers. The labels multiply; the underlying idea doesn't.
Two more pairs of terms get thrown around in the same breath. Clear them in sixty seconds, then we get to the real question.
Warehouse vs. lake vs. lakehouse— and Inmon vs. Kimball— in 60 seconds
A data warehouse stores cleaned, structured data and decides the structure before storing it (schema-on-write). A data lake stores raw data and applies structure when you read it (schema-on-read). A lakehouse combines the two.5 Put plainly: a warehouse decides the structure before it stores the data; a lake decides when you ask the question.
| Schema | Best for | |
|---|---|---|
| Warehouse | Schema-on-write (structure first) | Clean, repeatable reporting |
| Lake | Schema-on-read (structure later) | Raw, varied, exploratory data |
| Lakehouse | Hybrid | Teams that want both in one place |
Worth flagging: "lakehouse" was coined by Databricks, a vendor. Treat it as a useful category, not neutral consensus.
The two classic design philosophies show up just as often. Bill Inmon's top-down approach builds one normalized, enterprise-wide warehouse first— a single source of truth— then feeds department reports from it. Ralph Kimball's bottom-up approach builds dimensional data marts for specific needs first, then combines them.6 Top-down for control; bottom-up for speed.
And the last pair: ETL transforms data before loading it; ELT loads raw data first and transforms it inside the warehouse. ELT became the default with cloud-native warehouses that scale compute and storage independently.7 For a one-firm benchmark, none of this is load-bearing. It matters to big-data teams, not to you.
Do you actually need a data warehouse to benchmark?
For an internal benchmarking dashboard, you usually do not need a data warehouse. A BI tool can connect directly to the systems and spreadsheets you already have.8 A warehouse only starts to pay off once your data is scattered across many systems and your questions span departments.
That boundary is worth stating plainly, because it's where the honest answer lives. Panoply— a company that sells data warehouses— says you need one when data lives in multiple source systems and questions span departments, the point where you're constantly exporting and stitching things together in spreadsheets.8 When the recommendation to skip the product comes from the people who sell it, it's worth trusting.
A warehouse earns its keep when data lives in multiple systems and questions cross departments— not when one firm wants to watch its own margins.
Then there's the cost. A warehouse build is a six-figure, multi-month engineering commitment. Your benchmarking question is one you want answered this quarter. Spending half a year and a big budget on it is chasing pennies while the dollars— the decisions those numbers would inform— sit and wait.
So here's the reframe that runs through the rest of this piece: the best data warehouse, for most firms starting out, is the one you don't build. The proportionate move is a right-sized AI implementation matched to the question in front of you, not the platform someone's selling.
| Your situation | The call |
|---|---|
| One firm, a few data sources (ERP + spreadsheets) | Skip it— connect a BI tool directly |
| Growing data, occasional cross-system questions | Consider a warehouse-lite (one managed database) |
| Many source systems, cross-department questions, governance needs | Build it— a warehouse now earns its keep |
So if not a warehouse— what? The lighter path is shorter than you think.
The lighter path— a BI tool connected to where your data already lives
The lighter path is a BI tool connected directly to your existing database, ERP, and spreadsheets— often for free. Looker Studio is free and reaches data through hundreds of prebuilt connectors, which skips the database step entirely. Metabase's basic version is free to self-host. Power BI connects natively to SQL Server, Azure, and Excel.9
You can stand up a real benchmarking dashboard with a free BI tool and the data you already have. No warehouse. No six-month project.
| Tool | Cost | Connects to | Best for |
|---|---|---|---|
| Looker Studio | Free | Hundreds of sources via connectors | Fast start, Google-stack shops |
| Metabase | Free (self-host) | Databases directly | Teams with a database already |
| Power BI | Low entry, paid tiers | SQL Server, Azure, Excel, many ERPs | Microsoft-stack firms |
These tools read straight from source databases, CSVs, and Sheets.8 For a single firm watching its own numbers, that's usually all the plumbing you need to build the dashboards that replace your monthly financial packet.
Now the caveat, because skipping the warehouse has trade-offs. Direct-connect dashboards can strain the production systems they query, and they don't come with built-in history or governance. Run a heavy report against your live ERP at month-end and you might slow it down for everyone.
The sane middle ground is what I'd call a warehouse-lite: a single managed database (PostgreSQL, say) or your BI tool's own extract and semantic layer. It gives you history and a stable place to query without committing to a full three-tier build. It's the step between spreadsheet sprawl and an enterprise warehouse— and for a lot of firms, it's the last step they ever need.
For AEC firms specifically, the "where your data already lives" answer is unusually concrete.
For AEC firms, your benchmarking data already lives in your ERP
For architecture and engineering firms, the benchmarking data already lives in the ERP. As of March 2025, Deltek Vantagepoint users can view and incorporate the Deltek Clarity A&E Study— ten years of North America industry benchmarks— directly inside their Vantagepoint dashboards. You filter by year, firm type, and size, then compare your own KPIs against industry standards.10 The tooling to benchmark your firm against the industry already ships inside the system your project data lives in.
It goes further. Power BI can hold a live connection to Vantagepoint (not just static exports), refreshing dashboards multiple times a day, with role-specific views for project managers, principals, and business development.11 No warehouse in between. And the same logic reaches the budgeting module already sitting unused in your Deltek and your closed-project cost data that rarely gets a second look.
A useful AEC benchmarking dashboard tracks the numbers your principals already argue about:
- Gross margin and project profitability
- Utilization rate
- Backlog
- AR aging
- Win rate
- Cost-per-square-foot
For context on where the bar sits: most general contractors run gross margins in the 15–25% range.12 And in CFMA's 2024 Construction Financial Benchmarker, top-quartile "Best in Class" firms hit a 21.8% gross profit margin.13 Treat that figure as a directional marker from one industry survey, not gospel— but it's a real number to aim at.
The principle generalizes past Deltek. If your firm runs on a different system, the move is the same: connect to where the data already lives. And the part a warehouse project used to justify— the joining, cleaning, and modeling— is increasingly something AI can do.
Where AI compresses the work (and where it doesn't)
AI now does much of the analysis a warehouse project was once needed to justify— joining, cleaning, and modeling your data for a first-pass benchmark. ChatGPT's Advanced Data Analysis lets you upload an Excel or CSV file, ask questions in plain language, and get charts and analysis back; it writes and runs the code for you in the background.14 Claude is strong at spreadsheet reasoning, and Microsoft Copilot fits governed Microsoft 365 shops.
What AI does well here is the first pass: analyzing your exports, drafting the data model, writing the queries you'd otherwise wait on. That's real, and it's fast.
Now the boundary, because this is where overclaiming gets firms in trouble. Ad-hoc AI on a file upload is not a governed, scheduled, multi-user pipeline. Never present AI output as audited financials without a human check on the numbers.
AI is the fastest way to get your first benchmark out of a spreadsheet— and the wrong tool to run that benchmark unattended every month.
The pairing that works: AI for the first and ad-hoc analysis, plus a lightweight store (the warehouse-lite from earlier) for anything recurring and shared.
I think about this as intellectual augmentation, not artificial intelligence. AI compresses the data work; you keep the judgment about what the numbers mean. No matter the question, people are the answer— the tool amplifies your finance lead, it doesn't make the infrastructure decision for you.
A few quick questions before we close.
FAQ
What are the three types of data warehouse architecture?
Single-tier, two-tier, and three-tier— defined by the number of layers between your data sources and the end-user tools.1 Three-tier (storage → OLAP → BI) is the most common enterprise setup.2
Do I need a data warehouse for Power BI?
No. Power BI connects directly to databases, ERPs, and spreadsheets.8 A warehouse only helps at larger scale or when your data is spread across many source systems.
What's the difference between a data warehouse and a data lake?
A warehouse stores cleaned, structured data with the structure applied before storage (schema-on-write). A lake stores raw data and applies structure when you read it (schema-on-read).5
What is the 5-architecture model of a data warehouse?
It's an academic taxonomy of deployment patterns: centralized, independent data mart, federated, hub-and-spoke, and data mart bus.4 It describes how a warehouse is deployed, while the three-tier model describes its internal structure.
Can I benchmark my firm without building a data warehouse?
Yes. Connect a BI tool to where your data already lives— for AEC firms that's often the ERP, like Deltek Vantagepoint, which embeds the Clarity A&E benchmarking study.810
Understand the architecture, choose the right-sized path
Knowing the types of data warehouse architecture is worth the ten minutes— mostly so you can recognize when you don't need one. For a single firm's benchmarking dashboard, the right-sized path is almost always a BI tool, the data you already have, and AI to compress the analysis. Live this quarter, not next year.
Both things are true: the heavy architecture is real and worth understanding, and most firms should still choose the light path. Understand the architecture; choose the light path; get to the benchmark this quarter. If mapping that right-sized path to your firm's actual workflows is the part that feels hard, that's exactly where Dan Cumberland Labs helps firms map the right-sized path.
References
- Snowflake, "Data Warehouse Architecture and Design: Best Practices" (2024) — https://www.snowflake.com/en/fundamentals/data-warehouse-architecture-and-design/
- IBM, "What Is a Data Warehouse?" (2024) — https://www.ibm.com/think/topics/data-warehouse
- KANINI, "Different Data Warehouse Architecture Types: Pros & Cons of Each" (2024) — https://kanini.com/blog/data-warehouse-architecture-types/
- Ariyachandra & Watson, "Basic five data warehouse architectures identified" (ResearchGate, 2015) — https://www.researchgate.net/figure/Basic-five-data-warehouse-architectures-identified_fig2_277476655
- IBM, "Data Warehouses vs. Data Lakes vs. Data Lakehouses" (2024) — https://www.ibm.com/think/topics/data-warehouse-vs-data-lake-vs-data-lakehouse
- Keboola, "Kimball vs Inmon" (2023) — https://www.keboola.com/blog/kimball-vs-inmon
- CloudOptimo, "ETL vs ELT in Cloud Data Stacks: Key Differences and Best Use Cases" (2024) — https://www.cloudoptimo.com/blog/etl-vs-elt-in-cloud-data-stacks-key-differences-and-best-use-cases/
- Panoply, "BI and Data Warehousing: Do You Need a Data Warehouse Anymore?" (2023) — https://panoply.io/data-warehouse-guide/bi-and-data-warehousing/
- iKemo, "Power BI vs Tableau vs Looker Studio vs Metabase" (2025) — https://ikemo.io/blog/power-bi-vs-tableau-vs-looker-vs-metabase
- Deltek, "Deltek Announces Powerful Enhancements for Vantagepoint" (2025) — https://www.deltek.com/en-au/about/media-center/press-releases/2025/deltek-announces-powerful-enhancements-for-vantagepoint
- Stambaugh Ness, "Your Competitive Edge: Power BI for Deltek Vantagepoint" (2024) — https://www.stambaughness.com/blog/your-competitive-edge-power-bi-for-deltek-vantagepoint/
- Projul, "15 Construction KPIs Every Contractor Should Track" (2024) — https://projul.com/blog/construction-business-kpis-metrics-guide/
- RedHammer, "CFMA's 2024 Benchmarker Highlights Strong Construction Industry Performance" (reporting CFMA's 2024 Construction Financial Benchmarker) (2024) — https://www.redhammer.io/blog/cfmas-2024-benchmarker-highlights-strong-construction-industry-performance
- GPT for Work, "10 Best AI Tools for Data Analysis in Excel & Google Sheets (2026 Guide)" (2026) — https://gptforwork.com/blog/best-ai-tools-excel-google-sheets