system design
System Design: Data Warehouse
Design a scalable data warehouse for a growing e-commerce company. Walk through decisions on storage, processing, modelling, and serving.
System Design: Data Warehouse
The Problem
Design a data warehouse for an e-commerce company with 5M daily orders, 50+ data sources, and a team of 20 analysts. Stakeholders need dashboards updated every hour. The team uses Python and SQL.
Framework: Ask Before You Design
Before jumping to solutions, clarify:
Architecture Decision: Cloud Warehouse
For 5M orders/day ≈ 150M rows/month, a managed cloud warehouse is appropriate:
| Option | Best for |
|---|---|
| Snowflake | Multi-cloud, fine-grained compute control, data sharing |
| BigQuery | GCP ecosystem, serverless, low-maintenance |
| Redshift | AWS ecosystem, tight integration with S3/Glue |
| Databricks | Spark workloads, ML, open format (Delta Lake) |
Ingestion Layer
Three categories of sources:
All land in a raw/bronze schema — no transformation, append-only.
Transformation Layer
Use dbt for SQL transforms:
Run on hourly schedule (matches the 1-hour SLA) via Airflow or dbt Cloud.
Serving Layer
Reliability & Governance
What to Say in an Interview
Structure your answer:
Avoid: jumping straight to specific tools without justification.
Your design notes
Work through this problem yourself before reading the walkthrough above. Your notes are stored locally and not submitted anywhere — only sent to the AI when you click Review.