lesson

Data Modelling

Star schema, snowflake schema, normalisation, slowly changing dimensions, and wide tables.

Data Modelling

Dimensional Modelling

Kimball-style dimensional modelling organises data for analytics — optimised for reads, not writes.

Fact Tables

Store measurable events or metrics. Rows are thin — mostly foreign keys and measures.

fact_orders (order_id, date_key, customer_key, product_key, quantity, revenue, cost)

Dimension Tables

Descriptive context for facts. Wide and denormalised for easy filtering.

dim_customer (customer_key, name, email, city, country, segment, created_at)


Star Schema vs Snowflake Schema

StarSnowflake
DimensionsDenormalised (flat)Normalised (split into sub-dims)
Query complexitySimple JOINsMore JOINs
StorageMore redundancyLess redundancy
BI tool performanceFaster (fewer joins)Slower
RecommendedMost analytics casesWhen dimension tables are very large
Prefer star schema for most data warehouse use cases — simpler queries, better BI tool performance.


Slowly Changing Dimensions (SCD)

How do you handle dimension changes over time? (e.g. customer changes city)

TypeStrategyUse when
SCD Type 1Overwrite old valueHistory not needed
SCD Type 2Add new row with valid_from/valid_toFull history needed
SCD Type 3Add "previous_value" columnOnly current + one prior value needed
SCD Type 4Separate history tableHigh-change dimensions
SCD Type 2 example:
dim_customer (
  customer_key   SERIAL PRIMARY KEY,  -- surrogate key
  customer_id    INT,                  -- natural/business key
  name           TEXT,
  city           TEXT,
  valid_from     DATE NOT NULL,
  valid_to       DATE,                 -- NULL = current record
  is_current     BOOLEAN DEFAULT TRUE
)


Wide Tables (One Big Table)

Modern cloud warehouses (Snowflake, BigQuery, Redshift) handle wide tables well. Flattening joins into one wide table:

  • Eliminates join cost at query time
  • Works well with columnar storage (only touched columns are scanned)
  • Popular in dbt with final or mart layer
  • Tradeoff: harder to maintain, larger storage, schema changes require full refresh.


    Normalisation

    Normal FormRule
    1NFAtomic values, no repeating groups
    2NF1NF + no partial dependency on composite PK
    3NF2NF + no transitive dependency
    BCNFStricter 3NF — every determinant is a candidate key
    OLTP systems typically normalise to 3NF/BCNF to minimise update anomalies. OLAP systems intentionally denormalise for read performance.

    Sign in to use the AI study buddy on this lesson.

    Resources