lesson

Interview Prep: Behavioural & Technical

Common data engineering interview questions, the STAR framework, and what interviewers actually look for.

Data Engineering Interview Prep

What Interviewers Look For

Data engineering interviews test:

SQL proficiency — window functions, CTEs, optimisation (most common)

Data modelling — dimensional model, SCD, normalisation

Pipeline design — batch vs streaming, idempotency, orchestration

System design — design a data warehouse or ETL pipeline at scale

Debugging — "my pipeline is slow / producing duplicates / failing — what do you do?"

Behavioural — STAR format, teamwork, handling ambiguity

STAR Framework

Situation — set the context Task — what were you responsible for Action — what YOU did (not "we") Result — quantified outcome where possible

Common Behavioural Questions

"Tell me about a time you dealt with a data quality issue." Situation: production dashboard showing incorrect revenue figures. Task: root cause analysis, fix, prevent recurrence. Action: traced via dbt lineage to a join that was fanout-multiplying rows; added a dbt test for row count equality; added monitoring alert. Result: issue caught within 15 minutes the next time it occurred.

"Tell me about a complex pipeline you built." Focus on: scale, design decisions, failure handling, testing.

"How do you handle disagreements with stakeholders about data definitions?" Show: data as the arbiter, document agreed definitions in the dbt model description, involve analytics leadership.

Technical: "Debugging a Slow Query"

Structured answer:

EXPLAIN ANALYZE — find the bottleneck (Seq Scan? bad estimates?)

Check indexes — missing index on join/filter column?

Check statistics — ANALYZE table to update planner statistics

Simplify — isolate the slow subquery, test in isolation

Partition — large table? Partition by date and add partition pruning

Materialise — move to dbt table or incremental if CTE is re-evaluated many times

Technical: "Pipeline is Producing Duplicates"

Is the ingestion idempotent? Check for double-trigger of the DAG

Is there a fanout in a JOIN? (many-to-many)

Is there a missing DISTINCT or ROW_NUMBER() dedup step?

Is the unique key constraint in the warehouse enforced?

Is the incremental model missing a unique_key → rows appended instead of upserted?

Salary Negotiation Notes

Poland data engineering market: 15,000–30,000 PLN/month (B2B) depending on seniority

Remote EU roles: €60k–€120k+ depending on company stage

Always negotiate — first offer is rarely the best offer