lesson

Interview Prep: Behavioural & Technical

Common data engineering interview questions, the STAR framework, and what interviewers actually look for.

Data Engineering Interview Prep

What Interviewers Look For

Data engineering interviews test:

  • SQL proficiency — window functions, CTEs, optimisation (most common)
  • Data modelling — dimensional model, SCD, normalisation
  • Pipeline design — batch vs streaming, idempotency, orchestration
  • System design — design a data warehouse or ETL pipeline at scale
  • Debugging — "my pipeline is slow / producing duplicates / failing — what do you do?"
  • Behavioural — STAR format, teamwork, handling ambiguity

  • STAR Framework

    Situation — set the context Task — what were you responsible for Action — what YOU did (not "we") Result — quantified outcome where possible


    Common Behavioural Questions

    "Tell me about a time you dealt with a data quality issue." Situation: production dashboard showing incorrect revenue figures. Task: root cause analysis, fix, prevent recurrence. Action: traced via dbt lineage to a join that was fanout-multiplying rows; added a dbt test for row count equality; added monitoring alert. Result: issue caught within 15 minutes the next time it occurred.

    "Tell me about a complex pipeline you built." Focus on: scale, design decisions, failure handling, testing.

    "How do you handle disagreements with stakeholders about data definitions?" Show: data as the arbiter, document agreed definitions in the dbt model description, involve analytics leadership.


    Technical: "Debugging a Slow Query"

    Structured answer:

  • EXPLAIN ANALYZE — find the bottleneck (Seq Scan? bad estimates?)
  • Check indexes — missing index on join/filter column?
  • Check statisticsANALYZE table to update planner statistics
  • Simplify — isolate the slow subquery, test in isolation
  • Partition — large table? Partition by date and add partition pruning
  • Materialise — move to dbt table or incremental if CTE is re-evaluated many times

  • Technical: "Pipeline is Producing Duplicates"

  • Is the ingestion idempotent? Check for double-trigger of the DAG
  • Is there a fanout in a JOIN? (many-to-many)
  • Is there a missing DISTINCT or ROW_NUMBER() dedup step?
  • Is the unique key constraint in the warehouse enforced?
  • Is the incremental model missing a unique_key → rows appended instead of upserted?

  • Salary Negotiation Notes

  • Poland data engineering market: 15,000–30,000 PLN/month (B2B) depending on seniority
  • Remote EU roles: €60k–€120k+ depending on company stage
  • Always negotiate — first offer is rarely the best offer
  • Sign in to use the AI study buddy on this lesson.