Build data pipelines you can trust

September 27, 2025

A compact view of ETL/ELT, data lakes, and streaming systems with a focus on quality and operational simplicity.

Model the business first, then the tables

Data projects fail when “what does this mean?” is unclear.

Common patterns:

In practice, you want both: raw ingestion plus curated, versioned transforms.

Example (incremental load pattern):

insert into curated.events
select *
from raw.events
where ingested_at > (select coalesce(max(ingested_at), '1970-01-01') from curated.events);

A lake without structure becomes a dumping ground. Establish:

Treat datasets like products: documented, owned, and monitored.

Real-time pipelines introduce new failure modes:

Design for idempotency and implement clear replay procedures early.

If you can’t trust the data, it won’t be used:

Hi, I'm Martin Duchev. You can find more about my projects on my GitHub page.