Open Source Data Engineering Tools

A comprehensive collection of tools for modern data engineering

1Data Ingestion & ETL/ELT

  • Apache NiFi
  • Airbyte
  • Singer
  • Meltano
  • Talend Open Studio

2Workflow Orchestration

  • Apache Airflow
  • Dagster
  • Luigi

3Data Processing & Transformation

  • Apache Spark
  • Apache Flink
  • DBT (Data Build Tool)
  • Pandas

4Data Storage & Management

  • Apache Hadoop (HDFS)
  • Apache Iceberg
  • Delta Lake
  • DuckDB
  • ClickHouse

5Data Streaming

  • Apache Kafka
  • Redpanda
  • Apache Pulsar
  • Flink SQL

6Data Warehousing & Query Engines

  • Presto/Trino
  • Apache Druid
  • Apache Pinot

7Data Cataloging & Governance

  • Apache Atlas
  • Amundsen
  • DataHub

8Data Quality & Monitoring

  • Great Expectations
  • Deequ
  • Monte Carlo