dbt

dbt (data build tool) is an open source Python package that enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications. dbt allows you to build your data transformation pipeline using SQL queries.

dbt fits nicely into the modern BI stack, coupling with products like Stitch, Fivetran, Redshift, Snowflake, BigQuery, Looker, and Mode. Here’s how the pieces fit together:

dbt is the T in ELT. It doesn’t extract or load data, but it’s extremely good at transforming data that’s already loaded into your warehouse. This “transform after load” architecture is becoming known as ELT (extract, load, transform).

ELT has become commonplace because of the power of modern analytic databases. Data warehouses like Redshift, Snowflake, and BigQuery are extremely performant and very scalable such that at this point most data transformation use cases can be much more effectively handled in-database rather than in some external processing layer. Add to this the separation of compute and storage and there are decreasingly few reasons to want to execute your data transformation jobs elsewhere.

dbt is a tool to help you write and execute the data transformation jobs that run inside your warehouse. dbt’s only function is to take code, compile it to SQL, and then run against your database.

Want to deep-dive?

This is the playlist if you get into deep-dive :

https://www.youtube.com/playlist?list=PLy4OcwImJzBLJzLYxpxaPUmCWp8j1esvT

Data modeling techniques for more modularity

https://www.getdbt.com/analytics-engineering/modular-data-modeling-technique/

Commands

Makefile
init:
    dbt init ${PROJECT_NAME}
debug:
    dbt debug
run:
    dbt run
    dbt run --profiles-dir path/to/directory
    export DBT_PROFILES_DIR=path/to/directory
test:
    dbt test -m model1 [model2]
seed:
    dbt seed

requirements.txt
dbt-core
dbt-databricks
dbt-postgres
dbt-snowflake

profiles.yml
postgres:
  outputs:
    dev:
      type: postgres
      threads: 1
      host: <host>
      port: 5432
      user: postgres
      pass: <pass>
      dbname: postgres
      schema: public
    prod:
      type: postgres
      threads: 1
      host: <host>
      port: 5432
      user: postgres
      pass: <pass>
      dbname: postgres
      schema: public
  target: dev
  
databricks:
  outputs:
    dev:
      host: {HOST}
      http_path: {HTTP_PATH}
      schema: default
      threads: 1
      token: {TOKEN}
      type: databricks
  target: dev

dbt

Data modeling techniques for more modularity​

Commands​

More Resources​

Data modeling techniques for more modularity

Commands

More Resources