Data Engineering with Positron + Databricks

A webinar demonstration showing how to use Positron as an IDE for building production Data Engineering pipelines with Databricks.

Overview

This repository accompanies a webinar that highlights:

Positron as a professional IDE for data work (vs notebook-centric development)
Databricks Asset Bundles (DABs) for infrastructure-as-code deployments
Lakeflow Declarative Pipelines for declarative pipeline development
AI-assisted development with Positron Assistant

The Use Case

Business Question: How does weather impact NYC taxi operations?

We build a data pipeline that:

Fetches historical weather data from the Open-Meteo API
Joins weather conditions with NYC taxi trip data
Produces analytics showing weather impact on ridership and fares

Repository Structure

├── slides/                    # Quarto presentation
│   └── webinar.qmd           # reveal.js slide deck
├── example_pipeline/          # Databricks Asset Bundle project
│   ├── databricks.yml        # DAB configuration
│   ├── resources/            # Job and pipeline definitions
│   ├── src/
│   │   ├── example_pipeline/         # Python modules
│   │   │   ├── weather.py           # Open-Meteo API client
│   │   │   └── weather_cli.py       # CLI for weather fetch job
│   │   └── example_pipeline_etl/    # Declarative pipeline transformations
│   │       └── transformations/
│   │           ├── weather_data_source.py    # Bronze: raw weather
│   │           ├── weather_taxi_join.py      # Silver: enriched trips
│   │           └── weather_impact_metrics.py # Gold: aggregations
│   ├── tests/                # pytest test suite
│   └── fixtures/             # Sample data for testing
└── .beads/                   # Issue tracking (bd)

Getting Started

Prerequisites

Positron or VS Code with Databricks extension
uv package manager
Databricks CLI
Access to a Databricks workspace

Setup

Clone the repository:

git clone https://github.com/blairj09/databricks-data-engineering.git
cd databricks-data-engineering

Install dependencies:
```
cd example_pipeline
uv sync --dev
```
Configure Databricks authentication:
```
databricks configure
```
Deploy to your workspace:
```
databricks bundle deploy --target dev
```

Running Tests

cd example_pipeline
uv run pytest

Viewing the Slides

cd slides
quarto preview webinar.qmd

Key Concepts Demonstrated

Why IDE > Notebooks for Data Engineering

Challenge	IDE Solution
Meaningless git diffs	Standard Python files with clean diffs
No code review possible	PR-based workflows with readable changes
Can't unit test	pytest with Databricks Connect
Copy-paste code	Proper imports and shared modules
Production debugging	Breakpoints and stack traces

Lakeflow Declarative Pipelines

Declarative pipeline definitions with built-in:

Dependency management
Data quality expectations
Incremental processing
Schema evolution

Databricks Asset Bundles (DABs)

Infrastructure-as-code for Databricks:

Version-controlled job/pipeline definitions
Environment-specific deployments (dev/prod)
Repeatable, automated deployments

Resources

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.beads		.beads
.claude		.claude
eda		eda
example_pipeline		example_pipeline
slides		slides
.gitattributes		.gitattributes
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Engineering with Positron + Databricks

Overview

The Use Case

Repository Structure

Getting Started

Prerequisites

Setup

Running Tests

Viewing the Slides

Key Concepts Demonstrated

Why IDE > Notebooks for Data Engineering

Lakeflow Declarative Pipelines

Databricks Asset Bundles (DABs)

Resources

License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

blairj09/databricks-data-engineering

Folders and files

Latest commit

History

Repository files navigation

Data Engineering with Positron + Databricks

Overview

The Use Case

Repository Structure

Getting Started

Prerequisites

Setup

Running Tests

Viewing the Slides

Key Concepts Demonstrated

Why IDE > Notebooks for Data Engineering

Lakeflow Declarative Pipelines

Databricks Asset Bundles (DABs)

Resources

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages