A webinar demonstration showing how to use Positron as an IDE for building production Data Engineering pipelines with Databricks.
This repository accompanies a webinar that highlights:
- Positron as a professional IDE for data work (vs notebook-centric development)
- Databricks Asset Bundles (DABs) for infrastructure-as-code deployments
- Lakeflow Declarative Pipelines for declarative pipeline development
- AI-assisted development with Positron Assistant
Business Question: How does weather impact NYC taxi operations?
We build a data pipeline that:
- Fetches historical weather data from the Open-Meteo API
- Joins weather conditions with NYC taxi trip data
- Produces analytics showing weather impact on ridership and fares
├── slides/ # Quarto presentation
│ └── webinar.qmd # reveal.js slide deck
├── example_pipeline/ # Databricks Asset Bundle project
│ ├── databricks.yml # DAB configuration
│ ├── resources/ # Job and pipeline definitions
│ ├── src/
│ │ ├── example_pipeline/ # Python modules
│ │ │ ├── weather.py # Open-Meteo API client
│ │ │ └── weather_cli.py # CLI for weather fetch job
│ │ └── example_pipeline_etl/ # Declarative pipeline transformations
│ │ └── transformations/
│ │ ├── weather_data_source.py # Bronze: raw weather
│ │ ├── weather_taxi_join.py # Silver: enriched trips
│ │ └── weather_impact_metrics.py # Gold: aggregations
│ ├── tests/ # pytest test suite
│ └── fixtures/ # Sample data for testing
└── .beads/ # Issue tracking (bd)
- Positron or VS Code with Databricks extension
- uv package manager
- Databricks CLI
- Access to a Databricks workspace
-
Clone the repository:
git clone https://github.com/blairj09/databricks-data-engineering.git cd databricks-data-engineering -
Install dependencies:
cd example_pipeline uv sync --dev -
Configure Databricks authentication:
databricks configure
-
Deploy to your workspace:
databricks bundle deploy --target dev
cd example_pipeline
uv run pytestcd slides
quarto preview webinar.qmd| Challenge | IDE Solution |
|---|---|
| Meaningless git diffs | Standard Python files with clean diffs |
| No code review possible | PR-based workflows with readable changes |
| Can't unit test | pytest with Databricks Connect |
| Copy-paste code | Proper imports and shared modules |
| Production debugging | Breakpoints and stack traces |
Declarative pipeline definitions with built-in:
- Dependency management
- Data quality expectations
- Incremental processing
- Schema evolution
Infrastructure-as-code for Databricks:
- Version-controlled job/pipeline definitions
- Environment-specific deployments (dev/prod)
- Repeatable, automated deployments
MIT