2 min read
MIT License
Vertex AI Pipeline
Architecture

I built a full data pipeline of a product performance dataset in order to learn orchestration with Dagster, Cloud Run and Vertex AI.

Components

I used Dagster as the orchestration tool, with the components:

  1. Source data - I use a public dataset on BigQuery about product usage and model the data with dbt
  2. DuckDB and evidence.dev - I store persistent .DuckDB files to populate the evidence.dev dashboard
  3. Vertex AI integration - Calls the Vertex AI API to generate AI highlights and trends based off the source data
  4. Dashboard refresh - Updates Evidence.dev dashboard with latest data via scheduled GitHub Actions workflow

Notes

Each individual module can be run manually for testing and debugging. Right now, the project is deployed with GitHub Actions running the Dagster job on the first day of each month.