Skip to content

XO-Data Platform

Welcome to the XO-Data Platform technical documentation. This is a monorepo-based data engineering platform that orchestrates ELT pipelines from various data sources into Snowflake using a modern medallion architecture.

What is XO-Data?

XO-Data is a unified platform for:

  • Data Extraction from APIs (Gladly, Sprout Social, Gmail, Google Sheets, S3)
  • Data Pipeline Orchestration using Apache Airflow with YAML-driven DAG Factory
  • Data Warehousing in Snowflake with BRONZE/SILVER/GOLD layers
  • Reusable Python Packages for data engineering operations

For New Developers

For Package Developers

For Data Engineers

Platform Architecture

┌─────────────────────────────────────────────────────────────┐
│                      Data Sources                            │
│  Gladly API │ Sprout Social │ Gmail │ Google Sheets │ S3    │
└──────────────────────┬──────────────────────────────────────┘
                       │ Extract (xo-foundry tasks)
┌─────────────────────────────────────────────────────────────┐
│                  S3 Staging (Ingest/Stage)                   │
│  Copy-then-Peek, Standardize columns, Load strategy paths   │
└──────────────────────┬──────────────────────────────────────┘
                       │ Load (TRUNCATE + COPY INTO)
┌─────────────────────────────────────────────────────────────┐
│              Snowflake Medallion Architecture                │
│                                                              │
│  BRONZE (Raw)  →  SILVER (Cleaned)  →  GOLD (Analytics)    │
│  All VARCHAR      Typed & Historical   fct_ dim_ agg_ rpt_ │
└─────────────────────────────────────────────────────────────┘

Monorepo Structure

xo-data/
├── packages/              # Reusable Python packages
│   ├── xo-core/          # Foundation: extractors, utilities
│   ├── xo-foundry/       # Orchestration: DAG Factory, tasks
│   ├── xo-lens/          # Analytics: data analysis
│   └── xo-bosun/         # CLI: monorepo navigation
├── apps/                 # Deployment targets
│   ├── airflow/xo-pipelines/  # Airflow DAGs + configs
│   ├── snowflake-schema/      # Snowflake schema migrations
│   └── material-mkdocs/       # This documentation
└── development/          # Development environments

Key Concepts

Medallion Architecture

A three-tier data architecture in Snowflake:

  • BRONZE: Raw data from sources (all VARCHAR, truncated daily, 6 metadata columns)
  • SILVER: Cleaned, typed, historical data (no enrichment, no filtering)
  • GOLD: Analytics-ready data in four types: fct_, dim_, agg_, rpt_

Learn more about Medallion Architecture →

DAG Factory

YAML-driven pipeline generation: Define pipelines in YAML, generate production-ready Airflow DAGs with Pydantic validation and Jinja2 templates.

uv run xo-foundry generate-dag --config pipeline.yaml --output dags/

Learn more about DAG Factory →

Load Strategies

Three strategies per ADR 001:

  • full_refresh: Immutable daily snapshots (most common)
  • incremental: Full pulls with warehouse deduplication
  • historical: Late-arriving data, SCD Type 2

ELT Pipeline Pattern

  1. Extract: Source systems → S3 Ingest bucket (csv.DictWriter, no pandas)
  2. Stage: Copy-then-Peek, standardize columns → S3 Stage bucket
  3. Load: TRUNCATE + COPY INTO → Snowflake BRONZE (idempotent)
  4. Transform: dbt → SILVER → GOLD

Learn more about ELT Flow →

Time Windows

Centralized time window management:

  • Daily: Single date (execution date minus lag)
  • Intraday Relative: Window from lookback to lag
  • Intraday Absolute: Fixed start/end times

Copy-then-Peek Pattern

Performance optimization: S3-to-S3 copy + 8KB range request for headers. Constant time (~0.5s) regardless of file size.

Learn more about Copy-then-Peek →

Essential Commands

Package Management

# Install all dependencies
uv sync

# Install specific package
uv sync --package xo-foundry

# Add dependency
uv add --package xo-foundry requests

Code Quality

# Type checking (must pass with zero errors)
uv run ty check --project packages/xo-core
uv run ty check --project packages/xo-foundry

# Linting and formatting
uv run ruff check .
uv run ruff format .

DAG Factory

# Generate DAG from YAML config
uv run xo-foundry generate-dag \
  --config apps/airflow/xo-pipelines/dags/configs/warbyparker-gladly-daily.yaml \
  --output apps/airflow/xo-pipelines/dags/

# Validate config
uv run xo-foundry validate-config --config pipeline.yaml

Airflow Development

# Start local Airflow
cd apps/airflow/xo-pipelines
astro dev start

# Deploy to production
astro deploy <deployment-id>

Documentation Structure

This documentation is organized by topic:

  • Getting Started -- Installation, setup, and first steps
  • Architecture -- System design, ELT flow, layer architecture
  • Packages -- xo-core, xo-foundry, xo-bosun documentation
  • Snowflake -- Database architecture and medallion layers
  • Reference -- Naming conventions, client registry, ADR index

Last Updated: 2026-02-12 Platform Version: 2.0 (DAG Factory + Medallion Architecture)