XO-Data Platform¶
Welcome to the XO-Data Platform technical documentation. This is a monorepo-based data engineering platform that orchestrates ELT pipelines from various data sources into Snowflake using a modern medallion architecture.
What is XO-Data?¶
XO-Data is a unified platform for:
- Data Extraction from APIs (Gladly, Sprout Social, Gmail, Google Sheets, S3)
- Data Pipeline Orchestration using Apache Airflow with YAML-driven DAG Factory
- Data Warehousing in Snowflake with BRONZE/SILVER/GOLD layers
- Reusable Python Packages for data engineering operations
Quick Links¶
For New Developers¶
- Getting Started -- Set up your development environment
- Architecture Overview -- Understand the system design
- Naming Conventions -- Standards and conventions
For Package Developers¶
- xo-core Package -- Foundation utilities and extractors
- xo-foundry Package -- DAG Factory and orchestration
- xo-bosun Package -- Monorepo navigation CLI
For Data Engineers¶
- Snowflake Architecture -- BRONZE/SILVER/GOLD layers
- ELT Pipeline Flow -- How data flows through the system
- DAG Factory Guide -- YAML-driven pipeline generation
- Architecture Decisions -- ADRs shaping the platform
Platform Architecture¶
┌─────────────────────────────────────────────────────────────┐
│ Data Sources │
│ Gladly API │ Sprout Social │ Gmail │ Google Sheets │ S3 │
└──────────────────────┬──────────────────────────────────────┘
│ Extract (xo-foundry tasks)
▼
┌─────────────────────────────────────────────────────────────┐
│ S3 Staging (Ingest/Stage) │
│ Copy-then-Peek, Standardize columns, Load strategy paths │
└──────────────────────┬──────────────────────────────────────┘
│ Load (TRUNCATE + COPY INTO)
▼
┌─────────────────────────────────────────────────────────────┐
│ Snowflake Medallion Architecture │
│ │
│ BRONZE (Raw) → SILVER (Cleaned) → GOLD (Analytics) │
│ All VARCHAR Typed & Historical fct_ dim_ agg_ rpt_ │
└─────────────────────────────────────────────────────────────┘
Monorepo Structure¶
xo-data/
├── packages/ # Reusable Python packages
│ ├── xo-core/ # Foundation: extractors, utilities
│ ├── xo-foundry/ # Orchestration: DAG Factory, tasks
│ ├── xo-lens/ # Analytics: data analysis
│ └── xo-bosun/ # CLI: monorepo navigation
│
├── apps/ # Deployment targets
│ ├── airflow/xo-pipelines/ # Airflow DAGs + configs
│ ├── snowflake-schema/ # Snowflake schema migrations
│ └── material-mkdocs/ # This documentation
│
└── development/ # Development environments
Key Concepts¶
Medallion Architecture¶
A three-tier data architecture in Snowflake:
- BRONZE: Raw data from sources (all VARCHAR, truncated daily, 6 metadata columns)
- SILVER: Cleaned, typed, historical data (no enrichment, no filtering)
- GOLD: Analytics-ready data in four types:
fct_,dim_,agg_,rpt_
Learn more about Medallion Architecture →
DAG Factory¶
YAML-driven pipeline generation: Define pipelines in YAML, generate production-ready Airflow DAGs with Pydantic validation and Jinja2 templates.
Learn more about DAG Factory →
Load Strategies¶
Three strategies per ADR 001:
full_refresh: Immutable daily snapshots (most common)incremental: Full pulls with warehouse deduplicationhistorical: Late-arriving data, SCD Type 2
ELT Pipeline Pattern¶
- Extract: Source systems → S3 Ingest bucket (csv.DictWriter, no pandas)
- Stage: Copy-then-Peek, standardize columns → S3 Stage bucket
- Load: TRUNCATE + COPY INTO → Snowflake BRONZE (idempotent)
- Transform: dbt → SILVER → GOLD
Time Windows¶
Centralized time window management:
- Daily: Single date (execution date minus lag)
- Intraday Relative: Window from lookback to lag
- Intraday Absolute: Fixed start/end times
Copy-then-Peek Pattern¶
Performance optimization: S3-to-S3 copy + 8KB range request for headers. Constant time (~0.5s) regardless of file size.
Learn more about Copy-then-Peek →
Essential Commands¶
Package Management¶
# Install all dependencies
uv sync
# Install specific package
uv sync --package xo-foundry
# Add dependency
uv add --package xo-foundry requests
Code Quality¶
# Type checking (must pass with zero errors)
uv run ty check --project packages/xo-core
uv run ty check --project packages/xo-foundry
# Linting and formatting
uv run ruff check .
uv run ruff format .
DAG Factory¶
# Generate DAG from YAML config
uv run xo-foundry generate-dag \
--config apps/airflow/xo-pipelines/dags/configs/warbyparker-gladly-daily.yaml \
--output apps/airflow/xo-pipelines/dags/
# Validate config
uv run xo-foundry validate-config --config pipeline.yaml
Airflow Development¶
# Start local Airflow
cd apps/airflow/xo-pipelines
astro dev start
# Deploy to production
astro deploy <deployment-id>
Documentation Structure¶
This documentation is organized by topic:
- Getting Started -- Installation, setup, and first steps
- Architecture -- System design, ELT flow, layer architecture
- Packages -- xo-core, xo-foundry, xo-bosun documentation
- Snowflake -- Database architecture and medallion layers
- Reference -- Naming conventions, client registry, ADR index
Last Updated: 2026-02-12 Platform Version: 2.0 (DAG Factory + Medallion Architecture)