Skip to content

Merge Checklist - DAG Factory Implementation

Branch: feature/xo-foundry-dag-factory Target Branches: main and develop Date: 2025-12-03

Pre-Merge Checklist

✅ Code Quality

  • All code passes mypy type checking (zero errors)
  • All code passes ruff linting
  • All code is formatted with ruff format
  • Generated DAGs validate successfully in Python
  • No temporary/debug code left in files

✅ Testing

  • Path builder module tested (full_refresh, incremental, historical)
  • YAML validation tested (Pydantic schemas)
  • DAG generation tested (YAML → Python)
  • CLI commands tested (generate-dag, validate-config)
  • Reference DAG regenerated and validated
  • Load strategy paths verified in generated DAG

✅ Documentation

  • CLAUDE.md updated with DAG factory information
  • Architecture documentation archived
  • Implementation summary created
  • Load strategy decision record (ADR 001)
  • Roadmap document created with future work
  • Code comments and docstrings complete

✅ Migration Readiness

  • Backward compatibility maintained (old configs still work)
  • Default load_strategy: full_refresh
  • Clear migration path documented
  • Example YAML configs provided

What's Being Merged

New Files

DAG Factory Core

packages/xo-foundry/src/xo_foundry/
├── dag_factory/
│   ├── __init__.py
│   ├── factory.py                    # DAG generator
│   ├── builders/
│   │   ├── __init__.py
│   │   └── path_builder.py           # S3 path builder with load strategy
│   └── templates/
│       └── snowflake_load.py.j2      # Jinja2 template
├── schemas/
│   └── dag_config.py                 # Pydantic validation schemas
└── cli/
    └── generate_dags.py              # CLI tool

Configuration

packages/xo-foundry/configs/
└── warbyparker-timestamps.yaml       # Updated with load_strategy

Documentation

.claude/ongoing/
├── guides/
│   ├── xo-foundry-roadmap.md         # Future work roadmap
│   └── merge-checklist.md            # This file
├── decisions/
│   └── 001-load-strategy-terminology.md
├── reference/
│   └── data-refresh-patterns.md      # Industry patterns research
└── archived/
    └── 2025-12-03-dag-factory-implementation/
        ├── README.md
        ├── dag-factory-architecture.md
        ├── dag-factory-implementation-summary.md
        ├── load-strategy-implementation.md
        └── warbyparker-timestamps-test-results.md

Modified Files

Task Updates (Load Strategy Integration)

packages/xo-foundry/src/xo_foundry/tasks/
├── extract_tasks.py                  # Now uses build_ingest_path()
└── stage_tasks.py                    # Now uses build_stage_path()

Configuration

packages/xo-foundry/pyproject.toml    # Added CLI entry point

Generated DAG

apps/airflow/xo-pipelines/dags/
└── warbyparker_timestamps_daily.py   # Regenerated with load_strategy

Documentation

.claude/CLAUDE.md                     # Added DAG factory section

What's NOT Being Merged (Future Work)

See xo-foundry-roadmap.md for details:

Phase 2: Additional Extractors

  • Gmail extractor task
  • Google Sheets extractor task
  • S3 file extractor task
  • Generic API extractor task

Phase 3: dbt Integration

  • dbt Cloud API integration (replace placeholder)
  • dbt Core integration (alternative)
  • dbt metadata integration

Phase 4: Additional Templates

  • Data export template (Snowflake → External)
  • Hybrid pipeline template
  • Reverse ETL template

Phase 5: Quality & Observability

  • Data quality validation tasks
  • Monitoring and alerting
  • Lineage tracking

Deployment Steps

1. Merge to Develop

# Ensure branch is up to date
git checkout feature/xo-foundry-dag-factory
git pull origin feature/xo-foundry-dag-factory

# Merge to develop
git checkout develop
git pull origin develop
git merge feature/xo-foundry-dag-factory
git push origin develop

2. Test in Development Environment

# Deploy to Astronomer
cd apps/airflow/xo-pipelines
astro deploy <dev-deployment-id>

# Monitor DAG execution
# Verify S3 paths include load_strategy
# Check Snowflake BRONZE tables

3. Merge to Main (After Dev Testing)

# Merge to main
git checkout main
git pull origin main
git merge develop
git push origin main

# Tag release
git tag -a v0.4.0 -m "Release v0.4.0: DAG Factory with Load Strategy Support"
git push origin v0.4.0

4. Deploy to Production

# Deploy to Astronomer prod
cd apps/airflow/xo-pipelines
astro deploy <prod-deployment-id>

Breaking Changes

None - This release is backward compatible.

  • Existing DAGs continue to work without changes
  • Old path structure still supported (defaults to full_refresh)
  • Configurations without load_strategy default to full_refresh

Migration Guide for Existing DAGs

To adopt the new DAG factory for existing pipelines:

  1. Create YAML config based on existing DAG
  2. Add load_strategy to each source (usually full_refresh)
  3. Validate config: uv run xo-foundry validate-config --config ...
  4. Generate DAG: uv run xo-foundry generate-dag --config ... --output ...
  5. Compare: Verify generated DAG matches existing DAG logic
  6. Test locally: astro dev start
  7. Deploy: Replace manual DAG with generated version

Rollback Plan

If issues are discovered after merge:

  1. Immediate: Revert commit on affected branch
  2. DAG Level: Replace generated DAG with previous manual version
  3. Package Level: Pin xo-foundry to previous version in requirements

Post-Merge Tasks

  • Update team on new DAG factory workflow
  • Create example DAGs for other clients
  • Monitor first production runs
  • Gather feedback on YAML configuration structure
  • Plan Phase 2 (Additional Extractors)

Success Criteria

  • All type checking passes (mypy)
  • All linting passes (ruff)
  • Generated DAGs validate successfully
  • CLI tools work as documented
  • Reference DAG runs successfully in local Airflow
  • Reference DAG runs successfully in dev environment (post-merge)
  • S3 paths include load strategy (post-merge)
  • Snowflake loads complete successfully (post-merge)

Team Communication

Announcement Template:

🎉 DAG Factory v0.4.0 is ready for merge!

What's new:
- Generate Airflow DAGs from YAML configurations
- Load strategy support (full_refresh, incremental, historical)
- S3 paths now include load strategy for better data management
- CLI tool: `xo-foundry generate-dag`

Quick start:
1. Create YAML config in packages/xo-foundry/configs/
2. Run: uv run xo-foundry generate-dag --config ... --output ...
3. Test in local Airflow
4. Deploy!

Documentation: .claude/CLAUDE.md (see "DAG Factory" section)
Roadmap: .claude/ongoing/guides/xo-foundry-roadmap.md

Questions? Check the docs or ask!

Approval

  • Code review complete
  • Documentation reviewed
  • Testing complete
  • Ready for merge

Approved by: Data Engineering Lead Date: 2025-12-03