Merge Checklist - DAG Factory Implementation¶
Branch: feature/xo-foundry-dag-factory
Target Branches: main and develop
Date: 2025-12-03
Pre-Merge Checklist¶
✅ Code Quality¶
- All code passes
mypytype checking (zero errors) - All code passes
rufflinting - All code is formatted with
ruff format - Generated DAGs validate successfully in Python
- No temporary/debug code left in files
✅ Testing¶
- Path builder module tested (full_refresh, incremental, historical)
- YAML validation tested (Pydantic schemas)
- DAG generation tested (YAML → Python)
- CLI commands tested (generate-dag, validate-config)
- Reference DAG regenerated and validated
- Load strategy paths verified in generated DAG
✅ Documentation¶
- CLAUDE.md updated with DAG factory information
- Architecture documentation archived
- Implementation summary created
- Load strategy decision record (ADR 001)
- Roadmap document created with future work
- Code comments and docstrings complete
✅ Migration Readiness¶
- Backward compatibility maintained (old configs still work)
- Default load_strategy:
full_refresh - Clear migration path documented
- Example YAML configs provided
What's Being Merged¶
New Files¶
DAG Factory Core¶
packages/xo-foundry/src/xo_foundry/
├── dag_factory/
│ ├── __init__.py
│ ├── factory.py # DAG generator
│ ├── builders/
│ │ ├── __init__.py
│ │ └── path_builder.py # S3 path builder with load strategy
│ └── templates/
│ └── snowflake_load.py.j2 # Jinja2 template
├── schemas/
│ └── dag_config.py # Pydantic validation schemas
└── cli/
└── generate_dags.py # CLI tool
Configuration¶
Documentation¶
.claude/ongoing/
├── guides/
│ ├── xo-foundry-roadmap.md # Future work roadmap
│ └── merge-checklist.md # This file
├── decisions/
│ └── 001-load-strategy-terminology.md
├── reference/
│ └── data-refresh-patterns.md # Industry patterns research
└── archived/
└── 2025-12-03-dag-factory-implementation/
├── README.md
├── dag-factory-architecture.md
├── dag-factory-implementation-summary.md
├── load-strategy-implementation.md
└── warbyparker-timestamps-test-results.md
Modified Files¶
Task Updates (Load Strategy Integration)¶
packages/xo-foundry/src/xo_foundry/tasks/
├── extract_tasks.py # Now uses build_ingest_path()
└── stage_tasks.py # Now uses build_stage_path()
Configuration¶
Generated DAG¶
apps/airflow/xo-pipelines/dags/
└── warbyparker_timestamps_daily.py # Regenerated with load_strategy
Documentation¶
What's NOT Being Merged (Future Work)¶
See xo-foundry-roadmap.md for details:
Phase 2: Additional Extractors¶
- Gmail extractor task
- Google Sheets extractor task
- S3 file extractor task
- Generic API extractor task
Phase 3: dbt Integration¶
- dbt Cloud API integration (replace placeholder)
- dbt Core integration (alternative)
- dbt metadata integration
Phase 4: Additional Templates¶
- Data export template (Snowflake → External)
- Hybrid pipeline template
- Reverse ETL template
Phase 5: Quality & Observability¶
- Data quality validation tasks
- Monitoring and alerting
- Lineage tracking
Deployment Steps¶
1. Merge to Develop¶
# Ensure branch is up to date
git checkout feature/xo-foundry-dag-factory
git pull origin feature/xo-foundry-dag-factory
# Merge to develop
git checkout develop
git pull origin develop
git merge feature/xo-foundry-dag-factory
git push origin develop
2. Test in Development Environment¶
# Deploy to Astronomer
cd apps/airflow/xo-pipelines
astro deploy <dev-deployment-id>
# Monitor DAG execution
# Verify S3 paths include load_strategy
# Check Snowflake BRONZE tables
3. Merge to Main (After Dev Testing)¶
# Merge to main
git checkout main
git pull origin main
git merge develop
git push origin main
# Tag release
git tag -a v0.4.0 -m "Release v0.4.0: DAG Factory with Load Strategy Support"
git push origin v0.4.0
4. Deploy to Production¶
Breaking Changes¶
None - This release is backward compatible.
- Existing DAGs continue to work without changes
- Old path structure still supported (defaults to
full_refresh) - Configurations without
load_strategydefault tofull_refresh
Migration Guide for Existing DAGs¶
To adopt the new DAG factory for existing pipelines:
- Create YAML config based on existing DAG
- Add
load_strategyto each source (usuallyfull_refresh) - Validate config:
uv run xo-foundry validate-config --config ... - Generate DAG:
uv run xo-foundry generate-dag --config ... --output ... - Compare: Verify generated DAG matches existing DAG logic
- Test locally:
astro dev start - Deploy: Replace manual DAG with generated version
Rollback Plan¶
If issues are discovered after merge:
- Immediate: Revert commit on affected branch
- DAG Level: Replace generated DAG with previous manual version
- Package Level: Pin
xo-foundryto previous version in requirements
Post-Merge Tasks¶
- Update team on new DAG factory workflow
- Create example DAGs for other clients
- Monitor first production runs
- Gather feedback on YAML configuration structure
- Plan Phase 2 (Additional Extractors)
Success Criteria¶
- All type checking passes (mypy)
- All linting passes (ruff)
- Generated DAGs validate successfully
- CLI tools work as documented
- Reference DAG runs successfully in local Airflow
- Reference DAG runs successfully in dev environment (post-merge)
- S3 paths include load strategy (post-merge)
- Snowflake loads complete successfully (post-merge)
Team Communication¶
Announcement Template:
🎉 DAG Factory v0.4.0 is ready for merge!
What's new:
- Generate Airflow DAGs from YAML configurations
- Load strategy support (full_refresh, incremental, historical)
- S3 paths now include load strategy for better data management
- CLI tool: `xo-foundry generate-dag`
Quick start:
1. Create YAML config in packages/xo-foundry/configs/
2. Run: uv run xo-foundry generate-dag --config ... --output ...
3. Test in local Airflow
4. Deploy!
Documentation: .claude/CLAUDE.md (see "DAG Factory" section)
Roadmap: .claude/ongoing/guides/xo-foundry-roadmap.md
Questions? Check the docs or ask!
Approval¶
- Code review complete
- Documentation reviewed
- Testing complete
- Ready for merge
Approved by: Data Engineering Lead Date: 2025-12-03