ADR 015 — SANDBOX Schema for Ad-Hoc and Supplementary Data¶
Status¶
Accepted
Context¶
Some datasets are complementary sources for analyst use — they do not have dbt consumers, do not
need the full BRONZE → SILVER → GOLD pipeline overhead, and have no production SLA. The immediate
trigger was onboarding WBP QA evaluation data (issue #568): a manually-loaded Google Sheet used for
analyst evaluation that supplements, but does not replace, the existing GSHEETS_QA Bronze table.
Loading these datasets directly into BRONZE creates ambiguity:
- BRONZE implies a production pipeline owns the table (xo-foundry + Airflow).
- BRONZE tables have metadata columns (
RECORD_KEY,RECORD_HASH,BATCH_ID, etc.) populated by the loader. Manual analyst loads would lack these, breaking downstream assumptions. - Mixing governed pipeline tables with ad-hoc analyst tables makes Bronze schema inspection unreliable as a proxy for "what pipelines exist."
Without a formal schema, analysts resort to loading into arbitrary locations (OPERATIONS, one-off schemas, etc.), losing traceability.
Decision¶
Create a SANDBOX schema in every client database (WBP_DB, CND_DB, CORE_DB and their
_DEV equivalents). SANDBOX is provisioned by schemachange alongside BRONZE/SILVER/GOLD, making
it available to analysts from day one of client onboarding.
Characteristics:
- Owner: Analyst (not a pipeline). Tables are manually created and loaded.
- Naming convention:
{SOURCE}_{OBJECT}_{ISSUE_NUMBER}— e.g.,GSHEET_XO_EXTERNAL_QA_568. The issue number is a metadata suffix, not a topic prefix. It provides a direct pointer to the context behind the table and ensures names stay unique across loads. - No pipeline automation: No Airflow DAGs, no xo-foundry tasks own SANDBOX tables.
- No dbt models: SANDBOX tables are not referenced by any dbt model. If analyst convenience
views are needed, they are written as Snowflake
R__repeatable scripts via schemachange — not dbt models. - No SLA: SANDBOX is best-effort. Tables may be stale, incomplete, or temporary.
- No metadata columns required:
RECORD_KEY,RECORD_HASH, etc. are optional; their absence distinguishes SANDBOX from BRONZE.
Promotion path: If a SANDBOX table proves permanent and needs pipeline ownership:
- Create a proper Bronze table in BRONZE schema (rename without issue suffix, add metadata columns).
- Build the full xo-foundry pipeline and dbt Silver model.
- Retain the SANDBOX copy briefly for traceability, then drop it.
Consequences¶
Gets easier:
- Analysts have a safe, tracked location for supplementary loads without polluting BRONZE.
- Every SANDBOX table name points to a GitHub issue with full context.
- New client onboarding automatically gets SANDBOX via the V1.0.0 template migration.
- Clear promotion path from ad-hoc to production pipeline.
Gets harder / trade-offs:
- Analysts must follow the {SOURCE}_{OBJECT}_{ISSUE_NUMBER} naming convention consistently.
- Governance is lower by design — no automated freshness checks or data quality tests.
- SANDBOX tables are not visible in the dbt model registry or lineage graph.
Options Considered¶
WORKSPACE schema — Rejected. "Workspace" implies a persistent, curated space; it would
attract permanent objects that should stay in BRONZE or GOLD. The name doesn't signal lower
governance.
BRONZE_EXT schema — Rejected. Not an industry-standard term. Would be confused with
BRONZE by new team members. The "EXT" suffix is ambiguous (external? extended?).
Flat dump into OPERATIONS — Rejected. OPERATIONS is a legacy multi-client database being deprecated. Routing new analyst work there reinforces technical debt and loses per-client isolation.
No schema, issue-specific schemas (e.g., ISSUE_568) — Rejected. One schema per issue would
create schema sprawl and make permissions unmanageable. A single SANDBOX schema is simpler.