ADR 013 — Bronze All-VARCHAR + Silver Type Casting¶
Status¶
Accepted
Context¶
xo-foundry loads all API data into Snowflake Bronze tables using Python's csv.DictWriter, which preserves all values as strings. This matches our pandas-rules policy (see rules/pandas-rules.md): extraction code must never infer types.
However, the original Bronze DDL for 4 Gladly tables contained typed data columns (TIMESTAMP_NTZ, NUMBER):
| Table | Typed Columns |
|---|---|
GLADLY_CONTACT_TIMESTAMPS |
TIMESTAMP TIMESTAMP_NTZ, DUE_AT TIMESTAMP_NTZ |
GLADLY_CONVERSATION_TIMESTAMPS |
TIMESTAMP TIMESTAMP_NTZ |
GLADLY_WORK_SESSIONS |
5× TIMESTAMP_NTZ, 5× NUMBER |
GLADLY_AGENT_DURATIONS |
2× TIMESTAMP_NTZ, 1× NUMBER |
Snowflake's COPY INTO performs implicit type casting when loading CSV data into typed columns. This means:
- If the API returns an unexpected format (empty string, null literal, non-ISO timestamp), the load fails silently or raises a stage error.
- The Gsheet tables (GSHEET_CSAT, GSHEET_QA, added in V1.7.0/V1.8.0) are already fully VARCHAR — inconsistency across Bronze tables.
- Issue #553 removed loader-side type normalization from xo-foundry, making typed Bronze columns a correctness risk.
Decision¶
All Bronze data columns are VARCHAR. Metadata columns (DATE_TO_WAREHOUSE, UPDATE_DATE) keep TIMESTAMP_NTZ DEFAULT CURRENT_TIMESTAMP() because they are set by xo-foundry code at insert time, not from API data.
Silver dbt models own all type conversion. Every Silver model must explicitly cast typed columns from VARCHAR using Snowflake safe functions:
-- Timestamps
TRY_TO_TIMESTAMP_NTZ(col) AS col
-- Integers / decimals
TRY_TO_NUMBER(col) AS col
-- Booleans
IFF(UPPER(col) = 'TRUE', TRUE, FALSE) AS col
-- IDs and text — keep as VARCHAR, no cast
Migration V1.9.0__recreate_gladly_bronze_as_all_varchar.sql drops and recreates the 4 Gladly Bronze tables as fully VARCHAR. The V1.2.0 TIMESTAMP → TIMESTAMP_FIELD rename is absorbed into the new DDL.
Consequences¶
Easier:
- Bronze DDL is consistent: all data columns VARCHAR across all tables.
- Pipeline loads never fail due to type mismatch between API output and DDL.
- Type logic is in version-controlled SQL (Silver dbt models), not buried in DDL comments.
- TRY_TO_* functions return NULL on bad data instead of aborting the pipeline.
Harder:
- Every Silver model must explicitly cast columns. Forgetting a cast leaves a column as VARCHAR in Silver, which will cause type errors in Gold models that expect a timestamp or number.
- When a new Silver model is created, the developer must inspect the Bronze source and determine appropriate casts (see workflow in rules/dbt.md and docs/guides/dbt-development-workflow.md).
Options Considered¶
Option A: Keep typed Bronze DDL, fix loader to match Rejected. xo-foundry intentionally loads as strings (ADR-adjacent decision, enforced by pandas-rules). Keeping typed Bronze columns means maintaining a fragile implicit cast at the COPY INTO boundary.
Option B: Use Bronze typed columns only for well-known stable types Rejected. Inconsistency is worse than a uniform rule. "All VARCHAR" is easy to explain and enforce.
Option C: Cast in Bronze DDL using DEFAULT expressions
Not applicable. Snowflake COPY INTO does not evaluate DEFAULT expressions for columns with incoming data.