Skip to content

ADR 013 — Bronze All-VARCHAR + Silver Type Casting

Status

Accepted

Context

xo-foundry loads all API data into Snowflake Bronze tables using Python's csv.DictWriter, which preserves all values as strings. This matches our pandas-rules policy (see rules/pandas-rules.md): extraction code must never infer types.

However, the original Bronze DDL for 4 Gladly tables contained typed data columns (TIMESTAMP_NTZ, NUMBER):

Table Typed Columns
GLADLY_CONTACT_TIMESTAMPS TIMESTAMP TIMESTAMP_NTZ, DUE_AT TIMESTAMP_NTZ
GLADLY_CONVERSATION_TIMESTAMPS TIMESTAMP TIMESTAMP_NTZ
GLADLY_WORK_SESSIONS TIMESTAMP_NTZ, 5× NUMBER
GLADLY_AGENT_DURATIONS TIMESTAMP_NTZ, 1× NUMBER

Snowflake's COPY INTO performs implicit type casting when loading CSV data into typed columns. This means: - If the API returns an unexpected format (empty string, null literal, non-ISO timestamp), the load fails silently or raises a stage error. - The Gsheet tables (GSHEET_CSAT, GSHEET_QA, added in V1.7.0/V1.8.0) are already fully VARCHAR — inconsistency across Bronze tables. - Issue #553 removed loader-side type normalization from xo-foundry, making typed Bronze columns a correctness risk.

Decision

All Bronze data columns are VARCHAR. Metadata columns (DATE_TO_WAREHOUSE, UPDATE_DATE) keep TIMESTAMP_NTZ DEFAULT CURRENT_TIMESTAMP() because they are set by xo-foundry code at insert time, not from API data.

Silver dbt models own all type conversion. Every Silver model must explicitly cast typed columns from VARCHAR using Snowflake safe functions:

-- Timestamps
TRY_TO_TIMESTAMP_NTZ(col) AS col

-- Integers / decimals
TRY_TO_NUMBER(col) AS col

-- Booleans
IFF(UPPER(col) = 'TRUE', TRUE, FALSE) AS col

-- IDs and text — keep as VARCHAR, no cast

Migration V1.9.0__recreate_gladly_bronze_as_all_varchar.sql drops and recreates the 4 Gladly Bronze tables as fully VARCHAR. The V1.2.0 TIMESTAMP → TIMESTAMP_FIELD rename is absorbed into the new DDL.

Consequences

Easier: - Bronze DDL is consistent: all data columns VARCHAR across all tables. - Pipeline loads never fail due to type mismatch between API output and DDL. - Type logic is in version-controlled SQL (Silver dbt models), not buried in DDL comments. - TRY_TO_* functions return NULL on bad data instead of aborting the pipeline.

Harder: - Every Silver model must explicitly cast columns. Forgetting a cast leaves a column as VARCHAR in Silver, which will cause type errors in Gold models that expect a timestamp or number. - When a new Silver model is created, the developer must inspect the Bronze source and determine appropriate casts (see workflow in rules/dbt.md and docs/guides/dbt-development-workflow.md).

Options Considered

Option A: Keep typed Bronze DDL, fix loader to match Rejected. xo-foundry intentionally loads as strings (ADR-adjacent decision, enforced by pandas-rules). Keeping typed Bronze columns means maintaining a fragile implicit cast at the COPY INTO boundary.

Option B: Use Bronze typed columns only for well-known stable types Rejected. Inconsistency is worse than a uniform rule. "All VARCHAR" is easy to explain and enforce.

Option C: Cast in Bronze DDL using DEFAULT expressions Not applicable. Snowflake COPY INTO does not evaluate DEFAULT expressions for columns with incoming data.