Skip to content

ADR 014 — XOOS Analytics Data Delivery: Direct Snowflake Connection

Status

Accepted — 2026-03-11

Supersedes the original spike recommendation (Option 00: keep MongoDB, Option 01a: evaluate Cube) documented in .claude/ongoing/archived/2026-03-13-snowflake-xoos-integration-spike/.


Context

XOOS (XO Operations Suite) is the performance dashboard used by CX leadership. As of 2026-03, its analytics data flows via reverse-ETL:

Snowflake Gold → Airflow DAG → MongoDB Atlas → XOOS frontend

Three structural problems emerged as the platform scaled:

  1. Maintenance cost grows with metric count. Every new KPI requires changes in three places: a dbt model, an Airflow DAG task, and a MongoDB schema update. Adding 8 Tier 1 KPIs (AHT, ATT, EPH, CPH, FCR, Hold%, CSAT, NPS) would multiply the DAG surface area.

  2. Two copies of data that can drift. MongoDB is a snapshot of Gold at the time of the last DAG run. If a dbt model is fixed, users see stale (wrong) data until the next batch. Gold is not actually "display-ready."

  3. Loading time bug resolved the original concern. The original spike (xo-data#539) recommended keeping MongoDB because of filter-query latency concerns. Those concerns were addressed:

  4. Missing MongoDB compound index (fixed by Rupert): collection scan reduced from 13k docs to 817 docs, 10ms → 3ms.
  5. Svelte {#key} + <svelte:boundary> pattern destroyed the component tree on every filter change (fixed by Yurii): 24s freeze at 5k points → 114ms.
  6. Performance is no longer the justification for MongoDB. Architecture is the question.

A latency test was run on 2026-03-11 against Snowflake Gold (WBP_DB_DEV.GOLD, XO_DEV_WH XS warehouse) to measure whether direct Snowflake queries were fast enough for interactive dashboards.

Key results (cached / result cache active):

Query Rows p95 Notes
RPT_EMAIL_DAILY 7d 27 388ms Main dashboard view
RPT_EMAIL_DAILY 30d 251 357ms Timeframe switch
Weekly rollup 90d 2,641 357ms GROUP BY aggregation
AI durations 30d 956 386ms AI report view
AGG_CONTACTS 30d 29,488 1,526ms Raw aggregate — not a dashboard query
FCT_CONTACTS 30d 36,403 2,504ms Fact table — not a dashboard query

Dashboard-shaped queries (RPT_ views, rollups, <3k rows) are under 400ms cached. Slow queries return 29–36k rows — this is a query design issue, not a Snowflake latency issue. XOOS should query purpose-built RPT_ views, not raw aggregates or fact tables.


Decision

XOOS will query Snowflake Gold directly via Node.js SDK. The Airflow reverse-ETL DAG and MongoDB Analytics collection will be decommissioned for analytics once the direct connection is validated.

The full architecture:

XOOS frontend → GET /api/analytics/metrics → CloudFlare edge cache
                                                   │ (cache miss)
                                           Snowflake Gold (RPT_ views)
                                           XO_XOOS_WH (XS, auto-suspend 60s)
                                           XO_XOOS_READER role
                                           xoos_svc service account

Caching layer: CloudFlare edge cache (already in front of both ECS and Vercel deployments). HTTP Cache-Control: public, s-maxage=3600, stale-while-revalidate=86400 on analytics API responses. Cache key = URL + query params. Cache rule deployed 2026-03-11 on xoos.xtendops.com.

Snowflake infrastructure: - Dedicated XO_XOOS_WH warehouse (XS, auto-suspend 60s). Nickname: "Zeus." - XO_XOOS_READER role with read-only access to Gold schemas. - xoos_svc service account with key-pair auth. - With 1-hour CloudFlare TTL, the warehouse only wakes on cache misses: estimated 30–60 min/day = $1.50–3/day.

Gold layer becomes truly display-ready. A new metric requires only a dbt model. XOOS queries are generic (read registry, query table). No DAG change, no MongoDB schema update.

Master layer (cross-client, out of scope for this epic): CORE_DB will eventually hold cross-client reporting built with Snowflake native semantic views (not dbt semantic layer, per CTO direction 2026-03-11). Per-client RPT_ views are built first; native semantic views layer on top for cross-client queries.


Consequences

Gets easier: - Adding a new KPI = add a dbt model. XOOS reads the metric registry and queries the appropriate view automatically. - Data freshness matches Gold layer refresh (dynamic tables). No separate Airflow schedule to manage. - Single source of truth: Gold IS the display layer. Drift between MongoDB and Snowflake is eliminated. - Cost is predictable: CloudFlare absorbs repeated requests, warehouse barely runs.

Gets harder / trade-offs accepted: - First warm query per URL is slow (4s for Q1, view compilation). Application-side caching eliminates this for end users — CloudFlare serves subsequent requests from cache. - Snowflake result cache is fragile (invalidated when dynamic tables refresh, limited to identical query text + same role within 24h). CloudFlare edge cache is the primary caching mechanism; Snowflake result cache is a bonus. - XOOS must serve analytics via GET endpoints (not POST remote functions) for CloudFlare caching to work. This requires refactoring analytics.remote.ts to a +server.ts GET handler. - Snowflake service account credentials must be managed in XOOS environment (key-pair auth, not password).

Decommissioned: - Airflow reverse-ETL DAG for analytics (once direct connection is validated and stable) - MongoDB Analytics collection (after one sprint overlap as fallback) - analytics.remote.ts and analytics.remote-helpers.ts in XOOS


Options Considered

Option A: Extend MongoDB Reverse-ETL (Status Quo)

Keep the existing Airflow DAG → MongoDB → XOOS pattern. Every new metric = DAG change + MongoDB schema + XOOS ingest. At 8 KPIs × N clients, this becomes unmaintainable.

Rejected because: Maintenance cost grows linearly with metric count. Two copies of data. Gold is never actually display-ready.

Option B: Direct Snowflake Connection (SELECTED)

XOOS queries Snowflake Gold directly. CloudFlare edge cache absorbs repeat requests. Dedicated XS warehouse with aggressive auto-suspend.

Selected because: Latency test confirmed <400ms for dashboard-shaped queries. Simplest long-term architecture. Eliminates dual-copy drift. Cost is controlled via CloudFlare + auto-suspend.

Option C: Cube (Self-Hosted or Cloud)

Semantic layer between Snowflake and XOOS. Pre-aggregation engine.

Rejected because: Cost ($580+/mo Cloud, significant DevOps for self-hosted) is not justified given that Snowflake latency is already acceptable for dashboard queries. Adds a second modeling layer on top of dbt. For cross-client reporting, Snowflake native semantic views are chosen instead (zero new infrastructure, no vendor lock-in on Cube's DSL).

Cost comparison: | Option | Monthly est. | |--------|-------------| | Cube Cloud (Starter) | ~$580+ | | Cube Cloud (Premium) | ~$1,160+ | | Cube Self-Hosted | ~$75 + DevOps time | | Selected: Snowflake + CloudFlare | ~$30–90 |

Option D: Snowflake + Redis Cache

Cache-aside pattern with Redis between Snowflake and XOOS.

Rejected because: CloudFlare edge cache already sits in front of both ECS and Vercel deployments. Redis would add a new dependency, new secrets, and a network hop on every cache check for the same functional outcome. CloudFlare solves the same problem with zero new infrastructure.

Snowflake Native Semantic Layer (for dashboard queries)

Evaluated during the original spike (xo-data#539).

Rejected for dashboard queries: No pre-aggregation for filter-heavy interactive queries. Same warehouse credits as querying Gold tables directly. Semantic views are governance/metric-consistency features, not caching layers.

Accepted for cross-client master layer: Native semantic views support cross-database refs (CORE_DB → WBP_DB.GOLD, CND_DB.GOLD, etc.) and can be managed via dbt_semantic_view package. Per-client RPT_ views are built first; semantic views layer on top for company-wide reporting in a future epic.


  • xo-data#539 — Original spike issue (closed by this decision)
  • xo-monorepo#6287 — Parent epic: Standardized Analytics Dashboard
  • xo-monorepo#6371 — Feature 1: KPI Registry and Client Metric Map
  • xo-monorepo#6446 — Loading time bug (resolved, informed this decision)
  • ADR 008 — Gold layer: fct_, dim_, agg_, rpt_
  • ADR 010 — Rosters in CORE_DB, glossaries in client DB