Skip to content

xo-core Package

xo-core v1.5.0 -- The foundation package of the XO-Data platform. Provides core utilities, data extractors, and managers for common data engineering operations.

Purpose

xo-core serves as the shared utility layer for all XO-Data packages and applications. It contains:

  • Extractors: Data source connectors (Gladly API, Sprout Social, Gmail, Google Sheets, S3)
  • Managers: Service wrappers (Snowflake, S3, Google Services)
  • Utilities: DataFrame operations, file handling, logging

Installation

# Install xo-core package
uv sync --package xo-core

# Add as dependency to another package
uv add --package xo-foundry xo-core

Package Structure

packages/xo-core/src/xo_core/
├── extractors/              # Data source connectors
│   ├── gladly_extractor.py # Gladly API reports
│   ├── sprout_extractor.py # Sprout Social API
│   ├── gmail_extractor.py  # Gmail attachments
│   ├── gsheets_extractor.py# Google Sheets
│   └── s3_extractor.py     # S3 files
├── google_services/         # Google API integrations
│   ├── auth.py             # OAuth2 authentication
│   ├── gmail_manager.py    # Gmail operations
│   ├── gdrive_manager.py   # Google Drive operations
│   └── gsheets_manager.py  # Google Sheets operations (gspread)
├── managers/                # Service wrappers
│   ├── snowflake_manager.py# Snowflake operations
│   └── s3_manager.py       # S3 operations
├── utils/                   # Shared utilities
│   ├── df_utils.py         # DataFrame operations
│   ├── file_utils.py       # File handling
│   └── logger.py           # Logging configuration
└── __init__.py

Key Components

Extractors

Data source connectors that extract data and upload to S3.

No pandas in extraction

xo-foundry extraction tasks (which wrap these extractors) use native Python csv.DictWriter -- never pandas. Pandas corrupts data via automatic type inference (IDs become floats, leading zeros stripped). Let Snowflake handle type conversion.

GladlyExtractor

Extract reports from Gladly API.

from xo_core.extractors.gladly_extractor import GladlyExtractor

extractor = GladlyExtractor(
    api_user="user@example.com",
    api_token="your-token",
    org="your-org"
)

# Extract conversations report
report_data = extractor.extract_report(
    report_type="ContactTimestampsReport",
    start_date="2026-01-15",
    end_date="2026-01-15"
)

SproutExtractor

Extract data from Sprout Social API.

from xo_core.extractors.sprout_extractor import SproutExtractor

extractor = SproutExtractor(
    api_token="your-token",
    customer_id="your-customer-id"
)

# Extract messages
messages = extractor.extract_messages(
    start_date="2026-01-15",
    end_date="2026-01-15"
)

GmailExtractor

Extract email attachments from Gmail.

from xo_core.extractors.gmail_extractor import GmailExtractor

extractor = GmailExtractor(credentials_path="credentials.json")

# Extract attachments from label
attachments = extractor.extract_attachments(
    label="Reports/Daily",
    start_date="2026-01-15"
)

GSheetsExtractor

Extract data from Google Sheets.

from xo_core.extractors.gsheets_extractor import GSheetsExtractor

extractor = GSheetsExtractor(credentials_path="credentials.json")

# Extract sheet data
df = extractor.extract_sheet(
    spreadsheet_id="abc123",
    sheet_name="Data",
    range="A1:Z1000"
)

S3Extractor

Extract files from S3 buckets.

from xo_core.extractors.s3_extractor import S3Extractor

extractor = S3Extractor()

# Extract and read CSV
df = extractor.extract_csv(
    bucket="xo-ingest",
    key="client/report/data.csv"
)

Managers

Service wrappers that provide high-level operations for external services.

SnowflakeManager

Modern Snowflake connector with DataFrame operations and deduplication.

from xo_core.snowflake_manager import SnowflakeManager

manager = SnowflakeManager()

# Upload DataFrame with deduplication
prepped_df = manager.prep_dataframe_for_table(
    df=raw_df,
    table_name="GLADLY_CONTACT_TIMESTAMPS",
    filter_existing=True
)

manager.upload_dataframe(
    prepped_df,
    table_name="GLADLY_CONTACT_TIMESTAMPS",
    database="WBP_DB",
    schema="BRONZE"
)

# Execute query
results = manager.execute_query("SELECT COUNT(*) FROM TABLE")

# Download table to DataFrame
df = manager.download_table("DATABASE.SCHEMA.TABLE")

Key Features:

  • Automatic deduplication (single-field or multi-field)
  • Type-safe DataFrame uploads
  • Connection pooling
  • Prepared statement support

S3Manager

S3 operations with DataFrame support.

from xo_core.s3_manager import S3Manager

manager = S3Manager()

# Upload DataFrame
manager.upload_dataframe(
    df,
    bucket="xo-ingest",
    key="client/report/2026-01-15/data.csv"
)

# Download to DataFrame
df = manager.download_to_dataframe(
    bucket="xo-ingest",
    key="client/report/2026-01-15/data.csv"
)

# List files
files = manager.list_files(bucket="xo-ingest", prefix="client/")

Google Services

GSheetsManager

Google Sheets operations via gspread library.

from xo_core.google_services.gsheets_manager import GSheetsManager

gsheets = GSheetsManager(credentials_path="credentials.json")

# Read spreadsheet
data = gsheets.read_sheet(spreadsheet_id="abc123", sheet_name="Data")

GmailManager

Gmail API operations.

from xo_core.google_services.gmail_manager import GmailManager

gmail = GmailManager(credentials_path="credentials.json")

# List messages
messages = gmail.list_messages(label="Reports", after_date="2026-01-15")

# Get attachments
attachments = gmail.get_attachments(message_id="abc123")

Utilities

df_utils

DataFrame cleaning and standardization utilities.

from xo_core.df_utils import (
    clean_column_names,
    clean_dataframe,
    datetime_columns_handler,
    convert_to_nullable_int_columns,
    generate_record_key
)

# Clean column names (UPPERCASE, replace special chars)
df.columns = clean_column_names(df.columns)

# Comprehensive DataFrame cleaning
df = clean_dataframe(
    df,
    uppercase_columns=True,
    datetime_columns=['CREATED_AT'],
    auto_detect_int_columns=True,
    replace_empty_strings=True
)

Key Functions:

Function Purpose
clean_column_names() Standardize column names (UPPERCASE, _)
clean_dataframe() All-in-one DataFrame cleaning
datetime_columns_handler() Convert to datetime with timezone
convert_to_nullable_int_columns() Convert to Int64 (allows nulls)
replace_empty_strings_with_na() Replace "" with NA
drop_empty_columns() Remove columns with all nulls
generate_record_key() Generate unique keys for deduplication

logger

Consistent logging configuration.

from xo_core.logger import get_module_logger

logger = get_module_logger(__name__)

logger.info("Processing started")
logger.warning("Missing optional field")
logger.error("Failed to connect", exc_info=True)

Type Safety

xo-core is fully typed and must pass ty with zero errors:

uv run ty check --project packages/xo-core
from typing import Any
import pandas as pd

def process_data(items: list[dict[str, str]]) -> pd.DataFrame:
    """Process data with proper typing."""
    ...

Testing

# Run xo-core tests
uv run pytest packages/xo-core/tests/

# Run specific test file
uv run pytest packages/xo-core/tests/test_df_utils.py

# Run with coverage
uv run pytest --cov=xo_core packages/xo-core/tests/

Dependencies

Key dependencies:

  • pandas>=2.0 -- DataFrame operations
  • snowflake-connector-python -- Snowflake connectivity
  • boto3 -- AWS S3 operations
  • google-auth -- Google API authentication
  • gspread -- Google Sheets API
  • requests -- HTTP client for APIs
  • tqdm -- Progress bars

Next Steps


Package Location: packages/xo-core/ Version: 1.5.0 Dependencies: See packages/xo-core/pyproject.toml