Catalogs - Daft Cloud

What are Catalogs?

Catalogs are connections to external data catalog systems. Once configured, they can be automatically injected into your functions, giving you access to tables and metadata.

Quick Start

1. Create a Catalog

In the Daft Cloud dashboard:

Navigate to Catalogs in your project sidebar
Click Create catalog
Select your catalog type
Enter a name like analytics_db
Configure the connection credentials
Click Create

2. Use It in Your Code

Reference the catalog using a type annotation:

import daft

def analyze_users(analytics_db: daft.Catalog):
    """
    The `analytics_db` parameter will be automatically injected
    with the configured catalog connection.
    """
    # Load a table from the catalog
    users = analytics_db.load_table("users")

    # Query and process
    active_users = users.where(daft.col("status") == "active")

    return active_users.to_pydict()

3. Create a Run

When creating a run in the dashboard, map your function parameters to catalogs using keyword arguments:

Select the Function entrypoint type
Enter your file path and function name (e.g., my_script.py:analyze_users)
In the Keyword Arguments section, add an argument where:
- The key matches your function parameter name (e.g., analytics_db)
- The value is your configured catalog name
Click Create

How Injection Works

Daft Cloud uses type annotations to inject catalogs:

Annotate a parameter with daft.Catalog
Map the parameter to a catalog when creating a run
At runtime, the system:
- Fetches the catalog configuration
- Loads credentials from your project secrets
- Creates a connected Daft Catalog object
- Passes it to your function

No special decorators or syntax required—just type annotations.

Combining Data Sources and Catalogs

You can use data sources and catalogs together:

import daft

def enrich_data(
    raw_events: daft.DataFrame,      # From S3 data source
    product_catalog: daft.Catalog,   # From database catalog
):
    """Join streaming data with reference tables."""
    # Load reference table from catalog
    products = product_catalog.load_table("products")

    # Join with raw event data
    enriched = raw_events.join(products, on="product_id")

    return enriched.to_pydict()

When creating the run, map each parameter to its corresponding resource in the Arguments section.

Supported Catalogs

Supabase Database

Unity Catalog

Catalog Naming

Catalog names must follow these rules:

Use only lowercase letters, numbers, and underscores
Start with a letter or number
Maximum 255 characters

Examples: production_warehouse, analytics_db, unity_prod

Security

All catalog credentials are stored as project secrets:

Secrets are encrypted at rest using AWS KMS
Connection strings and tokens are never exposed in the UI after creation
Each project has isolated secret storage

​What are Catalogs?

​Quick Start

​1. Create a Catalog

​2. Use It in Your Code

​3. Create a Run

​How Injection Works

​Combining Data Sources and Catalogs

​Supported Catalogs

​Catalog Naming

​Security