Skip to main content

What are Catalogs?

Catalogs are connections to external data catalog systems. Once configured, they can be automatically injected into your functions, giving you access to tables and metadata.

Quick Start

1. Create a Catalog

In the Daft Cloud dashboard:
  1. Navigate to Catalogs in your project sidebar
  2. Click Create catalog
  3. Select your catalog type
  4. Enter a name like analytics_db
  5. Configure the connection credentials
  6. Click Create

2. Use It in Your Code

Reference the catalog using a type annotation:
import daft

def analyze_users(analytics_db: daft.Catalog):
    """
    The `analytics_db` parameter will be automatically injected
    with the configured catalog connection.
    """
    # Load a table from the catalog
    users = analytics_db.load_table("users")

    # Query and process
    active_users = users.where(daft.col("status") == "active")

    return active_users.to_pydict()

3. Create a Run

When creating a run in the dashboard, map your function parameters to catalogs using keyword arguments:
  1. Select the Function entrypoint type
  2. Enter your file path and function name (e.g., my_script.py:analyze_users)
  3. In the Keyword Arguments section, add an argument where:
    • The key matches your function parameter name (e.g., analytics_db)
    • The value is your configured catalog name
  4. Click Create

How Injection Works

Daft Cloud uses type annotations to inject catalogs:
  1. Annotate a parameter with daft.Catalog
  2. Map the parameter to a catalog when creating a run
  3. At runtime, the system:
    • Fetches the catalog configuration
    • Loads credentials from your project secrets
    • Creates a connected Daft Catalog object
    • Passes it to your function
No special decorators or syntax required—just type annotations.

Combining Data Sources and Catalogs

You can use data sources and catalogs together:
import daft

def enrich_data(
    raw_events: daft.DataFrame,      # From S3 data source
    product_catalog: daft.Catalog,   # From database catalog
):
    """Join streaming data with reference tables."""
    # Load reference table from catalog
    products = product_catalog.load_table("products")

    # Join with raw event data
    enriched = raw_events.join(products, on="product_id")

    return enriched.to_pydict()
When creating the run, map each parameter to its corresponding resource in the Arguments section.

Supported Catalogs


Catalog Naming

Catalog names must follow these rules:
  • Use only lowercase letters, numbers, and underscores
  • Start with a letter or number
  • Maximum 255 characters
Examples: production_warehouse, analytics_db, unity_prod

Security

All catalog credentials are stored as project secrets:
  • Secrets are encrypted at rest using AWS KMS
  • Connection strings and tokens are never exposed in the UI after creation
  • Each project has isolated secret storage