Data Sources - Daft Cloud

What are Data Sources?

Data sources are named connections to external data storage systems. Once configured, they can be automatically injected into your functions as Daft DataFrames.

Quick Start

1. Create a Data Source

In the Daft Cloud dashboard:

Navigate to Data Sources in your project sidebar
Click Create data source
Select your source type
Enter a name like sales_data
Configure the connection (bucket, paths, credentials)
Click Create

2. Use It in Your Code

Reference the data source using a type annotation:

import daft

def process_sales(sales_data: daft.DataFrame):
    """
    The `sales_data` parameter will be automatically injected
    with the configured data source.
    """
    # sales_data is already a Daft DataFrame pointing to your data
    result = sales_data.select("product_id", "revenue", "quantity")

    # Process the data
    summary = result.groupby("product_id").agg(
        daft.col("revenue").sum(),
        daft.col("quantity").sum(),
    )

    return summary.to_pydict()

3. Create a Run

When creating a run in the dashboard, map your function parameters to data sources using keyword arguments:

Select the Function entrypoint type
Enter your file path and function name (e.g., my_script.py:process_sales)
In the Keyword Arguments section, add an argument where:
- The key matches your function parameter name (e.g., sales_data)
- The value is your configured data source name
Click Create

How Injection Works

Daft Cloud uses type annotations to inject data sources:

Annotate a parameter with daft.DataFrame
Map the parameter to a data source when creating a run
At runtime, the system:
- Fetches the data source configuration
- Loads credentials from your project secrets
- Creates a Daft DataFrame pointing to your data
- Passes it to your function

No special decorators or syntax required—just type annotations.

Multiple Data Sources

You can inject multiple data sources into a single function:

import daft

def combine_data(
    orders: daft.DataFrame,
    products: daft.DataFrame,
    customers: daft.DataFrame,
):
    """Combine data from multiple sources."""
    result = orders.join(products, on="product_id")
    result = result.join(customers, on="customer_id")
    return result.to_pydict()

When creating the run, map each parameter to its corresponding data source in the Arguments section.

Supported Data Sources

Amazon S3

Supabase Storage

​What are Data Sources?

​Quick Start

​1. Create a Data Source

​2. Use It in Your Code

​3. Create a Run

​How Injection Works

​Multiple Data Sources

​Supported Data Sources