Amazon S3 - Daft Cloud

Configuration

Field	Required	Description
`bucket`	Yes	S3 bucket name (3-63 characters, lowercase)
`paths`	Yes	Array of S3 object paths (supports glob patterns)
`format`	No	File format: `parquet`, `json`, `csv`, or `file`
`region`	No	AWS region (e.g., `us-east-1`)
`endpoint`	No	Custom endpoint for S3-compatible services
`secret_name`	No	Reference to a project secret containing AWS credentials

Credentials

Store your AWS credentials as a project secret in the following JSON format:

{
  "aws_access_key_id": "AKIA...",
  "aws_secret_access_key": "..."
}

If no secret is specified, Daft Cloud will attempt to use environment credentials.

Glob Pattern Support

Paths support standard glob patterns:

Pattern	Description
`*`	Matches any characters except `/`
`?`	Matches any single character
`[...]`	Matches any character in the brackets
`**`	Matches any path segment

Examples:

data/*.parquet - All parquet files in the data folder
logs/2024/**/*.json - All JSON files in any 2024 subdirectory
images/batch_[0-9].png - Specific numbered batch files

Example

import daft

def process_s3_data(sales_data: daft.DataFrame):
    """Process data from an S3 data source."""
    return sales_data.select("product_id", "revenue").to_pydict()

File Formats

Format	Extension	Description
Parquet	`.parquet`	Columnar format, best for analytics
JSON	`.json`	JSON files, one object per line (JSONL)
CSV	`.csv`	Comma-separated values
File	Any	Binary files (images, PDFs, audio, video, etc.)

The format is automatically detected from file extensions, or you can specify it explicitly.

​Configuration

​Credentials

​Glob Pattern Support

​Example

​File Formats

Configuration

Credentials

Glob Pattern Support

Example

File Formats