Configuration
| Field | Required | Description |
|---|---|---|
bucket | Yes | S3 bucket name (3-63 characters, lowercase) |
paths | Yes | Array of S3 object paths (supports glob patterns) |
format | No | File format: parquet, json, csv, or file |
region | No | AWS region (e.g., us-east-1) |
endpoint | No | Custom endpoint for S3-compatible services |
secret_name | No | Reference to a project secret containing AWS credentials |
Credentials
Store your AWS credentials as a project secret in the following JSON format:Glob Pattern Support
Paths support standard glob patterns:| Pattern | Description |
|---|---|
* | Matches any characters except / |
? | Matches any single character |
[...] | Matches any character in the brackets |
** | Matches any path segment |
data/*.parquet- All parquet files in the data folderlogs/2024/**/*.json- All JSON files in any 2024 subdirectoryimages/batch_[0-9].png- Specific numbered batch files
Example
File Formats
| Format | Extension | Description |
|---|---|---|
| Parquet | .parquet | Columnar format, best for analytics |
| JSON | .json | JSON files, one object per line (JSONL) |
| CSV | .csv | Comma-separated values |
| File | Any | Binary files (images, PDFs, audio, video, etc.) |