DuckDB single source
A SQL node with no in: slots can be an ingress node. DuckDB reads files or
URLs directly, and the query result becomes the node output. This is the
lightest Rime setup: no Python, no R, no JavaScript, and often no separate
source node.
The checked-in examples/sql-source-duckdb fixture shows the older v1 shape.
The v2.1 pattern below is the same idea expressed with current kind: sql
syntax.
What it Teaches
Section titled “What it Teaches”- Ingress SQL reads external data with DuckDB functions.
- Downstream SQL nodes read upstream tables through named slots.
- Pure-SQL DAGs still get Rime caching, artifacts, validation, and reports.
Minimal v2.1 DAG
Section titled “Minimal v2.1 DAG”specification_version: "2.1"
nodes: - id: cars kind: sql source: queries/cars.sql
- id: by_origin kind: sql in: cars: cars source: queries/by_origin.sqlqueries/cars.sql:
SELECT CAST(Name AS VARCHAR) AS name, CAST(Miles_per_Gallon AS DOUBLE) AS mpg, CAST(Cylinders AS INTEGER) AS cylinders, CAST(Origin AS VARCHAR) AS originFROM read_json_auto( 'https://raw.githubusercontent.com/vega/vega-datasets/main/data/cars.json')WHERE Miles_per_Gallon IS NOT NULLqueries/by_origin.sql:
SELECT origin, COUNT(*) AS n, AVG(mpg) AS mean_mpg, MIN(mpg) AS min_mpg, MAX(mpg) AS max_mpgFROM carsGROUP BY originORDER BY mean_mpg DESCThe cars table name in by_origin.sql comes from the YAML slot key:
in: cars: carsUse cases
Section titled “Use cases”- Ad-hoc warehouse reports without leaving SQL
- Quick exploration over Parquet / CSV files
- CI pipelines where Python/R interpreters add deployment overhead
- Pre-shaping large files before a smaller Python/R analysis node
Run It
Section titled “Run It”Copy the snippets above into a new folder:
sql-only/├── pipeline.dag.yaml└── queries/ ├── cars.sql └── by_origin.sqlThen run:
rime run pipeline.dag.yamlrime build pipeline.dag.yamlIf you are running the older checked-in fixture as-is, use the migration guide or migrate it to the v2.1 shape above first.
- DuckDB is bundled with the Rime runtime.
- SQL nodes have one default output. Use multiple SQL nodes for multiple publishable intermediate tables.
- Ingress SQL works well for Parquet, CSV, JSON, and remote files supported by DuckDB extensions.