Quick start
This page runs the smallest complete Rime project: one CSV, one
pipeline.dag.yaml, one command. It uses only core nodes, so no Python or R
interpreter setup is required.
For a slower tutorial that creates every file from scratch, see the first pipeline workshop.
Clone an Example
Section titled “Clone an Example”git clone https://github.com/danielsjoo/rimecd rimeThe example lives at examples/single-file/:
examples/single-file/├── data/│ └── penguins.csv└── pipeline.dag.yamlRead the DAG
Section titled “Read the DAG”specification_version: "2.1"
nodes: - id: penguins kind: source path: data/penguins.csv
- id: adelie_only kind: filter inputs: [penguins] expr: '[species] == "Adelie"'
- id: by_island kind: aggregate inputs: [adelie_only] groupBy: ["[island]"] metrics: - "[mean_bill_length] = [bill_length_mm].mean()" - "[mean_flipper_length] = [flipper_length_mm].mean()" - "[n] = [bill_length_mm].count()"What each node does:
| Node | Kind | Result |
|---|---|---|
penguins | source | Load data/penguins.csv. |
adelie_only | filter | Keep rows where species is Adelie. |
by_island | aggregate | Summarize the Adelie rows by island. |
All relative paths resolve from the directory that contains the DAG file.
Validate
Section titled “Validate”rime validate examples/single-file/pipeline.dag.yamlValidation checks YAML syntax, node schemas, input references, graph cycles, and source paths. It does not execute any nodes.
rime run examples/single-file/pipeline.dag.yamlRime writes artifacts next to the DAG:
examples/single-file/outputs/├── manifest.json├── penguins/│ └── default.parquet├── adelie_only/│ └── default.parquet└── by_island/ └── default.parquetThe final table has three rows:
| island | mean_bill_length | mean_flipper_length | n |
|---|---|---|---|
| Biscoe | 40.30 | 195.00 | 1 |
| Dream | 39.15 | 180.00 | 2 |
| Torgersen | 39.30 | 183.50 | 2 |
Build the HTML Report
Section titled “Build the HTML Report”rime build examples/single-file/pipeline.dag.yamlrime build runs the DAG and writes:
examples/single-file/outputs/run_report.htmlOpen that file to inspect node status, cache state, output sizes, schemas, preview rows, and the final aggregate output.
Hide Noisy Nodes
Section titled “Hide Noisy Nodes”Reports include every node by default. Add metadata.report: false to source
or staging nodes that should stay out of the HTML report:
- id: penguins kind: source path: data/penguins.csv metadata: report: falseAdd a Language Node
Section titled “Add a Language Node”Use a language node when a built-in node does not fit. Language nodes use a
named in: map instead of positional inputs:.
params: min_mass: { type: float, default: 3500 }
nodes: - id: heavy_penguins kind: python source: scripts/heavy.py in: penguins: penguins min_mass: params.min_massdef run(penguins, min_mass): return penguins[penguins["body_mass_g"] >= float(min_mass)]Then run with a specific interpreter or a param override:
rime run pipeline.dag.yaml --python-bin .venv/bin/pythonrime run pipeline.dag.yaml --param min_mass=3800Common Commands
Section titled “Common Commands”| Command | Use it for |
|---|---|
rime validate pipeline.dag.yaml | Check schema and graph integrity before running. |
rime run pipeline.dag.yaml | Execute the DAG and persist artifacts. |
rime build pipeline.dag.yaml | Execute the DAG and render HTML. |
rime run pipeline.dag.yaml --no-cache-read | Force recompute while still writing fresh cache. |
rime run pipeline.dag.yaml --lean | Recompute without reading or writing cache artifacts. |
- Workshop: build a first pipeline
- Concepts: DAG specification
- Concepts: Polyglot runtime
- Node Reference — every built-in operator