Two ways to use Rime
Rime has two complementary ways to describe data work:
- Core nodes for common table operations using Rime’s built-in YAML grammar.
- Script nodes for custom logic in Python, R, JavaScript, or SQL.
Both live in the same pipeline.dag.yaml. That file declares where data comes from, how each transform depends on earlier outputs, and what each step returns. When you run it, Rime validates the graph, computes cache keys, executes only what changed, writes outputs, and records logs.
Core nodes
Section titled “Core nodes”Core nodes are the quickest way to express routine data shaping without choosing a programming language. They cover the operations most analyses repeat: load a source, filter rows, derive columns, aggregate, join, pivot, select, sort, concatenate, and run statistical checks.
The syntax is intentionally close to spreadsheet formulas. Column references are wrapped in brackets:
specification_version: "2.1"
nodes: - id: line_items kind: source path: data/orders.csv
- id: paid_items kind: filter inputs: [line_items] expr: "[status] == 'paid'"
- id: revenue_items kind: derive inputs: [paid_items] as: revenue expr: "[unit_price] * [quantity]"
- id: review_columns kind: select inputs: [revenue_items] columns: [order_id, store_id, revenue]This is language agnostic. You do not need pandas, dplyr, JavaScript arrays, or SQL just to say “keep paid rows,” “make these new columns,” or “carry forward these fields.” The YAML is the contract, and the runtime decides how to execute it.
What to expect when you make this file:
sourcenodes read files into named tables.- Core transform nodes point at earlier node IDs with
inputs: [...]. - Formula strings operate on columns from the input table.
- Each node emits a named output that downstream nodes can reference.
rime validate pipeline.dag.yamlcatches schema and graph mistakes before a run.rime run pipeline.dag.yamlwrites node outputs underoutputs/and reuses cached work when inputs and node definitions have not changed.
Core nodes are usually the right first move when the transformation can be described as table algebra. They keep the pipeline readable for reviewers who may not know the language you would otherwise have written.
Script nodes
Section titled “Script nodes”Script nodes are for everything that needs a real language: API clients, custom parsing, model inference, specialized statistical packages, unusual joins, plotting helpers, or project-specific business logic.
In a script node, the YAML still declares the node and its dependencies, but the implementation lives in a script file:
specification_version: "2.1"
nodes: - id: customers kind: source path: data/customers.csv
- id: scored_customers kind: python source: scripts/score_customers.py in: customers: customers out: scored: tableThe Python file exports the function Rime should run:
import pandas as pd
def run(customers: pd.DataFrame) -> pd.DataFrame: scored = customers.copy() scored["risk_score"] = ( scored["late_payments"] * 2 + scored["open_balance"] / 1000 ) return scoredRime interprets that script as a node function. The engine reads the declared inputs, materializes them in the native shape for the language, calls the function, captures the return value, validates the declared outputs, and makes those outputs available to the rest of the DAG.
The same idea applies across languages:
- SQL nodes run in DuckDB with upstream tables available as named views.
- Python nodes receive pandas DataFrames and return DataFrames or declared objects.
- R nodes receive data frames or tibbles and return data frames.
- JavaScript nodes use Rime’s
defineNodehelper to declare inputs, outputs, and arunfunction.
Script nodes keep custom code focused on computation. You should not need to write intermediate Parquet files, invent file naming conventions, pass paths through environment variables, or manually wire logs into downstream steps. That work belongs to the Rime runtime.
Mixing both
Section titled “Mixing both”Most useful pipelines mix the two styles. Start with core nodes for the readable, reviewable table operations. Add script nodes where a language earns its place.
specification_version: "2.1"
nodes: - id: orders kind: source path: data/orders.csv
- id: clean_orders kind: filter inputs: [orders] expr: "[status] != 'cancelled'"
- id: model_inputs kind: select inputs: [clean_orders] columns: [customer_id, ordered_at, subtotal, discount, region]
- id: predictions kind: python source: scripts/predict_churn.py in: orders: model_inputs out: scored_customers: tableThat is the main Rime habit: declare the data flow in one file, write custom functions only where they clarify the work, and let the runtime handle the execution machinery.