Skip to content

filter

A filter node is a named row gate. The schema stays the same; only the set of rows changes.

Good filter nodes read like cohort decisions: adults only, visits after baseline, active accounts, non-null outcomes. If the expression needs a paragraph to explain it, split the logic into an upstream derive with a readable feature name.

FieldRequiredNotes
inputsyesExactly one upstream table.
expryesBoolean expression evaluated per row. Truthy rows are kept.
metadata.labelnoUse a readable label such as “Keep visits after baseline”; the expression itself is usually too terse for reviewers.
  • Write a boolean expression such as [age] >= 18 or [status] == "active".
  • Use bracketed column refs and plain literals. Row-level functions like coalesce([score], 0) are fine.
  • Do not hide aggregations inside a filter. Build summaries with aggregate, then filter the summarized table.

The important review question is row loss. Compare input rows to output rows and make sure a zero-row result is intentional.

Expression parse or evaluation errors fail the node and downstream dependents. The best UI and report copy should point at the expression, not the whole DAG.

- id: adults
kind: filter
inputs: [patients] # length 1
expr: "[age] >= 18"