Skip to content

derive

derive adds one computed column. It is the right node when a feature deserves a name and should be inspectable in the DAG.

Prefer a chain of small derives over one opaque script when each intermediate feature is useful for review. Use a language node when the computation needs loops, model code, external packages, or multiple output columns at once.

FieldRequiredNotes
inputsyesExactly one upstream table.
asyesIdentifier-shaped output column name.
expryesExpression compiled to a Polars expression and aliased to as.
  • expr is compiled to Polars and aliased to as, so the expression itself does not need an assignment.
  • Use coalesce() when nulls should become a default value instead of following native null behavior.
  • The new as column cannot collide with an existing column. Drop or rename first if you mean to replace something.

The preview should make the new column easy to find. For numeric features, a distribution/profile is usually more useful than a long row sample.

default is the input table plus the new column.

- id: lab_load
kind: derive
inputs: [patient_lab] # length 1
as: lab_load # new column name
expr: "[crp_mean] * [ldl_max] / 1000.0"