Decision Tree Lab

How much data we keep aside for checking.
Locks randomness so results repeat.
Notes: ID3 is classification-only (categorical target). Regression Tree requires numeric target.

Preview

Summary


            

Quick plot

Fit / Load

When to use this: Categorical targets with clear if/then rules.
Presets are quick starting points; you can still tweak knobs.
Sets the maximum number of question layers. Deeper trees fit tighter but can overfit.
Why this matters: Max depth
  • Deep trees can memorize quirks in the training data.
  • Shallow trees may miss real patterns.
  • Use this to show overfitting vs underfitting.
Smallest group size allowed to split. Bigger values make the tree steadier.
Why this matters: Min samples
  • Small groups are noisy and unstable.
  • Larger minimums create steadier splits.
  • Too large can hide useful structure.
Turns numbers into ranges so ID3 can ask questions. More bins means more possible splits.
Why this matters: Bins
  • More bins create more split options.
  • Too many bins can chase noise.
  • Too few bins can blur real differences.

Download model

Tree

Metrics

Confusion matrix


After you click Fit
  • Start with Findings for the main story.
  • Use the tree to see the exact questions.
  • Check accuracy and the confusion matrix.

Findings


What the model saw

Notes

ID3 expects a categorical target. Numeric predictors are binned into quantiles.

Fit / Load

When to use this: Numeric targets you want to predict.
Presets are quick starting points; you can still tweak knobs.
Sets the maximum number of question layers. Deeper trees fit tighter but can overfit.
Why this matters: Max depth
  • Deeper trees fit training data more closely.
  • Shallow trees may miss real patterns.
  • Use this to show underfit vs overfit.
Smallest group size allowed to split. Bigger values avoid noisy splits.
Why this matters: Min split
  • Small groups are noisy and unstable.
  • Larger minimums create steadier splits.
  • Too large can hide useful structure.
Prunes weak splits to keep the tree simpler. Higher values prune more.
Why this matters: Complexity (cp)
  • Higher values prune weak splits.
  • Lower values allow more detail.
  • Tuning cp balances fit vs simplicity.

Download model

Tree plot

Metrics

Variable importance


After you click Fit
  • Read Findings for the headline result.
  • Use the tree to see split points.
  • Check RMSE and R^2 to judge fit.

Findings


What the model saw

Fit / Load

When to use this: Categorical targets with numeric predictors.
Presets are quick starting points; you can still tweak knobs.
How we score a split. Gini and information are two common choices.
Sets the maximum number of question layers. Deeper trees fit tighter but can overfit.
Why this matters: Max depth
  • Deep trees can memorize quirks in the training data.
  • Shallow trees may miss real patterns.
  • Use this to show overfitting vs underfitting.
Smallest group size allowed to split. Bigger values avoid noisy splits.
Why this matters: Min split
  • Small groups are noisy and unstable.
  • Larger minimums create steadier splits.
  • Too large can hide useful structure.
Prunes weak splits to keep the tree simpler. Higher values prune more.
Why this matters: Complexity (cp)
  • Higher values prune weak splits.
  • Lower values allow more detail.
  • Tuning cp balances fit vs simplicity.

Download model

Tree plot

Metrics

Confusion matrix


After you click Fit
  • Read Findings for the headline result.
  • Use the tree to see the split rules.
  • Check accuracy and the confusion matrix.

Findings


What the model saw

Notes

This module fits a CART classification tree (rpart). Use it for categorical targets.

A decision tree is a flowchart of simple questions.
Each split asks about a feature.
Each leaf gives a final prediction.
Classification predicts a category (like Yes/No).
Regression predicts a number (like price).
Train/test split keeps some data aside for checking.
The seed locks randomness so results repeat.
Steps: choose data, pick a target column, fit a model.
Compare metrics, inspect the tree, and view the quick plot.