Decision Tree Lab

Preview

Summary

Quick plot

Fit / Load

When to use this: Categorical targets with clear if/then rules.

Preset

Presets are quick starting points; you can still tweak knobs.

Max depth

Sets the maximum number of question layers. Deeper trees fit tighter but can overfit.

Why this matters: Max depth

Deep trees can memorize quirks in the training data.
Shallow trees may miss real patterns.
Use this to show overfitting vs underfitting.

Min samples

Smallest group size allowed to split. Bigger values make the tree steadier.

Why this matters: Min samples

Small groups are noisy and unstable.
Larger minimums create steadier splits.
Too large can hide useful structure.

Numeric bins

Turns numbers into ranges so ID3 can ask questions. More bins means more possible splits.

Why this matters: Bins

More bins create more split options.
Too many bins can chase noise.
Too few bins can blur real differences.

Load model (.rds)

Browse...

Download model

Tree

Metrics

Confusion matrix

After you click Fit

Start with Findings for the main story.
Use the tree to see the exact questions.
Check accuracy and the confusion matrix.

Findings

What the model saw

Notes

ID3 expects a categorical target. Numeric predictors are binned into quantiles.

Fit / Load

When to use this: Numeric targets you want to predict.

Preset

Presets are quick starting points; you can still tweak knobs.

Max depth

Sets the maximum number of question layers. Deeper trees fit tighter but can overfit.

Why this matters: Max depth

Deeper trees fit training data more closely.
Shallow trees may miss real patterns.
Use this to show underfit vs overfit.

Min split

Smallest group size allowed to split. Bigger values avoid noisy splits.

Why this matters: Min split

Small groups are noisy and unstable.
Larger minimums create steadier splits.
Too large can hide useful structure.

Complexity (cp)

Prunes weak splits to keep the tree simpler. Higher values prune more.

Why this matters: Complexity (cp)

Higher values prune weak splits.
Lower values allow more detail.
Tuning cp balances fit vs simplicity.

Load model (.rds)

Browse...

Download model

Tree plot

Metrics

Variable importance

After you click Fit

Read Findings for the headline result.
Use the tree to see split points.
Check RMSE and R^2 to judge fit.

Findings

What the model saw

Fit / Load

When to use this: Categorical targets with numeric predictors.

Preset

Presets are quick starting points; you can still tweak knobs.

Split criterion

How we score a split. Gini and information are two common choices.

Max depth

Sets the maximum number of question layers. Deeper trees fit tighter but can overfit.

Why this matters: Max depth

Deep trees can memorize quirks in the training data.
Shallow trees may miss real patterns.
Use this to show overfitting vs underfitting.

Min split

Smallest group size allowed to split. Bigger values avoid noisy splits.

Why this matters: Min split

Small groups are noisy and unstable.
Larger minimums create steadier splits.
Too large can hide useful structure.

Complexity (cp)

Prunes weak splits to keep the tree simpler. Higher values prune more.

Why this matters: Complexity (cp)

Higher values prune weak splits.
Lower values allow more detail.
Tuning cp balances fit vs simplicity.

Load model (.rds)

Browse...

Download model

Tree plot

Metrics

Confusion matrix

After you click Fit

Read Findings for the headline result.
Use the tree to see the split rules.
Check accuracy and the confusion matrix.

Findings

What the model saw

Notes

This module fits a CART classification tree (rpart). Use it for categorical targets.

A decision tree is a flowchart of simple questions.
Each split asks about a feature.
Each leaf gives a final prediction.
Classification predicts a category (like Yes/No).
Regression predicts a number (like price).
Train/test split keeps some data aside for checking.
The seed locks randomness so results repeat.
Steps: choose data, pick a target column, fit a model.
Compare metrics, inspect the tree, and view the quick plot.