3. R programming language in Data science

R language Decision Trees

A decision tree is a tree-shaped algorithm used to determine a course of action. Each branch of the tree represents a possible decision, occurrence, or reaction.


Root node: Represents the entire population or sample, and this further gets divided into two or more homogeneous sets.

Splitting: The process of dividing a node into two or more sub-nodes.

Decision node: When a sub-node splits into further sub-nodes, then it is called a decision node.

Leaf/terminal Node: Nodes with no children (no further split) are called a leaf or terminal nodes.

Pruning: When we reduce the size of decision trees through node reduction (opposite of splitting), the process is called pruning.

Branch/sub-tree: A subsection of the decision tree is called a branch or sub-tree.

Parent and child node: A node, which is divided into sub-nodes, is called a parent node of sub-nodes, whereas sub-nodes are the child of parent nodes.

There are two more important concepts that you should know before implementing a decision tree algorithm: entropy and information gain.


Entropy is the measure of randomness or impurity in the dataset.

Information gain is the measure of the decrease in entropy after the dataset is split. It is also known as entropy reduction.


Leave a Reply

Your email address will not be published. Required fields are marked *