Before understanding how a random forest algorithm works, first, let’s learn more about how a decision tree works with the following example:
Suppose you want to predict whether a person will buy a phone or not based on the phone’s features. For that, you can build a simple decision tree.
In this decision tree, the parent/root node and the internal nodes represent the phone’s features, while the leaf nodes are the outputs. The edges represent the connections between the nodes based on the values from the features. Based on the price, RAM, and internal storage, consumers can decide whether they want to purchase the phone. The problem with this decision tree is that you only have limited information, which may not always provide accurate results.
Using a random forest model will improve your results, as it provides diversity into building the model with several different features.
We have created three different decision trees to build a random forest model.
Now, suppose a new phone is launched with specific features, and you want to decide whether to buy that phone or not.
Let’s transfer this data to our random forest model and confirm the model’s output.
The first two trees predict the phone purchase, and the third decision tree suggests the disadvantages of making this purchase. Therefore, our model predicts that you should buy the newly launched phone.