7. Random Forest in R programming language

Few terminologies in the Random Forest Algorithm

Before we start working with R, we need to understand a few different terminologies that are used in random forest algorithms, such as:

1. Variance – When there is a change in the training data algorithm, this is the measure of that change. 

2. Bagging – This is a variance-reducing method that trains the model based on random subsamples of training data. 

3. Out-of-bag (oob) error estimate – The random forest classifier is trained using bootstrap aggregation, where each new tree is fit from a bootstrap sample of the training observations. The out-of-bag (oob) error is the average error for each calculation using predictions from the trees that do not contain their respective bootstrap sample. This enables the random forest classifier to be adjusted and validated during training.

4. Information gain – Used to determine which feature/attribute gives us the maximum information about a class. It is based on the concept of entropy, which is the degree of uncertainty, impurity, or disorder. It aims to reduce the level of entropy, starting from the root node to the leaf nodes. 

The formula for entropy is as shown below:


Where p represents the probability, and E(S) represents the entropy.

5. Gini index: The Gini index, or Gini impurity, measures the degree of probability of a particular variable being incorrectly classified when it is chosen randomly. The degree of the Gini index varies between zero and one, where zero denotes that all elements belong to a certain class or only one class exists, and one denotes that the elements are randomly distributed across various classes. A Gini index of 0.5 denotes equally distributed elements into some classes.

The Gini index formula is shown below:


Where pi is the probability of an object being classified to a particular class.

Let’s now look at how we can implement the random forest algorithm.

Leave a Reply

Your email address will not be published. Required fields are marked *