
SVM is a type of classification algorithm that classifies data based on its features. An SVM will classify any new element into one of the two classes.

Once you give it some inputs, the algorithm will segregate and classify the data and then create the outputs. When you ingest more new data (an unknown fruit variable in this example), the algorithm will correctly classify the fruit: e.g., “apple” versus “orange”.
Understanding the SVM
The following are some examples to understand SVM in detail:
1st example: Linear SVM classification problem with a 2D data set
The goal of this example is to classify cricket players into batsmen or bowlers using the runs-to-wicket ratio. A player with more runs would be considered a batsman and a player with more wickets would be considered a bowler.
If you take a data set of cricket players with runs and wickets in columns next to their names, you could create a two-dimensional plot showing a clear separation between bowlers and batsmen. Here we present a data set with clear segregation between bowlers versus batsmen to help understand SVM.

Before separating anything using high-level mathematics, let’s look at an unknown value, which is new data being introduced into the dataset without a predesignated classification.

The next step is to draw a decision boundary, or a line separating the two classes to help classify the new data points.

You can actually draw several boundaries, as shown above. Then, you need to find the line of best fit that clearly separates those two groups. The correct line will help you classify the new data point.

You can find the best line by computing the maximum margin from equidistant support vectors. Support vectors in this context simply mean the two points — one from each class that are closest together, but that maximize the distance between them or the margin.

Note: You may think that the word vector refers to data points. While this may be the case in two-dimensional or three-dimensional spaces, once you get into higher dimensions with more features in your data set, you need to look at these as vectors. The reason they are support vectors is that the two vectors closest together maximize the distance between the two groups supporting the algorithm.
There are a couple of points at the top that are pretty close to one another, and similarly at the bottom of the graph. Shown below are the points that you need to consider. The rest of the points are too far away. The bowler points to the right and the batsman points to the left.

Mathematically, you can calculate the distance among all of these points and minimize that distance. Once you pick the support vectors, draw a dividing line, and then measure the distance from each support vector to the line. The best line will always have the greatest margin or distance between the support vectors.

For instance, if you consider the yellow line as a decision boundary, the player with the new data point is the bowler. But, as the margins don’t appear to be maximum, you can come up with a better line.

Use other support vectors, draw the decision boundary between those, and then calculate the margin. Notice now that the unknown data point would be considered a batsman.

Continue doing this until you find the correct decision boundary with the greatest margin.

If you look at the green decision boundary, the line appears to have a maximum margin compared to the other two. This the boundary of greatest margin and when you classify your unknown data value, you can see that it clearly belongs to the batsman’s class. The green line divides the data perfectly because it has the maximum margin between the support vectors. At this point, you can be confident with the classification — the new data point is indeed a batsman.

Technically, this dividing line is called a hyperplane. In two-dimensional spaces, we typically refer to the dividing lines as “lines,” but in three-dimensional and higher dimensions, they’re considered “planes” or ”hyperplanes.” Technically, they are all hyperplanes.

The hyperplane with the maximum distance from the support vectors is the one you want. Sometimes called the positive hyperplane (D+), it is the shortest distance to the closest positive point and (D-), or the negative hyperplane, which is the shortest distance to the closest negative point.

The sum of (D+) and (D-) is called the distance margin. You should always try to maximize the distance margin to avoid misclassification. For instance, you can see the yellow margin is much smaller than the green margin.

This problem set is two-dimensional because the classification is only between two classes. It is called a linear SVM.

2nd example: Understanding Kernel SVM. Classification problem with higher dimension data
The data set shown below has no clear linear separation between the two classes. In machine learning parlance, you would say that these are not linearly separable. How can you get the support vector machine to work on such data?

Since you can’t separate it into two classes using a line, you need to transform it into a higher dimension by employing a kernel function to the data set.

A higher dimension enables you to clearly separate the two groups with a plane. Here, you can draw some planes between the green dots and the red dots — with the end goal of maximizing the margin.

If you let R=the number of dimensions, the kernel function will convert a two-dimensional space (R2) to a three-dimensional space (R3). Once the data is separated into three dimensions, you can apply SVM and separate the two groups using a two-dimensional plane.

This is similar in the higher dimensions (3+D):

There are many types of kernel functions, such as:
- Gaussian RBF kernel
- Sigmoid kernel
- Polynomial kernel
Depending on the dimensions and how you want to transform the data, you can choose from any of these kernel functions.