Search

# Machine Learning [#2 Build Concept]

Updated: Sep 27, 2019

As we talked in the previous blog "Machine learning [#1 Journey Guide]", our goal is to learn how to use the tools and to learn when should we use them. Understanding deep mathematical concepts is not my goal, however, calculus and algebra are not avoidable.

This approach will give you a good overview of this industry and practical tools to solve real-world problems without having to reinvent the wheel.

Nonetheless, I will include links to who would like to get a deeper understanding of the mathematical concept behind the used algorithm or technique.

## Building up concepts

fig.1 Linearly separable

We all know the simple linear equation y=aX+c (fig.1), where 'a' is a constant and represent the slope of the linear function and 'c' is constant.

Consider we have data of two sets Set_1(orange dots) and Set_2(blue dots).

To mathematically represent the classification of the two sets, we need to find the linear equation that separates them on the graph. And by finding the equation here I mean, find the value of constant 'a' and constant 'c'. And then we can draw a perfect line that separates both data sets.

Now, we can define a function Z=y-aX-c. For point for example (3,4) if Z>0, then the picked point will belong to Set_1 and if Z<0 then the picked point will belong to Set_2.

let's rewrite Z somehow in more "Machine learning" style. And since y and X are the only variables, we can call them features of the system, why? because those are the only changing characteristics between the two sets, and when they change, the boundary between the two sets changes, right?

So y = (w1)(X1) ,aX = (w2)(x2) and c = (w0)(x0) ,where(x0)=1 and (w0), (w1) and (w2) are constants, and we will call them weights because it weights each feature.

So rewriting Z function to:

Z= (w0)(x0) + (w1)(x1) + (w2)(x2)

What if we have more features, Z function will look like this:

Z= (w0)(x0) + (w1)(x1) + (w2)(x2)+.....+(wj)(xj)

where j is the number of features in the system.

fig.2 Not Linearly separable

Most of the datasets in real-life will not be linearly separable like the graph in fig.1, we can have data distribution like for example in fig. 2.

If we have data like in fig.2, we can not separate the two sets with a linear line like in fig,2 but we can have a threshold for the tolerated error we can have in our model we develop later on.

But essentially we do the following steps for most machine learning algorithms:

1. We separate the data into two-part, Training data set and testing data set

2. For each training sample, we Substitute the values of features in the Z function (Predictor function)

3. Substitute the value of Z in "activation function" F(Z) and get output (Activation function is just a function that determines a certain output for a range of input, for instance, a threshold will output 1 if Z>num. Just move on now)

4. Determine how deviated the output from the actual value and update the weight with steady steps called "Learning Rate" toward the right answer

5. And then repeat again from point.2 (each iteration called epochs) for a maximum number of iterations and/or threshold for tolerated misclassification (deviation from the output)

All these steps are done for us for the most used machine learning algorithms in the famous library Scikit-Learn. All you have to do is to provide it with the dataset, train the model, and use the trained model afterward (Yes, there are a lot more skills needed but seriously most of the work is done for you).

Enough talking and explaining, let's get our hands dirty with classic example iris data set. This data set consists of 3 different types of irises’ (Setosa, Versicolour, and Virginica). 150 samples of irises of 4 features (Sepal Length, Sepal Width, Petal Length, and Petal Width).

## Prepare the data and visualize it

Visualizing data and preparing them are the most important skills you need to have while you are learning "Machine Learning". Obviously, you will have to deal with a lot of numbers and csv files. We will use for visualizing data seaborn, matplotlib and mlxtend libraries.

First, we import the libraries

Load iris data and print the first five entries

Now let's plot two features sepal_length against petal length

As you can see, we have plotted two features and see now with the help of seaborn library that we have three categories in the 'species' color-coded into red, orange and green. Obviously, we can choose any two features to plot and see the relation between them. We can also plot in a higher dimension than 2D but that is out of the scope now.

As we discussed, we need the samples values of the features (x) and the corresponding Z

Z= (w0)(x0) + (w1)(x1) + (w2)(x2)+.....+(wj)(xj)

Each line of the print(iris.head()) is a sample where each feature is the (x) and the corresponding Z is the species. So now, we need to extract that from the iris variable.

Z actually now looks like this

That is not computationally efficient, we need to convert the string class labels to integer class labels. This will save us computation power for more demanding tasks need to be done by the processor.

Now Z looks like this

For X data, we need to perform 2 steps

1. Split the data into training samples and test samples

2. "Feature Scaling", mainly we will use the Standardization method.

You may want to read in the mentioned Wikipedia link why we are scaling our features, but in short, by scaling our features, we optimize the performance of our system.

## Conclusion

In this article, we discussed a little bit about the main concept behind what we are trying to do. We understood what is the Z function. We understood that, after all, we'll pass the value of Z function to "activation function". We knew some new terminologies like "learning rates" and "epochs", next we will learn how to use them in our code.

We also plotted our data using seaborn library. We converted string class labels to integer class labels. Also, we standardized our samples and split them into training and testing samples using scikit-Learn library.

The below code is the final code we will use next for all our training with scikit-Learn library.

Finally, let me know what do you think about this blog? Was it useful? Did I oversimplify the topic? Did I complicate it?