# Machine Learning [#2 Build Concept]

Updated: Sep 27, 2019

As we talked in the previous blog __"Machine learning [#1 Journey Guide]"__, our goal is to learn how to use the tools and to learn when should we use them. Understanding deep mathematical concepts is not my goal, however, calculus and algebra are not avoidable.

This approach will give you a good overview of this industry and practical tools to solve real-world problems without having to reinvent the wheel.

Nonetheless, I will include links to who would like to get a deeper understanding of the mathematical concept behind the used algorithm or technique.

**Building up concepts**

We all know the simple linear equation *y=aX+c* (fig.1), where 'a' is a constant and represent the slope of the linear function and 'c' is constant.

Consider we have data of two sets Set_1(orange dots) and Set_2(blue dots).

To mathematically represent the classification of the two sets, we need to find the linear equation that separates them on the graph. And by finding the equation here I mean, find the value of constant *'a'* and constant *'c'*. And then we can draw a perfect line that separates both data sets.

Now, we can define a function *Z=y-aX-c*. For point for example (3,4) if *Z>0*, then the picked point will belong to Set_1 and if *Z<0* then the picked point will belong to Set_2.

let's rewrite *Z* somehow in more "Machine learning" style. And since *y* and X are the only variables, we can call them * features *of the system, why? because those are the only changing characteristics between the two sets, and when they change, the boundary between the two sets changes, right?

So* y = (w1)(X1)* ,*aX = (w2)(x2)* and *c = (w0)(x0)* ,where*(x0)=1* and *(w0)*,* (w1)* and *(w2) *are constants, and we will call them **weights** because it weights each * feature*.

So rewriting *Z* function to:

*Z= (w0)(x0) + (w1)(x1) + (w2)(x2)*

What if we have more features, *Z* function will look like this:

*Z= (w0)(x0) + (w1)(x1) + (w2)(x2)+.....+(wj)(xj)*

where *j* is the number of features in the system.

Most of the datasets in real-life will not be linearly separable like the graph in fig.1, we can have data distribution like for example in fig. 2.

If we have data like in fig.2, we can not separate the two sets with a linear line like in fig,2 but we can have a threshold for the tolerated error we can have in our model we develop later on.

But essentially we do the following steps for most machine learning algorithms:

We separate the data into two-part, Training data set and testing data set

For each training sample, we Substitute the values of features in the

*Z*function (**Predictor function**)Substitute the value of

*Z*in**"**__activation function__**"***F(Z)*and get*output*(Activation function is just a function that determines a certain output for a range of input, for instance, a threshold will output 1 if Z>num. Just move on now)Determine how deviated the

*output*from the actual value and update the weight with steady steps called "" toward the right answer__Learning Rate__And then repeat again from point.2 (each iteration called

) for a maximum number of iterations and/or threshold for tolerated misclassification (deviation from the output)__epochs__

All these steps are done for us for the most used machine learning algorithms in the famous library __Scikit-Learn__. All you have to do is to provide it with the dataset, train the model, and use the trained model afterward (Yes, there are a lot more skills needed but seriously most of the work is done for you).

Enough talking and explaining, let's get our hands dirty with classic example __iris__ data set. This data set consists of 3 different types of __irises__’ (Setosa, Versicolour, and Virginica). 150 samples of __irises__ of 4 ** features ** (Sepal Length, Sepal Width, Petal Length, and Petal Width).

**Prepare the data and visualize it**

Visualizing data and preparing them are the most important skills you need to have while you are learning "Machine Learning". Obviously, you will have to deal with a lot of numbers and csv files. We will use for visualizing data __seaborn__, __matplotlib__ and __mlxtend __libraries.

First, we import the libraries

Load__ iris__ data and print the first five entries

Now let's plot two features sepal_length against petal length

As you can see, we have plotted two features and see now with the help of __seaborn__ library that we have three categories in the *'species'* color-coded into red, orange and green. Obviously, we can choose any two features to plot and see the relation between them. We can also plot in a higher dimension than 2D but that is out of the scope now.

As we discussed, we need the samples values of the features *(x) *and the corresponding Z

*Z= (w0)(x0) + (w1)(x1) + (w2)(x2)+.....+(wj)(xj)*

Each line of the *print(iris.head()) *is a sample where each feature is the *(x) *and the corresponding* Z *is the species. So now, we need to extract that from the iris variable.

*Z* actually now looks like this

That is not computationally efficient, we need to convert the string class labels to integer class labels. This will save us computation power for more demanding tasks need to be done by the processor.

Now Z looks like this

For X data, we need to perform 2 steps

Split the data into training samples and test samples

__"Feature Scaling"__, mainly we will use the__Standardization__method.

You may want to read in the mentioned Wikipedia link why we are scaling our features, but in short, by scaling our features, we optimize the performance of our system.

**Conclusion**

In this article, we discussed a little bit about the main concept behind what we are trying to do. We understood what is the *Z* function. We understood that, after all, we'll pass the value of *Z* function to "__activation function____".__ We knew some new terminologies like "learning rates" and "epochs", next we will learn how to use them in our code.

We also plotted our data using __seaborn__ library. We converted string class labels to integer class labels. Also, we standardized our samples and split them into training and testing samples using __scikit-Learn__ library.

The below code is the final code we will use next for all our training with __scikit-Learn__ library.

Finally, let me know what do you think about this blog? Was it useful? Did I oversimplify the topic? Did I complicate it?

Please leave a comment below and if you have any question, I will be happy to answer.

Don't forget to subscribe, join our forum and share it with friends.