So far, we only saw the theoretical aspect and the mathematical working of a few machine learning algorithms. Today, we will look into the Python implementation of the same algorithms.
The following question would be, why Python? Why not R or Java or any other language?
There are a few reasons:
First one being, it is concise, simple to understand and interpret.
Machine Learning algorithms can sometimes get extremely complicated and at such times, if the language/framework also is a complicated one, it gets difficult to develop code in such a language.
Python has a plethora of libraries which can be simply imported and used to implement algorithms. (Tensorflow, Scikit learn, Numpy)
It is supported by platforms like Linux, MacOS, and Windows.
If you are new to Python you can easily learn Python in no time from Studytonight.
The requirement to run the code examples below is Spyder (Python 3.7). Spyder is a powerful IDE written in Python. You can download it from here: https://www.spyder-ide.org/ (This IDE comes pre-installed with all the required libraries.)
Before running the below code, download the necessary packages using "pip install PACKAGE-NAME
". It is highly suggested to visit the Anaconda site and search for the required packages, and then download them.
In the code example below we will use the numpy, scikit and matpotlib libraries of python to implement Linear Regression using the Mean squared error.
Click on the Run button to see the output.
Following are the steps followed in the above code:
In the code above, we have generated random (x, y) values using the numpy library.
Then we initialize the LinearRegression()
model.
We then put the (x, y) data in the model after which the prediction begins.
The program will then print the Slope, Intercept, Root mean squared error and the R2 score before plotting a graph using the matpotlib library.
We also familiarized ourselves with Logistic Regression and its mathematical working.
One important point to note here is that, in the same way, that Linear Regression is implemented using Mean Squared Error, Logistic Regression is implemented using Maximum Likelihood Estimation (MLE). It is an iterative process, that starts off with a random weight/value for the predictor (independent variable) and is carried on until the optimum weights are achieved (when there is less or no change in the output when the weight change).
Logistic regression can be implemented using Scikit Learn or from scratch. First, we will see how we can use the sci-kit learn library to implement Logistic regression.
Below is the sci-kit learn implementation of Logistic Regression:
As you can see above, we are reading data from the marks.csv file and then applying the Logistic Regression algorithm on it.
Now we will implement Logistic Regression from scratch without using the sci-kit learn library. The data that we are using is saved in the marks.csv file which you can see in the terminal.
NOTE: Copy the data from the terminal below, paste it into an excel sheet, split the data into 3 different cells, save it as a CSV file and then start working.
The model parameters vary greatly when implemented using Scikit learn in comparison to when it is implemented from scratch. To understand the reason behind this, don't forget to tune in to my next article.
Linear Regression |
Logistic Regression |
The output (outcome/dependent variable) is a continuous value, i.e it can take up any value. |
The output (outcome/dependent variable) is discrete, i.e it can take limited values. |
It is used in cases where a value is to be predicted. For example: predicting house prices when the area, locality, and other dependent attributes have been provided. |
It is used in cases where the response variable is binary/categorical. For example: if it would rain today or not, whether the student would pass or fail. |
The equation used to calculate the linear regression is |
The equation used to calculate logistic regression is |
Interpreting the coefficient is simple since the equation is first order, variables are held constant, and the dependent variable is observed. |
Interpreting coefficient depends on the family of logistic regression and the function ( |
This regression used ordinary least square method to bring the errors to minimal and reach the best possible fit of data in the graph. |
This regression uses maximum likelihood method to predict binary values. |
Use case: Predicting house prices when provided with a labeled dataset. |
Use case: Predicting if the house would be sold for a certain X amount or not. |
In the next post, we will see some accuracy metrics used to determine the performance of Machine Learning algorithms, a quick guide to decide which algorithm can be used for various real-life scenarios, and some more Machine Learning algorithms and their working.