As discussed previously, regression problems and classification problems are classified under Supervised learning. Today, the topic under light will be regression problems.
Under regression, today we will be learning about linear regression. Regression refers to the fact that we are trying to predict a real-valued output, when provided with a “right answer” to the problem in question. Linear regression means there is a linear relationship between the input variables and the single output variable. This is the simplest form of regression, hence to get familiar with regression, we are considering linear regression.
Now to some expressions. Consider the previous car example where we have different models of cars and their second-hand buying price. Slightly modifying it, we add the price when the car is newly purchased and the second hand selling price. Here, let us assume that after an year, the selling price of the car drops by 30 percent, hence the second column. This, for example purposes, now forms our training data set.
Consider the following:
n - let ‘n’ be the number of training examples, in our case, it is 5
x – let ‘x’ denote the input variable. These input variables are also known as "features".
y – let ‘y’ denote the output variables. In simple words, it is the output/ target variable that the algorithm will predict
(x,y) denotes a single row of training data set. You can access the ‘i’th row using [x(i), y(i)], where I represents the row number, starting from 1.
h – represents hypothesis function
We first feed the above training data set to the learning algorithm and gives us the output by using a function which is called “hypothesis”. This function has been named so since long and no one really thought of changing the name, hence the word “hypothesis”. What this hypothesis function does is, it takes an input, and tries to provide an estimated value for the corresponding y. Simply put, hypothesis is a mapping function from x to y. it is essential to decide how to represent the hypothesis function.
The representation of this hypothesis function is a linear equation that combines a specific set of input values (x) whose solution is the predicted output value (y). Both the input values (x) and the output value are numeric.
The linear equation assigns one scale factor to each input value or column, called a coefficient and represented by the capital Greek letter theta . One additional coefficient is also added, giving the line an additional degree of freedom (e.g. moving up and down on a two-dimensional plot) and is often called the intercept or the bias coefficient.
In other words, this kind of regression model is also known as univariate linear regression. This means that there we are predicting the output as function of one input variable (x). There are other complex models, but it is suggested to start with the simplest of all, the linear regression that involves just one variable.
Given the training data set, now it is required to predict the value of a new “CAR E”, whose selling price, when new, has been provided. The above equation can be re-written as following, where (theta 0) and (theta 1) will stabilize the parameters of the model.
Given values of (theta 0) and (theta 1) , it is a trivial task to predict or rather calculate the value of the hypothesis functon.
In linear regression, we have a training set and it is required to come up with values for the parameters (theta zero) and (theta one) so that the straight line we get out of this, corresponds to a straight line that fits the data properly. Fitting proper values to get a decently straight line will be discussed in the coming post.
SHARE YOUR THOUGHTS WITH US!