CLOSE

   Machine Learning  Cost Function  ML  AI  
   Technology    Machine Learning

Defining a Cost Function and Figuring out How to fit a Graph

         
 JUNE 26, 2018   by SmritiS

In my previous post, we explored linear regression and how to predict values given a data set with certain parameters. We also discussed the technique of choosing the right set of value(s) so as to get an almost straight line that fits the data properly. In this post, we need to come up with values for the parameters (theta zero) and (theta one) such that we predict a value that is close to the value provided.

I will extend my previous post's example of the car dataset where second-hand car's price was supposed to be predicted given new prices, and check how close it is to the actual value using the squared error function. The squared error function basically tells the difference between the value predicted and the value provided. The smaller the value, the better the function is.

How do we predict the values of (theta 0) and (theta 1) that are correct or the right fit, or those values that form a linear graph/straight line? Just a brief from my previous post - and are basically the 'x' and 'y' values of the hypothesis function which are used in curve fitting and predicting values required by the function. These two values determine the hypothesis function.

We can choose any values for and , but to fit the straight line of the graph, it is important to have a strategy. Choose the parameters theta 0 and theta 1 so that h(x) is close to y (y is a part of the training dataset (x,y)). Also, our original task, as defined in the problem, is to predict the second-hand price of a car, given the price of a car when it is brand new and when it is sold for a second-hand price(see this previous tutorial to understand). The predicted value (with the help of hypothesis function) and the actual value should have very less difference, i.e

where h(x) is the predicted value of Car D and y is the actual second-hand price of Car D, must be minimum. This also implies that the square of this should be minimum, i.e

Now another question would pop up, why do we need to take the square of the difference (or rather the square error cost function)? The reason is that it has turned out that squaring the difference is a reasonable calculation and it has been proven that it works well for most of the regression problems. There are other equally good functions, but squared error cost functions work well with many problems.

Therefore, should be minimum. Here, n is the number of training sets we have, means summation over the training datasets given and we have halved it to ease the mathematical calculations, where

.

Formally,

should be minimum where

This is also known as the cost function or the squared error cost function because we need to find the minimum value of theta 0 and theta 1 so that the difference between the predicted value and the actual value is minimum.

Let us actually simplify the hypothesis function to work with ease on the cost function. Until specified, assume = 0. Therefore, our hypothesis function now becomes

Consequently, our cost function becomes

Now, let us put all the above expressions to work and try to predict the second-hand price of a car, given a set of cars and their original and second-hand values. Also, find out whether the second-hand price of Car E provided in the data is close to our predicted value(whether it fits into a decently straight line or not).

Car E has an original price of 12.78 L and second-hand price of 8.94 L. Initially, = 0 and = 0. This gives h(x)= 0. When the sum of (h(x)-y)^2 is calculated given h(x) is 0, the value comes out to be 190.09. This has to be divided by 2* number of observations which yields 19.00. This is a very high value of cost function, which leads us to consider a better value for h(x) and a lower cost function.

Now, we need to calculate 1/2n(sum), which calculates to 1/(2*5) (sum), where n is the number of observations. This yields J() as 1.49 which is decently low. The value of h(x) can be modified to yield an even lower J().

Consider the equation where we assume= x and = - 0.25. This leads to the following predicted values.

It is clear from the above table that the actual second-hand value and the value predicted are similar. If the values of and are further tweaked, it yields in better-predicted values.

NOTE: All the squared values are precise up to 2 decimal places only. The values might change if the precision is higher.

This is how a cost function is calculated and a relationship is established between the dependent variable (y - second-hand car price) and independent variable (x - original price of the car). You can use datasets from Kaggle and apply linear regression to get better at it.

In the upcoming posts, I will discuss artificial intelligence and machine learning, deep learning and much more. So stay tuned.


SHARE YOUR THOUGHTS WITH US!