Signup/Sign In

Linear Regression with Gradient Descent

Posted in Machine Learning   LAST UPDATED: FEBRUARY 11, 2020

    In a regression problem, the program predicts real-valued output. In linear regression, we have a training set or dataset in which we have to fit the best possible straight line or say the best possible linear relation with the least chance of error.




    Regression problem:

    Let's say that we have a data set of used vehicles that consists of a relation between kilometers driven and price. We have observed the sale of used vehicles for 6 months and came up with this data, which we will term as our training set. Using the training set a linear relation(straight line) has to be generated with error as low as possible. The linear relation( let's name it hypothesis) will now be able to predict prices for other vehicles whenever we give it the input of kilometers driven.

    Linear relation can be represented as:

    O = a1 + (a2)*x { or y = mx+c equation of a straight line } ……………..(1)

    logistic regression with gradient decent

    Our dataset includes the sale statement of 50 used vehicles. So let's denote our number of training examples by T,

    T = 50

    And, x = input (which we will enter), and

    O = output (predicted price by the program)

    Now as you can see in the first equation above a1 and a2 are the parameters which will determine the linear relation or in simple words the orientation of the line (like vertical, horizontal etc).

    The values of the parameters a1 and a2 should be selected or chosen such that the error possibility is least. Here the sum of squared error will be our first source to evaluate the relation or model, after which we have to work on our hypothesis which generate the best possible line with least error chances.

    Here the squaring of error is necessary for eliminating the negative value. And after squaring the error we will get more accurate value.

    Our hypothesis will be,

    H(x) = a1 + (a2)*x

    So our squared error cost function or mean squared error would look like this:

    logistic regression with gradient decent

    About H(x),

    • For fixed a2 this is a function of x,
    • For different values of a2 slope of this function varies (something like this),

      logistic regression with gradient decent

    About E(a2) {let's assume E for only one parameter for better understanding}

    • Function of parameter a2,
    • Controls the slope of the line (as seen below),

      logistic regression with gradient decent

    Now for minimizing the squared error cost function E, we have an algorithm called Gradient Descent. It is used all over in machine learning. Now let's look at the problem, we have E(a1, a2) and where we have a1 and a2 such that E(a1, a2) is minimum.

    So what we are going to do is:

    • Start with some random values of a1 and a2.

    • Keep changing a1, a2 to reduce E(a1, a2) until we reach the minimum.




    Gradient Descent Algorithm

    Repeat until convergence {

    logistic regression with gradient decent

    }

    Now we also have to update the values of our parameters simultaneously.

    The correct way is:

    logistic regression with gradient decent

    a1 := Variable1

    a2 := Variable2

    here := is the assignment operator and = is considered as truth assertion and ? is the learning rate.

    The incorrect way is:

    logistic regression with gradient decent

    About parameter a2,

    • If a2 > 0, then x(input or feature or predictor) and O(output or target) have a positive relationship, i.e. if x increase then O will also increase.

    • If a2 < 0, then x(input or feature or predictor) and O(output or target) have a negative relationship, i.e. if x increase then O will decrease.

    About parameter a1,

    • It determines the intercept of the line generated by our hypothesis.

    About learning rate ?,

    • If the learning rate is too small, gradient descent can be very slow.

    • If the learning rate is too large then the gradient descent can exceed or overshoot the minimum point in the function. As a result of which it may fail to converge or even diverge in some cases.

      logistic regression with gradient decent

    • Gradient descent can converge to a local minimum, even with the learning rate ? is fixed.

    • As we approach a local minimum, gradient descent will automatically take a smaller step, hence we don't need to decrease ? overtime. We can see that in the image below,

      logistic regression with gradient decent


    Putting it all together,

    Gradient Descent Algorithm is,

    Repeat until convergence {

    logistic regression with gradient decent

    }

    logistic regression with gradient decent

    Now lets combine them together, for that simplification we need to do the partial derivative of E(a1, a2) with respect to a1 and a2,

    logistic regression with gradient decent

    [ substituting the value of H(x)]

    After doing the partial derivative we get,

    logistic regression with gradient decent

    so here's how our algorithm looks like,

    Repeat until convergence {

    logistic regression with gradient decent

    } and we have to update it simultaneously.

    logistic regression with gradient decent

    About the author:
    Shashwat Pandey is an accomplished technical author with a wealth of experience in writing about computer networks. His passion for the subject extends beyond his writing, and he enjoys exploring new networking technologies to stay on the trend.
    Tags:Linear RegressionGradient DecentMLAI
    IF YOU LIKE IT, THEN SHARE IT
     

    RELATED POSTS