Signup/Sign In

Types of Loss functions in ML

Posted in Machine Learning   LAST UPDATED: MARCH 2, 2020

    Hello Everyone!

    In this article, I will be discussing (Code implementation + Explanation) various Loss functions used in ML and their comparison using a performance graph.

    There are various loss functions used in Machine Learning depending on the user's purpose.

    Out of them, the loss functions that I am going to cover in this article are:

    1. Mean Squared Loss

    2. Mean Absolute Loss

    3. Mean Log Cosh Loss

    4. Root Mean Squared Loss

    So lets get started!!

    What are Loss functions in Machine Learning?

    Most of the algorithms in Machine Learning rely on Optimizing (minimizing or maximizing) a function, which we call an Objective Function.

    Out of these, the group of functions that we tend to minimize is called Loss Functions.

    As the name suggests, a loss function is used to determine the loss of information or error in a particular Machine Learning algorithm under consideration.

    Formally speaking:

    A loss function is a measure of how good a prediction model does in terms of being able to predict the expected outcome.

    Out of many, the most commonly used method for finding the minimum point of a function (point of minima, as we are focusing on reducing the error to maximize the objective function) is Gradient Descent.

    Categorization of Loss Functions:

    All the loss functions can be broadly categorized into 2 types:

    1. Classification Loss

    2. Regression Loss

    In this article, most of the loss functions that are going to be discussed, fall into the category of Regression Loss.

    So let's study each one of them, one at a time.

    1. Mean Squared Loss/Error:

    It is defined as the sum of the squared distances between the actual values and the predicted values.

    Formula:

    Mathematically, it is defined as below:

    Code Implementation:

    Below we have the code implementation,

    def mean_squared_loss(xdata, ydata, weights):
        '''
        weights = weight vector [D X 1] #input weight vector
        xdata = input feature matrix [N X D] #input values
        ydata = output values [N X 1] #actual output values
    
        '''
        new = np.dot(xdata,weights)
        predict_y = np.subtract(new,ydata)
        MSE = np.mean(np.square(predict_y))
        
        return MSE

    Applications of Mean Square Error:

    Out of various use-cases/applications of Mean Square Error, below I have discussed some of the most important ones.

    • In Statistical Modelling, the Mean Square Error represents the difference between the actual observation values and the observation values predicted by the model.

    • In Linear Regression, the mathematical benefits of Mean Square Error are particularly evident in its use at analyzing the performance of Linear Regression. It helps in differentiating the variations in a dataset into the following two categories:

      • Variation explained by the Model

      • Variation explained by Randomness.

    • The key criterion in selecting various estimators is minimizing Mean Square Error. Among various unbiased estimators, minimizing the Mean Square Error is equivalent to minimzing the Variance, and the estimator that does this is the minimum variance unbiased estimator.

    Graphical Representation of Mean Square Error:

    2. Mean Absolute Loss

    It is defined as the sum of the absolute differences between the actual values and the predicted values.

    The term absolute difference refers to the distance or the amount/magnitude of deflection in the predicted values from the actual values.

    Formula:

    Mathematically, it is defined as below:

    Code Implementation:

    Below we have the code implementation,

    def mean_absolute_loss(xdata, ydata, weights):
        '''
        weights = weight vector [D X 1] #input weight vector
        xdata = input feature matrix [N X D] #input values
        ydata = output values [N X 1] #actual output values
    
        '''
        predict_y = np.subtract(ydata, np.dot(xdata,weights)) # (y - w*x)
        MAL = np.sum(np.abs(predict_y)) #sum of the absolute difference between the individual values.
        MAL = MAL/xdata.shape[0] #taking the mean by dividing the sum by the number of input values.
    
        return MAL

    Applications of Mean Absolute Error:

    Out of various use-cases/applications of Mean Absolute Error, below I have discussed some of the most important ones.

    • Mean Absolute Error is mostly used to determine the accuracy of the industry forecasts.
    • It is of great help in the process of strategic planning as it determines the accuracy of the predictions and providing the relevant recommendations.

    Graphical Representation of Mean Absolute Error:

    3. Mean Log Cosh Loss:

    Log-cosh is the logarithm of the hyperbolic cosine of the prediction error.

    Formula:

    Mathematically, it is defined as below:

    Code Implementation:

    Below we have the code implementation,

    def mean_log_cosh_loss(xdata, ydata, weights):
        '''
        weights = weight vector [D X 1] #input weight vector
        xdata = input feature matrix [N X D] #input values
        ydata = output values [N X 1] #actual output values
    
        '''
        predict_y = np.abs(np.subtract(xdata@weights,ydata)) # (y - w*x)
        MLCL = np.log(np.cosh(predict_y))
        MLCL = np.mean(MLCL)
    
        return MLCL

    Applications of Mean Log Cosh Loss:

    Out of various use-cases/applications of Mean Log Cosh Loss, below I have discussed some of the most important ones.

    • log(cosh(x)) is approximately equal to (x ** 2) / 2 for small x and to abs(x) - log(2) for large x.
    • This means that 'logcosh' works mostly like the mean squared error, but will not be so strongly affected by the occasional wildly incorrect prediction.
    • It has all the advantages of Huber loss, and it’s twice differentiable everywhere, unlike Huber's loss.

    Graphical Representation of Mean Log Cosh Loss:

    4. Root Mean Squared Loss

    Root Mean Square Loss/Error (RMSE) is the standard deviation of the residuals (prediction errors).

    Residuals are a measure of how far from the regression line data points are; RMSE is a measure of how to spread out these residuals are.

    In simple terms, it tells you how concentrated the data is around the line of best fit.

    Formula:

    Mathematically, it is defined as below:

    Code Implementation:

    Below we have the code implementation,

    def root_mean_squared_loss(xdata, ydata, weights):
        '''
        weights = weight vector [D X 1] #input weight vector
        xdata = input feature matrix [N X D] #input values
        ydata = output values [N X 1] #actual output values
    
        '''
    
        predict_y = np.subtract(np.dot(xdata,weights),ydata)
        RMSL = np.sqrt(np.mean((predict_y)**2))
        return RMSL

    Applications of Root Mean Squared Loss:

    Out of various use-cases/applications of Root Mean Squared Loss, below I have discussed some of the most important ones.

    • climatology

    • forecasting

    • regression analysis to verify experimental results.

    Comparison graph for all the Loss/Error Functions discussed above:

    We have tested the below-written code and this is the output graph we have got.

    Note: In the below graph, the logcosh and Mean Absolute Error are completely overlapped denoting that the results depends on the type and amount of the input data provided to the model. You may try running the below code for a different input size and you will get a different graph.

    The complete code (putting it all together):

    Note: Below we have provided the complete code (programmed in Python) to run and test all the above discussed error functions yourself.

    The entire code is developed by our team and hence we would recommend you to go through to it carefully, one function at a time and we are sure that it will help you get your concepts crystal clear.

    Don't get overwhelmed by looking at the size of the code. We have provided you with the comments wherever possible.

    Hope you find it helpful.

    import numpy as np
    import argparse
    import csv
    import matplotlib.pyplot as plt
    import sys
    import math
    
    ''' 
    You are only required to learn the following functions
    
    mean_squared_loss
    mean_absolute_loss
    mean_log_cosh_loss
    root_mean_squared_loss
    
    Don't modify any other functions or command line arguments because autograder will be used
    Don't modify function declaration (arguments)
    '''
    
    def mean_squared_loss(xdata, ydata, weights):
    
        '''
        weights = weight vector [D X 1] #input weight vector
        xdata = input feature matrix [N X D] #input values
        ydata = output values [N X 1] #actual output values
        '''
        new = np.dot(xdata,weights)
        predict_y = np.subtract(new,ydata)
        MSE=np.mean(np.square(predict_y))
        
        return MSE
    
    def mean_squared_gradient(xdata, ydata, weights):
    
        '''
        weights = weight vector [D X 1]
        xdata = input feature matrix [N X D]
        ydata = output values [N X 1]
        Return the mean squared gradient
        '''
        predict_y = np.subtract(np.dot(xdata,weights),ydata)
        predict_y = np.asarray(predict_y)
        MSG = np.dot(xdata.T,(predict_y))
        MSG=MSG/xdata.shape[0]
        MSG=2*MSG
        
        return MSG
    
    def mean_absolute_loss(xdata, ydata, weights):
    
        '''
        weights = weight vector [D X 1] #input weight vector
        xdata = input feature matrix [N X D] #input values
        ydata = output values [N X 1] #actual output values
        '''
        predict_y = np.subtract(ydata, np.dot(xdata,weights)) # (y - w*x)
        MAL = np.sum(np.abs(predict_y)) #sum of the absolute difference between the individual values.
        MAL = MAL/xdata.shape[0] #taking the mean by dividing the sum by the number of input values.
    
        return MAL
    
    def mean_absolute_gradient(xdata, ydata, weights):
        mul = np.dot(xdata , weights)
        predict_y = mul-ydata
        abst = np.abs(predict_y)
        aj = np.divide(predict_y ,abst)
        MAG = np.dot(xdata.T , aj)
        MAG=MAG/xdata.shape[0]
        return MAG
    
    def mean_log_cosh_loss(xdata, ydata, weights):
    
        '''
        weights = weight vector [D X 1] #input weight vector
        xdata = input feature matrix [N X D] #input values
        ydata = output values [N X 1] #actual output values
        '''
    
        predict_y = np.abs(np.subtract(xdata@weights,ydata)) # (y - w*x)
        MLCL = np.log(np.cosh(predict_y))
        MLCL=np.mean(MLCL)
    
        return MLCL
    
    def mean_log_cosh_gradient(xdata, ydata, weights):
        predict_y = (np.subtract(np.dot(xdata,weights),ydata))
        MLCG = (np.dot (np.tanh(predict_y).T,xdata)).T
        MLCG=MLCG/xdata.shape[0]
        return MLCG
    
    def root_mean_squared_loss(xdata, ydata, weights):
    
        '''
        weights = weight vector [D X 1] #input weight vector
        xdata = input feature matrix [N X D] #input values
        ydata = output values [N X 1] #actual output values
        '''
        predict_y = np.subtract(np.dot(xdata,weights),ydata)
        RMSL = np.sqrt(np.mean((predict_y)**2))
        return RMSL
    
    def root_mean_squared_gradient(xdata, ydata, weights):
       
        predict_y = np.subtract(np.dot(xdata,weights),ydata)
        numerator = (np.dot(predict_y.T,xdata))/xdata.shape[0]
        denominator = np.sqrt(np.mean((predict_y)**2))
        RMSG = np.divide(numerator,denominator)
        RMSG = RMSG.T
        return RMSG     
     
    
    class LinearRegressor:
    
        def __init__(self,dims):
    
            # dims is the number of the features
            # You can use __init__ to initialise your weight and biases
            # Create all class related variables here
    
            self.weights =np.ones((dims,1))
            self.weights = self.weights.astype('float64')
            return 
    
        def train(self, xtrain, ytrain, loss_function, gradient_function, epoch=100, lr=1.0):
    
            '''
            xtrain = input feature matrix [N X D]
            ytrain = output values [N X 1]
            learn weight vector [D X 1]
            epoch = scalar parameter epoch
            lr = scalar parameter learning rate
            loss_function = loss function name for linear regression training
            gradient_function = gradient name of loss function
            '''
    
            # You need to write the training loop to update weights here
    
            ytrain = np.array(ytrain)
            ytrain = np.reshape(ytrain, (xtrain.shape[0],1))
            ytrain = ytrain.astype('float64')
            arr_err = []
    
            for iteration in range(epoch):
                err = loss_function(xtrain, ytrain,self.weights)
                self.weights = self.weights - lr*gradient_function(xtrain,ytrain,self.weights)
                # print("error =",err)
                arr_err.append(err)
    
            return arr_err
            
    
        def predict(self, xtest):
    
            count=np.dot(xtest,self.weights)
    
            for i in range(xtest.shape[0]):
                adi=int(count[i])
                if adi<0:
                    adi=0
                print(str(adi))
    
    
            ''' 
            This code is to make the output csv file
            file = open("prediction.csv","w")
            file.write("instance (id),count\n")
            for i in range(xtest.shape[0]):
                row=""
                adi=int(count[i])
                if adi<0:
                    adi=0
                row=str(i)+","+str(adi)+"\n"
                print(str(adi))
                file.write(row)
    
            file.close()
    
            # This returns your prediction on xtest
    
            '''
    
            return count
    
    
    def read_dataset(trainfile, testfile):
    
        '''
        Reads the input data from train and test files and 
        Returns the matrices Xtrain : [N X D] and Ytrain : [N X 1] and Xtest : [M X D] 
        where D is number of features and N is the number of train rows and M is the number of test rows
        '''
    
        xtrain = []
        ytrain = []
        xtest = []
    
        with open(trainfile,'r') as f:
            reader = csv.reader(f,delimiter=',')
            next(reader, None)
            for row in reader:
                xtrain.append(row[:-1])
                ytrain.append(row[-1])
    
        with open(testfile,'r') as f:
            reader = csv.reader(f,delimiter=',')
            next(reader, None)
            for row in reader:
                xtest.append(row)
    
        return np.array(xtrain), np.array(ytrain), np.array(xtest)
    
    
    def preprocess_dataset(xdata, ydata=None):
    
        '''
        xdata = input feature matrix [N X D] 
        ydata = output values [N X 1]
        Convert data xdata, ydata obtained from read_dataset() to a usable format by loss function
        The ydata argument is optional so this function must work for the both the calls
        xtrain_processed, ytrain_processed = preprocess_dataset(xtrain,ytrain)
        xtest_processed = preprocess_dataset(xtest) 
        
        NOTE: You can ignore/drop few columns. You can feature scale the input  data before processing further.
        '''
    
        xtrain = xdata[:,[8,9,10,11]]
        n,m = xdata.shape 
        X0 = np.ones((n,1))
        xtrain = np.append(xtrain,X0,axis=1)
        xtrain = np.asarray(xtrain)
        aa = xtrain[...,0]
        aa = aa.astype(float)
        m1 = np.mean(aa)
        sd1 = np.std(aa)
        xtrain[...,0] = (aa-m1)/sd1  
        aa = xtrain[...,1]
        aa = aa.astype(float)
        m1 = np.mean(aa)
        sd1 = np.std(aa)
        xtrain[...,1] = (aa-m1)/sd1
        aa = xtrain[...,2]
        aa = aa.astype(float)
        
        for i in range(n):
            if aa[i]==3:
                aa[i]=0
                    
        aa = aa.astype(float)
        m1 = np.mean(aa)
        sd1 = np.std(aa)
        xtrain[...,2]=(aa-m1)/sd1
        aa = xtrain[...,3]
        aa = aa.astype(float)
    
        for i in range(n):
            if aa[i]==3:
                aa[i]=0            
    
        aa = aa.astype(float)
        m1 = np.mean(aa)
        sd1 = np.std(aa)
        xtrain[...,3]=(aa-m1)/sd1
    
        dt=xdata[:,1]
    
        year=np.zeros((n,3))
    
        for i in range(n):
            yr=dt[i]
            yr=yr[3:4]
            yr=int(yr)
            year[i][yr-1]=1
    
        xtrain = np.concatenate((xtrain,year),axis=1)
    
        mon=np.zeros((n,12))
    
        for i in range(n):
            m = dt[i]
            m = m[5:7]
            m = int(m)
            mon[i][m-1]=1
    
        xtrain = np.concatenate((xtrain,mon),axis=1)
    
        day=np.zeros((n,31))
    
        for i in range(n):
            d = dt[i]
            d = d[8:10]
            d = int(d)
            day[i][d-1] = 1   
    
        xtrain = np.concatenate((xtrain,day),axis=1)
    
        season = {
            "1" : [1,0,0,0],
            "2" : [0,1,0,0],
            "3" : [0,0,1,0],
            "4" : [0,0,0,1]
        }
    
        sea = xdata[:,2]
        ss = []
    
        for i in sea:
            ss.append(season[i])
    
        ss = np.array(ss)
        xtrain = np.append(xtrain,ss,axis=1)
    
        hr = xdata[:,3]
        hours = np.zeros((n,24))
     
        for i in range(n):
            hours[i][int(hr[i])]=1
    
        hours = np.asarray(hours)
        xtrain = np.concatenate((xtrain,hours),axis=1)
    
        hl1 = np.eye(2)[xtrain[:, 4].astype(float).astype(int)]
    
        xtrain = np.concatenate((xtrain,hl1),axis=1)
    
        '''
        holi = {
            "0" : [1,0],
            "1" : [0,1]
        }
    
        hl=xtrain[:,4]
        hl1=[]
    
        for i in hl:
            hl1.append(holi[i])
    
        hl1=np.array(hl1)
    
        xtrain = np.append(xtrain,hl1,axis=1) 
    
        '''
    
        days = {
    
            "Monday" : [1,0,0,0,0,0,0],
    
            "Tuesday" : [0,1,0,0,0,0,0],
    
            "Wednesday" : [0,0,1,0,0,0,0],
    
            "Thursday" : [0,0,0,1,0,0,0],
    
            "Friday" : [0,0,0,0,1,0,0],
    
            "Saturday" : [0,0,0,0,0,1,0],
    
            "Sunday" : [0,0,0,0,0,0,1]
    
        }
    
        abc=xdata[:,5]
    
        dayss=[]
    
        for i in abc:
            dayss.append(days[i])
    
        dayss=np.asarray(dayss)
    
        xtrain = np.append(xtrain,dayss,axis=1)
    
        wda=np.eye(2)[xtrain[:, 6].astype(float).astype(int)]
    
        xtrain = np.concatenate((xtrain,wda),axis=1)
    
        '''
    
        wda = {
            "0" : [1,0],
            "1" : [0,1]
        }
    
        wd=xtrain[:,6]
    
        wd1=[]
    
        for i in wd:
            wd1.append(wda[i])
    
        wd1=np.asarray(wd1)
    
        xtrain = np.append(xtrain,wd1,axis=1)
    
        '''
    
        st = np.eye(2)[xtrain[:, 7].astype(float).astype(int)]
        xtrain = np.concatenate((xtrain,st),axis=1)
    
    
        '''
        situation = {
            "1" : [1,0],
            "2" : [0,1]        
        }   
    
        st=xtrain[:,7]
        st1 = []
        
        for i in st:
            st1.append(situation[i])
    
        st1=np.asarray(st1)
    
        xtrain = np.append(xtrain,st1,axis=1)
    
        '''
    
        xtrain = xtrain.astype('float64')
        return xtrain, ydata 
      
    
    dictionary_of_losses = {
    
        'mse':(mean_squared_loss, mean_squared_gradient),
    
        'mae':(mean_absolute_loss, mean_absolute_gradient),
    
        'rmse':(root_mean_squared_loss, root_mean_squared_gradient),
    
        'logcosh':(mean_log_cosh_loss, mean_log_cosh_gradient),
    
    }
    
    
    def main():
    
        # You are free to modify the main function as per your requirements.
        # Uncomment the below lines and pass the appropriate value
    
        #mean_squared_loss()
    
        xtrain, ytrain, xtest = read_dataset(args.train_file, args.test_file)
    
        xtrainprocessed, ytrainprocessed = preprocess_dataset(xtrain, ytrain)
    
        xtestprocessed = preprocess_dataset(xtest)
    
        model1 = LinearRegressor(xtrainprocessed.shape[1])
    
        mse = model1.train(xtrainprocessed, ytrainprocessed, mean_squared_loss , mean_squared_gradient, args.epoch, args.lr)
    
        '''
        Code to plot graph for all the 4 errors
    
        model2 = LinearRegressor(xtrainprocessed.shape[1])
    
        model3 = LinearRegressor(xtrainprocessed.shape[1])
    
        model4 = LinearRegressor(xtrainprocessed.shape[1])
    
        # The loss function is provided by command line argument    
    
        loss_fn, loss_grad = dictionary_of_losses[args.loss]
    
        mae = model2.train(xtrainprocessed, ytrainprocessed, mean_squared_loss, mean_absolute_gradient, args.epoch, args.lr)
    
        logcosh = model3.train(xtrainprocessed, ytrainprocessed, mean_squared_loss, mean_log_cosh_gradient, args.epoch, args.lr)
    
        rmse = model4.train(xtrainprocessed, ytrainprocessed, mean_squared_loss, root_mean_squared_gradient, args.epoch, args.lr)
    
        # print("MSE",mse,"MAE",mae,"LOGCOSH",logcosh,"RMSE",rmse)
    
        plt.plot(range(args.epoch),mse,"-",label="mse")
    
        plt.plot(range(args.epoch),mae,"-", label="mae")
    
        plt.plot(range(args.epoch),logcosh,"-",label="logcosh")
    
        plt.plot(range(args.epoch),rmse,"-",label="rmse")
    
        plt.legend()
    
        plt.show()
    
        '''
    
        model1.predict(xtestprocessed[0])
    
         
    if __name__ == '__main__':  
    
        '''
    
        You can remove the comments and run or test your code on the following input:
    
        code to test all the 8 functions for the sample data provided on the moodle
    
        ydata = [-2,  1,  1,  2,  0]
    
        xdata = [[ 1,  0,  2, -3], [ 1, -1,  0, -3], [-2, -5,  1, -3], [ 0, -5,  3, -3], [ 0, -4,  3, -2]]
    
        weights = [ 1, 0, -2, -1]
    
        xdata=np.asarray(xdata)
    
        ydata=np.asarray(ydata)
    
        weights = np.asarray(weights)
    
        ydata=ydata.reshape(ydata.shape[0],1)
    
        weights=weights.reshape(weights.shape[0],1)
    
        ydata=ydata.reshape(ydata.shape[0],1)
    
        print(xdata.shape)
    
        print(ydata.shape)
    
        print(weights.shape)
    
        print(mean_squared_loss(xdata, ydata, weights))
    
        print(mean_squared_gradient(xdata, ydata, weights))
    
        print(mean_absolute_loss(xdata, ydata, weights))
    
        print(mean_absolute_gradient(xdata, ydata, weights))
    
        print(root_mean_squared_loss(xdata, ydata, weights))
    
        print(root_mean_squared_gradient(xdata, ydata, weights) )
    
        print(mean_log_cosh_loss(xdata, ydata, weights))
    
        print(mean_log_cosh_gradient(xdata, ydata, weights))
    
        sys.exit(0)
    
        '''
    
        parser = argparse.ArgumentParser()
    
        parser.add_argument('--loss', default='mse', choices=['mse','mae','rmse','logcosh'], help='loss function')
    
        parser.add_argument('--lr', default=0.01, type=float, help='learning rate')
    
        parser.add_argument('--epoch', default=50000, type=int, help='number of epochs')
    
        parser.add_argument('--train_file', type=str, help='location of the training file')
    
        parser.add_argument('--test_file', type=str, help='location of the test file')
    
        args = parser.parse_args()
    
        main()

    Just copy and paste this Python code on your IDE and run by uncommenting the input parameters.

    What to uncomment and how to run, we have left it upto you so that you can validate your understanding of the code implementation.

    In case of any queries, feel free to get them cleared by posting it on our coemment section down below!

    Happy Learning : )

    About the author:
    Technical writer Aditya Jain specialises in LeetCode. Aditya's writing style simplifies even the hardest LeetCode problems. Aditya works to help developers improve and reach their potential.
    Tags:Loss FunctionsMean Squared LossMean Absolute LossMean Log Cosh Loss
    IF YOU LIKE IT, THEN SHARE IT
     

    RELATED POSTS