Signup/Sign In

Python Dataclass decorator - Part 1

Posted in Programming   LAST UPDATED: SEPTEMBER 21, 2021

    If you are already using Python 3.7, you may be aware of the new features, one of them being the dataclass. But if you haven’t updated to the new version yet, here is news for you.

    Python 3.7 has introduced a new feature, the dataclass.

    But wait, even in Python 3.6, the dataclasses can be implemented by installing it with the help of the following statement:

    pip install dataclasses


    So what is a dataclass?

    Dataclasses are basically Python classes that store data objects. Data objects include (but not limited to) specific data types, like a number or a class instance.

    They come with already implemented basic functionality set like instantiation, print method, and comparison of instances. Dataclass can be created by specifying the @dataclass decorator with a normal class.


    Why a dataclass now?

    The whole point of creating the Python language was to make it more readable. In Python, readability counts. Due to the same reason, dataclasses were created. You will see what we mean by readability in a few minutes.

    How can I differentiate between a regular class and a dataclass?

    This is quite simple. Python comes with a dataclass decorator (@dataclass) that indicates that the class is a dataclass. This is usually done in the following way:

    from dataclasses import dataclass
    
    @dataclass
    class class_name:
        # class definition

    Now let us compare a normal class and a dataclass to see what a dataclass has to offer to us.


    Normal Python class

    A normal class is implemented by using the class keyword followed by the name of the class.

    class Website:
        def __init__(self, val):
            self.val = val
    
    # creating class object
    class_instance = Website(12)
    class_instance.val

    Output:

    12


    Python Dataclass

    The dataclass is indicated with the help of @dataclass decorator.

    from dataclasses import dataclass
    
    @dataclass
    class Website: 
        val:float
    
    class_instance = Website(12.21)
    print(class_instance)

    Output:

    Website(val=12.21)


    Comparing the above two classes, the following can be inferred:

    1. The usage of __init__ in the dataclass has been dismissed(not required).

    2. The variables inside the class have been defined with their type in dataclass, as opposed to using self(representing the object of class) to declare it in normal class. This method of indicating the type of value is known as type hinting.

    3. The output clearly shows that the value belongs to the class Website.

    In addition to this, default values can be specified in the dataclass's class members.

    Under the hood, the dataclass implements a __repr__() method that helps present the object of the class in a readable string format. It also implements an __eq__() method which comes into play when we compare two objects of the dataclass. We will cover this in details below.

    Well, this is simple. Is this the only reason I should use a dataclass?

    No, this isn't the only reason. In addition to readability, the dataclasses (as mentioned previously) have pre-implemented methods. This means such methods don't need to be explicitly defined in a dataclass.

    Dataclasses can be represented in different ways. Below is a demonstration:

    import dataclasses
     
    @dataclasses.dataclass
    # or @dataclasses.dataclass() 
    class Website:
        val:int = 0

    The init, repr and eq methods are set to True automatically when a dataclass is implemented. In other words, it is interpreted as follows,

    @dataclasses.dataclass(init=True, repr=True, eq=True)

    Let's cover about these special methods one by one.


    Representation:

    When we create a default class, we generally add only the __init__ method to it for initializing the object of the class,

    class Website:
        def __init__(self, val):
            self.val = val
    
    class_instance = Website(12)
    print(class_instance)
    print(class_instance.val)

    Output:

    <__main__.Website object at 0x000002C0B4FE4E80>
    12

    What do you understand from the above code?

    You see that the value of the Website instance is 12. But what about the line, i.e <__main__.Website object at 0x000002C0B4FE4E80>?

    Well that is how python displays an object of a class.

    Hence making debugging tough since the object's representation utility isn't specified in a normal class. A neat representation of the data in a normal class needs to be implemented with the help of __repr__ method. See the below code to understand the implementation of the __repr__ method in a normal class.

    class Website:
    
        def __init__(self, val):
            self.val = val
    
        # special method __repr__
        def __repr__(self):
            return self.val
    
    class_instance = Website('12')
    print(class_instance)

    Output:

    12

    This means the __repr__ method must be explicitly defined in normal classes. On the other hand, these methods come already implemented in a dataclass.

    Consider the following code of a dataclass:

    From the below code, it can be seen that the __repr__ method doesn't have to be explicitly defined.

    from dataclasses import dataclass
    
    @dataclass
    class data_class():
        value : int
    
    class_instance = data_class(12)
    print(class_instance)
    print(type(class_instance))

    Output:

    data_class(value=12)
    <class '__main__.data_class'>
    

    The above functionality of representation, as well as other methods, can be included by default in a dataclass by specifying the appropriate keyword to True.

    If we want to exclude the default __repr__ method from our dataclass, we can do so by using the following code:

    from dataclasses import dataclass
    
    @dataclass(repr=False)
    class data_class(): 
        value:int 
        
    class_instance = data_class(12)
    print(class_instance)

    Output:

    <__main__.data_class object at 0x000002CFG4FE4E80>


    Comparing Objects:

    In a dataclass, the __eq__ method is implemented, which is used for equating two objects of the class.

    We will compare the implementation of == (checking for the equality of two objects) in a normal class and a dataclass.

    from dataclasses import dataclass
    
    @dataclass
    class data_class():
        value : int
    
    class_instance = data_class(12)
    print(class_instance)
    print(type(class_instance))
    
    
    class normal_class():
        def __init__(self, val):
            self.val = val
    
    #Two objects instantiated for the dataclass
    instance_one = data_class(12)
    instance_two = data_class(12)
    
    #Two objects instantiated for the normal class
    instance_three = normal_class(12)
    instance_four = normal_class(12)
    
    print("DataClass Equal:", instance_one == instance_two)
    print("Normal Class Equal:", instance_three == instance_four)

    Output:

    data_class(value=12)
    <class '__main__.data_class'>
    DataClass Equal: True
    Normal Class Equal: False

    The last two lines of this output might seem confusing. Here is the explanation for this behaviour of dataclass and normal class.

    The equality operator basically checks whether both the objects refer to the same memory location. But this isn't the case since two different instances of the same class will obviously have different locations. Hence, the result is False in case of a normal class.

    On the other hand, when the == is used to compare objects of a dataclass, it checks to see if the contents of both the instances of the same class are the same or not. Since both instances contain the same data, it returns True.

    When a dataclass generates an __eq__ method, it compares 2 instances of the same class. This is done by comparing the attributes of one class instance (which is in the form of a tuple) with the attributes of the other instance of the class.

    If we have a complex class logic and we want to implement our own logic for equating class objects for our dataclass, we can define our dataclass and specify eq=False to not include the __eq__ method by default.

    Note: The ordering methods(which include <, >, <=, and >=) can be implemented by setting the keyword order to True, i.e order=True while mentioning the dataclass decorator.


    Conclusion

    In today's post, we understood what a dataclass is, its significance, its usage and its advantages over normal classes. In the upcoming posts, we will dive deeper into dataclasses and understand more about them.

    You may also like:

    About the author:
    I love writing about Python and have more than 5 years of professional experience in Python development. I like sharing about various standard libraries in Python and other Python Modules.
    Tags:PythonPython Dataclass
    IF YOU LIKE IT, THEN SHARE IT
     

    RELATED POSTS