Coder Perfect

What exactly are data classes, and how do they differ from regular classes?

Problem

PEP 557 introduces data classes to the Python standard library.

They employ the @dataclass decorator and are intended to be “mutable namedtuples with default,” but I’m not clear what that implies or how they vary from common classes.

What are python data classes, and when should they be used?

Asked by kingJulian

Solution #1

Data classes are simply normal classes that are designed to store state rather than logic. You create a data class every time you build a class that is largely made up of attributes.

The dataclasses module simplifies the process of creating data classes. It handles a lot of the paperwork for you.

This is particularly beneficial when your data class must be hashable, as this necessitates both a __hash__ and a __eq__ method. If you implement a custom __repr__ method for debugging purposes, it can get very long:

class InventoryItem:
    '''Class for keeping track of an item in inventory.'''
    name: str
    unit_price: float
    quantity_on_hand: int = 0

    def __init__(
            self, 
            name: str, 
            unit_price: float,
            quantity_on_hand: int = 0
        ) -> None:
        self.name = name
        self.unit_price = unit_price
        self.quantity_on_hand = quantity_on_hand

    def total_cost(self) -> float:
        return self.unit_price * self.quantity_on_hand

    def __repr__(self) -> str:
        return (
            'InventoryItem('
            f'name={self.name!r}, unit_price={self.unit_price!r}, '
            f'quantity_on_hand={self.quantity_on_hand!r})'

    def __hash__(self) -> int:
        return hash((self.name, self.unit_price, self.quantity_on_hand))

    def __eq__(self, other) -> bool:
        if not isinstance(other, InventoryItem):
            return NotImplemented
        return (
            (self.name, self.unit_price, self.quantity_on_hand) == 
            (other.name, other.unit_price, other.quantity_on_hand))

You may simplify it with dataclasses by reducing it to:

from dataclasses import dataclass

@dataclass(unsafe_hash=True)
class InventoryItem:
    '''Class for keeping track of an item in inventory.'''
    name: str
    unit_price: float
    quantity_on_hand: int = 0

    def total_cost(self) -> float:
        return self.unit_price * self.quantity_on_hand

The same class decorator can also manage immutability and generate comparison methods (__lt__, __gt__, etc.).

Data classes are namedtuple classes, but they are immutable by default (as well as being sequences). In this aspect, dataclasses are far more flexible, and they can easily be designed to perform the same functions as a namedtuple class.

The attrs project, which can accomplish considerably more, inspired the PEP (including slots, validators, converters, metadata, etc.).

If you want to see some examples, check out the solutions for Day 7, Day 8, Day 11, and Day 20 of the Advent of Code.

Install the backported module (needs 3.6) or use the attrs project indicated above if you want to utilize the dataclasses module in Python versions 3.7.

Answered by Martijn Pieters

Solution #2

The issue has been resolved. This response, on the other hand, includes some practical examples to help with the fundamental knowledge of dataclasses.

The meaning of the latter phrase is as follows:

You save time entering boilerplate code when compared to common classes.

This is a quick rundown of the dataclass characteristics (TL;DR?). (For more information, see the Summary Table in the following section.)

Dataclasses come with the following functionalities by default.

Comparison + Attributes + Representation

import dataclasses


@dataclasses.dataclass
#@dataclasses.dataclass()                                       # alternative
class Color:
    r : int = 0
    g : int = 0
    b : int = 0

These defaults are provided by automatically setting the following keywords to True:

@dataclasses.dataclass(init=True, repr=True, eq=True)

If the proper keywords are set to True, further functionality are accessible.

Order

@dataclasses.dataclass(order=True)
class Color:
    r : int = 0
    g : int = 0
    b : int = 0

The ordering methods are now implemented (overloading operators: >= >=), similar to functools.total ordering but with more stringent equality checks.

Hashable, Mutable

@dataclasses.dataclass(unsafe_hash=True)                        # override base `__hash__`
class Color:
    ...

Despite the fact that the object is potentially changeable (and thus potentially undesirable), a hash is used.

Hashable, Immutable

@dataclasses.dataclass(frozen=True)                             # `eq=True` (default) to be immutable 
class Color:
    ...

A hash has been implemented, and altering the object or assigning attributes to attributes is no longer possible.

If either unsafe hash=True or frozen=True, the object is hashable in general.

For further information, see the original hashing logic table.

Special methods must be manually implemented to obtain the following features:

Unpacking

@dataclasses.dataclass
class Color:
    r : int = 0
    g : int = 0
    b : int = 0

    def __iter__(self):
        yield from dataclasses.astuple(self)

Optimization

@dataclasses.dataclass
class SlottedColor:
    __slots__ = ["r", "b", "g"]
    r : int
    g : int
    b : int

The size of the object has been reduced:

>>> imp sys
>>> sys.getsizeof(Color)
1056
>>> sys.getsizeof(SlottedColor)
888

__slots__ can speed up the process of creating instances and accessing characteristics in specific cases. Additionally, default assignments are not permitted in slots; otherwise, a ValueError is raised.

In this blog post, you can learn more about slots.

+----------------------+----------------------+----------------------------------------------------+-----------------------------------------+
|       Feature        |       Keyword        |                      Example                       |           Implement in a Class          |
+----------------------+----------------------+----------------------------------------------------+-----------------------------------------+
| Attributes           |  init                |  Color().r -> 0                                    |  __init__                               |
| Representation       |  repr                |  Color() -> Color(r=0, g=0, b=0)                   |  __repr__                               |
| Comparision*         |  eq                  |  Color() == Color(0, 0, 0) -> True                 |  __eq__                                 |
|                      |                      |                                                    |                                         |
| Order                |  order               |  sorted([Color(0, 50, 0), Color()]) -> ...         |  __lt__, __le__, __gt__, __ge__         |
| Hashable             |  unsafe_hash/frozen  |  {Color(), {Color()}} -> {Color(r=0, g=0, b=0)}    |  __hash__                               |
| Immutable            |  frozen + eq         |  Color().r = 10 -> TypeError                       |  __setattr__, __delattr__               |
|                      |                      |                                                    |                                         |
| Unpacking+           |  -                   |  r, g, b = Color()                                 |   __iter__                              |
| Optimization+        |  -                   |  sys.getsizeof(SlottedColor) -> 888                |  __slots__                              |
+----------------------+----------------------+----------------------------------------------------+-----------------------------------------+

+These methods are not automatically generated and require manual implementation in a dataclass.

* __ne__ isn’t required, thus it’s not used.

Post-initialization

@dataclasses.dataclass
class RGBA:
    r : int = 0
    g : int = 0
    b : int = 0
    a : float = 1.0

    def __post_init__(self):
        self.a : int =  int(self.a * 255)


RGBA(127, 0, 255, 0.5)
# RGBA(r=127, g=0, b=255, a=127)

Inheritance

@dataclasses.dataclass
class RGBA(Color):
    a : int = 0

Conversions

Convert a dataclass to a tuple or a dict, recursively:

>>> dataclasses.astuple(Color(128, 0, 255))
(128, 0, 255)
>>> dataclasses.asdict(Color(128, 0, 255))
{'r': 128, 'g': 0, 'b': 255}

Limitations

Answered by pylang

Solution #3

The following is taken from the PEP specification:

The @dataclass generator adds methods to the class that you’d otherwise define yourself like __repr__, __init__, __lt__, and __gt__.

Answered by Mahmoud Hanafy

Solution #4

Consider the following simple class. Foo

from dataclasses import dataclass
@dataclass
class Foo:    
    def bar():
        pass  

The dir() built-in comparison is shown below. The Foo without the @dataclass decorator is on the left, while the Foo with the @dataclass decorator is on the right.

After comparing with the inspect module, here’s another diff.

Answered by prosti

Post is based on https://stackoverflow.com/questions/47955263/what-are-data-classes-and-how-are-they-different-from-common-classes