Dunder method in Python

Environment: This article’s Python version is CPython3.9.9 from deadsnake running on Ubuntu 18.04.

In Python, there are some special attributes of a class or object starting and ending with two consecutive underscores “__”, for example, __init__. They are usually called “Dunder method” or “magic” method. The latter name originates from the fact that these methods are usually fairly mysterious for Python beginners. This post will demystify these Dunder methods through analogy of similar concepts in other general programming languages.

In the official document of Python, all the details of these special methods are categorized under the standard data model of Python: Special Method Names. Briefly speaking, there methods with double underscores, is Python’s approach to operator overloading, allowing classes to define their own behavior with respect to language operators.

Typical dunder methods

Let’s start with __str__ method. It is called by the built-in method str() and other text-formatting method such as print() or format(). It is similar to Java’s toString() method associated with every object. The following is an example of overriding this method.

class MyStringClass:
    def __init__(self, var):
        self.var = var
        
    def __str__(self):
        return "My String Class Value:" + str(self.var)
    
if __name__ == '__main__':
    my_str_class = MyStringClass(6)
    print(my_str_class)

The output is

My String Class Value:6

If you comment out the overrided __str__ method. It will output

<__main__.MyStringClass object at 0x7f8cf1777fd0>

The latter output indicates where the object is stored, but it does not show the actual value wrapped in the class. This behavior is related to the builtin(default) method’s implementation of __str__. In CPython’s implementation standard, when the __str__ method is missing, the __repr__ method to obtain the object address. The __repr__ method provides an “official” string representation of an object, mostly used by the interpreter. You can verify it with the following definition:

class MyStringClass:
    def __init__(self, var):
        self.var = var
    
    def __new__(cls, *args, **kwargs):
        cls.__init__()

if __name__ == '__main__':
    my_str_class = MyStringClass(6)
    print(my_str_class)
    print(hex(id(my_str_class)))

Note:

This above behavior is only related to the standard CPython’s implementation. Other Python interpreters might have their own behaviors.

Other Simple cases

Similar to __str__(self), there are other simple cases of these dunder methods that can be invoked through a built-in method. For example, hash() calls __hash__(self) method, and it is also used in set or fronzenset; bool() calls the __bool__(self) method of a instantiated object; etc. These methods are similar to XYZ-able interfaces in Java world. When any class implements these interfaces, they need to provide a corresponding implementations explicitly in that subclass. In Python, things are much simpler(… or more error-prone at runtime, :P) thanks to dynamic typing.

Operator overloading

Special methods also allows operator overloading or keywords overloading, such as +, +=, in etc. These semantics are called “emulating” in Python. Refer to:

Emulating Numeric Types: Number class in Java
Emulating Callables: Function<T, R> interface in Java
Emulating Generics: reflections in Java
Emulating Container Types: Iterable interface in Java

These relations between Java concepts and Python concepts are not precisely 1-1 related. However, you can find some similarities that help you understand how Python works internally.

Python even allows you to overload with statement as context managers. It comes with a pair of methods: __enter__ and __exit__. This allows you to have a full control over closeable resources within a specialized class, similar to try(...){ } semantics in Java. The following is a mock example:

class SafeDatabaseConnection:

    def __enter__(self):
        # initialize a db connection and return it
        ...
        return self.dbconn

    def __exit__(self, exc_type, exc_value, traceback):
        # close the db connection after context expires.
        self.dbconn.close()

if __name__ == '__main__':
    with SafeDatabaseConnection() as mydbconn:
        # do something
        pass

Higher-level of designs

Overall, these dunder methods are simple but super powerful. They allow you to program in an objective-oriented way like many other OOP languages. Instead of inheriting base classes or implementing interfaces like XYZable (what you must do in Java), you can simply override dunder methods in your own class and use built-in functions or keywords such as with or in immediately with your own customized classes.

If you think about this problem in the other way, these built-in functions are not only simply “functions”, but also genuinely built-in “patterns”. Many Python libraries follow these standard “functions/patterns” to design their software structure. For example, Keras Layer is a callable in Python. Understanding and programming in these “built-in” patterns let your code follow Python’s conventions. Consider the following class definition. If you define a 2-D point with (x, y) coordinate, naturally you want to let them become add-able. Then you can override the __add__ method to allow the operator + to be able to compose them in a straight-forward way.

class Point:

    def __init__(self, x, y):
        self.x = x
        self.y = y

    def __add__(self, other):
        return Point(self.x + other.x, self.y +other.y)

    def __str__(self):
        return "x_coor:{}, y_coor:{}".format(self.x, self.y)

if __name__ == '__main__':
    p1 = Point(1, 2)
    p2 = Point(3, 4)
    print(p1 + p2)
    # output "x_coor:4, y_coor:6"

These pre-defined methods encourage programmers to think more about the inherent algebra relations through these built-in functions, and how to fit these inherent algebra relations into the built-in functions/paradigms.

Controlled class initialization

Python has very few “black” magic when instantiating a class. There is no implicit initialization order of attributes when you initialize a derived class. If anything from the base class is needed, for example, calling base-class constructors, you will have to use super().__init__(...) explicitly.

Basics: object lifecycles with `init`, `new` and `del`

__new__(cls, [...]) is a static method that takes the class of which an instance was requested as its first argument. The remaining arguments are those passed to the object constructor expression and the return value is usually an instance of the object. There might be an exception if you customize the class initialization process (see below).

If __new__() is invoked during object construction and it returns an instance of cls, then the new instance’s __init__() method will be invoked like __init__(self[, ...]), where self is the new instance and the remaining arguments are the same as were passed to the object constructor.

If __new__() does not return an instance of cls, then the new instance’s __init__() method will not be invoked.

__init__(self, [...]) is probably the most popular dunder method across Python projects. It is called after an instance is created by the __new__ method but before returning to the caller. If a base class has an __init__() method, the derived class’s __init__() method, if any, must explicitly call it to ensure proper initialization of the base class part of the instance; for example: super().__init__([args...]).

__del__(self) is called before the instance is destroyed, e.g. collected by Garbage Collector. Many folks, especially those with C++ background, might call it as “destructor” but the newest Python doc suggests that it is not a good name because any languages with GC is not suitable for RAII (Resource Acquisition Is Initialization), as the instance’s deletion time is not under programmer’s control. (See Destructor)

Note:

del x doesn’t directly call x.__del__() — the former decrements the reference count for x by one, and the latter is only called when x’s reference count reaches zero. By design, del is a keyword instead of a built-in function to avoid confusion.
__del__ and __delete__ are different! The latter is used to delete an attribute of Python object.

Advanced: Customize the class initialization

Caution: Avoid using this feature in your work unless you know the exact effect of your code. Customizing class initialization makes your code deviates from Python standard and usually do more harm than good. However, understanding how it works might help you understand Python internals libraries better.

These dunder methods allow us to revisit and revise the standard Python class creation.

It leverages two major methods to modify classes.

`__set_name__(self, owner, name)`

This method is automatically called at the time the owning class owner is created. The object has been assigned to name in class owner. Here is a pair of examples of how this method is invoked from Python’s official doc.

class A:
    
    x = C()  # This is a class varaible bound to the class itself. Automatically calls: x.__set_name__(A, 'x')

class A:
   pass

c = C()
A.x = c                  # The hook is not called. But you injected a class variable to the class itself. 
c.__set_name__(A, 'x')   # Manually invoke the hook

`__init_subclass__(cls)`

This method is called while the a subclass of a class is initialized.

Metaclass

The above two methods are only elementary approach to manipulate single changes of class initialization. To make class initialization more systematic, metaclass comes into the play. It is also probably the most useful piece of “customizing” your own class definition.

Before diving into what metaclass is, let’s look at what class is. My favorite definition is from the book Essential Scala

Class provides a template for creating objects.

In Python, the instantiated instance of a class is object. Moreover, the Class definition itself is also an object. The second part is critical for us if we want to templatize some code because we can create another layer of “template” of our code. Metaclass is built to create such second layer. In other words,

Metaclass is a template for creating classes.

To define a metaclass, you need to derive from type instead of object.

class Meta(type):
    def __new__(cls, name, bases, dct):
        x = super().__new__(cls, name, bases, dct)
        x.meta_name = "my_meta"
        return x

Then you’re free to create a list of classes out of the Meta:

class Class1(metaclass=Meta):
    pass

class Class2(metaclass=Meta):
    pass

After that, the class attributes can be fetched directly, or from instantiated instances.

c1 = Class1()
c2 = Class2()
print(c1.meta_name)     # my_meta
print(Class1.meta_name) # my_meta