Dunder method in Python
- Environment: This article’s Python version is CPython3.9.9 from deadsnake
running on
Ubuntu 18.04
.
In Python, there are some special attributes of a class or object starting and ending with two consecutive underscores
“__
”, for example, __init__
. They are usually called “Dunder method” or “magic” method. The latter name originates from the
fact that these methods are usually fairly mysterious for Python beginners. This post will demystify
these Dunder methods through analogy of similar concepts in other general programming languages.
In the official document of Python, all the details of these special methods are categorized under the standard data model of Python: Special Method Names. Briefly speaking, there methods with double underscores, is Python’s approach to operator overloading, allowing classes to define their own behavior with respect to language operators.
Typical dunder methods
Let’s start with __str__
method. It is called by the built-in method str()
and other text-formatting method such as
print()
or format()
. It is similar to Java’s toString()
method associated with every object. The following is an example
of overriding this method.
class MyStringClass:
def __init__(self, var):
self.var = var
def __str__(self):
return "My String Class Value:" + str(self.var)
if __name__ == '__main__':
my_str_class = MyStringClass(6)
print(my_str_class)
The output is
My String Class Value:6
If you comment out the overrided __str__
method. It will output
<__main__.MyStringClass object at 0x7f8cf1777fd0>
The latter output indicates where the object is stored, but it does not show the actual value wrapped in the class. This
behavior is related to the builtin(default) method’s implementation of __str__
. In CPython’s implementation standard,
when the __str__
method is missing, the __repr__
method to obtain the object address. The __repr__
method provides
an “official” string representation of an object, mostly used by the interpreter. You can verify
it with the following definition:
class MyStringClass:
def __init__(self, var):
self.var = var
def __new__(cls, *args, **kwargs):
cls.__init__()
if __name__ == '__main__':
my_str_class = MyStringClass(6)
print(my_str_class)
print(hex(id(my_str_class)))
Note:
- This above behavior is only related to the standard CPython’s implementation. Other Python interpreters might have their own behaviors.
Other Simple cases
Similar to __str__(self)
, there are other simple cases of these dunder methods that can be invoked through a built-in method.
For example, hash()
calls __hash__(self)
method, and it is also used
in set
or fronzenset
; bool()
calls the __bool__(self)
method of a instantiated object; etc. These methods are similar to
XYZ-able
interfaces in Java world. When any class implements these interfaces, they need to provide a corresponding implementations
explicitly in that subclass. In Python, things are much simpler(… or more error-prone at runtime, :P) thanks to dynamic typing.
Operator overloading
Special methods also allows operator overloading or keywords overloading, such as +
, +=
, in
etc. These semantics
are called “emulating” in Python. Refer to:
- Emulating Numeric Types:
Number
class in Java - Emulating Callables:
Function<T, R>
interface in Java - Emulating Generics: reflections in Java
- Emulating Container Types:
Iterable
interface in Java
These relations between Java concepts and Python concepts are not precisely 1-1 related. However, you can find some similarities that help you understand how Python works internally.
Python even allows you to overload with
statement as context managers.
It comes with a pair of methods: __enter__
and __exit__
. This allows you to have a full control over closeable resources within a specialized class,
similar to try(...){ }
semantics in Java. The following is a mock example:
class SafeDatabaseConnection:
def __enter__(self):
# initialize a db connection and return it
...
return self.dbconn
def __exit__(self, exc_type, exc_value, traceback):
# close the db connection after context expires.
self.dbconn.close()
if __name__ == '__main__':
with SafeDatabaseConnection() as mydbconn:
# do something
pass
Higher-level of designs
Overall, these dunder methods are simple but super powerful. They allow you to program in an objective-oriented way like
many other OOP languages. Instead of inheriting base classes or implementing interfaces like XYZable
(what you must do in Java),
you can simply override dunder methods in your own class and use built-in functions
or keywords such as with
or in
immediately with your own customized classes.
If you think about this problem in the other way, these built-in
functions are not only simply “functions”, but also
genuinely built-in “patterns”. Many Python libraries follow these standard “functions/patterns” to design their software structure. For example,
Keras Layer is a callable in Python.
Understanding and programming in these “built-in” patterns let your code follow Python’s conventions. Consider the following
class definition. If you define a 2-D point with (x, y)
coordinate, naturally you want to let them become add-able.
Then you can override the __add__
method to allow the operator +
to be able to compose them in a straight-forward way.
class Point:
def __init__(self, x, y):
self.x = x
self.y = y
def __add__(self, other):
return Point(self.x + other.x, self.y +other.y)
def __str__(self):
return "x_coor:{}, y_coor:{}".format(self.x, self.y)
if __name__ == '__main__':
p1 = Point(1, 2)
p2 = Point(3, 4)
print(p1 + p2)
# output "x_coor:4, y_coor:6"
These pre-defined methods encourage programmers to think more about the inherent algebra relations through these built-in functions, and how to fit these inherent algebra relations into the built-in functions/paradigms.
Controlled class initialization
Python has very few “black” magic when instantiating a class. There is no implicit initialization order of attributes when you
initialize a derived class. If anything from the base class is needed, for example, calling base-class constructors, you
will have to use super().__init__(...)
explicitly.
Basics: object lifecycles with __init__
, __new__
and __del__
__new__(cls, [...])
is a static method that takes the class of which an instance was requested as its first argument.
The remaining arguments are those passed to the object constructor expression and the return value is usually an instance
of the object. There might be an exception if you customize the class initialization process (see below).
- If
__new__()
is invoked during object construction and it returns an instance of cls, then the new instance’s__init__()
method will be invoked like__init__(self[, ...])
, where self is the new instance and the remaining arguments are the same as were passed to the object constructor.
- If
__new__()
does not return an instance of cls, then the new instance’s__init__()
method will not be invoked.
__init__(self, [...])
is probably the most popular dunder method across Python projects. It is called after an instance is created by the__new__
method but before returning to the caller. If a base class has an__init__()
method, the derived class’s__init__()
method, if any, must explicitly call it to ensure proper initialization of the base class part of the instance; for example:super().__init__([args...])
.
__del__(self)
is called before the instance is destroyed, e.g. collected by Garbage Collector. Many folks, especially those
with C++ background, might call it as “destructor” but the newest Python doc suggests that it is not a good name because
any languages with GC is not suitable for RAII (Resource Acquisition Is Initialization), as the instance’s deletion time
is not under programmer’s control. (See Destructor)
Note:
del x
doesn’t directly callx.__del__()
— the former decrements the reference count for x by one, and the latter is only called when x’s reference count reaches zero. By design,del
is a keyword instead of a built-in function to avoid confusion.__del__
and__delete__
are different! The latter is used to delete an attribute of Python object.
Advanced: Customize the class initialization
Caution: Avoid using this feature in your work unless you know the exact effect of your code. Customizing class initialization makes your code deviates from Python standard and usually do more harm than good. However, understanding how it works might help you understand Python internals libraries better.
These dunder methods allow us to revisit and revise the standard Python class creation.
It leverages two major methods to modify classes.
__set_name__(self, owner, name)
This method is automatically called at the time the owning class owner
is created.
The object has been assigned to name
in class owner
. Here is a pair of examples of how this method is invoked from
Python’s official doc.
class A:
x = C() # This is a class varaible bound to the class itself. Automatically calls: x.__set_name__(A, 'x')
class A:
pass
c = C()
A.x = c # The hook is not called. But you injected a class variable to the class itself.
c.__set_name__(A, 'x') # Manually invoke the hook
__init_subclass__(cls)
This method is called while the a subclass of a class is initialized.
Metaclass
The above two methods are only elementary approach to manipulate single changes of class initialization. To make class
initialization more systematic, metaclass
comes into the play. It is also probably the most useful piece of “customizing”
your own class definition.
Before diving into what metaclass
is, let’s look at what class
is. My favorite definition is from the book
Essential Scala
Class provides a template for creating objects.
In Python, the instantiated instance of a class is object. Moreover, the Class definition itself is also an object. The second part is critical for us if we want to templatize some code because we can create another layer of “template” of our code. Metaclass is built to create such second layer. In other words,
Metaclass is a template for creating classes.
To define a metaclass, you need to derive from type
instead of object
.
class Meta(type):
def __new__(cls, name, bases, dct):
x = super().__new__(cls, name, bases, dct)
x.meta_name = "my_meta"
return x
Then you’re free to create a list of classes out of the Meta
:
class Class1(metaclass=Meta):
pass
class Class2(metaclass=Meta):
pass
After that, the class attributes can be fetched directly, or from instantiated instances.
c1 = Class1()
c2 = Class2()
print(c1.meta_name) # my_meta
print(Class1.meta_name) # my_meta