18 Is a Metaclass a Pandora's Box or an Aladdin's Lamp

18 Is a Metaclass a Pandora’s Box or an Aladdin’s Lamp #

Hello, I am Cai Yuannan, the author of the column “Real-world Large-scale Data Processing” on Geek Time. Today I want to share with you the topic: Metaclass - Pandora’s Box or Aladdin’s Lamp?

Python has many black magic features, such as metaclass, which I will share with you today. I know many people who have two extreme views on these language features.

  • Some people think these language features are very powerful, like an Aladdin’s Lamp with unlimited capabilities. They believe that one must find an opportunity to use them in order to demonstrate their Python skills.
  • Others believe that these language features are too dangerous and can lead to abuse. They think that once opened, these “demons” will be unleashed, making the entire codebase difficult to maintain.

Both views have their merits, but they are also superficial. Today, I will show you whether metaclass is a Pandora’s Box or an Aladdin’s Lamp.

Many Chinese books on the market translate metaclass as “元类” (yuan class). I have always thought that this translation is terrible, so I don’t want to use the term “元类” here. Because if we understand it literally, “元” means “origin” or “basic”, so “元类” would make people think of “basic class”. Does Python’s metaclass refer to the Object in Python 2? This is confusing.

In fact, the prefix “meta” in meta-class comes from the Greek word “meta” and has two meanings:

  1. “Beyond”, for example, the technical term “metadata” refers to data that goes beyond the data it describes.
  2. “Change”, for example, the technical term “metamorphosis” refers to a change in form.

As the name suggests, metaclass actually contains the meanings of “beyond class” and “transforming class”. It has nothing to do with the meaning of “basic class”. Therefore, to deeply understand metaclass, we need to focus on its “transcendent transformation” capabilities. Next, I will delve into the transcendent transformation capabilities of metaclass and explain what metaclass is used for, how it is applied, how it is implemented in the Python language design, and the risks of using metaclass.

What is the usefulness of the transcendent metamorphosis feature of metaclass? #

YAML is a well-known Python tool that allows for easy serialization and deserialization of structured data. One transcendent metamorphosis capability of YAMLObject is that any of its subclasses support serialization and deserialization. For example, consider the following code:

class Monster(yaml.YAMLObject):
  yaml_tag = u'!Monster'
  def __init__(self, name, hp, ac, attacks):
    self.name = name
    self.hp = hp
    self.ac = ac
    self.attacks = attacks
  def __repr__(self):
    return "%s(name=%r, hp=%r, ac=%r, attacks=%r)" % (
       self.__class__.__name__, self.name, self.hp, self.ac,      
       self.attacks)

yaml.load("""
--- !Monster
name: Cave spider
hp: [2,6]    # 2d6
ac: 16
attacks: [BITE, HURT]
""")

Monster(name='Cave spider', hp=[2, 6], ac=16, attacks=['BITE', 'HURT'])

print yaml.dump(Monster(
    name='Cave lizard', hp=[3,6], ac=16, attacks=['BITE','HURT']))

# Output
!Monster
ac: 16
attacks: [BITE, HURT]
hp: [3, 6]
name: Cave lizard

Where is the specialized functionality of YAMLObject demonstrated?

You see, by calling the unified yaml.load(), any YAML sequence can be loaded into a Python object, while calling yaml.dump() can serialize a YAMLObject subclass. For the users of load() and dump(), they do not need to know any type information in advance, which makes hyper-dynamic configuration programming possible. In my practical experience, many large-scale projects require the application of this hyper-dynamic configuration concept.

For example, in a large-scale project for a smart voice assistant, we have 10,000 voice dialogue scenarios, each developed by a different team. As a core member of the smart voice assistant team, it is impossible for me to understand the implementation details of each sub-scenario.

When dynamically configuring experiments with different scenarios, it is often the case that today I want to experiment with the configuration of scenarios A and B, and tomorrow with scenarios B and C. The configuration file alone can have tens of thousands of lines of code, which is quite a workload. However, by applying this dynamic configuration concept, I can have the engine dynamically load the Python classes needed based on my text configuration file.

For users of YAML, this is also very convenient. By simply inheriting yaml.YAMLObject, you can give your Python object serialization and deserialization capabilities. Isn’t this a bit “abnormal” and a bit “transcendent” compared to ordinary Python classes?

In fact, I have come across many Python developers at Google and found that perhaps only 10% of them can explain the advantages of YAML-like design patterns. And those who know that similar dynamic serialization/deserialization functions, like YAML, are implemented using metaclass, are even rarer, perhaps only 1% percent.

How to Use the Overt Transformation Feature of metaclass? #

Just mentioned earlier, probably only 1% of Python developers know that the dynamic serialization/deserialization of YAML is implemented by metaclass. If you ask further, only 0.1% of people may be able to explain how YAML uses metaclass to implement dynamic serialization/deserialization functionality.

Due to space limitations, let’s focus on the load() function of YAMLObject. Simply put, we need a global register so that YAML knows that the !Monster in the serialized text needs to be loaded into the Python type Monster.

A natural idea is to create a global variable called registry and register all YAMLObjects that need deserialization into it. For example, like this:

registry = {}

def add_constructor(target_class):
    registry[target_class.yaml_tag] = target_class

Then, add the following line of code after the definition of the Monster class:

add_constructor(Monster)

However, the disadvantage is obvious. For users of YAML, after defining each YAML serializable class Foo, they need to add the line add_constructor(Foo). This undoubtedly adds trouble for developers and is more error-prone as developers may easily forget about it.

So, what is a better implementation? If you have looked at the source code of YAML, you will find that metaclass solves this problem.

# Common part for Python 2/3
class YAMLObjectMetaclass(type):
    def __init__(cls, name, bases, kwds):
        super(YAMLObjectMetaclass, cls).__init__(name, bases, kwds)
        if 'yaml_tag' in kwds and kwds['yaml_tag'] is not None:
            cls.yaml_loader.add_constructor(cls.yaml_tag, cls.from_yaml)
    # Rest of the definition omitted

# Python 3
class YAMLObject(metaclass=YAMLObjectMetaclass):
    yaml_loader = Loader
    # Rest of the definition omitted

# Python 2
class YAMLObject(object):
    __metaclass__ = YAMLObjectMetaclass
    yaml_loader = Loader
    # Rest of the definition omitted

You can see that YAMLObject declares the metaclass as YAMLObjectMetaclass, although the declaration method is slightly different in Python 2 and 3. In YAMLObjectMetaclass, the following line of code is where the magic happens:

cls.yaml_loader.add_constructor(cls.yaml_tag, cls.from_yaml)

YAML applies metaclass and intercepts the definition of all YAMLObject subclasses. In other words, when you define any YAMLObject subclass, Python forcefully inserts the following code at runtime to automatically add add_constructor(Foo), which we wanted before.

cls.yaml_loader.add_constructor(cls.yaml_tag, cls.from_yaml)

Therefore, users of YAML do not need to manually write add_constructor(Foo). Isn’t it actually simple?

By now, we have mastered the usage of metaclass and surpassed 99.9% of Python developers in the world. Furthermore, if you can deeply understand how metaclass is implemented at the language design level in Python, you will be a rare “Python master” in the world.

How is metaclass implemented in the low-level design of Python? #

Earlier, we mentioned that metaclasses can intercept the definition of Python classes. How does it achieve this?

To understand the underlying mechanism of metaclasses, you need to have a deep understanding of Python’s type model. Below, I will explain it in three points.

First, all user-defined classes in Python are instances of the type class. #

It may surprise you, but the fact is that a class itself is just an instance of a class named type. In the world of Python types, the type class is like the god of creation. This can be verified in the code:

# Python 3 is similar to Python 2
class MyClass:
  pass

instance = MyClass()

print(type(instance))
# Output:
<class '__main__.C'>

print(type(MyClass))
# Output:
<class 'type'>

As you can see, instance is an instance of MyClass, and MyClass is simply an instance of the “god” type.

Second, user-defined classes are just the __call__ operator overload of the type class. #

When we end a class definition statement, what actually happens is that Python calls the __call__ operator of type. In simple terms, when you define a class, like this:

class MyClass:
  data = 1

What Python actually executes is the following code:

class = type(classname, superclasses, attributedict)

The type(classname, superclasses, attributedict) on the right side of the equal sign is the __call__ operator overload of type, and it further calls:

type.__new__(typeclass, classname, superclasses, attributedict)
type.__init__(class, classname, superclasses, attributedict)

Of course, all of this can be verified with code, like in the following code example:

class MyClass:
  data = 1

instance = MyClass()
print(MyClass, instance)
# Output:
# (__main__.MyClass, <__main__.MyClass instance at 0x7fe4f0b00ab8>)
print(instance.data)
# Output:
# 1

MyClass = type('MyClass', (), {'data': 1})
instance = MyClass()
print(MyClass, instance)
# Output:
# (__main__.MyClass, <__main__.MyClass at 0x7fe4f0aea5d0>)
print(instance.data)
# Output:
# 1

As you can see, the normal definition of MyClass is completely identical to manually calling the type operator.

Third, metaclass is a subclass of type and “transcends” normal classes by replacing the __call__ operator overload mechanism of type. #

In fact, once you set the metaclass of a type MyClass to MyMeta, MyClass is no longer created by the original type, but it will call the __call__ operator overload of MyMeta.

class = type(classname, superclasses, attributedict)
# becomes
class = MyMeta(classname, superclasses, attributedict)

This is why, in the YAML example above, using the __init__ method of YAMLObjectMetaclass, we can secretly execute add_constructor() for all subclasses of YAMLObject.

Risks of Using Metaclass #

In the previous sections, I have been discussing the principles and advantages of metaclass. It is indeed true that only when you deeply understand the essence of metaclass can you use it effectively. Unfortunately, as I mentioned earlier, less than 0.1% of Python developers have a deep understanding of metaclass.

However, there are always pros and cons, especially when it comes to something as “game-changing” as metaclass. As you can see, metaclass can “distort” the normal Python type model. Therefore, if used carelessly, the risks it poses to the entire codebase are immeasurable.

In other words, metaclass is only meant for a small number of Python developers who work on developing Python libraries at the framework level. At the application level, metaclass is often not the best choice.

It is precisely because of this that, to the best of my knowledge, many top Silicon Valley companies require special permission to use Python metaclass.

Summary #

In this lesson, we have dissected the source code of YAML, focusing on the original intention of metaclass design which is to “go beyond transformations”, and have analyzed the scenarios and methods of using metaclasses. Moreover, we have delved further into the design level of the Python language and understood the implementation mechanism of metaclasses.

As the title suggests, metaclass is a language feature of Python that can be considered as “black magic”. Heaven and hell are just a step away - if used properly, metaclass can achieve magical features like YAML; however, if used improperly, it may open Pandora’s box.

Therefore, the content of today not only helps students in need to have a deeper understanding of metaclasses and better grasp and apply them, but also provides popular science and warnings to beginners: do not attempt metaclasses lightly.

Thought Question #

After learning about Python decorators in the previous lesson and metaclasses in this lesson, you now know that both of them can intervene in Python’s normal type mechanism. So, what do you think is the difference between decorators and metaclasses? Please feel free to leave a comment and discuss with me.