Python 系列学习十六:descriptor 官文解读

前言

打算写一系列文章来记录自己学习 Python 3 的点滴;本章主要介绍 Python 面向对象编程中有关 descriptor 的官方文档的相关内容;

正如在介绍的相关定义的时候,我们知道了,类的定义中有 @staticmethod、@classmethod 以及实例方法,但是有个疑问是,@classmethod 和实例方法中的第一个参数,cls 和 self 在执行过程中是怎么自动传入方法体内的呢?答案就是 descriptor,也是本文笔者将重点介绍的内容;本文将会以 Python 3.0 官文为主来进行描述;

本文为作者的原创作品,转载需注明出处;

Descriptor 是什么

In general, a descriptor is an object attribute with “binding behavior”, one whose attribute access has been overridden by methods in the descriptor protocol. Those methods are __get__(), __set__(), and __delete__(). If any of those methods are defined for an object, it is said to be a descriptor.

上述便是官文的描述了,大致翻译如下,

通俗上来讲,一个 descriptor 就是一个拥有“绑定行为”的对象属性,该属性的访问方式被 descriptor 中的一系列协议方法所覆盖了;这些方法是,__get__(), __set__(), 和 __delete__();如果有任何一个方法在某个对象中有定义,那么该对象就被称作为descriptor

什么意思呢?有两层含义,

  1. 比如有一个对象 $O$ 的属性 $A$,该属性本身也是一个对象,$A$ 对象实现了 __get__(), __set__(), 或者 __delete__() 中的任意一个方法,那么对象 $O$ 的属性 $A$ 就是一个descriptor
  2. 通过 $O$ 访问 $A$ 将会通过descriptor的协议方法 __get__(), __set__(), 和 __delete__() 进行拦截调用;

The default behavior for attribute access is to get, set, or delete the attribute from an object’s dictionary. For instance, a.x has a lookup chain starting with a.__dict__['x'], then type(a).__dict__['x'], and continuing through the base classes of type(a) excluding metaclasses. If the looked-up value is an object defining one of the descriptor methods, then Python may override the default behavior and invoke the descriptor method instead. Where this occurs in the precedence chain depends on which descriptor methods were defined.

上述的描述依然引用自官文,大致翻译如下,

默认属性的访问方式是以从 object 的字典( dictionary )中进行 get,set 和删除操作得以实现的;比如,a.x的调用过程有一个从a.__dict__['x'],再到type(a).__dict__['x'],直到所有type(a)的父类(除了元类)的查找链条,如果在查找过程中,发现某一个属性(既原文中的 looked-up value,这里实际指代的是原文中的 _x_ )定义了与 descriptor 相关的方法,然后,Python 就会覆盖默认的取值行为,取而代之将会调用 descriptor 的相关方法来进行;

什么意思呢?上面的这段话比较详细的概括了 descriptor 方法是如何拦截对象属性的调用过程的,当对象在调用某个属性的时候会通过一个查找链条去查找该属性,这里有两种情况,如果该属性没有实现 descriptor 方法,那么就直接返回该属性,如果该属性实现了 descriptor 方法,那么调用过程将会被 descriptor 方法取代(或称作拦截),然后再通过 descriptor 方法返回相关的结果;

Descriptors are a powerful, general purpose protocol. They are the mechanism behind properties, methods, static methods, class methods, and super(). They are used throughout Python itself to implement the new style classes introduced in version 2.2. Descriptors simplify the underlying C-code and offer a flexible set of new tools for everyday Python programs.

Descriptor 是非常强大的且是通用的协议;它是 properties,methods,静态方法,类方法以及 super() 方法的背后原理和实现机制;他们自 Python 2.2 以后被广泛的用于实现 new style classes;

上面这段话,强调了 Descriptor 的重要性和其根本意义所在;

Descriptor 协议

descr.__get__(self, obj, type=None) --> value

descr.__set__(self, obj, value) --> None

descr.__delete__(self, obj) --> None

由此可知,descriptor 的协议主要由上述三个内置方法所构成;( --> 表示返回值 )

If an object defines both __get__() and __set__(), it is considered a data descriptor. Descriptors that only define __get__() are called non-data descriptors (they are typically used for methods but other uses are possible).

如果一个对象同时定义了__get()____set__(),该对象被定义为 data descriptor;如果只实现了__get__()方法,该对象就是一个 non-data descriptors;

Data and non-data descriptors differ in how overrides are calculated with respect to entries in an instance’s dictionary. If an instance’s dictionary has an entry with the same name as a data descriptor, the data descriptor takes precedence. If an instance’s dictionary has an entry with the same name as a non-data descriptor, the dictionary entry takes precedence.

Data descriptor 和 non-data descriptor 的区别在于覆盖在实例的字典中是如何进行的;如果一个实例的字典中的某个元素有同名的 data descriptor,那么 data descriptor 将会被优先选择;如果一个实例的字典中的某个元素有同名的 non-data descriptor,那么字典中的元素将会被优先选择;

按照英文原文翻译,得到的就是上面这个字面的意思,但是,里面却包含了歧义,我们知道字典通常相当于是一个 Map,怎么可能有同名的元素?一个 Map 只能包含一个 key 才对呀… 所以,应当不存在多个相同的 key 才对,那么又怎么来的同名呢?其实不然,同一个属性名可以是实例方法/变量,也可以是类方法/属性,也可以是静态方法/属性,这样,在同一个字典中就会有重名的情况发现(至于如何在字典中实现重名的逻辑不是本文考察的重点);

To make a read-only data descriptor, define both __get__() and __set__() with the __set__() raising an AttributeError when called. Defining the __set__() method with an exception raising placeholder is enough to make it a data descriptor.

如果需要定义一个只读的 data descriptor,覆盖实现 __get__()__set__(),并且让__set__()抛出异常即可;

笔者总结:在读完官文的这段描述以后,我的感悟是,在定义 descriptor 的时候,最好是定义 data descriptor 而不要定义成 non-data descriptor,毕竟 non-data descriptor 不被 Python 编译器有限考虑;

Descriptors 调用

A descriptor can be called directly by its method name. For example, d.__get__(obj).

一个 descriptor 可以直接通过方法名进行调用;比如,采用 d.__get__(obj) 的方式;

Alternatively, it is more common for a descriptor to be invoked automatically upon attribute access. For example, obj.d looks up d in the dictionary of obj. If d defines the method __get__(), then d.__get__(obj) is invoked according to the precedence rules listed below.

不过,一种更为普遍的方式是在属性被访问的过程中自动去调用 descriptor 所对应的方法;比如,obj.d将会从obj的字典中去查找d,如果d定义了 __get__() 方法,然后 d.__get__(obj) 方法将会通过如下的规则被调用;

The details of invocation depend on whether obj is an object or a class.

如何调用,取决于obj是一个实例还是一个类;

For objects, the machinery is in object.__getattribute__() which transforms b.x into type(b).__dict__['x'].__get__(b, type(b)). The implementation works through a precedence chain that gives data descriptors priority over instance variables, instance variables priority over non-data descriptors, and assigns lowest priority to __getattr__() if provided. The full C implementation can be found in PyObject_GenericGetAttr() in Objects/object.c.

如果是实例,背后的机制是通过object.__getattribute__()b.x转换为type(b).__dict__['x'].__get__(b, type(b));内部的实现机制是通过一个这样一个优先链的顺序,data descriptors 先于实例变量,而实例变量优先于 non-data descriptors,而 __getattr__() 的优先级是最低的;具体的实现逻辑可以参考源码 Objects/object.c 中的方法 PyObject_GenericGetAttr()

or classes, the machinery is in type.__getattribute__() which transforms B.x into B.__dict__['x'].__get__(None, B). In pure Python, it looks like:

如果是类,背后的机制是通过type.__getattribute__()B.x转换为B.__dict__['x'].__get__(None, B);它的逻辑用 Python 代码模拟为(备注,原生逻辑是通过 c 实现的),

1
2
3
4
5
6
def __getattribute__(self, key):
"Emulate type_getattro() in Objects/typeobject.c"
v = object.__getattribute__(self, key)
if hasattr(v, '__get__'):
return v.__get__(None, self)
return v

OK,上述代码非常清晰的阐述了B.x是如何通过type.__getattribute__()转换为B.__dict__['x'].__get__(None, B)的,我们一步一步的来分析

  1. 首先B.x调用被 Python 解释器解释为type.__getattribute__(B, 'x')的调用,于是调用开始,开始进行转换;
  2. 代码第 3 行,这一行其实实现的就是B.__dict__['x']部分逻辑,从B的字典中去查找属性x
  3. 代码第 4 - 6 行,这里实现的就是__get__(None, B)部分逻辑,如果属性x实现了__get__方法,那么就返回v.__get__(None, self),如果没有实现,则直接返回属性x

注意,上述的代码并不全,只涵盖了__get__()方法的逻辑,不过__set__()__delete()__的相关逻辑可以很容易的类推出来;

The important points to remember are:

  • descriptors are invoked by the __getattribute__() method
  • overriding __getattribute__() prevents automatic descriptor calls
  • object.__getattribute__() and type.__getattribute__() make different calls to __get__().
  • data descriptors always override instance dictionaries.
  • non-data descriptors may be overridden by instance dictionaries.

需要尤其重点注意并牢记的是:

  • descriptors 是通过__getattribute__()方法进行调用的;
  • 重载__getattribute__()方法将会阻止上述 descriptor 的自动调用的特性;
  • object.__getattribute__()type.__getattribute__()通过不同的调用去调用__get__();(OK,上面的例子中,有 Python 对type.__getattribute__()实现的模拟代码,其实object.__getattribute__()也可以很容的被推导出来! )
  • data descriptors 通常会覆盖实例字典(instance dictionaries)
  • non-data descriptors 通常会被实例字典(instance dictionaries)所覆盖

The object returned by super() also has a custom __getattribute__() method for invoking descriptors. The call super(B, obj).m() searches obj.__class__.__mro__ for the base class A immediately following B and then returns A.__dict__['m'].__get__(obj, B). If not a descriptor, m is returned unchanged. If not in the dictionary, m reverts to a search using object.__getattribute__().

通过super()返回的对象也有一个用来调用 descriptors 的方法__getattribute__();调用super(B, obj).m()将会快速的通过检索obj.__class__.__mro__找到B的基类A,然后返回A.__dict__['m'].__get__(obj, B);如果不是 descriptor,m 将不会发生改变;如果 m 不再字典中,将会使用object.__getattribute__()进行调用;

The implementation details are in super_getattro() in Objects/typeobject.c. and a pure Python equivalent can be found in Guido’s Tutorial.

The details above show that the mechanism for descriptors is embedded in the getattribute() methods for object, type, and super(). Classes inherit this machinery when they derive from object or if they have a meta-class providing similar functionality. Likewise, classes can turn-off descriptor invocation by overriding __getattribute__().

Descriptor 例子

The following code creates a class whose objects are data descriptors which print a message for each get or set. Overriding __getattribute__() is alternate approach that could do this for every attribute. However, this descriptor is useful for monitoring just a few chosen attributes:

1
2
3
4
5
6
7
8
9
10
11
12
13
class RevealAccess(object):
"""A data descriptor that sets and returns values
normally and prints a message logging their access.
"""
def __init__(self, initval=None, name='var'):
self.val = initval
self.name = name
def __get__(self, obj, objtype):
print('Retrieving', self.name)
return self.val
def __set__(self, obj, val):
print('Updating', self.name)
self.val = val

RevealAccess 实现了 descriptor 协议方法 __get____set__,所以,按照定义 RevealAccess 本身是一个 data descriptor;

1
2
3
class MyClass(object):
x = RevealAccess(10, 'var "x"')
y = 5

注意的是 MyClass 的属性 x 是一个 descriptor;看看有关 MyClass 的调用过程,

1
2
3
4
5
6
7
8
9
10
11
>>> m = MyClass()
>>> m.x
Retrieving var "x"
10
>>> m.x = 20
Updating var "x"
>>> m.x
Retrieving var "x"
20
>>> m.y
5

从上述的调用过程中我们可以清晰的看到,如果被调用对象的属性是一个 descriptor,那么在取值或者是设值的时候,将会分别调用 descriptor 对应的协议方法__get____set__进行;

Properties

TODO

Functions and Methods

Python’s object oriented features are built upon a function based environment. Using non-data descriptors, the two are merged seamlessly.

Python 面向对象编程的特性是建立在方法环境上并使用 no-data descriptors,两者无缝衔接;

Class dictionaries store methods as functions. In a class definition, methods are written using def and lambda, the usual tools for creating functions. The only difference from regular functions is that the first argument is reserved for the object instance. By Python convention, the instance reference is called self but may be called this or any other variable name.

类的字典将 methods 当做 functions 进行存储;在类定义中,methods 是通过 def 和 lambda 所定义的;唯一与普通 functions 的区别是第一个参数是为 object instance 所保留的,通常使用关键字 self 当然也可以使用 this 来作为其变量名;

我的补充,普通方法既是方法本身作为一个对象进行调用,而类方法或者实例方法是通过一个对象的属性进行调用的;而后续所介绍的 bound 和 unbound 方法的调用分别对应的就是实例方法和类方法的调用机制;

To support method calls, functions include the get() method for binding methods during attribute access. This means that all functions are non-data descriptors which return bound or unbound methods depending whether they are invoked from an object or a class. In pure python, it works like this:

为了支撑方法的调用,方法( 这里应该指的是 Function objects )当中包含了__get__()方法,用于当属性被访问的时候用来绑定方法;意思就是说,所有的 non-data descriptors 是否返回绑定或者是未绑定的 methods 取决于它们是通过 object 或者是 class 来进行调用的;用 Python 代码来模拟描绘如下,

1
2
3
4
5
class Function(object):
. . .
def __get__(self, obj, objtype=None):
"Simulate func_descr_get() in Objects/funcobject.c"
return types.MethodType(self, obj)

我的补充,可见 Function 对象本身实现了__get__(),所以它是一个 non-data descriptor;当某个方法被调用的时候,将会被该方法拦截,并通过 types.MethodType 返回一个实例或者非实例方法;

用下面的测试用例来描述一下该特性,

1
2
3
4
>>> class D(object):
... def f(self, x):
... return x
...

上述代码在类 $D$ 中定义了一个方法 $f$;

1
2
3
4
5
6
7
>>> d = D()
>>> D.__dict__['f'] # Stored internally as a function
<function f at 0x00C45070>
>>> D.f # Get from a class becomes an unbound method
<unbound method D.f>
>>> d.f # Get from an instance becomes a bound method
<bound method D.f of <__main__.D object at 0x00B18C90>>

从上述例子中可以看到,通过类调用,返回的是一个 unbound 的方法,通过实例 d 调用,返回的是一个 bounded 的方法;

While they could have been implemented that way, the actual C implementation of PyMethod_Type in Objects/classobject.c is a single object with two different representations depending on whether the im_self field is set or is NULL (the C equivalent of None).

从 c 语言的底层实现上来看,到底是 bound 还是 unbound 是通过判断im_self是否为空来实现的;

Likewise, the effects of calling a method object depend on the im_self field. If set (meaning bound), the original function (stored in the im_func field) is called as expected with the first argument set to the instance. If unbound, all of the arguments are passed unchanged to the original function. The actual C implementation of instancemethod_call() is only slightly more complex in that it includes some type checking.

总结,上述内容描述了实例方法是如何实现的,以及与非实例方法之间的区别是什么;

静态方法和类方法

Non-data descriptors provide a simple mechanism for variations on the usual patterns of binding functions into methods.

To recap, functions have a get() method so that they can be converted to a method when accessed as attributes. The non-data descriptor transforms an obj.f(*args) call into f(obj, *args). Calling klass.f(*args) becomes f(*args).

回顾一下,functions 当作为属性被访问的时候通过__get__()方法将访问过程转换为一个方法的访问过程;non-data descriptor 将obj.f(*args)的调用过程转换为f(obj, *args)的调用方式;klass.f(*args)的调用过程转换为f(*args)的方式进行调用;

我的补充,obj.f(*args)表示当方法f被当做实例对象属性进行访问的过程;klass.f(*args)表示当f被当做类对象属性进行访问的过程;

下面这张表归纳了相关绑定转换的关系;

Transformation Called from an Object Called from a Class
function f(obj, *args) f(*args)
staticmethod f(*args) f(*args)
classmethod f(type(obj), *args) f(klass, *args)

如何看上述这张表?

  1. 当 function 作为属性被访问
    比如 $obj.f(*args)$,这是通过 Object 既类实例来进行调用的,将会被转换为 $f(obj, *args)$;比如 $clz.f(*args)$,既是通过类来进行方法 $f$ 的调用,将会被转换为 $f(*args)$;
  2. 当 function 作为属性被访问,但是该方法的定义是经过 @staticmethod 进行标注的;
  3. 当 function 作为属性被访问,但是该方法的定义是经过 @classmethod 进行标注的;

静态方法

Good candidates for static methods are methods that do not reference the self variable.

一个好的静态方法的例子是,方法定义的时候不包含 self 变量;

下面来看这样一个有关静态方法的例子,在类 $E$ 中通过 staticmethod 对象将函数 $f$ 封装成一个静态方法属性;

1
2
3
4
5
def f(x):
return x

class E(object):
f = staticmethod(f)

看下调用过程,

1
2
3
4
>>> print(E.f(3))
3
>>> print(E().f(3))
3

可见,无论是通过类方法调用还是实例方法调用,返回的结果是相同的;那么静态方法是如何实现的呢?在对象方法属性的调用过程中,是如何区分它是静态方法的呢?答案就在 staticmethod 源码中了,下面是通过 $Python$ 所写出的模拟代码

1
2
3
4
5
6
class staticmethod(object):
"Emulate PyStaticMethod_Type() in Objects/funcobject.c"
def __init__(self, f):
self.f = f
def __get__(self, obj, objtype=None):
return self.f

可见,staticmethod 本身,是一个 descriptor,关键就在 __get__() 方法内部,直接返回该函数的引用;所以无论是通过类实例调用还是类调用,返回的都是同一个方法;

@staticmethod

1
2
3
4
5
def f(x):
return x

class E(object):
f = staticmethod(f)

上述的定义被 @staticmethod 简化了,简化的形式如下,

1
2
3
4
class E(object):
@staticmethod
def f(x):
return x

@staicmethod 等价于执行 $ f = staticmethod(f) $,将方法属性 $f$ 替换为 staticmethod(f) 封装后的 descriptor

类方法

Unlike static methods, class methods prepend the class reference to the argument list before calling the function. This format is the same for whether the caller is an object or a class:

与静态方法不同的是,类方法将类的引用作为方法调用的第一个参数传入;看下面这个例子,

1
2
3
4
5
def f(klass, x):
return klass.__name__, x

class E(object):
f = classmethod(f)

类似静态方法,这里通过 classmethod 对象将方法 $f$ 封装成了一个对象 E 的静态方法属性;下面看看其调用过程,

1
2
3
4
>>> print(E.f(3))
('E', 3)
>>> print(E().f(3))
('E', 3)

下面,通过 Python 来模拟实现 classmethod 对象

1
2
3
4
5
6
7
8
9
10
class classmethod(object):
"Emulate PyClassMethod_Type() in Objects/funcobject.c"
def __init__(self, f):
self.f = f
def __get__(self, obj, klass=None):
if klass is None:
klass = type(obj)
def newfunc(*args):
return self.f(klass, *args)
return newfunc

classmethod 对象是一个 descriptor,通过__get__()方法返回一个类方法既第一个参数是类的引用;

@classmethod

1
2
3
4
5
def f(klass, x):
return klass.__name__, x

class E(object):
f = classmethod(f)

同样 @classmethod 将类方法的定义简化了,

1
2
3
4
class E(object):
@classmethod
def f(klass, x):
return klass.__name__, x

同样,@classmethod 的作用就是将 E 对象的方法属性 $f$ 替换为相应的 classmethod descriptor;

我的总结

面向对象的实现方式,各有千秋,动态语言中,javascript 采用的是 prototype 的方式,既原型链的方式来实现面向对象的特性;而 Python 的实现方式却有自己的特点,descriptor 的作用既是 Python 中面向对象实现中的一个非常重要的一个环节,既是如何实现类方法、类变量、静态方法、实例方法以及实例属性,这里的答案就是本章节所介绍的 descriptor;

Reference

Python 2.0: https://docs.python.org/2/howto/descriptor.html

Python 3.0: https://docs.python.org/3/howto/descriptor.html