Exploring Dynamic Scoping in Python

Experimenting with Code Objects & Bytecode

Contents

Introduction

Ruby has anonymous code blocks, Python doesn't.

Anonymous code blocks are (apparently) an important feature in implementing DSLs, much touted by Ruby protaganists.

As far as I can tell, the major difference between code block in Ruby and functions in Python is that code blocks are executed in the current scope. You can rebind local variables in the scope in which the code block is executed. Python functions have a lexical scope, with the execption that you can't rebind variables in the enclosing scope.

Note

It turns out that this is wrong. Ruby code blocks are lexically scoped like Python functions. This article is really an exploration of dynamic scoping.

If you define a function inside a function or method which uses the variable 'x', this will be loaded from the scope in which the function was defined; not the scope in which it is executed. This is enormously useful, but perhaps not always the desired behaviour. If a function assigns to the variable 'x' this will always be inside the scope of the function and not affect the scope the function was defined in or executed in.

I thought it would be fun to try and implement this feature of anonymous code blocks for Python, using code objects. This should be a fun way to learn more about the implementation of Python scoping rules by experimenting with byte-code. If this sounds like it's a hack, then it's only because it is. It is interesting to note however that Aspect Oriented Programming is a well accepted technique in Java, and is mainly implemented at the bytecode level.

This article looks at the byte-code operations used in code objects and experiments with creating new ones. Although the details of the byte-codes are shown, no great technical knowledge should be needed to follow the article.

Code Objects

Python doesn't have code blocks. It does have code objects. These can be executed in the current scope, but they are inconvenient to create inside a program. The code must be stored as a string, compiled and then executed.

>>> x = 3
>>> codeString = "print x\nx = 7\n"
>>> codeObject = compile(codeString, '<CodeString>', 'exec')
>>> exec codeObject
3
>>> print x
7
>>>

Functions store a code object representing the body of the function as the func_code attribute. For a reference on function attributes, see the function type. The byte-code contains instructions telling the interpreter how to load and store values. It is a combination of the function attributes and the byte-code, including code object attributes, that implement the scoping rules.

You can't just execute the code object of a function:

>>> x = 3
>>> def function():
...     print x
...     x = 7
...
>>> codeObject = function.func_code
>>> exec codeObject
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "<stdin>", line 2, in function
UnboundLocalError: local variable 'x' referenced before assignment
>>>

The co_freevars attribute of the code object contains a list of the variables from the enclosing scope used by the code object. Their are various other attributes like co_varnames which tell the interpreter how to load names. For a reference on code objects, see: Code Objects (Unofficial Reference Wiki).

Code objects are immutable, or at least the interesting attributes are read only, so we can't just change the attributes we are interested in.

We can create new code objects. The documentation doesn't seem to encourage this though :

>>> x = 3
>>> from types import CodeType
>>> print CodeType.__doc__
code(argcount, nlocals, stacksize, flags, codestring, constants, names,
varnames, filename, name, firstlineno, lnotab[, freevars[, cellvars]])

Create a code object. Not for the faint of heart.
>>>

In order to implement code blocks I would like to take the code objects from a function and transform them into ones which can be executed in the current scope.

There is an interesting recipe which transforms bytecodes and creates new code objects in this way: Implementing the make statement by hacking bytecodes.

Luckily there is an easier way.

Byte-Codes

There is a great module called Byteplay. This lets you manipulate byte-codes and create new code objects. Ideal for my purposes.

It is also great for exploring byte-codes. Let's see what the byte-code looks like for some functions. The Python Byte Code Instructions comes in handy here.

The following Python creates three code blocks and uses Byteplay to print out the names of the byte-codes operations. The three code blocks come froma function which is defined in the global scope, the same code (without the argument 'x') compiled from a string in the global scope, and a function defined inside another function.

from byteplay import Code
from pprint import pprint

z = 1
def testFunction(x):
    y = 1
    print x
    print y
    print z

print 'From Function:'
code = Code.from_code(testFunction.func_code)
byteCode1 = code.code
pprint(byteCode1)

codeObject = compile("""
y = 1
print y
print z""", '<Summink>', 'exec')

print
print 'From current scope:'
code = Code.from_code(codeObject)
byteCode2 = code.code
pprint(byteCode2)

def anotherScope():
    z = 1
    def testFunction(x):
        y = 1
        print x
        print y
        print z
    code = Code.from_code(testFunction.func_code)
    byteCode3 = code.code

    return byteCode3

byteCode3 = anotherScope()

print
print 'Code defined in another scope, using a local rather than a global.'
pprint(byteCode3)

This prints out the following (you don't need to read it all) :

From Function:[(SetLineno, 6), (LOAD_CONST, 1), (STORE_FAST, 'y'), (SetLineno, 7), (LOAD_FAST, 'x'), (PRINT_ITEM, None), (PRINT_NEWLINE, None), (SetLineno, 8), (LOAD_FAST, 'y'), (PRINT_ITEM, None), (PRINT_NEWLINE, None), (SetLineno, 9), (LOAD_GLOBAL, 'z'), (PRINT_ITEM, None), (PRINT_NEWLINE, None), (LOAD_CONST, None), (RETURN_VALUE, None)]From current scope:[(SetLineno, 2), (LOAD_CONST, 1), (STORE_NAME, 'y'), (SetLineno, 3), (LOAD_NAME, 'y'), (PRINT_ITEM, None), (PRINT_NEWLINE, None), (SetLineno, 4), (LOAD_NAME, 'z'), (PRINT_ITEM, None), (PRINT_NEWLINE, None), (LOAD_CONST, None), (RETURN_VALUE, None)]Code defined in another scope, using a local rather than a global.[(SetLineno, 66), (LOAD_CONST, 1), (STORE_FAST, 'y'), (SetLineno, 67), (LOAD_FAST, 'x'), (PRINT_ITEM, None), (PRINT_NEWLINE, None), (SetLineno, 68), (LOAD_FAST, 'y'), (PRINT_ITEM, None), (PRINT_NEWLINE, None), (SetLineno, 69), (LOAD_DEREF, 'z'), (PRINT_ITEM, None), (PRINT_NEWLINE, None), (LOAD_CONST, None), (RETURN_VALUE, None)]

In summary, this tells us:

Store a local variable: STORE_FAST
Load an argument: LOAD_FAST
Load a variable local to function: LOAD_FAST
Load a global: LOAD_GLOBAL
Load a value from the enclosing scope: LOAD_DEREF
Load a value from the same scope: LOAD_NAME
Store a value in the same scope: STORE_NAME

So in order to rescope a code block to execute in the current scope, we need to transform LOAD_FAST and LOAD_DEREF into LOAD_NAME, and STORE_FAST and STORE_DEREF (which we haven't seen here) into STORE_NAME.

Transforming Byte-codes

The Byteplay module allows us to iterate over the opcodes. It stores them as a list of tuples. Because lists are mutable we can replace the byte-codes we are interested in.

The Byteplay module also has a dictionary called opmap, which is a mapping of opcode names to their symbolic values.

from byteplay import Code, opmap

LOAD_FAST = opmap['LOAD_FAST']
STORE_FAST = opmap['STORE_FAST']
LOAD_NAME = opmap['LOAD_NAME']
STORE_NAME = opmap['STORE_NAME']
LOAD_DEREF = opmap['LOAD_DEREF']
STORE_DEREF = opmap['STORE_DEREF']

def AnonymousCodeBlock(function):
    code = Code.from_code(function.func_code)
    newBytecode = []
    for opcode, arg in code.code:
        if opcode in (LOAD_FAST, LOAD_DEREF):
            opcode = LOAD_NAME
        elif opcode in (STORE_FAST, STORE_DEREF):
            opcode = STORE_NAME
        newBytecode.append((opcode, arg))

At the start of the function AnonymousCodeBlock we use Code.from_code to turn the function byte-code object into a Byteplay object. By the end, so far, we have a list newBytecode which holds our transformed bytecode.

There is one more step. We need to turn this back into a code object, but one which executes in the current scope. This means that we need to set the freevars attribute to () (empty) and the newlocals attribute to False.

code.code = newBytecode
code.newlocals = False
code.freevars = ()
return code.to_code()

Because we're not interested in functions which take arguments, we ought to check the function we've been passed. inspect.getargspec makes this easy. The full AnonymousCodeBlock, looks like this.

import inspect
from byteplay import Code, opmap

LOAD_FAST = opmap['LOAD_FAST']
STORE_FAST = opmap['STORE_FAST']
LOAD_NAME = opmap['LOAD_NAME']
STORE_NAME = opmap['STORE_NAME']
LOAD_DEREF = opmap['LOAD_DEREF']
STORE_DEREF = opmap['STORE_DEREF']

def AnonymousCodeBlock(function):
    argSpec = inspect.getargspec(function)
    if [i for x in argSpec if x is not None for i in x]:
        raise TypeError("Function '%s' takes arguments" % function.func_name)

    code = Code.from_code(function.func_code)
    newBytecode = []
    for opcode, arg in code.code:
        if opcode in (LOAD_FAST, LOAD_DEREF):
            opcode = LOAD_NAME
        elif opcode in (STORE_FAST, STORE_DEREF):
            opcode = STORE_NAME
        newBytecode.append((opcode, arg))
    code.code = newBytecode
    code.newlocals = False
    code.freevars = ()
    return code.to_code()

Using AnonymousCodeBlock

To use AnonymousCodeBlock you pass it a function. It returns a code object which represent the body of the function. You can execute this with a call to exec. Local variables used by the code, and names bound by it, will be looked up and bound in the scope in which you execute the code.

def thunk():
    print "In thunk"
    print x
    x = 45

def getInnerThunk():
    x = 1
    z = 3
    def innerThunk():
        print 'In inner thunk'
        print x
        x = 7
        print z
    return innerThunk

def main():
    x = 20
    z = 10

    codeObject = AnonymousCodeBlock(thunk)
    exec codeObject
    print x

    print
    codeObject2 = getInnerThunk()
    exec AnonymousCodeBlock(codeObject2)
    print x

main()
x = 5
z = 6
print
print 'in local'
exec AnonymousCodeBlock(getInnerThunk())
print x

The above code uses two functions which work with the variables 'x' and 'z'. One of the functions (thunk) is used directly. The second (innerThunk) is obtained by calling getInnerThunk. If you run it (I won't spoil the surprise), you'll see that it does what it should. The variable 'x' is printed and then changed: whether the function comes from an inner scope or not, and whichever scope it is executed in.

So there we have it, an implementation of anonymous code blocks for Python, sort of.