isternberg February 2016

Python3 multiple assignment and memory address

After reading this and this, which are pretty similar to my question, I still cannot understand the following behaviour:

a = 257
b = 257
print(a is b) #False
a, b = 257, 257
print(a is b) #True

When printing id(a) and id(b) I can see that the variables, to which the values were assigned in separate lines, have different ids, whereas with multiple assignment both values have the same id:

a = 257
b = 257
print(id(a)) #139828809414512
print(id(b)) #139828809414224
a, b = 257, 257
print(id(a)) #139828809414416
print(id(b)) #139828809414416

But it's impossible to explain this behaviour by saying that multiple assignment of same values always creates pointers to the same id since:

a, b = -1000, -1000  
print(id(a)) #139828809414448
print(id(b)) #139828809414288

Is there a clear rule, which explains when the variables get the same id and when not?

edit

relevant info: The code in this question was run in interactive mode(ipython3)

Answers


chepner February 2016

Any such rule is implementation-specific. CPython, for example, pre-allocates int objects for small integers (-5 through 256) as a performance optimization.

The only general rule is to assume any use of a literal will generate a new object.


Kasramvd February 2016

That's because of pythons interpreter optimization at UNPACK_SEQUENCE time, during loading the const values. Actually when python encounters to an iterable during the unpacking it doesn't load the duplicate objects multiple time, and just keep the first object and assigns all of your names to one object address. Thus all your variables would be same references to one object.

Actually your unpacking would be equivalent to following command:

a = b = 257

And about the negative numbers in python 2.X it doesn't make any difference but in python 3.X it seems that for numbers smaller than -5 python will create new object during unpacking:

>>> a, b = -6, -6
>>> a is b
False
>>> a, b = -5, -5
>>> 
>>> a is b
True


user2357112 February 2016

This is due to a constant folding optimization in the bytecode compiler. When the bytecode compiler compiles a batch of statements, it uses a dict to keep track of the constants it's seen. This dict automatically merges any equivalent constants.

Here's the routine responsible for recording and numbering constants (as well as a few related responsibilities):

static int
compiler_add_o(struct compiler *c, PyObject *dict, PyObject *o)
{
    PyObject *t, *v;
    Py_ssize_t arg;

    t = _PyCode_ConstantKey(o);
    if (t == NULL)
        return -1;

    v = PyDict_GetItem(dict, t);
    if (!v) {
        arg = PyDict_Size(dict);
        v = PyInt_FromLong(arg);
        if (!v) {
            Py_DECREF(t);
            return -1;
        }
        if (PyDict_SetItem(dict, t, v) < 0) {
            Py_DECREF(t);
            Py_DECREF(v);
            return -1;
        }
        Py_DECREF(v);
    }
    else
        arg = PyInt_AsLong(v);
    Py_DECREF(t);
    return arg;
}

You can see that it only adds a new entry and assigns a new number if it doesn't find an equivalent constant already present. (The _PyCode_ConstantKey bit makes sure things like 0.0, -0.0, and 0 are considered inequivalent.)

In interactive mode, a batch ends every time the interpreter has to actually run your command, so constant folding mostly doesn't happen across commands:

>>> a = 1000
>>> b = 1000
>>> a is b
False
>>> a = 1000; b = 1000 # 1 batch
>>> a is b
True

In a script, all top-level statements are one batch, so more constant folding happens:

a = 257
b = 257
print a is b

In a script, this prints True.

A function's code get

Post Status

Asked in February 2016
Viewed 2,722 times
Voted 12
Answered 3 times

Search




Leave an answer