Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why and where python interned strings when executing `a = 'python'` while the source code does not show that?

I am trying to learn the intern mechanism of python using in the implementation of string object. But in both PyObject *PyString_FromString(const char *str)andPyObject *PyString_FromStringAndSize(const char *str, Py_ssize_t size) python interned strings only when its size is 0 or 1.

PyObject *
PyString_FromString(const char *str)
{
    fprintf(stdout, "creating %s\n", str);------------[1]
    //...
    //creating...
    /* share short strings */
    if (size == 0) {
        PyObject *t = (PyObject *)op;
        PyString_InternInPlace(&t);
        op = (PyStringObject *)t;
        nullstring = op;
        Py_INCREF(op);
    } else if (size == 1) {
        PyObject *t = (PyObject *)op;
        PyString_InternInPlace(&t);
        op = (PyStringObject *)t;
        characters[*str & UCHAR_MAX] = op;
        Py_INCREF(op);
    }
    return (PyObject *) op;
}

But for longer strings like a ='python', if I modified the string_print to print the address, it is identical to the one of another string varable b = 'python. And at the line marked as [1] above, I print a piece of log when python creating a string object showing multiple strings are created when executing a ='python' just without 'python'.

>>> a = 'python'
creating stdin
creating stdin
string and size creating (null)
string and size creating a = 'python'
?
creating a
string and size creating (null)
string and size creating (null)
creating __main__
string and size creating (null)
string and size creating (null)
creating <stdin>
string and size creating d
creating __lltrace__
creating stdout
[26691 refs]
creating ps1
creating ps2

So where is string 'python' created and interned?

Update 1

Plz refer to the comment by @Daniel Darabos for a better interpretation. It is a more understandable way to ask this question.

The following is the output of PyString_InternInPlace after adding a log print command.

PyString_InternInPlace(PyObject **p)
{
    register PyStringObject *s = (PyStringObject *)(*p);
    fprintf(stdout, "Interning ");
    PyObject_Print(s, stdout, 0);
    fprintf(stdout, "\n");
    //...
}
>>> x = 'python'
Interning 'cp936'
Interning 'x'
Interning 'cp936'
Interning 'x'
Interning 'python'
[26706 refs]
like image 882
Joey.Z Avatar asked Mar 13 '26 17:03

Joey.Z


1 Answers

The string literal is turned into a string object by the compiler. The function that does that is PyString_DecodeEscape, at least in Py2.7, you haven't said what version you are working with.

Update:

The compiler interns some strings during compilation, but it is very confusing when it happens. The string needs to have only identifier-ok characters:

>>> a = 'python'
>>> b = 'python'
>>> a is b
True
>>> a = 'python!'
>>> b = 'python!'
>>> a is b
False

Even in functions, string literals can be interned:

>>> def f():
...   return 'python'
...
>>> def g():
...   return 'python'
...
>>> f() is g()
True

But not if they have funny characters:

>>> def f():
...   return 'python!'
...
>>> def g():
...   return 'python!'
...
>>> f() is g()
False

And if I return a pair of strings, none of them are interned, I don't know why:

>>> def f():
...   return 'python', 'python!'
...
>>> def g():
...   return 'python', 'python!'
...
>>> a, b = f()
>>> c, d = g()
>>> a is c
False
>>> a == c
True
>>> b is d
False
>>> b == d
True

Moral of the story: interning is an implementation-dependent optimization that depends on many factors. It can be interesting to understand how it works, but never depend on it working any particular way.

like image 128
Ned Batchelder Avatar answered Mar 15 '26 05:03

Ned Batchelder



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!