Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does python's "gc.collect()" not work as expected?

Here is my test code:

#! /usr/bin/python3
import gc
import ctypes

name = "a" * 50
name_id = id(name)
del name
gc.collect()
print(ctypes.cast(name_id, ctypes.py_object).value)

output:

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

In my opinion, gc.collect() should clean the variable name and it's value,
but why can I get value with name_id after gc.collect() ?

like image 733
olivetree123 Avatar asked Jan 22 '26 06:01

olivetree123


1 Answers

You shouldn't expect gc.collect() to do anything here. gc simply controls the cyclic garbage collector, which is an auxilliary garbage collector because CPython uses reference counting for its main memory management strategy. The cyclic garbage collector handles reference cycles, there are no reference cycles here so gc.collect won't do anything.

In my opinion, gc.collect() should clean the variable name and it's value,

That is simply not how Python works. The variable ceased to exist with del name, but the object continues to exist, in this case, due to compiler optimizations. Python variables are not like C variables, they aren't chunks of memory, they are names that refer to objects in a particular namespace.

In any case, disassembling the code will give you some insight here:

In [1]: import dis

In [2]: dis.dis("""
   ...: import gc
   ...: import ctypes
   ...:
   ...: name = "a" * 50
   ...: name_id = id(name)
   ...: del name
   ...: gc.collect()
   ...: print(ctypes.cast(name_id, ctypes.py_object).value)
   ...: """)
  2           0 LOAD_CONST               0 (0)
              2 LOAD_CONST               1 (None)
              4 IMPORT_NAME              0 (gc)
              6 STORE_NAME               0 (gc)

  3           8 LOAD_CONST               0 (0)
             10 LOAD_CONST               1 (None)
             12 IMPORT_NAME              1 (ctypes)
             14 STORE_NAME               1 (ctypes)

  5          16 LOAD_CONST               2 ('aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa')
             18 STORE_NAME               2 (name)

  6          20 LOAD_NAME                3 (id)
             22 LOAD_NAME                2 (name)
             24 CALL_FUNCTION            1
             26 STORE_NAME               4 (name_id)

  7          28 DELETE_NAME              2 (name)

  8          30 LOAD_NAME                0 (gc)
             32 LOAD_METHOD              5 (collect)
             34 CALL_METHOD              0
             36 POP_TOP

  9          38 LOAD_NAME                6 (print)
             40 LOAD_NAME                1 (ctypes)
             42 LOAD_METHOD              7 (cast)
             44 LOAD_NAME                4 (name_id)
             46 LOAD_NAME                1 (ctypes)
             48 LOAD_ATTR                8 (py_object)
             50 CALL_METHOD              2
             52 LOAD_ATTR                9 (value)
             54 CALL_FUNCTION            1
             56 POP_TOP
             58 LOAD_CONST               1 (None)
             60 RETURN_VALUE

So, when your code block was compiled, the CPython compiler noticed that "a"*50 could be turned into a constant, and so it did. It stores constants for code objects until that code object doesn't exist any more (in this case, when the interpreter exist). Since this code object will maintain a reference to this string object, it will exist the entire time.

So, more explicitely:

In [4]: code = compile("""name = "a" * 50""", filename='foo', mode='exec')

In [5]: code
Out[5]: <code object <module> at 0x7ff7c12495d0, file "foo", line 1>

In [6]: code.co_consts
Out[6]: ('aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa', None)

Note also that Python memory management is complex and pretty opaque. All objects are handled on a privately managed heap. Just because an object is "released" doesn't mean that the runtime won't simply re-used that bit of memory for objects of the same type (or other suitable types) as needed. Look at this:

In [1]: class Foo: pass

In [2]: import ctypes

In [3]: foo = Foo()

In [4]: id(foo)
Out[4]: 140559250737552

In [5]: del foo

In [6]: foo2 = Foo()

In [7]: id(foo2)
Out[7]: 140559250737680

In [8]: ctypes.cast(140559250737552, ctypes.py_object).value
Out[8]: <prompt_toolkit.lexers.pygments.RegexSync at 0x7fd68035c990>

In [9]: id(foo2)
Out[9]: 140559250737680

In [10]: del foo2

In [11]: ctypes.cast(140559250737680, ctypes.py_object).value
Out[11]: <prompt_toolkit.lexers.pygments.PygmentsLexer at 0x7fd68035ca10>

Notice how you are able to recover some objects in these cases, because the ipython interactive shell is creating objects all the time, and the internal heap is happy to re-use that memory.

Look what happens in a more bare-bones REPL:

(base) juanarrivillaga@50-254-139-253-static% python
Python 3.7.9 (default, Aug 31 2020, 07:22:35)
[Clang 10.0.0 ] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import ctypes
>>> class Foo: pass
...
>>> foo = Foo()
>>> i = id(foo)
>>> del foo
>>> ctypes.cast(i, ctypes.py_object).value
zsh: segmentation fault  python

So yeah. More what one might expect, we tried to access a part of memory that had been not only reclaimed by the internal heap, but freed by the Python process, and thus, we got a segmentation fault.

like image 137
juanpa.arrivillaga Avatar answered Jan 24 '26 20:01

juanpa.arrivillaga



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!