In languages with automatic garbage collection like Haskell or Go, how can the garbage collector find out which values stored on the stack are pointers to memory and which are just numbers? If the garbage collector just scans the stack and assumes all addresses to be references to objects, a lot of objects might get incorrectly marked as reachable.
Obviously, one could add a value to the top of each stack frame that described how many of the next values are pointers, but wouldn't that cost a lot of performance?
How is it done in reality?
The garbage collector uses a traversal of the object graph to find the objects that are reachable. Objects that are not reached in this traversal are deemed garbage, even if they are part of a cycle of references.
Sizes of stack and heap memory can be changed by specifying options to JVM. After an object is no longer used by the program, Java garbage collector reclaims the memory for reuse by other programs.
The garbage collector serves as an automatic memory manager. You do not need to know how to allocate and release memory or manage the lifetime of the objects that use that memory. An allocation is made any time you declare an object with a “new” keyword or a value type is boxed. Allocations are typically very fast.
GC. Collect() forces garbage collector to run. This is not recommended but can be used if situations arise.
Some collectors assume everything on the stack is a potential pointer (like Boehm GC). This turns out to be not as bad as one might expect, but is clearly suboptimal. More often in managed languages, some extra tagging information is left with the stack to help the collector figure out where the pointers are.
Remember that in most compiled languages, the layout of a stack frame is the same every time you enter a function, therefore it is not that hard to ensure that you tag your data in the right way.
The "bitmap" approach is one way of doing this. Each bit of the bitmap corresponds to one word on the stack. If the bit is a 1 then the location on the stack is a pointer, and if it is a 0 then the location is just a number from the point of view of the collector (or something along those lines). The exceptionally well written GHC runtime and calling conventions use a one word layout for most functions, such that a few bits communicate the size of the stack frame, with the rest serving as the bitmap. Larger stack frames need a multi word structure, but the idea is the same.
The point is that the overhead is low, since the layout information is computed at compile time, and then included in the stack every time a function is called.
An even simpler approach is "pointer first", where all the pointers are located at the beginning of the stack. You only need to include a length prior to the pointers, or a special "end" word after them, to tell which words are pointers given this layout.
Interestingly, trying to get this management information on to the stack produces a host of problem related to interop with C. For example, it is sub optimal to compile high level languages to C, since even though C is portable, it is hard to carry this kind of information. Optimizing compilers designed for C like languages (GCC,LLVM) may restructure the stack frame, producing problems, so the GHC LLVM backend uses its own "stack" rather than the LLVM stack which costs it some optimizations. Similarly, the boundary between C code, and "managed" code needs to be constructed carefully to keep from confusing the GC.
For this reason, when you create a new thread on the JVM you actually create two stacks (one for Java, one for C).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With