How can a garbage collector find out about object references done from the stack?

Tags:

In languages with automatic garbage collection like Haskell or Go, how can the garbage collector find out which values stored on the stack are pointers to memory and which are just numbers? If the garbage collector just scans the stack and assumes all addresses to be references to objects, a lot of objects might get incorrectly marked as reachable.

Obviously, one could add a value to the top of each stack frame that described how many of the next values are pointers, but wouldn't that cost a lot of performance?

How is it done in reality?

742

asked May 22 '12 21:05

fuz

1 Answers

Some collectors assume everything on the stack is a potential pointer (like Boehm GC). This turns out to be not as bad as one might expect, but is clearly suboptimal. More often in managed languages, some extra tagging information is left with the stack to help the collector figure out where the pointers are.

Remember that in most compiled languages, the layout of a stack frame is the same every time you enter a function, therefore it is not that hard to ensure that you tag your data in the right way.

The "bitmap" approach is one way of doing this. Each bit of the bitmap corresponds to one word on the stack. If the bit is a 1 then the location on the stack is a pointer, and if it is a 0 then the location is just a number from the point of view of the collector (or something along those lines). The exceptionally well written GHC runtime and calling conventions use a one word layout for most functions, such that a few bits communicate the size of the stack frame, with the rest serving as the bitmap. Larger stack frames need a multi word structure, but the idea is the same.

The point is that the overhead is low, since the layout information is computed at compile time, and then included in the stack every time a function is called.

An even simpler approach is "pointer first", where all the pointers are located at the beginning of the stack. You only need to include a length prior to the pointers, or a special "end" word after them, to tell which words are pointers given this layout.

Interestingly, trying to get this management information on to the stack produces a host of problem related to interop with C. For example, it is sub optimal to compile high level languages to C, since even though C is portable, it is hard to carry this kind of information. Optimizing compilers designed for C like languages (GCC,LLVM) may restructure the stack frame, producing problems, so the GHC LLVM backend uses its own "stack" rather than the LLVM stack which costs it some optimizations. Similarly, the boundary between C code, and "managed" code needs to be constructed carefully to keep from confusing the GC.

For this reason, when you create a new thread on the JVM you actually create two stacks (one for Java, one for C).

178

answered Oct 03 '22 18:10

Philip JF

Related questions
                            
                                One REPL to bind them all?
                            
                                Which techniques have contributed the most to Haskell's improving performance? [closed]
                            
                                How to design a monadic stack?
                            
                                Why do 3 and x (which was assigned 3) have different inferred types in Haskell? [duplicate]
                            
                                Re-export qualified?
                            
                                What are the differences between lenses and zippers?
                            
                                how to implement doubly linked lists
                            
                                What's the relationship between profunctors and arrows?
                            
                                Understanding this matrix transposition function in Haskell
                            
                                Haskell: Parsing command line arguments
                            
                                What's the conceptual difference between Machines and Conduits (or other similar libraries)?
                            
                                Overuse of fromIntegral in Haskell
                            
                                Why is a "type class" called "type class"?
                            
                                Haskell / GHCi - loading modules from different directories
                            
                                repeatedly applying a function until the result is stable
                            
                                Benefit of avoiding multiple list traversals
                            
                                Haskell line of code not compiling: "Illegal datatype context"
                            
                                Defining a function by equations with different number of arguments
                            
                                Name conflicts in Haskell records
                            
                                Why does Scala not have a return/unit function defined for each monad (in contrast to Haskell)?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How can a garbage collector find out about object references done from the stack?

Tags:

garbage-collection

haskell

go

fuz

People also ask

1 Answers

Philip JF

Recent Activity

Donate For Us