Ulrich Drepper's paper on thread-local storage outlines the TLS ABI for several different cpu architectures, but I'm finding it insufficient as a basis for implementing TLS for two reasons:
As an example, the only actual ABI requirements for i386 are:
%gs:0 points to a pointer to itself.___tls_get_addr and __tls_get_addr functions must exist with the correct semantics for looking up arbitrary TLS segments.In particular, the existence or layout of a DTV is not part of the ABI, nor is the ordering/layout of TLS segments other than the main program's.
It seems that any arch using "TLS variant II" has roughly the above ABI requirements. But I don't understand the requirements of "TLS variant I" very well at all, and it seems from reading sources (in uClibc and glibc) that there may even be several variants of "variant I".
Are there any better documents I should be looking at, or can somebody familiar with the workings of TLS explain the ABI requirements to me?
The best I can gather so far is:
For either TLS variant, __tls_get_addr or other arch-specific functions must exist and have the correct semantics for looking up any TLS object, and the relative offset between any two TLS segments must be a runtime constant (same offset for each thread).
For TLS variant II (i386, etc.), the "thread pointer register" (which may not actually be a register, but perhaps some mechanism like %gs:0 or even a trap into kernelspace; for simplicity though let's just call it a register) points just past the end of the TLS segment for the main executable, where "just past the end" includes rounding up to the next multiple of the TLS segment's alignment.
For TLS variant I, the "thread pointer register" points to some fixed offset from the beginning of the TLS segment for the main executable. This offset varies by arch. (It has been chosen on some ugly RISC archs to maximize the amount of TLS accessible via signed 16-bit offsets, which strikes me as extremely useless since the compiler has no way of knowing whether the relocated offset will fit in 16 bits and thus must always generate the slower, larger 32-bit-offset code using load-upper/add instructions).
As far as I can tell, nothing about TCBs, DTVs, etc. is part of the ABI, in the sense that applications are not permitted to access these structures, nor is the location of any TLS segment other than the main executable's part of the ABI. In both variants I and II, it makes sense to store implementation-internal information for the thread at a fixed offset from the "thread pointer register", in whichever way safely avoids overlapping the TLS segment.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With