I remember in an assembly class, we learned the m68k processor, and there were 3 kinds of shifts you could do. Linear shift, circular shift, and circular shift with extend.
The last one, circular shift with extend, basically rotates all bits left or right but it puts the outermost one into an extend bit before it would be moved to the beginning again (if you ever shifted by 1 again).
I drew a little pic:

Basically, a 33rd bit is used in the circular shifting, but doesn't show up in the 32-bit word of course. The 33rd bit is the X flag of the processor, which stands for extend. You could just as easily use any status flag, such as the carry flag, but I guess the motorola guys wanted to preserve that flag so it doesn't get overwritten, in case you need the carry flag for its normal duties in the middle of some algorithm that also needs rotate with extend.
Anyway, what is the purpose of rotate with extend? What is it used for? What is it needed for? It seems so strange. Why in the world do you need a 33rd bit?
I've read this and this, two related questions, but they didn't talk about the circular shift with extend.
I'm aware of some uses for normal shifts. Basically to divide by two, or test divisibility, and permutate the bits for randomness. Stuff like that. But I cannot think of why you would need some extended bit inserted into the rotate but doesn't come out in the result.
EDIT: I'm interested in any use of it, modern or old, and doesn't matter if it's on the m68k or not. The m68k is just the first placed I encountered it (and I never even used it there).
Circular shifts are used often in cryptography in order to permute bit sequences. Unfortunately, many programming languages, including C, do not have operators or standard functions for circular shifting, even though virtually all processors have bitwise operation instructions for it (e.g. Intel x86 has ROL and ROR).
1 x: Right Circular Shift the array x times. If an array is a[0], a[1], …., a[n – 1], then after one right circular shift the array will become a[n – 1], a[0], a[1], …., a[n – 2]. 2 y: Left Circular Shift the array y times.
Bit Rotation: A rotation (or circular shift) is an operation similar to shift except that the bits that fall off at one end are put back to the other end. In left rotation, the bits that fall off at left end are put back at right end. In right rotation, the bits that fall off at right end are put back at left end.
The movement of data out of one end of a storage unit or register and reentry of the same data at the other end. 2. In a computer word or a register, a shift, such as a logical shift, that inserts in one end data that have been removed from the other end.
Given three variables x, y, z write a function to circularly shift their values to right. In other words if x = 5, y = 8, z = 10, after circular shift y = 5, z = 8, x = 10. Call the function with variables a, b, c to circularly shift values.
Suppose you want to shift a 32 bit word to the right but all you have are 16 bit registers. To do this you have to shift the two 16 bit parts of the 32 bit word to the right and transfer the bit that was shifted out of the high word into the low word.
If you just have logical shifts, this is cumbersome to do as you have to manually fix up that bit. The rotate-through-carry instruction allows you to keep the bit that needs to be transferred in the carry flag and shift it in in one go. The rotate-through-carry instruction places the bit shifted out in the carry flag so you can easily chain this together to right-shift arbitrarily sized data.
On x86 (and most architectures that have this instruction), the extra bit is the carry flag, and lots of things can set that flag. Rotate-through-carry left or right lets you shift the carry bit back into some other register. Interesting that m68k uses a different flag for extended-rotate.
I'm not very familiar with m68k anymore, so I'll mostly talk about other arches. (But apparently that's what you want :)
Instructions like this are often available on microcontrollers that are much less capable than x86 or m68k. Or with limited opcode space (and decoding complexity), some CPUs only have a rotate-through-carry by 1 and not regular shift instructions. If you want to shift in a zero, make sure the flag is clear first.
8051 is like this: only rotate left/right by 1, and rotate-with-carry left/right by 1, not shift.  See rlc in the ISA ref manual.  When possibly, avoid the clr instruction when you want to shift in a zero by putting rlc after something else that leaves Carry cleared.
I think it's typical for extended circular shift to use the carry flag, rather than its own X bit the way m68k does.
Anyway, extended-precision rotate is kind of traditional / expected for CPUs, but has more uses on more limited CPUs.
For a register, rcl reg, 1 is the same operation as adc reg,reg: shift the old contents left by 1, and set the low bit to CF.  The bit shifted out by the rotate or adc becomes the new value of CF.  So RCL is only a non-redundant part of an instruction set if it's available with a memory operand, or (for weird cases) with a count greater than 1.  (The rotate-right version is not redundant, though.)
IDK why you'd ever use a count > 1. On modern x86, rotate-through-carry is fairly fast with count=1, but definitely slow with a variable count or fixed count>1. IDK which way the chicken/egg problem went: CPU designers didn't make it fast because nobody used it anyway, or people stopped using it because it's slow. But probably the former because I don't remember ever seeing a use mentioned for rotate-through-carry by more than 1 bit.
For extended-precision shifts, x86 has a double-shift instruction (shld / shrd  dst, src, count) that shifts dst, shifting in bits from src instead of zeros or copies of the sign bit.  It doesn't work with 2 memory operands, so an extended-precision shift loop has to load and store registers with separate instructions.  That's more code size than a loop using rcr dword [edi], 1 / sub edi, 4, but code size is rarely a problem on modern x86 and doing the loads/stores with separate instructions isn't slower.)  More importantly, shrd shifts multiple bits at once, so you can loop over an array once to implement a multi-precision shift by 2 or more bits.
Extended rotate can only shift one bit at a time between registers, because it uses a 1-bit storage space (in flags). I think on m68k if you did want to shift multiple bits between registers, you'd probably copy a register and use regular shifts + OR to combine. (Or a rotate and AND/OR to split up the bits.)
On AMD CPUs, shld/shrd is slower than rcl/rcr-by-1, but it's the opposite on Intel CPUs.  (http://agner.org/optimize/).
I can't really think of any use cases other than shifting bits between registers. Maybe if you shift a bit out, then branch on something that might set or clear the X bit, then shift the bit back in, you could use an extended rotate to do something to the low or high bit? But you could usually do the same more easily with AND, OR, or XOR with a constant.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With