XlogicX Blog

Assembly_Is_Too_High-Level_-_Redundant_Machine-Code_-_Adventures_of_ModR-M,_SIB,_and_REX

There are many ways in which an assembly instruction can be encoded into different machine code. Though I can see this as being a multipart post, this one will focus on some weird (redundant) effects of the ModR/M and SIB bytes. For this post we will focus on the XOR instruction, but note that the following principles would apply to most instructions that use the ModR/M encoding.

 

How 30 c0 can be the same as 32 c0

For reference, here is a screenshot of the opcodes from the Intel Manual

xorintel1 xorintel2

We will focus on two similar encodings for XOR. XOR r/m8,r8 (0x30) and XOR r8,r/m8 (0x32). r8 means a 1-byte register, m8 means a 1-byte memory location (the value in the location), and r/m8 means either. So the machine code of 0x30 means XOR a register or a memory location (value in) with a register and 0x32 means XOR a register with a register or a memory location (value in). The key takeaway here is that for either of these encodings (0x30/0x32), both of them allow for the possibility for both operands to be 1-byte registers.

ModR/M:

We must specify the values of the operands somehow; if it's a pointer, register, and which. This is done with the ModR/M byte. This byte is separated into 3 fields, the Mod (2 bits), an R/M (3 bits), and another R/M (3 bits). For one of the R/M's: Mod 00 specifies pointers, SIB, or static 32-bit displacement (depending on the values that follow), Mod 01 and 02 are similar to 00, but with displacements added, and Mod 03 specifies a register. The other R/M is assumed to be a Register. If the R/M is a register, then (in general), 000 deals with A registers (AL, AX, EAX), 001 C, 010 D, 011 B, 100 SP, 101 BP, 110 SI, 111 DI. Look to 2-6 of Vol. 2A of the Intel Manual for the full table.

So let's break up 0xc0 (our above magical operand). In binary = 1100 0000. To break up in the 3 parts = 11 000 000. So Mod 3, A register, A register. Since we are dealing with 1-byte registers in this case, we would be dealing with AL.

Recap:
0x30 is for XOR r/m8,r8. With 0xc0, we are choosing just registers, and AL for both, so XOR AL, AL. For 0x32 we have XOR r8,r/m8. With 0xc0, we are also choosing just registers, of which also happen to be AL and AL. So 30c0 is XOR AL, AL, but so is 32c0.

 

SIB:

The SIB byte is what allows us to do instructions like XOR [rcx+rbp*2], al; it allows us to do Index/Base/Offset tricks. Some encodings of the ModR/M byte specify that a SIB byte will be in use. This byte is also divided up similarly to the ModR/M (the 2-bit, 3-bit, 3-bit format). The first 2 bits (SS) specify the multiplier: 00 is 1, 01 is 2, 10 is 4, and 11 is 8. So the above XOR example would have SS as 01. Without getting too crazy with the SIB byte, the interesting takeway is that if the next 3-bits are 100, there are no effects of the SIB byte, and the next 3-bits just specify the register.

An interesting XOR with SIB:

Keep in mind there are many variations of this; we are just looking at one example; a PoC. Consider 30 04 60. This is XOR (0x30), the ModR/M of 0x04 is going to call for a SIB byte. We are in Mod 00, the next 000 is AL register, and the next 100 is what calls for the SIB byte. Our SIB byte is 0x60. In binary, that is 0110 0000 or 01 100 000 (only change in spacing for clarity). So this would be a multiplier of 2 (with an SS of 01), but then our next 3-bits is the 100, so don't multiply? (yes). We are still doing a pointer though, specified by the next 3 bits (000), which is the A register. This translates to XOR [rax], AL. The funny thing is that you don't need a SIB byte to do all that, you can specify you want an A register pointer and the AL register with just the ModR/M byte (the ModR/M byte would be 0x00), so the machine code would be 30 00.

 

REX Prefix:

You can throw a byte (0x40-0x4f) (0x66 also does some 8/16 bit mods) in front of most of these instructions to get the super-power of accessing 64-bit registers (only works on a 64-bit system; would be it's own instruction otherwise, not a prefix). As it turns out, there is tons of redundancy in the registers you can access this way, here's a screenshot of some of that ignorance (awesomeness)