XlogicX Blog

Assembly_is_Too_High_Level_-_Why_ESP_doesn't_scale_-_But_EBP_can_still_Base

The main 8 general purpose registers are EAX, ECX, EDX, EBX, ESP, EBP, ESI, and EDI. In that order. You will see this structure in a lot of places. I will give some examples below, but it is in no way exhuastive; I just wanted to show some variety.

There's the B0-B7 and B8-BF MOV instructions where the 2nd hex digit defines which register to receive an immediate value, notice that the registers are in the order described above.

Registers are also encoded like this in the ModR/M byte. If we wanted to XOR EBX with ECX, we would use the 0xCB byte
(click for larger view)

It's the SIB encoding where we start running into some interesting exceptions to this rule. In the format of [Base + Index * Scale + Displacement]: It appears that the Index can be any of the 8 general purpose registers with exception to ESP (the one that would be missing from those), and the Base can be any general purpose register with exception to EBP.

Based on these exceptions, we should expect 'xor eax, [esp * 2]' to fail. This is true, there is no way to encode this into machine code and an assembler will give an error. What is interesting is that we CAN do something like this: xor eax, [ebp + eax * 2]. In this case, we are specifying EBP as the base (not allowed?) EAX as the Index with a Scale of 2, and "no displacement." Let's look at the machine code that NASM chose to go with to make this work:

Let's work backwards here. 0x00 is not the SIB (otherwise it would mean [eax + eax]), 0x45 is actually the SIB. You can refer to Chart 1, but it appears to be using the [*] item for the Base and EAX * 2 for the Index * Scale. The [*] completely depends on which one of the 4 Mods was used in the ModR/M byte. In the highlighted "NOTES" section of Chart 1, you'll see that [*] could either mean just a 32 bit displacement, or it could also mean EBP + a 8-bit or 32-bit displacement. Only Mod 01 and 10 allow us to use EBP.

(Chart 1, click to enlarge)

This actually starts to explain why we had our 0x00 show up at the end of our machine code, it looks like we are dealing with an 8-bit displacement. Look to Chart 2 for where the ModR/M byte (0x44) falls on this. The [--][--] means we are using a SIB byte. We also see that an 8-bit displacement was selected.

(Chart 2, click to enlarge)

Knowing this, we could just as well have used machine code of:
33 84 45 00 00 00 00

For debuggers and dissasemblers that don't show the displacement when they are a zero value, both forms of this machine code look identical. For example, in Evans Debugger: