XlogicX Blog

Assembly_is_Too_High-Level_-_AAD-AAM,_Even_the_Math_is_Too_High-Level

Oh boy, I love seeing words like these! Even though this post will focus on the AAD instruction, this applies to the below two instructions:

I particularly love this one; because we get to see an illustration of abstractions being misleading on a few levels (assembly being too abstract, and even a mathematical formula being used too abstractly). I'm about to get all kinds of philosophical up in here!

When approaching a tool or a system, normal people only look at what it is supposed to do; what it is intended for (and that's even assuming too much for a normal person sometimes). As a hacker, we aspire to look at a system for what it actually does. Sometimes we have to hack around just to discover this. But sometimes it isn't even hidden; it can be well documented. Even so, normal people only care about the useful abstractions, and still ignore what systems actually do.

Assembly Abstraction: AAD = ASCII Adjust AX Before Division
This instruction is intended to take two 8-bit BCD values and convert them to one 8-bit 'binary' value. For those that don't know, BCD stands for Binary Coded Decimal. It's a way of representing (only) decimal values in a binary/hex encoding. To represent the decimal number of 79 in hex, it would look like 0x4f (just 1 byte). In BCD, it could look like 0x79. In BCD, we ignore the A-F values of hex, even though we are still using standard 4-bit nibbles and 8-bit bytes. This means we are wasting data space. Even worse, AAD takes an 8-bit value for each digit, so 79 would actually be 0x0709.

For AAD, the first byte (07 in this case) is in AH, and the next byte is in AL (09). After using the AAD instruction, the result is put into AL. The instruction mentions AX because AX is AH:AL (Accumulator High, Accumulator Low). So if AH had 0x07, and AL had 0x09, and we then ran AAD, AL would then have 0x4f (decimal 79). That's what this instruction is supposed to do.

Let's look at the machine code for AAD (right next to the assembly that created it):

D5 is the actual machine-code for AAD. 0A is put there by our assembler for us, it is used for base 10 (notice we didn't say 'aad 10' in our source file). We can't change this value in assembly. As the Intel Manual states, this can only be done in machine code. Let's do a base 2 conversion. AH will be 01, and so will AL. So if AX is 0x0101, AAD (2) should yield 0x03 in AL. This is because 11 in binary is 3 in decimal.

As you can see, AL (seen in EAX register) has the value that we expected.

So by unlocking this 'base' byte, we can convert from arbitrary bases, as Intel states. But we are not at the bottom of this abstraction stack yet. Converting from any base is what it is supposed to be used for at this level...again, what does it actually do.

Mathematical Abstraction:

So really: AL = AL + (AH * base)
Where you provide the 1 byte base.

This formula absolutely does what it is supposed to do. So we are done, we slap the "dutifully converts from base" label on this formula and that is what it is. Abstractions make life so easy on our stupid brains. But converting bases isn't really what it does. It can do that, but what it really does is AL = AL + (AH * base). Just so you don't think I'm splitting hairs like usual. Consider that the 1 byte base value can be any value from 0x00-0xff. So what does it mean to convert 0x0709 from base 1, or base 0 (just so you know: not a thing)? Even base 2 doesn't make sense here, because 7 and 9 are not valid binary characters. This doesn't mean we still can't AL = 9 (7 * 2) and get a value anyway (0x17).