In the realms of assembly obfuscations, this isn't extremely high in complexity. This is me finding an excuse to use the "FIST" instruction. In the context of the PoC, it is being used as a decoder for some encoded shellcode. Before going into the super technical details, below is a video of the PoC in action.
Let's talk about floating point numbers (as it applies to x86). I've said this a ton, but data is meaningless without context. For example 0xfe could be 254 decimal, but if it were a signed integer, it would be -2. The sign is defined by the most significant (left-most) bit, when interpreted as a signed int. Floating point representation has it's own complex structure. In a more standard notation, it's what gives us the ability to represent numbers like 13 * 10^4. However, everything is still represented base 2, so it's more like 1.9837 * 2^16. With this type of notation, we can represent very large and very small numbers (with a loss of precision). In a 32-bit floating point number, we have 1 bit for the sign (and these are not 2's compliment), 8 bits for the multiplier (called the bias), it can represent positive and negative (fractional), and the remaining 23 bits represent the number before modification (multiplying out). If only it were that simple though, this 23 bit number has an implied leading 1. So 1.265 would be stored as 265 and the 1 is assumed. These 23 bits also represent fractional binary.
Let's talk about fractional binary. We know that each increasing bit represents twice as much as the bit before it. Well after the decimal place, each bit represents half as much. Lets look at a non-fraction to illustrate the first point: 1011. The right most digit represents 1, the next 2, then 4, then 8. We have 1, 2, and 8 'selected', so it's 11 decimal. What is 1011.1011. Starting from left to right after the decimal, we have 1*.50 + 0*.25 + 1*.125 + 1*.0625. The fractional part is .6875, so the entire number would be 11.6875 as a decimal number.
The 8-bit bias part of our data rests on 0x7f. larger than 0x7f is a larger exponent, smaller than 0x7f is a fractional exponent.
Let's encode 11.6875 (decimal) into a floating point number. First we must convert it to binary, of which we already had above; it's 1011.1011. Now we have to push it back a few decimal places to get it to be a 1.xxxx... format: 1.0111011. We had to push this 3 places, so our exponent will be 3. We add this 3 to 0x7f getting us 0x82 for the bias bits. The sign bit will be 0 for positive. As far as representing our binary in 23 bits, we are supposed to remove the leading 1 and pad with 0's until it's 23 bits. It helps me to look at the full 32-bit value in binary before making it hex. In order, we have sign bit (0), the bias (1000 0010 / 0x82), and the number (0111 011 0000000000000000). We combine all of that to get: 01000001001110110000000000000000. In hex this is 0x413b0000. So if 0x413b00000 is interpreted as floating point data, it represents 11.6875.
The FPU is a set of 8 additional registers (ST0-ST7). These registers are actually 80-bit each. The FPU is mostly interfaced like a stack; in the sense that we usually push data to it and pop data from it. To push data to the FPU stack, we can use the FLD instruction (floating point load) to take data from a location in memory and place it into ST0. You can then use FST (floating point store) to take the first FPU register (ST0) off of the stack and place it into memory. Both of these instructions assume the data is already encoded as floating point.
One cool thing is that you can have your integer data converted into floating point with the FILD instruction (I for integer). This takes an integer from memory, encodes it as floating point, and pushes it to the FPU stack. Likewise, you can take floating point data from the FPU and convert it back to an integer with FIST (again, I for integer). For practical reasons I use FISTP (P for pop). Interestingly, by default FST/FIST accesses the data on the top of the FPU stack without actually removing it (popping). So if you keep pushing with FLD/FILD and not popping, you'll run into problems.
The idea is to have a bunch of shellcode encoded into memory as floating point data, only to have FIST decode it back into memory and then execute. It is unreasonable expect to be able to convert every 32-bit instruction into something that equates as a 32-big floating point number. It is far easier (for me) to just use the least significant 2 bytes of the integer from a float as the machine code. This means I need to take some shellcode and break it up into 2-byte chunks. Each chunk needs to be encoded as floating point data. I did this manually...There were some steps in between, but for posterity, here's the scratchpad text file I was working around in:
The idea is to manually convert 2 bytes into a 4-byte float that will be decoded by FIST where the least significant 2 bytes will be the original 2 bytes we were encoding. Let's look at 'Int 0x80' (this was the last line in my sloppy work screenshotted above). In machine code that is 0xcd80. Because of some Little Endian side effects, we will need to pre-reverse the bytes, so 0x80cd. In binary this is 1000 0000 1100 1101, We then shift this to 1.000 0000 1100 1101. That is 15 shifts, where 1.000 0000 1100 1101 * 2 ^ 15 would get us back to our 0x80cd. So let's encode. It's positive, so the sign big is 0. It's bias is 15, and we add that to the 0x7f, so that's 0x8e (1000 1110). And for the number, we ignore the leading 1 and pad with 0's to the right to get 23 bits, so 00000001100110100000000. All that binary together is 0100 0111 0000 0000 1100 1101 0000 0000 or 0x4700cd00 represented in hex. So if 0x4700cd00 was in ST0, FIST/FISTp would put 0xcd800000 into our memory location. I only end up using those first 2 bytes in my PoC.
First note that this PoC has been modified where .text is rwx (can be modified on the fly), this was done manually in a hex editor (of which is also demonstrated in the video). It already has the encoded shellcode in memory with assembly dq directives and a handy pointer to it with 'data:'. I initialize a looping counter with ECX, specify where our program will end up starting with EDX, and get our data address into EBX. I then load the first encoded data with FLD, then decode it out into our program memory area with FISTP, I increment the pointers (data and program) and decode the next until I am done. When done, I jump to the start of the decoded program. There is nothing spectacular about the decoded program, just the same execve(/bin/sh) thing I always do.
Below is a screenshot of the source: