(or another perspective on the Invoke vs Call argument)
The video version (for the illiterate) can be found at: https://youtu.be/QyjXBv3sqRY
I’m in the process of re-certifying for the GREM certification (GIAC Reverse Engineering Malware). Although I’m pretty good with assembly language in a handful of architectures (Motorolla, x86, propeller, and ARM), my skills are shit with Windows and its APIs. In the context of GREM and static code analysis goes, I still have a ways to go; a ‘not seeing the forest for the trees’ issue. I will still likely pass the certification like last time, because I understand most of the concepts in their compartmentalized pieces. My problem is some of the big picture stuff (always has been). I joke about everything being too high level, and honestly, most of the time it really is a joke or an extreme over exaggeration. But for me, I sometimes do have a harder time comprehending an abstraction when it abstracts away how things actually work. For most people, it doesn’t matter how the technology works, so long as it does. However, as a hacker, I have technology ‘trust issues’; things don’t always ‘just work.’. And the abstraction likely wont give you any hints as to why the thing failed, the answers are revealed at a lower layer.
Blah blah blah, I digress. I wanted to set out to learn many of these Windows APIs in a bit more detail. Reverse engineering usually teaches how to read the code, but my (and probably your) comprehension magnifies when we actually write code. So in this case, I wanted to set out and write a few very simple assembly programs that put the correct arguments on the stack and call a Windows API, just how I see this happening when debugging some malware, just how it is supposed to work. As a point of reference I am using the FLARE VM setup from FireEye. It comes with fasm, so that’s the assembler I will use (I don’t really have religious preferences with an assembler).
For API’s, the Windows way is a bit different than the Linux way. For Linux, generally, you put all of your arguments in registers and then do an Int 80 (interrupt to Linux). In windows, with ‘sdtcall’ functions, you push all of your arguments to the stack and Call the Windows API function by name (the corresponding addresses of these functions end up getting linked in). I’m not really opposed to this method, it allows for a large amount of arguments by default, as it’s the stack, not a limited amount of registers.
As I didn’t know the fasm ways of assembly, I looked to the Internet for some examples. I wanted to create a simple dialog box. I expected to see a simple assembly program with a .data section with the strings and then the .text (.code) section with some instructions pushing the arguments to the stack and then a call to the API function. For pretty much every google result I got, what I got back was a heavily abstracted version of how this is generally done, and the ironic bonus: NO ASSEMBLY INSTRUCTIONS!
Before I get to that, I will say that I eventually figured out the way to do this with real assembly language in the source file. And it was as straight forward as I would have expected it to be. For reference, here is a screenshot of the source program:
This is what it looks like in the x64dbg debugger:
Note that the assembly looks awfully similar to the source. This is no mistake. This is exactly what I’m going for here. Remembering that my goal is to try and understand what is actually going on with these API functions, this is the most comprehensible way to go about this. You’ll notice that all the arguments are on the stack and ready to go for when I’m about to call them. And it is extremely clear how they all got onto the stack (the 4 preceding push instructions).
Okay. Now let’s talk about the ‘no assembly required’ way that is recommended to write this. Because the source code is easier to read. Because it’s ‘cleaner code.’ Because assembly language is so ‘hard’ to write that you might as well write assembly programs that don’t use assembly instructions (then just give up and fucking use python). Anyway, here’s a screenshot of the ‘clean’ way to do this:
It is clearer to read. If there were no comments in my version, then the ‘invoke’ version would be much more obvious in its intentions. But now, here’s a screenshot of how dirty and incomprehensible this is in the debugger:
Before I start ranting and criticizing, I have to be fare and state that the examples I found on the Internet didn’t use a .data section and inlined the strings in the invoke section (cleaner source code). This is the real cause of the mess of the disassembly. Had I used a .data section with this invoke command: ‘invoke MessageBox,HWND_DESKTOP,message,title1,MB_OKCANCEL’, it woulnd’t be so bad. I digress. So note that even though the source code is ‘clean,’ what’s actually being ‘assembled’ (compiled really) is nothing but. You see as we are about to make the call, all the right arguments are on the stack. I see two of the original pushes needed for two of our arguments (push 1 and push 0). We also need two more arguments; we need pointers to our strings for the title of the window and the message in the window. How on earth did these get into the stack, and what the fuck are these confusing instructions doing in our program. Do we really need to do ARPL, INSB, OUTSD, DAA, and IMUL instructions? Well no, that’s not what is happening. What we are actually seeing is a disassembled representation of our strings. See our first call to ‘syscalls.40201B’, it’s jumping past our first string. A call normally knows how to return to where we came from by pushing the address of the next instruction to the stack. In this case though, our program doesn’t intend to return to this at all, it is using that pushed address as a side effect, as that address really is the first byte of our string, it serves as a pointer to it, and it is now on the stack conveniently as an argument. So that call jumps us to another call that does the same thing; it skips over the next string that follows it, getting a pointer to it on the stack, indirectly. So that second call instruction brings us all the way down to the ‘push 0’ instruction right before our API call to MessageBoxA. These abused CALL instructions are how we got the string arguments onto the stack.
The end result is the same. As somebody that has to read or write the assembly source, using invoke is likely a better way to write and collaborate. However, nothing about it is actual assembly language, it abstracts it away. It’s not like this behavior is uncommon or indefensible. Compilers do this kind of thing all the time, even when they aren’t optimized that much (and when they are optimized, wow). Joking aside, using invoke is probably the way to go if your writing something more serious, although, why not just use C? Writing “assembly” in shortcuts and macros with no actual assembly sounds a lot like a higher level language (like C). This is why I always found HLA (High Level Assembly) so objectionable. Though to be clear, I respect the Author of HLA and he has done other really amazing work.
A lot of arguments of which way is better than which (with many things) comes down to what your doing at the moment. In the use case from the paragraph above, invoke away. But to return to my use case, I’m trying to familiarize myself with some simple Windows API calls by playing with different arguments in assembly and calling them, and then watching them perform their actions in a debugger (as not all API’s will do something visual; I might have to see the stack, registers, and memory getting manipulated). Using invoke for this strategy makes this process all the more confusing.
All this said, you might be able to see why I have a little ways to go when it comes to fully reverse engineering Windows binaries. Not to be confused with targeted reversing. I’m somewhat adequate with looking at particular APIs and pulling out IOCs from the artifacts they leave behind, and all the other ‘cheater’ dynamic forms of analysis. But if I ever want to see a bigger and fuller picture, I’m going to want to start writing the assembly that I’m reading and put bigger pieces of the puzzle together. At least, that’s the plan.