BrainFuck is an 'esoteric' programming language with only 8 one-character instructions. I've used it here-and-there for well over a decade. I love minimalist languages, so RISCy. A brainfuck environment operates on a large array of data. There's an instruction to move the pointer in this array forwards and backwards and to increment or decrement it's value...that's already half the language. There's also an instruction for input or output of 1 character. Finally, there's an instruction to start and stop a loop. That's the language. There are many interpreters out there, I'm sure there are some compilers too, but I wanted to write a one-liner-esque command with common simple *nix tools. I actually ended up with 4 separate commands, but we will get to all of that.
sed is the main tool I used to compile the BrainFuck code into x86. dd, printf, tr, and cat are helpers with formatting and staging the resulting 512-byte boot image output.
This does mean that the output file is limited to 512 bytes (more like 470-ish bytes after some of the overhead).
Each BrainFuck command takes different amounts of bytes to encode:
>: 1 byte
<: 1 byte
+: 2 bytes
-: 2 bytes
,: 12 bytes
.: 5 bytes
[: 3 bytes
]: 10 bytes
The looping stack (controlled by '[', and ']') has 4k allocated to it. It is unlikely you will come to the nested loop limit with this.
The data stack that you control with <, >, +, and - is about 21.5k (in theory)
dd bs=512 count=1 if=/dev/zero of=out.bin; printf '\x31\xc0\x8e\xd8\x8e\xd0\xbc\x00\x9c\xbb\x00\xac\xb4\xb8\x8e\xc0\xb0\x03\xcd\x10\xb4\x01\xb5\x26\xcd\x10\xb9\xd0\x07\xb8\x20\x00\x31\xff\xf3\xab\x31\xff' | dd of=out.bin bs=1 count=38 conv=notrunc ;printf '\x55\xaa' | dd of=out.bin bs=1 seek=510 count=2 conv=notrunc; cat source.bf | tr '\n' 'x' | sed -e 's/x//' -e 's/>/\x43/g' -e 's/</\x4b/g' -e 's/\+/\xfe\x07/g' -e 's/\-/\xfe\x0f/g' -e 's/,/\xb4\x01\xcd\x16\x74\xfa\x30\xe4\xcd\x16\x88\x07/g' -e 's/\./\x8a\x07\xb4\x0f\xab/g' -e 's/\[/\xe8\x00\x00/g' -e 's/\]/\x5a\x52\x80\x3f\x00\x74\x02\xff\xe2\x5a/g' | dd of=out.bin bs=1 seek=38 conv=notrunc
Let's go through these 4 separate commands in detail:
dd bs=512 count=1 if=/dev/zero of=out.bin
We are using an output file of out.bin. We want to start with a blank 512 byte file. So we use dd, set the block size to 512 bytes and use /dev/zero as the source of nulls to pad this file with.
printf '\x31\xc0\x8e\xd8\x8e\xd0\xbc\x00\x9c\xbb\x00\xac\xb4\xb8\x8e\xc0\xb0\x03\xcd\x10\xb4\x01\xb5\x26\xcd\x10\xb9\xd0\x07\xb8\x20\x00\x31\xff\xf3\xab\x31\xff' | dd of=out.bin bs=1 count=38 conv=notrunc
In this command we are sending machine code to dd. It's 38 bytes of machine code (count=38). At a high level, this machine code sets up the environment so we have our stacks, video memory (for output), a routine to clear the screen, etc... Below is what a disassembly of this would look like (as it shows the machine code too). Even though this is partially an objdump output, I have included comments too
31 c0 xor ax,ax ;set ax to 0
8e d8 mov ds,ax ;DS=0
8e d0 mov ss,ax ;stack starts at 0
bc 00 9c mov sp,0x9c00 ;200h past code start, 4k of stack
bb 00 ac mov bx,0xac00 ;brainfuck stack location
b4 b8 mov ah,0xb8 ;text video memory
8e c0 mov es,ax ;ES=0xB800
b0 03 mov al,0x3 ;Set Video Mode
cd 10 int 0x10 ;syscall
b4 01 mov ah,0x1 ;Set Cursor Shape
b5 26 mov ch,0x26
cd 10 int 0x10 ;syscall
b9 d0 07 mov cx,0x7d0 ;whole screens worth
b8 20 00 mov ax,0x20 ;empty black background
31 ff xor di,di ;upper left corner
f3 ab rep stos WORD PTR es:[di],ax ;clear screen
31 ff xor di,di ;set cursor back to upper left corner
printf '\x55\xaa' | dd of=out.bin bs=1 seek=510 count=2 conv=notrunc
The image file needs to end with 0x55aa to be considered a bootable image. So we use printf and dd to inject these bytes into the image
cat source.bf | tr '\n' 'x' | sed -e 's/x//' -e 's/>/\x43/g' -e 's/</\x4b/g' -e 's/\+/\xfe\x07/g' -e 's/\-/\xfe\x0f/g' -e 's/,/\xb4\x01\xcd\x16\x74\xfa\x30\xe4\xcd\x16\x88\x07/g' -e 's/\./\x8a\x07\xb4\x0f\xab/g' -e 's/\[/\xe8\x00\x00/g' -e 's/\]/\x5a\x52\x80\x3f\x00\x74\x02\xff\xe2\x5a/g' | dd of=out.bin bs=1 seek=38 conv=notrunc
This is the compound command that does the real work!
First we feed the source file to our chain of commands with 'cat'
We then use tr (translate) to replace the ending newline with the x character (because we can't just make it blank / remove it with tr)
Then we have a long line of find/replace commands with sed. The first one is simple, we now can just remove that 'x' that we put in there, so there are no longer any newlines.
The rest of the find/replace (regex) sed arguments are to literally find BrainFuck instructions and replace it with equivalent machine code. Before we explore each instruction, let's make some notes about our environment. I mentioned that we are using a large array in our environment, we will be using the 16 bit 'bx' register to track this; we use it as a pointer that we increment and decrement. I considered not even using a real stack for this resulting program, because it is mostly not needed as a concept, but I ended up needing it for the looping instructions, which I will get to
Instruction: > (increment array pointer)
43 inc bx ;bx is our pointer, increment it
Instruction: < (decrement array pointer)
4b dec bx ;decrement pointer
Instruction: + (increment value in array)
fe 07 inc BYTE PTR [bx] ;increment the value
Instruction: - (decrement value in array)
fe 0f dec BYTE PTR [bx] ;decrement the value
Instruction: , (Take keyboard input)
b4 01 mov ah,0x1 ;check if keyboard input
cd 16 int 0x16 ;interupt for it
74 fa je 0x2c ;keep checking if no input
30 e4 xor ah,ah ;get keypress
cd 16 int 0x16 ;interupt for it
88 07 mov BYTE PTR [bx],al ;put keypress in array
Instruction: . (Print character)
8a 07 mov al,BYTE PTR [bx] ;get character from array
b4 0f mov ah,0xf ;make sure video background is still black (possible this isn't needed)
ab stos WORD PTR es:[di],ax ;display the character
Instruction: [ (Start the loop)
e8 00 00 call 0x40 ;pushes address of next instruction onto the stack
Instruction: ] (End the loop if array at pointer is zero)
5a pop dx ;get loop start
52 push dx ;keep it on the stack
80 3f 00 cmp BYTE PTR [bx],0x0 ;is the value zero yet
74 02 je 0x49 ;if so, the loop is over, jump past it
ff e2 jmp dx ;otherwise, go back to the beginning of the loop
5a pop dx ;return stack to pristine state
Just to show this works, I used the first two examples from Wikipedia's article on BrainFuck
...because reasons