XlogicX Blog

Vm0wd2Qy

This article will be about 2 encodings/interpretations of binary; ASCII and Base64. ASCII is an 8-bit (debatable I know, but go with it) character encoding, and Base64 is a 6-bit character encoding. Conversions can be made between them:

Not too interesting. This post is more about a string like "Vm0wd2Qy":

[2nd command is just there to add a graceful newline] The decoded version is smaller, but that's it, this transformation was not as dramatic as the first example. How can this be?

Consider that it is possible for the same data to 'mean' the same thing in two different encodings at the same time.

Let's look at the binary encodings for ASCII and Base64. We wont do any hexidecimal or decimal conversions or any unnecessary stuff like that; we will just look at what binary produces which printable characters for both systems. I am only going to show the binary encodings of A-Za-z0-9+/ for ASCII, and for Base64 (which happens to be the full base64 character set). To be clear, there are more characters (printable and not) in the ASCII character set. For a full reference on both ASCII and Base64, the Wikipedia pages are more than enough.

This table represents both systems, with the printable character in the middle, and the binary that represents the character for each base. It's not a fancy chart, but it get's the job done:

 

Some Toy Conversions:

Converting is fairly straight forward, although you have to be mindful that one system is 8-bit and the other is 6-bit. Let's convert the first 4 base64 characters in the example that this post starts with.

The top (red) is the Base64, and the bottom (blue) is the ASCII. It is the same binary for each. You can look at the conversion table above to see that this binary directly corresponds to the printable characters they are supposed to.

Now, Vm0wd2Qy:

This kind of black magic excites me :).

If you decode the (red) Base64, you get the same data, just less characters, because of the 4:3 ratio. If you take the ASCII results (still also valid Base64 characters), and treat it as Base64 and decode it, the same pattern emerges. (in this case, Vm0wd2 would decode to Vm0w). Eventually, we are reduced to nothing.

Over 9000 digits of this:

So Vm0wd2Qy is cool, but let's look at an epic 'recursive' Base64 string of 10,000 characters. Same goes for this, you can keep doing 'base64 -d', the string doesn't change, it just shrinks. Even in 10,000 characters, there's no infinitely repeating patterns. That said, the entropy is also very low. Out of all these characters, there are 975 'V's (10%), 556 'W's, 1 'g' and no 'f's, '7's, '+'s or '/'s. There are so many other interesting patterns that emerge when playing with even the simplest of formal systems, but I feel I've already awarded myself with an informal autism diagnosis for this post, so without further delay, over 9,000 of these beautiful characters:

