How Base64 Encoding Works
Base64 encoding is used to convert a sequence of bits to a sequence of alphanumeric characters.
There are 64 possible values that each encoded character can take, hence the name base64 encoding: 26 lowercase letter, 26 uppercase letters, 10 numbers, and the +
and /
characters.
Actually, there is an extra character used for padding (the =
character). So technically, a 65-character set is used for base64 encoding.
How it works
Since the base 64 character set has 64 characters (excluding the padding character), we need 6 bits to represent each character.
Since most computers deal with bytes (8 bits), we take 3 bytes at a time (24 bits) and break them down in a sequence of four 6 bit characters. We can then use a character from the base 64 alphabet to represent each group of 6 bits.
Table 1: The Base 64 AlphabetValue Encoding Value Encoding Value Encoding Value Encoding0 A 17 R 34 i 51 z1 B 18 S 35 j 52 02 C 19 T 36 k 53 13 D 20 U 37 l 54 24 E 21 V 38 m 55 35 F 22 W 39 n 56 46 G 23 X 40 o 57 57 H 24 Y 41 p 58 68 I 25 Z 42 q 59 79 J 26 a 43 r 60 810 K 27 b 44 s 61 911 L 28 c 45 t 62 +12 M 29 d 46 u 63 /13 N 30 e 47 v14 O 31 f 48 w (pad) =15 P 32 g 49 x16 Q 33 h 50 y
Source: https://tools.ietf.org/search/rfc4648
Example
Let's say we want to encode the word "foo".
First, we need to convert this string to a sequence of bits. For simplicity, let's convert each character into the binary representation of its corresponding ASCII value:
foo = 01100110 01101111 01101111
We then take this sequence of 3 bytes and break it down into a sequence of four 6-bit groups:
011001 | 10 0110 | 1111 01 | 101111
We then convert each 6-bit group into the corresponding printable character from the base 64 alphabet. Therefore, the base64 encoded representation of "foo" is: Zm9v
Since base64 encoding works on any sequence of bits, it can be used to encode non-textual data like images as well.
The padding token
When processing a sequence of bits in groups of 3 bytes, it is possible that the last sequence does not contain all 3 bytes. In this case, the sequence is padded with 0s to make the number of bits a multiple of 6.
If the last sequence has only 2 bytes, the base64 encoded output will be 3 characters from the base 64 alphabet followed by the =
character to signify padding.
If the last sequence has only 1 byte, the base64 encoded output will have 2 characters from the base 64 alphabet followed by ==
to signify padding.