How Base64 Encoding Works

Base64 encoding is used to convert a sequence of bits to a sequence of alphanumeric characters.

There are 64 possible values that each encoded character can take, hence the name base64 encoding: 26 lowercase letter, 26 uppercase letters, 10 numbers, and the + and / characters.

Actually, there is an extra character used for padding (the = character). So technically, a 65-character set is used for base64 encoding.

How it works

Since the base 64 character set has 64 characters (excluding the padding character), we need 6 bits to represent each character.

Since most computers deal with bytes (8 bits), we take 3 bytes at a time (24 bits) and break them down in a sequence of four 6 bit characters. We can then use a character from the base 64 alphabet to represent each group of 6 bits.

Table 1: The Base 64 Alphabet
Value Encoding Value Encoding Value Encoding Value Encoding
0 A 17 R 34 i 51 z
1 B 18 S 35 j 52 0
2 C 19 T 36 k 53 1
3 D 20 U 37 l 54 2
4 E 21 V 38 m 55 3
5 F 22 W 39 n 56 4
6 G 23 X 40 o 57 5
7 H 24 Y 41 p 58 6
8 I 25 Z 42 q 59 7
9 J 26 a 43 r 60 8
10 K 27 b 44 s 61 9
11 L 28 c 45 t 62 +
12 M 29 d 46 u 63 /
13 N 30 e 47 v
14 O 31 f 48 w (pad) =
15 P 32 g 49 x
16 Q 33 h 50 y

Source: https://tools.ietf.org/search/rfc4648

Example

Let's say we want to encode the word "foo".

First, we need to convert this string to a sequence of bits. For simplicity, let's convert each character into the binary representation of its corresponding ASCII value:

foo = 01100110 01101111 01101111

We then take this sequence of 3 bytes and break it down into a sequence of four 6-bit groups:

011001 | 10 0110 | 1111 01 | 101111

We then convert each 6-bit group into the corresponding printable character from the base 64 alphabet. Therefore, the base64 encoded representation of "foo" is: Zm9v

Since base64 encoding works on any sequence of bits, it can be used to encode non-textual data like images as well.

The padding token

When processing a sequence of bits in groups of 3 bytes, it is possible that the last sequence does not contain all 3 bytes. In this case, the sequence is padded with 0s to make the number of bits a multiple of 6.

If the last sequence has only 2 bytes, the base64 encoded output will be 3 characters from the base 64 alphabet followed by the = character to signify padding.

If the last sequence has only 1 byte, the base64 encoded output will have 2 characters from the base 64 alphabet followed by == to signify padding.

© 2021 Ravi Suresh Mashru. All rights reserved.