Character Sets
- ASCII - is the most basic character set. It's of length 0 to 127. So 128 characters in total.
- Extended ASCII - It has the same characters as ASCII plus some additional characters. Its length is 0 to 255. So 256 characters in total.
- Unicode - This is the most extensive character set.
Character set vs Encoding
It's extremely important to understand this difference.
Character sets - Allocate a code for each character.
Encoding - How the code can be serialized and de-serialized as bytes.
UTF-8 Encoding
This defines how the Unicode code points are efficiently stored and transmitted. Its that's why also named Unicode Transformation Format. It's implements certain logic which ensures efficient usage of bytes to generate encoding.
why ASCII has no special encoding?
ASCII has one but since it's just 128 characters, the encoding is to directly use 7 bits to encode each value.
Since this has a direct mapping, we feel there is no encoding.