Computer Science : S1 : L7 : Representing Text

When a computer converts text to binary is uses a tool called character set , with two main character sets being called ASCII (up to 128 characters) and Unicode (over 1.1 million characters)

Unicode is a standard setup for encoding which contains over 1.1 million characters in all languages. Compared to ASCII , Unicode uses up to 4 bytes per character compared to a measly 1 byte for ASCII this gives Unicode the capability to support a variety of encoding systems . When looking at ASCII it can only represent the english alphabet , punctuation and some mathematical symbols compared to Unicode. Unicode is needed to unify all the different encoding schemes so that the confusion between computers can be limited as much as possible’.

The way ASCII is formed is each letter in the text is translated into its matching ASCII code when data is encoded using ASCII, this is then saved as a string of binary digits (0s and 1s). From one computer to another, this binary version of the data can be communicated, where it can then be decoded into the original text.

However when we compare this to unicode; Every character in every human language is given a distinct "code point" by Unicode. A Unicode character encoding technique is UTF-8. This indicates that UTF-8 converts a particular Unicode character's code point into a string of binary data.f