Fundamentals of data representation
Denary | Binary | Hexadecimal | |
Base | Base 10 | Base 2 | Base 16 |
Uses | Everyday life | Statistics generation, electrical engineering | Computer science (eg: MAC addresses) |
Why | Common approach (simpler since we have 10 fingers) | Computers use switches, which can either be on (1) or off (0) | It is shorter, easier to read, memorise and recognise. |
Digits | 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 | 0, 1 | 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F |
A binary digit is known as a bit: the smallest unit of data a system can use.
Four bits is known as a nibble.
4 bits (b) | 1 nibble |
8 bits | 1 byte (B) |
1000 bytes (1000 B) | 1 Kilobyte (KB) |
1000 kilobytes (1000 KB) | 1 Megabyte (MB) |
1000 megabytes (1000 MB) | 1 Gigabyte (GB) |
1000 gigabytes (1000 GB) | 1 Terabyte (TB) |
1000 terabytes (1000 TB) | 1 Petabyte (PB) |
The first eight binary place values are 1,2,4,8,16,32,64,128 – placed from right to left.
Write down the binary number and list the powers of 2 from right to left
Add up the values that have a one underneath them
→ Example:
128 | 64 | 32 | 16 | 8 | 4 | 2 | 1 |
0 | 1 | 1 | 1 | 1 | 1 | 0 | 1 |
Answer = (0 × 128) + (1 × 64) + (1 × 32) + (1 × 16) + (1 × 8) + (1 × 4) + (0 × 2) + (1 × 1) = 125
List the first eight binary place values (as given in previous example).
Place a one under the largest number that can fit into the number you want to convert
Subtract the value above the one from the number you’re converting.
REPEAT
→ Example:
Denary value = 156
128 | 64 | 32 | 16 | 8 | 4 | 2 | 1 |
1 | 0 | 0 | 1 | 1 | 1 | 0 | 0 |
156 - 128 = 28
28 - 16 = 12
12 - 8 = 4
4 - 4 = 0
Thus, 156 in binary form is 10011100.
Place the binary digits in groups of 4 (from right to left) – these groups are called nibbles
Convert each group to denary.
Convert denary to hexadecimal, using the table.
Put the hex digits together.
Denary | Binary | Hexadecimal |
0 | 0000 | 0 |
1 | 0001 | 1 |
2 | 0010 | 2 |
3 | 0011 | 3 |
4 | 0100 | 4 |
5 | 0101 | 5 |
6 | 0110 | 6 |
7 | 0111 | 7 |
8 | 1000 | 8 |
9 | 1001 | 9 |
10 | 1010 | A |
11 | 1011 | B |
12 | 1100 | C |
13 | 1101 | D |
14 | 1110 | E |
15 | 1111 | F |
→ Example:
q) Convert 11000011 to hex.
1100 = C & 0011 = 3
Hexadecimal = C3
SPLIT the hex number into individual values
CONVERT each hex value to decimal
CONVERT each decimal to binary
COMBINE all digits to make one binary.
→ Example:
q) Convert FE to binary.
F = decimal 15 = 1111
E = decimal 14 = 1110
Result: 11111110
How do we add binary values? example:
Digit | + Digit | = Result |
0 | 0 | 0 |
0 | 1 | 1 |
1 | 1 | 0 carry 1 |
1 | 1 + 1 | 1 carry 1 |
To multiply a number in binary, you can shift all its digits to the left and fill in the gaps with 0s. For example, to multiply a binary value by 4, all digits SHIFT two places to the left. Contrastingly, to divide, you shift the digits to the right.
Shifting to the right: divides by 2^(number of places)
Shifting to the left: multiplies by 2^(number of places)
Note: shifting is a simple way to multiply/divide, and if a 1 is lost in the process, the value becomes incorrect or inaccurate.
A character set is a list of all the characters available in a computer. The two standard character sets are ASCII and Unicode.
8 bit characters (7 for the character & 1 for error checking)
can represent 2^7 (=128) characters
32 control codes
32 punctuation codes
26 uppercase letters
26 lowercase letters
10 numeric digits (0-9)
can be considered a subset of unicode
advantages: less storage needed per character
16 bit or 32 bit characters
can represent 2^16 or 2^32 characters
advantages: can represent much more characters (billions)
uses the same code as ASCII up to 127
(similar to ASCII), upper and lower case letters have different codes each
can represent visual characters as well, such as emojis.
Tip: character codes are grouped and that they run in sequence. For example in ASCII ‘A’ is coded as 65, ‘B’ as 66, so if you know the value for capital A, you should be able to figure out the values for all other capital letters
A pixel (picture element) is the smallest possible area in an image. It’s defined by a colour and represented as binary. Each pixel has a position.
Bitmapped images:
An image consists of pixels
Each pixel has a colour and each colour has a unique binary number (bits).
Binary bits are represented in order as a two-dimensional matrix of pixels to form an image.
Image size = width (in bits) * height (in bits)
Colour depth:
the number of bits used for a pixel (colour depth) is proportional to the number of colours that can be represented
number of colours = 2^number of bits
Image file size (in bits)= width in pixels (W) × height in pixels (H) × colour depth in bits (D)
Sound is analogue. This means it needs to be converted to digital form to be stored and processed by a computer
Analogue signals are sampled to digitalise sound.
sample = a measure of amplitude at a point in time
Sampling rate = the number of samples taken in a second (hertz)
Sample resolution = the number of bits per sample
Sound file size (bits) = sampling rate (Hz) x sampling resolution (bits) x length (secs)
A common way of reducing file size.
Can be lossy of lossless.
Lossy compression:
Some of the data is removed to make the file smaller.
Algorithms remove data that is least likely to be noticed.
The original file cannot be restored from the compressed version.
Lossless compression:
None of the information is removed.
Algorithms look for patterns in the data so that repeated data items only need to be stored once, together with information about how to restore them.
The original file can be restored.
Why should we compress files?
file size is reduced
faster transmission
less bandwidth required
lower cost of cloud storage
Huffman coding uses a binary tree to represent data, allocating a binary code to each data element (such as a character).
the longer the data element/ character, the shorter the binary code representing it, as it is at the top
often, the right hand paths = 1 and left hand = 0
How to calculate the number of bits for a phrase using a binary tree:
Use the Huffman tree to work out how many bits are needed for each character.
For each character, multiply the number of bits by the frequency of the character to get the total number of bits that character needs in the whole phrase.
Add all of these totals for each character together to work out the number of bits for the entire phrase.
How to calculate the number of bits for a phrase before compression (using ASCII)?
Count how many characters there are in the phrase, including spaces.
Multiply this number by 7
RLE compresses data by specifying how many times a character or pixel repeats, followed by the value of the character or pixel.
→ Example:
The text AAAABBBBBCCCCC is made up of 14 characters.
To store this in ASCII would take 7 × 14=84 bits. We can, however, code the same text in RLE as: 4 65 5 66 5 67.
To store the RLE would take 7 × 6= 42 bits.
This means we saved 42 bits, which is half the file size.
Denary | Binary | Hexadecimal | |
Base | Base 10 | Base 2 | Base 16 |
Uses | Everyday life | Statistics generation, electrical engineering | Computer science (eg: MAC addresses) |
Why | Common approach (simpler since we have 10 fingers) | Computers use switches, which can either be on (1) or off (0) | It is shorter, easier to read, memorise and recognise. |
Digits | 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 | 0, 1 | 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F |
A binary digit is known as a bit: the smallest unit of data a system can use.
Four bits is known as a nibble.
4 bits (b) | 1 nibble |
8 bits | 1 byte (B) |
1000 bytes (1000 B) | 1 Kilobyte (KB) |
1000 kilobytes (1000 KB) | 1 Megabyte (MB) |
1000 megabytes (1000 MB) | 1 Gigabyte (GB) |
1000 gigabytes (1000 GB) | 1 Terabyte (TB) |
1000 terabytes (1000 TB) | 1 Petabyte (PB) |
The first eight binary place values are 1,2,4,8,16,32,64,128 – placed from right to left.
Write down the binary number and list the powers of 2 from right to left
Add up the values that have a one underneath them
→ Example:
128 | 64 | 32 | 16 | 8 | 4 | 2 | 1 |
0 | 1 | 1 | 1 | 1 | 1 | 0 | 1 |
Answer = (0 × 128) + (1 × 64) + (1 × 32) + (1 × 16) + (1 × 8) + (1 × 4) + (0 × 2) + (1 × 1) = 125
List the first eight binary place values (as given in previous example).
Place a one under the largest number that can fit into the number you want to convert
Subtract the value above the one from the number you’re converting.
REPEAT
→ Example:
Denary value = 156
128 | 64 | 32 | 16 | 8 | 4 | 2 | 1 |
1 | 0 | 0 | 1 | 1 | 1 | 0 | 0 |
156 - 128 = 28
28 - 16 = 12
12 - 8 = 4
4 - 4 = 0
Thus, 156 in binary form is 10011100.
Place the binary digits in groups of 4 (from right to left) – these groups are called nibbles
Convert each group to denary.
Convert denary to hexadecimal, using the table.
Put the hex digits together.
Denary | Binary | Hexadecimal |
0 | 0000 | 0 |
1 | 0001 | 1 |
2 | 0010 | 2 |
3 | 0011 | 3 |
4 | 0100 | 4 |
5 | 0101 | 5 |
6 | 0110 | 6 |
7 | 0111 | 7 |
8 | 1000 | 8 |
9 | 1001 | 9 |
10 | 1010 | A |
11 | 1011 | B |
12 | 1100 | C |
13 | 1101 | D |
14 | 1110 | E |
15 | 1111 | F |
→ Example:
q) Convert 11000011 to hex.
1100 = C & 0011 = 3
Hexadecimal = C3
SPLIT the hex number into individual values
CONVERT each hex value to decimal
CONVERT each decimal to binary
COMBINE all digits to make one binary.
→ Example:
q) Convert FE to binary.
F = decimal 15 = 1111
E = decimal 14 = 1110
Result: 11111110
How do we add binary values? example:
Digit | + Digit | = Result |
0 | 0 | 0 |
0 | 1 | 1 |
1 | 1 | 0 carry 1 |
1 | 1 + 1 | 1 carry 1 |
To multiply a number in binary, you can shift all its digits to the left and fill in the gaps with 0s. For example, to multiply a binary value by 4, all digits SHIFT two places to the left. Contrastingly, to divide, you shift the digits to the right.
Shifting to the right: divides by 2^(number of places)
Shifting to the left: multiplies by 2^(number of places)
Note: shifting is a simple way to multiply/divide, and if a 1 is lost in the process, the value becomes incorrect or inaccurate.
A character set is a list of all the characters available in a computer. The two standard character sets are ASCII and Unicode.
8 bit characters (7 for the character & 1 for error checking)
can represent 2^7 (=128) characters
32 control codes
32 punctuation codes
26 uppercase letters
26 lowercase letters
10 numeric digits (0-9)
can be considered a subset of unicode
advantages: less storage needed per character
16 bit or 32 bit characters
can represent 2^16 or 2^32 characters
advantages: can represent much more characters (billions)
uses the same code as ASCII up to 127
(similar to ASCII), upper and lower case letters have different codes each
can represent visual characters as well, such as emojis.
Tip: character codes are grouped and that they run in sequence. For example in ASCII ‘A’ is coded as 65, ‘B’ as 66, so if you know the value for capital A, you should be able to figure out the values for all other capital letters
A pixel (picture element) is the smallest possible area in an image. It’s defined by a colour and represented as binary. Each pixel has a position.
Bitmapped images:
An image consists of pixels
Each pixel has a colour and each colour has a unique binary number (bits).
Binary bits are represented in order as a two-dimensional matrix of pixels to form an image.
Image size = width (in bits) * height (in bits)
Colour depth:
the number of bits used for a pixel (colour depth) is proportional to the number of colours that can be represented
number of colours = 2^number of bits
Image file size (in bits)= width in pixels (W) × height in pixels (H) × colour depth in bits (D)
Sound is analogue. This means it needs to be converted to digital form to be stored and processed by a computer
Analogue signals are sampled to digitalise sound.
sample = a measure of amplitude at a point in time
Sampling rate = the number of samples taken in a second (hertz)
Sample resolution = the number of bits per sample
Sound file size (bits) = sampling rate (Hz) x sampling resolution (bits) x length (secs)
A common way of reducing file size.
Can be lossy of lossless.
Lossy compression:
Some of the data is removed to make the file smaller.
Algorithms remove data that is least likely to be noticed.
The original file cannot be restored from the compressed version.
Lossless compression:
None of the information is removed.
Algorithms look for patterns in the data so that repeated data items only need to be stored once, together with information about how to restore them.
The original file can be restored.
Why should we compress files?
file size is reduced
faster transmission
less bandwidth required
lower cost of cloud storage
Huffman coding uses a binary tree to represent data, allocating a binary code to each data element (such as a character).
the longer the data element/ character, the shorter the binary code representing it, as it is at the top
often, the right hand paths = 1 and left hand = 0
How to calculate the number of bits for a phrase using a binary tree:
Use the Huffman tree to work out how many bits are needed for each character.
For each character, multiply the number of bits by the frequency of the character to get the total number of bits that character needs in the whole phrase.
Add all of these totals for each character together to work out the number of bits for the entire phrase.
How to calculate the number of bits for a phrase before compression (using ASCII)?
Count how many characters there are in the phrase, including spaces.
Multiply this number by 7
RLE compresses data by specifying how many times a character or pixel repeats, followed by the value of the character or pixel.
→ Example:
The text AAAABBBBBCCCCC is made up of 14 characters.
To store this in ASCII would take 7 × 14=84 bits. We can, however, code the same text in RLE as: 4 65 5 66 5 67.
To store the RLE would take 7 × 6= 42 bits.
This means we saved 42 bits, which is half the file size.