Fundamentals of data representation

Introduction:

Denary

Binary

Hexadecimal

Base

Base 10

Base 2

Base 16

Uses

Everyday life

Statistics generation, electrical engineering

Computer science (eg: MAC addresses)

Why

Common approach (simpler since we have 10 fingers)

Computers use switches, which can either be on (1) or off (0)

It is shorter, easier to read, memorise and recognise.

Digits

0, 1, 2, 3, 4, 5, 6, 7, 8, 9

0, 1

0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F

Units of information:

  • A binary digit is known as a bit: the smallest unit of data a system can use. 

  • Four bits is known as a nibble.

4 bits (b)

1 nibble

8 bits

1 byte (B)

1000 bytes (1000 B)

1 Kilobyte (KB)

1000 kilobytes (1000 KB)

1 Megabyte (MB)

1000 megabytes (1000 MB)

1 Gigabyte (GB)

1000 gigabytes (1000 GB)

1 Terabyte (TB)

1000 terabytes (1000 TB)

1 Petabyte (PB)

Converting bases:

1. Converting binary to denary:

The first eight binary place values are 1,2,4,8,16,32,64,128 – placed from right to left. 

  • Write down the binary number and list the powers of 2 from right to left

  • Add up the values that have a one underneath them

→ Example:

128

64

32

16

8

4

2

1

0

1

1

1

1

1

0

1

Answer = (0 × 128) + (1 × 64) + (1 × 32) + (1 × 16) + (1 × 8) + (1 × 4) + (0 × 2) +     (1 × 1) = 125

2.  Converting Denary to Binary

  • List the first eight binary place values (as given in previous example).

  • Place a one under the largest number that can fit into the number you want to convert

  • Subtract the value above the one from the number you’re converting.

  • REPEAT

→ Example:

Denary value = 156

128

64

32

16

8

4

2

1

1

0

0

1

1

1

0

0

156 - 128 = 28

28 - 16 = 12

12 - 8 = 4

4 - 4 = 0

Thus, 156 in binary form is 10011100.

3. Converting Binary to Hexadecimal (and vice versa)

  • Place the binary digits in groups of 4 (from right to left) – these groups are called nibbles

  • Convert each group to denary.

  • Convert denary to hexadecimal, using the table.

  • Put the hex digits together.

Denary

Binary

Hexadecimal

0

0000

0

1

0001

1

2

0010

2

3

0011

3

4

0100

4

5

0101

5

6

0110

6

7

0111

7

8

1000

8

9

1001

9

10

1010

A

11

1011

B

12

1100

C

13

1101

D

14

1110

E

15

1111

F

→ Example:

q) Convert 11000011 to hex.

1100 = C & 0011 = 3

 Hexadecimal = C3

4. Converting Hexadecimal to Binary (and similar calculations)

  • SPLIT the hex number into individual values

  • CONVERT each hex value to decimal

  • CONVERT each decimal to binary

  • COMBINE all digits to make one binary.

→ Example:

q) Convert FE to binary. 

  1. F = decimal 15 = 1111

  2. E = decimal 14 = 1110

  3. Result: 11111110 

How do we add binary values? example:

Digit

+ Digit

= Result

0

0

0

0

1

1

1

1

0 carry 1

1

1 + 1

1 carry 1

Binary shifts

To multiply a number in binary, you can shift all its digits to the left and fill in the gaps with 0s. For example, to multiply a binary value by 4, all digits SHIFT two places to the left. Contrastingly, to divide, you shift the digits to the right.

Shifting to the right: divides by 2^(number of places)

Shifting to the left: multiplies by 2^(number of places)

Note: shifting is a simple way to multiply/divide, and if a 1 is lost in the process, the value becomes incorrect or inaccurate.

Character encoding

A character set is a list of all the characters available in a computer. The two standard character sets are ASCII and Unicode.

ASCII (American standard code for information exchange):

  • 8 bit characters (7 for the character & 1 for error checking)

  • can represent 2^7 (=128)  characters

    • 32 control codes

    • 32 punctuation codes

    • 26 uppercase letters

    • 26 lowercase letters

    • 10 numeric digits (0-9)

  • can be considered a subset of unicode

  • advantages: less storage needed per character

Unicode:

  • 16 bit or 32 bit characters

  • can represent 2^16 or 2^32 characters

  • advantages: can represent much more characters (billions)

  • uses the same code as ASCII up to 127

  • (similar to ASCII), upper and lower case letters have different codes each

  • can represent visual characters as well, such as emojis.

Tip: character codes are grouped and that they run in sequence. For example in ASCII ‘A’ is coded as 65, ‘B’ as 66, so if you know the value for capital A, you should be able to figure out the values for all other capital letters

Representing images

A pixel (picture element) is the smallest possible area in an image. It’s defined by a colour and represented as binary. Each pixel has a position.

Bitmapped images:

  1. An image consists of pixels

  2. Each pixel has a colour and each colour has a unique binary number (bits).

  3. Binary bits are represented in order as a two-dimensional matrix of pixels to form an image.

Image size = width (in bits) * height (in bits)

Colour depth:

  • the number of bits used for a pixel (colour depth) is proportional to the number of colours that can be represented

  • number of colours = 2^number of bits

Image file size (in bits)=  width in pixels (W) × height in pixels (H) × colour depth in bits (D)

Representing sound

  • Sound is analogue. This means it needs to be converted to digital form to be stored and processed by a computer

  • Analogue signals are sampled to digitalise sound.

    • sample = a measure of amplitude at a point in time

Sampling rate = the number of samples taken in a second (hertz)

Sample resolution = the number of bits per sample

Sound file size (bits) = sampling rate (Hz) x sampling resolution (bits) x length (secs)

Compression

  • A common way of reducing file size. 

  • Can be lossy of lossless.

  • Lossy compression:

    • Some of the data is removed to make the file smaller.

    • Algorithms remove data that is least likely to be noticed.

    • The original file cannot be restored from the compressed version.

  • Lossless compression:

    • None of the information is removed.

    • Algorithms look for patterns in the data so that repeated data items only need to be stored once, together with information about how to restore them.

    • The original file can be restored.

Why should we compress files?

  1. file size is reduced

  2. faster transmission

  3. less bandwidth required

lower cost of cloud storage

Huffman coding (lossless compression)

  • Huffman coding uses a binary tree to represent data, allocating a binary code to each data element (such as a character).

  • the longer the data element/ character, the shorter the binary code representing it, as it is at the top

  • often, the right hand paths = 1 and left hand = 0

    How to calculate the number of bits for a phrase using a binary tree:

  • Use the Huffman tree to work out how many bits are needed for each character.

  • For each character, multiply the number of bits by the frequency of the character to get the total number of bits that character needs in the whole phrase.

  • Add all of these totals for each character together to work out the number of bits for the entire phrase.

    How to calculate the number of bits for a phrase before compression (using ASCII)?

  • Count how many characters there are in the phrase, including spaces.

Multiply this number by 7

Run Length Encoding (lossless compression)

  • RLE compresses data by specifying how many times a character or pixel repeats, followed by the value of the character or pixel.

→ Example:

  • The text AAAABBBBBCCCCC is made up of 14 characters.

  • To store this in ASCII would take 7 × 14=84 bits. We can, however, code the same text in RLE as: 4 65 5 66 5 67.

  • To store the RLE would take 7 × 6= 42 bits.

  • This means we saved 42 bits, which is half the file size.

robot