1/21
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
What is compression?
Reducing the size of a file so that it takes up less space on secondary storage, while trying to make the compressed file as close to the original as possible.
Uses of compressing data files:
Smaller files take up less storage space on a device
Streaming & downloading files from the Internet is quicker as they take up less bandwidth
It allows web pages to load quicker in web browsers
Email service normally restrict the size of the attachment you can send so compressing the file allows you to send the same content with a smaller file size
Why is data compressed?
To decrease file size and free up storage space
To make the data faster to access/read/write
What are the 2 types of compression?
Lossy and Lossless.
Lossy compression
Permanently removes data from the file in order to limit the number of bits the file needs and so reduces its size. It is irreversible.
When is lossy suitable?
For data where reducing quality is acceptable, like video and sound.
Lossless compression
Makes the file smaller by temporarily removing data to store the file and then restores it to its original state when the file is opened. It is reversible.
Lossy compression advantages:
Greatly reduces file size - so more files can be stored
Takes up less bandwidth so can be downloaded and streamed quicker
Easier to download and display on a web browser quickly
Commonly used - lots of software can read lossy files
Lossy compression disadvantages:
It permanently loses data - the file can’t be turned back into the original
Can’t be used on text or software files, these files need to retain all the info of the original
The quality of lossy files are worse than the original - but loss in quality is normally unnoticeable
Lossless compression advantages:
Data is only removed temporarily - no reduction in quality so the compressed file should look/sound like the og
Lossless files can be decompressed - turned back to og
It can be used on text and software files
Lossless compression disadvantages:
The reduction in file size is small, so lossless files still take up quite a lot of space on your device - e.g. a lossless song may have a file size of 30MB, while the same song with lossy compression may be 5MB
Examples of lossy file types:
MP3 (audio)
AAC (audio)
JPEG (image)
Examples of lossless file types:
FLAC (audio)
TIFF (image)
PNG (image)
Run-length encoding
A form of lossless data compression that looks for consecutive repeating data in a file (called a run) and instead of storing the repeated data separately, it stores the number of times it repeats, and one copy of the data.
What would the text file “AAAABBBCCDAA” be compressed to?
4A 3B 2C 1D 2A
How is RLE used in bitmap images?
To compress sequences of the same colour.
Use RLE to compress the following bit pattern:
0000 0000 0111 1111 1111 1111 1100 0000 0000 0000
(9,0) (17,1) (14, 0)
What is Huffman Coding?
A method of lossless compression primarily used on text based data. It gives each data value a unique binary code but the codes vary in length, it gives shorter binary codes to data values that appear more frequently.
What are huffman trees?
A diagram used to represent the algorithm that is used to compress text based data.
What does a huffman tree consist of?
A set of nodes that are connected by edges, each node can either have 0, 1 or 2 child nodes. Each edge is labelled 0 or 1, usually a right-pointing edge is 1 and a left-pointing edge is 0.
How to calculate the number of bits saved:
Find out the bits per character uncompressed and minus the compressed bits per character from that.
Describe how Huffman coding can be used to compress a text file:
Huffman coding assesses the frequency of each data value in a large document or block of text. From the frequencies, a tree can be formed. The data values that appear more frequently are nearer the top of the tree and will have the shortest binary codes to represent them. A binary code is assigned to each character that appears in the text - including spaces. The code is worked out by tracing through the tree where a 0 means take a left branch and a 1 means take the right branch.