Data Representation and Compression

Data and Binary Numbers

  • Binary Numbers: Digital data uses binary numbers for numerical representation.
  • Bit: The smallest unit of information, either 0 or 1.

Base Conversion

  • Binary to Decimal: Convert binary to decimal by recognizing that binary digits represent powers of 2.
    • Example: Binary number 1101.
  • Decimal to Binary: Find powers of 2 that sum up to the decimal number.
    • Start with the largest power of 2 less than the number, subtract, and repeat until reaching 0.
    • Example: Decimal number 200.

Digital Images as Bits

  • Digital Images: Images are converted to binary, processed, and displayed.
  • Pixels: Digital images consist of pixels with binary numbers.
  • Black and White Images: Represented using 1 (black/on) and 0 (white/off).
  • Grid Creation: Draw a grid and color squares based on binary values.
  • Metadata: Data needed to know the image size (e.g., 10 x 10 grid).

Binary and Color Representation

  • Color Representation: Computers use binary for colors.
  • Color Basis: Colors are created using red, green, and blue light.
  • Maximum Color Value: 255 in decimal, represented as 11111111 in binary.
  • Minimum Color Value: 0.

Music as Bits

  • Analog Signal: Continuous in time and range of values.
  • Digital Signal: Sequence of discrete symbols (bits).
  • Sampling: Recording analog signals at discrete moments and converting to digital.
  • Noise Resilience: Digital signals are more resilient against noise.

Data Compression

  • Data Compression Usage: Used in MP3, MP4, RAR, ZIP, JPG, PNG files, etc.
  • Importance: Important for backing up and archiving files, especially for internet uploads.
  • Two-Way Process: Compression algorithms reduce data size, decompression restores the original form.
  • Usefulness: Saves disk space and reduces bandwidth during data transmission.
  • Function: Compresses a string of bytes to a smaller set of bytes.
  • Lossless Algorithms: Reconstruct the original message exactly.
    • Used for text.
  • Lossy Algorithms: Reconstruct an approximation of the original message.
    • Used for images and sound where slight loss is acceptable.

Lossless Compression

  • Function: Data is packed and decompressed without any loss of data. Exact reconstruction is possible.
  • Text Compression: Crucial to ensure identical reconstruction because minor differences can alter meaning.

Lossy Compression

  • Function: Digital data is not decompressed back to 100% of the original.
  • Characteristics: Provides high compression but with some loss of original data (pixels, sound waves, etc.).
  • Meaning of Lossy: Loss of a quantity such as a frequency component or noise.
  • Examples:
    • Images: High compression loss is noticeable when photos are enlarged.
    • Music: Difference between MP3 and high-resolution audio.
    • Video: Moving frames can handle more pixel loss than images.

Using Programs with Data

  • Data Increase: Digitization and multiple transactions have led to a surge in data.
  • Data Analysis: Analyzing large data sets helps categorize connections and find patterns.
  • Data Extraction: Obtaining data from databases or software for use in other software.
  • Process: Data extraction → transformation (filters/programs) → analysis (graphs, visualization).
  • Steps to Extract and Analyze Data:
    • Analyze data sources (web pages, emails, videos, audio, text, etc.).
    • Determine the purpose of the analysis (trend, effect, cause, quantity).
    • Decide on tools for reading data and repositories for storing data.
    • Clean the data (whitespace, symbols, duplicates).
    • Understand data patterns and text flow using visualization tools.

How to Read and Analyze Graphs

  • Graph Definition: Pictorial representation used to depict data relationships.
  • Representation: Data is represented in points, lines, bars, pie charts, and scatter plots.
  • Types of Graphs:
    • Picture Graphs: Use pictures to represent values.
    • Bar Graphs: Use vertical or horizontal bars to represent values.
    • Line Graphs: Use lines to represent values.
    • Scatter Plots: Use points with a best-fit line.