Big Idea 2

Data

Bits Represent Data

  • Definition: A bit is shorthand for a single binary digit, either 0 or 1.

  • Byte: A byte consists of 8 bits.
      - Example: The binary sequence
    0110 1111 is an 8-bit representation or 1 byte.

  • Binary Representation: Binary sequences can represent all digital data types such as colors, Boolean logic, and lists.

  • Example: A 10 MP (1 MP = one million pixels) picture in 16-bit mode uses 10,000,000 pixels. Each pixel, at 6 bytes each, requires a total of 480,000,000 bits to represent the picture.

  • Video Data Usage: Videos often require 1,000,000 bits of data per second.

Abstractions

  • Definition of Abstraction: Bits are grouped to represent abstractions such as numbers, characters, and colors.

  • Benefits of Abstractions: They identify common features that generalize a program and minimize repetitive code, thus reducing potential errors.

  • Example of Abstraction in Code:
      - In Java:
    int x = 1234 + 4321;
      - In Python:
    x = 1234 + 4321;

  • Without abstractions, in machine code, the same math operation translates into complex binary sequences that are more challenging to handle.

  • High-level languages are more abstract, making coding and debugging easier.

Analog vs. Digital Data

  • Analog Signal: Values change smoothly over time. Examples include pitch and volume in music, colors in a painting, and positions in a race.

  • Digital Signal: An analog signal that has been digitized into discrete steps.
      - Sampling Technique: Involves measuring values of the analog signal at regular intervals called samples. More frequent sampling leads to more accurate digital representation of the analog signal.
      - This process illustrates the abstraction of representing analog data with digital formats.

Consequences of Using Bits to Represent Data

  • Variables: An abstraction within a program that can hold a value. Each variable can represent one value or multiple values (like lists).

  • Data Types: Common data types include:
      - Integers: Whole numbers.
      - Real Numbers: Decimal numbers (e.g., 4.00).
      - Boolean: True/False values.
      - Strings: Text (e.g., "Novack the third").
      - Lists: Collections (e.g., [1, 1, 35, 6]).

  • Limitations in Languages:
      - In languages like Java, integers have fixed limits (e.g., from -2,147,483,648 to +2,147,483,647). Exceeding these limits results in overflow errors.
      - Python has no upper limit on integer size, expanding to memory limits.

Number Systems

  • Important bases include:
      - Binary: Base 2.
      - Decimal: Base 10.
      - Hexadecimal: Base 16.

  • Converting Numbers: The AP exam focuses on binary to decimal and decimal to binary conversions:
      - Decimal to Binary Conversion Table:
        - 0 → 0000
        - 1 → 0001
        - 2 → 0010
        - 3 → 0011
        - 15 → 1111

Lossy and Lossless Data Compression

  • Data Compression Definition: The process of reducing the size of transmitted or stored data without necessarily losing information.

  • Types of Compression:
      - Lossy Compression: Reduces file size but sacrifices some quality (e.g., possible loss of original file content).
      - Lossless Compression: Original file can be fully reconstructed from compressed data, but results in larger file sizes.
      - Context determines which compression method is suitable based on quality versus storage needs.

Information Extracted from Data

  • Digital data is generated extensively, especially through always-on devices and social media.

  • Definition of Information: The collection of facts and patterns derived from processed data.

  • Analyzing Big Data: Combines statistics, mathematics, and programming to gain insights from data.

  • Risks and Dangers of Data Interpretation:
      - Correlation in data does not imply causation. Misinterpretation can lead to serious errors in judgment and action.

Challenges in Data Management

  • Common challenges include:
      - Need for data cleaning (removing ambiguities and ensuring uniformity).
      - Handling incomplete or invalid data.
      - Combining multiple data sources for comprehensive analysis.

  • Cleaning Data: A systematic approach to ensuring data quality by standardizing entries (e.g., making all abbreviations uniform).

Predicting Algorithms

  • Algorithms leverage big data to influence everyday decisions, examples include:
      - Credit card fraud detection based on purchasing patterns.
      - Targeted advertising on social media based on user behavior.
      - Recommendations for products on retail sites.
      - Predictive policing utilizing crime trend data.

  • Limitations of Predictions: Past data patterns do not guarantee future trends; unexpected innovations may alter existing dynamics.

Visualization of Data

  • Proper visualization techniques are vital for interpreting complex data.

  • Useful visualization formats include:
      - Column charts, line graphs, pie charts, bar charts, and histograms.

  • Graph Example: Displaying user growth versus profit, showcasing how increases in users can correlate with profit.

  • Importance of visualization for understanding trends and making informed decisions.

Privacy Concerns

  • Issues surrounding data collection can lead to privacy risks, especially in online activities.

  • Metadata Definition: Data that provides context about primary data (e.g., a photo's location and time).

  • Privacy Risks Include:
      - Data collection through e-commerce and online services can lead to unauthorized sharing of personal information.
      - Balance between the convenience of services and potential privacy violations.

Metadata

  • Function of Metadata: Aids in identifying, organizing, and managing data efficiently.

  • Example of Metadata in a Photograph:
      - The filename, location, date taken, and author information can help organize and search for the photograph without altering the original data itself.

  • Enhances data usability by providing insightful additional information, facilitating retrieval tools.