AP CSP Big Idea 2 (Data): Analysis, Metadata, and Preparing Data for Use

1.0(1)

Studied by 42 people

0%Big Idea 2 Mastery

0%Exam Mastery

View linked note

Build your Mastery score

AP Practice

Supplemental Materials

Call Kai

Card Sorting

1/24

Earn XP

Description and Tags

AP Computer Science Principles

Big Idea 2: Data

Data Analysis and Metadata

Last updated 3:08 PM on 3/12/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai	Chat

No analytics yet

Send a link to your students to track their progress

25 Terms

New cards

Data (AP CSP sense)

Any information stored in a form a computer can process (e.g., numbers, text, images, sounds, locations, clicks, sensor readings).

New cards

Program (in data analysis)

A set of precise steps that takes data as input and can transform, summarize, and extract patterns from it.

New cards

Transform (data)

Change data’s form or representation (e.g., convert units, create new fields, recode categories).

New cards

Summarize (data)

Compute compact descriptions of a dataset such as counts, totals, averages, minimum/maximum, or distributions.

New cards

Pattern extraction

Using computation to find relationships, trends, clusters, or unusual values (anomalies) in data.

New cards

Iteration (through a dataset)

Looping through many records/values to compute results (e.g., checking every row in a table).

New cards

Filtering

Keeping only records that match a condition (e.g., only rows where grade = 12 or value > threshold).

New cards

Aggregation

Grouping and combining data to produce summaries (e.g., totals per category).

New cards

Visualization

Presenting results in charts/graphs/maps so humans can interpret patterns and summaries.

New cards

Data-processing pipeline

Common sequence of steps: input → parse → clean/validate → transform → analyze → output.

New cards

Parse

Interpret a data format into usable parts (e.g., split a CSV row into columns).

New cards

Clean/Validate

Detect and handle missing/invalid values, formatting issues, duplicates, and inconsistencies so analysis is reliable.

New cards

Selection (rows)

A process that keeps specific records based on a rule (e.g., appending only rows where grade = 12).

New cards

Counter (variable)

A variable that increases by 1 for each item that matches a condition (used for counting).

New cards

Sum (accumulator)

A running total that adds the data values themselves (used before computing totals/averages).

New cards

Average (mean)

A summary statistic computed as total sum divided by number of items; requires both sum and count (or LENGTH).

New cards

Missing values

Data entries not recorded or absent (e.g., blank, NA, null, ?), which can skew or break computations if not handled.

New cards

Duplicate records

The same person/event recorded multiple times; can distort counts and totals unless duplicates are appropriately handled.

New cards

Inconsistent categories

Same category represented in different text forms (e.g., "NY", "New York", "newyork"), preventing correct grouping/counting without standardization.

New cards

Outlier / impossible value

An unusually extreme or invalid entry (e.g., negative age, temperature of 999) that may indicate error or a rare real event.

New cards

Biased sample

Data that does not represent the target population; programs cannot fix this, so results may be misleading.

New cards

Correlation vs. causation

A correlation means two values vary together; it does not prove one causes the other.

New cards

Metadata

“Data about data”: context that explains meaning, units, quality, and constraints so data can be interpreted correctly.

New cards

Data dictionary

A metadata document describing each field/column (name, meaning, data type, allowed values, units, missing-value rules).

New cards

Imputation

Filling in missing data with an estimated/default value (e.g., group average), which can hide uncertainty and distort results if unjustified.