1/24
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
Data (AP CSP sense)
Any information stored in a form a computer can process (e.g., numbers, text, images, sounds, locations, clicks, sensor readings).
Program (in data analysis)
A set of precise steps that takes data as input and can transform, summarize, and extract patterns from it.
Transform (data)
Change data’s form or representation (e.g., convert units, create new fields, recode categories).
Summarize (data)
Compute compact descriptions of a dataset such as counts, totals, averages, minimum/maximum, or distributions.
Pattern extraction
Using computation to find relationships, trends, clusters, or unusual values (anomalies) in data.
Iteration (through a dataset)
Looping through many records/values to compute results (e.g., checking every row in a table).
Filtering
Keeping only records that match a condition (e.g., only rows where grade = 12 or value > threshold).
Aggregation
Grouping and combining data to produce summaries (e.g., totals per category).
Visualization
Presenting results in charts/graphs/maps so humans can interpret patterns and summaries.
Data-processing pipeline
Common sequence of steps: input → parse → clean/validate → transform → analyze → output.
Parse
Interpret a data format into usable parts (e.g., split a CSV row into columns).
Clean/Validate
Detect and handle missing/invalid values, formatting issues, duplicates, and inconsistencies so analysis is reliable.
Selection (rows)
A process that keeps specific records based on a rule (e.g., appending only rows where grade = 12).
Counter (variable)
A variable that increases by 1 for each item that matches a condition (used for counting).
Sum (accumulator)
A running total that adds the data values themselves (used before computing totals/averages).
Average (mean)
A summary statistic computed as total sum divided by number of items; requires both sum and count (or LENGTH).
Missing values
Data entries not recorded or absent (e.g., blank, NA, null, ?), which can skew or break computations if not handled.
Duplicate records
The same person/event recorded multiple times; can distort counts and totals unless duplicates are appropriately handled.
Inconsistent categories
Same category represented in different text forms (e.g., "NY", "New York", "newyork"), preventing correct grouping/counting without standardization.
Outlier / impossible value
An unusually extreme or invalid entry (e.g., negative age, temperature of 999) that may indicate error or a rare real event.
Biased sample
Data that does not represent the target population; programs cannot fix this, so results may be misleading.
Correlation vs. causation
A correlation means two values vary together; it does not prove one causes the other.
Metadata
“Data about data”: context that explains meaning, units, quality, and constraints so data can be interpreted correctly.
Data dictionary
A metadata document describing each field/column (name, meaning, data type, allowed values, units, missing-value rules).
Imputation
Filling in missing data with an estimated/default value (e.g., group average), which can hide uncertainty and distort results if unjustified.