1/42
A comprehensive set of Q&A flashcards covering data, variables, distributions, and descriptive statistics from Week 1 notes (Ch 1-3).
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
What is data?
Recorded values (numbers or labels) together with their context, captured and stored.
What is a data warehouse?
A vast digital repository where data are stored.
What is Big Data?
The challenges of collecting, managing, storing, and curating large-scale data.
What must you know before turning data into information?
The decision you want to make or the question data can answer, and how to communicate the answer.
What is data mining?
The process of obtaining actionable information from data, often for future performance.
What is predictive analytics?
Analysis focusing on predicting future performance.
What is business analytics?
The use of data and statistical analysis to inform business decisions.
What does it mean that all data have a context?
Data values are information about a subject and are interpreted within context; data are organized into a data table.
What are the rows of a data table called?
Cases (individuals or units) about whom we record characteristics.
What are the columns of a data table called?
Variables (the characteristics recorded).
What is a categorical (qualitative) variable?
A variable that names categories and answers questions about how cases fall into those categories.
What is a quantitative variable?
A numerical variable with units that indicates how much is measured.
What subtypes exist for categorical variables?
Ordinal, nominal, and binary.
What is an ordinal categorical variable?
Values with intrinsic order (e.g., Dissatisfied, Neutral, Satisfied).
What is a nominal categorical variable?
Values without intrinsic order (e.g., locations like South Australia, Victoria, etc.).
What is a binary categorical variable?
A categorical variable with only two possible values (e.g., gender).
Why do quantitative variables have units?
To indicate how values are measured, their scale, and magnitude.
What are the two types of quantitative variables?
Continuous and discrete.
Can a variable be both categorical and quantitative?
Yes; depending on purpose. For example Age can be quantitative (in years) or used as categories (child, teen, adult).
What is a histogram?
A graph for a quantitative variable showing frequency of values by dividing into bins.
What is a relative frequency histogram?
A histogram showing the percentage of cases in each bin instead of counts.
What are the three things to describe when looking at a distribution?
Shape, center, and spread.
What is a mode?
Peaks in a distribution; unimodal, bimodal, multimodal.
What does symmetry mean in a distribution?
Halves on either side of the center look like mirror images.
What are tails in a distribution?
The thinner ends of the distribution.
What is skewness?
If one tail stretches farther than the other; distribution skewed to the side of the longer tail.
What is an outlier?
A value that stands away from the body of the distribution; can affect methods and may indicate errors; should be discussed in conclusions.
How do you calculate the mean?
Sum of all values divided by the number of data values (n).
When should you use the median?
When a distribution is skewed, has gaps, or contains outliers; the median is resistant to outliers.
What is meant by the mean and median in symmetric distributions?
If roughly symmetric, the mean and median are close.
What is the range?
Max minus min; a simple measure of spread; not resistant to outliers.
What are quartiles and the IQR?
Q1 and Q3 frame the middle 50% of data; IQR = Q3 − Q1 (a robust spread measure).
What is the standard deviation and variance?
Variance is the average of squared deviations from the mean (s^2); the standard deviation is the square root of the variance.
When is standard deviation appropriate?
For symmetric distributions and when used with the mean; it can be influenced by outliers.
What is a five-number summary?
Median, Q1, Q3, minimum, and maximum.
What is a boxplot?
A plot showing the five-number summary; the central box shows the middle 50% (IQR); whiskers indicate potential skewness; outliers are plotted separately.
What is a z-score?
A standardized value: (value − mean) / standard deviation; tells how many standard deviations a value is from the mean.
How do you determine which data point is more unusual using z-scores?
Compare the absolute values of the z-scores; the larger absolute value is more unusual.
In the real estate example, which is more unusual: a $340,000 house or a 5000 sq ft house?
The 5000 sq ft house (z ≈ 4.46) is more unusual than the $340,000 house (z = 3.0).
What does a z-score of 2 indicate?
Two standard deviations above the mean.
What is the purpose of the boxplot’s box and whiskers?
Box shows the middle 50% (IQR); whiskers indicate spread and potential skewness; outliers shown separately.
What is the five-number summary used for in boxplots?
To describe a distribution and provide input for a boxplot visualization.
Why should outliers be noted in conclusions?
They can be the most informative part of the data and may indicate data quality issues or true extremes.