1/17
These flashcards cover key concepts related to data wrangling and statistical analysis, including data types, statistics, hypothesis testing, and measures of central tendency and dispersion.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
What is data wrangling and why is it important?
the process of cleaning, structuring, and enriching raw data into a usable format. It ensures accuracy and consistency, enabling meaningful analysis.
What are the types of data (measurement scales)?
Nominal, Ordinal, Interval, and Ratio.
What is a dummy variable?
A binary (0/1) variable that represents categories, typically used for nominal data.
For which data types can you compute mode, median, and mean?
Mode: all types; Median: ordinal, interval, ratio; Mean: interval, ratio.
What is the difference between inferential and descriptive statistics?
drawing conclusions about populations using samples VS summarizing existing data
What is the difference between one-way and cross-tabulation?
frequency of one variable VS frequencies across two or more variables.
What are the three main measures of central tendency?
Mean, Median, and Mode.
What are the three main measures of dispersion?
Range, Variance, and Standard Deviation.
What is a dashboard and why is it valuable?
A real-time visual display of key metrics; valuable for monitoring and quick decisions.
What are the three different concepts of differences?
Statistical, Practical, and Perceptual.
What is an effect size? Give an example.
A measure of the magnitude of an effect. Example: Cohen's d.
What is hypothesis testing?
A statistical method to determine if there is enough evidence to support a claim about a population.
What are the two types of hypotheses?
Null (no effect) and Alternative (effect exists).
What is a p-value and its role in hypothesis testing?
It measures the probability that results occurred by chance. p <= .05 = significant.
What is the difference between effect size and p-value?
shows the magnitude of the difference VS shows significance
Common inferential statistics and their use?
T-test (2 means), ANOVA (3+ means), Chi-square (categorical), Correlation (relationships), Regression (predictions).
What does p <= .05 or p > .05 mean?
p <= .05: significant, reject null; p > .05: not significant.
How to draw conclusions from statistical output?
Look at p-value, effect size, confidence intervals, and test statistic.