1/15
These flashcards cover key concepts related to Exploratory Data Analysis, including definitions and explanations of different data types, central tendency measures, data distributions, and visualization techniques.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
Exploratory Data Analysis (EDA)
A crucial first step in the data analysis process that helps to uncover patterns, spot anomalies, and check assumptions about data.
Qualitative Data
Categorical data that cannot be measured numerically; examples include types of fruit or marital status.
Quantitative Data
Numerical data that can be measured and analyzed; examples include height or age.
Nominal Data
A type of qualitative data that is used for labeling variables without any quantitative value or order.
Ordinal Data
A type of qualitative data that has a defined order or ranking among categories, but the differences between categories are not uniformly measurable.
Interval Data
Numerical data where the order matters, and the difference between values is consistent, but there is no true zero point.
Ratio Data
The highest level of measurement that is numerical data with all properties of interval data, plus a true zero point, allowing for meaningful ratios.
Mean
A measure of central tendency calculated by adding all values and dividing by the number of values.
Median
The middle value in an ordered dataset; it is less affected by outliers compared to the mean.
Mode
The value that appears most frequently in a dataset.
Histogram
A visual tool that displays the distribution of numerical data by grouping values into bins or ranges.
Left-skewed Distribution
A distribution where the tail extends to the left; indicates that most data points are clustered to the right.
Right-skewed Distribution
A distribution where the tail extends to the right; indicates that most data points are clustered to the left.
Normal Distribution
A symmetrical, bell-shaped distribution where the mean, median, and mode are all equal.
Outliers
Data points that significantly differ from the majority of observations in a dataset, often indicating anomalies.
Central Tendency
Measures that describe the center of a dataset; includes mean, median, and mode.