1/39
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
Why is data visualization important?
It provides a holistic understanding of data
Which statistic measures central tendency?
Mean
Median is useful when data is:
Skewed or contains outliers
Anscombe’s Quartet shows:
Same statistics but different distributions highlighting the importance of visualizing data
Visualization helps detect:
Patterns and anomalies
Histogram is used for:
Distribution of one variable
Density plot is:
Continuous histogram
(smoothed-out, continuous version of a histogram. Instead of using discrete, stepped bars to show frequencies, a density plot uses kernel density estimation (KDE))
Scatterplot is used for:
Two variables (correlation between two numerical variables)
LOESS (Locally Estimated Scatterplot Smoothing) is used to:
fit smooth curve (non para-metric regression method)
Correlation measures:
Relationship between variables
Outliers may indicate:
Dirty data
Log transformation helps when data is:
Highly skewed
(specifically used to manage right-skewed (positively skewed) data)
Bimodal distribution means:
Two peaks
Unimodal distribution means:
One peak
Missing values appear as:
Zero spikes or gaps
Scatterplot overplotting problem occurs when:
Too much data
Hexbin plots help:
Visualize dense data
(used to address the issue of overplotting in scatter plots, which occurs when dealing with large datasets where many points overlap)
Boxplot shows:
Distribution summary
(min, 1 quartile, median, 3 quartile, max)
Whiskers in boxplot represent:
Range of data
Jitter is used to:
Reduce overlap
(adds a small amount of random noise to the position of data points in a plot to help improve clarity)
Pairwise plots (grid of scatter plots) show:
Relationships among variables (each plot represents the relationship between a unique pair of numerical variables.)
Time series (data plotted in chronological order) plot is used for:
Data over time
Seasonality means:
Repeated patterns over time
Trend shows:
General direction
Data exploration focuses on:
Understanding data
Data presentation focuses on:
Communicating results
Regression line represents:
Linear relationship
LOESS vs regression:
LOESS handles non-linear
Cloud-like scatterplot means:
Weak relationship
Exponential relationship becomes linear after:
Log transformation
Dirty data includes:
Incorrect or missing values
Saturated data means:
Capped or limited values
Histogram bins represent:
Frequency counts
Correlation close to 1 means:
Strong positive relationship
Correlation close to 0 means:
Weak relationship
Multiple clusters indicate:
Different populations
Data range shows:
Spread of values
Visualization before modeling helps:
Better understanding
Boxplot median is shown by:
Line inside box
R function for histogram:
hist()