Statistics: Histograms, Stemplots, Outliers, and Percentiles

0.0(0)
studied byStudied by 0 people
full-widthCall with Kai
GameKnowt Play
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/40

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

41 Terms

1
New cards

What is the purpose of 'bins' in a histogram?

even intervals that capture data for visualization.

2
New cards

How do you calculate the bin width for a histogram?

calculated as the highest value minus the lowest value divided by the number of bins.

3
New cards

What is a stemplot?

a method of illustrating data where the stem represents the first digits and the leaf represents the last digit of the data points.

4
New cards

Why is it important to add a key to a stemplot?

is necessary to clarify how to interpret the stems and leaves in the plot.

5
New cards

What is cumulative relative frequency?

is the total of relative frequencies up to a certain point, often used to display percentiles.

6
New cards

What is an outlier in a data set?

an individual piece of data that falls outside the overall pattern of the distribution.

7
New cards

What are two common methods for identifying outliers?

Method A uses the 1.5 × IQR rule, while Method B uses the standard deviation rule (mean ± 2SD).

8
New cards

What is the formula for calculating the mean of a sample?

The mean is calculated as X = Σx / n, where Σx is the sum of all observations and n is the number of observations.

9
New cards

What distinguishes categorical variables from quantitative variables?

Categorical variables represent specific groups, while quantitative variables are numerical and can be measured.

10
New cards

What are the two types of quantitative variables?

can be discrete (e.g., number of siblings) or continuous (e.g., height, weight).

11
New cards

What is the purpose of a boxplot?

visually summarizes the distribution of a data set, highlighting the median, quartiles, and potential outliers.

12
New cards

How do you describe the center of a distribution?

can be described using the mean or median, depending on the distribution's skewness and presence of outliers.

13
New cards

What does a percentile indicate?

indicates the value below which a certain percentage of observations fall.

14
New cards

What is the 50th percentile in a normal distribution?

is the median, meaning 50% of the data is above it and 50% is below it.

15
New cards

What is the significance of a 'sideways histogram'?

a variation of a stemplot that displays data in a horizontal format.

16
New cards

What is the role of the median in skewed distributions?

the median is preferred for measuring center because it is less affected by extreme values.

17
New cards

What is the difference between relative frequency and frequency?

Frequency counts how often a value occurs, while relative frequency is the count divided by the total number of observations.

18
New cards

How should you summarize findings after creating a graph?

describing the shape, center, and spread of the distribution.

19
New cards

What is the impact of extreme values on the mean?

can skew the mean, making it an unreliable measure of center in such cases.

20
New cards

What is the two standard deviation rule for identifying outliers?

identifies outliers as any data points that fall outside the range of mean ± 2SD.

21
New cards

What is a back-to-back stemplot?

displays two sets of data side by side, allowing for easy comparison between categories.

22
New cards

What does it mean for a distribution to be symmetric?

has a balanced shape, where the left and right sides are mirror images.

23
New cards

How do you determine if a variable is discrete or continuous?

It depends on how the data is used; for example, age is often treated as discrete even though it can be continuous.

24
New cards

What is the importance of labeling boundaries in a histogram?

helps clarify the intervals represented by each bin in the histogram.

25
New cards

What is the purpose of a lifeline in a histogram?

indicates the minimum x-value before the first bin, providing a reference point for the data.

26
New cards

What is the standard deviation equation?

sx= √[Σ(x - x̄)² / (n - 1)].

27
New cards

How do you compare distributions?

Use phrases like “is higher than” or “is less than”

28
New cards

1/5 * IQR rule

LB= Q1-1.5(IQR)

UB=Q3 + 1.5(IQR)

Anything below LB, or anything above UB is considered an outlier

29
New cards

two standard deviation rule

LB= mean - 2(SD)

UB= mean + 2(SD)

The next whole number below LB, or above UB, is an outlier

30
New cards

bin width formula 

range - sqrt of data values = bin width

ex: 20 / 5 = 4

31
New cards

nonresistant

strongly influenced by extreme values

32
New cards

resistant

if there are extreme values in a data set, the measure of center will not be as affected by it.

33
New cards

Skewed distribution center/spread

use mean for center and IQR for variability (spread)

34
New cards

Symmetric distribution center/spread

use median for center and SD for spread

35
New cards

What is a percentile?

indicates the value below which a given percentage of observations in a group of observations falls.

36
New cards

What does the 70th percentile mean?

A test score in the 70th percentile would mean that 70% of the scores in the data set were below that score.

37
New cards

How to describe a distribution’s shape

skewed, symmetric, bimodal, unimodal

38
New cards

how to describe center

mean/median

39
New cards

how to describe spread

range, sd, iqr

40
New cards
term image

left skewed distribution

41
New cards
term image

right skewed distribution