module 2-1 categorical variables

0.0(0)

Studied by 0 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/43

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

44 Terms

New cards

The distribution of a categorical variable

Given in form of a stable with

each possible category
Frequency (or number) of individuals who fall into each category
Relative frequency (or percentage) of individuals who fall into each category

New cards

The relative frequency for particular category

The percentage of the frequency that the category appears in the data set.

New cards

Formula for relative frequency

Frequency/number of observations

New cards

Percentage of relative frequency

Relative frequency x 100%

New cards

A frequency distribution table can be displayed as

A bar chart or a pie chart.

New cards

Bar chart

A graph of the distribution of a categorical variable showing the counts for each category next to each other.

will effectively show the frequency or percent in different categories

New cards

Pie chart

Will show the relationship between the parts and the whole using the slices.

used only when you want to emphasize each category’s relation to the whole.

New cards

Slice size formula

Category relative frequency x 360 degrees

New cards

Contingency table

Shows how individuals are distributed long each variable, contingent on the value of the other variable.

Exploring the relationship between 2 categorical variables

New cards

Marginal distribution

Each frequency distribution of its respective variable.
A distribution of either the row or column variable

New cards

Conditional distribution

Shows the distribution of one variable for just the observations that satisfy a condition on another variable.

New cards

Dependent variables

If the continual distribution of one variable is not the same for each category of another.

there is an association between these variables

New cards

Independent variables

If the conditional distribution of one variable in a contingency table is the same for each category of another.

there is no association between these variables

New cards

Mean

A set of numerical observations is the familiar arithmetic average. The balance point of the distribution.

is not resistant to outliers and skewness
May not represent the “center” of the data very well

New cards

In a symmetrical distribution

Mean = median = mode

New cards

In a positively (right) skewed distribution

Mean > median > mode

New cards

Negatively (left) skewed distribution

Mean < median < mode

New cards

Range

Max - min

New cards

Upper fence

Every measurement above this fence is an outlier.

New cards

Upper fence formula

Q3 + 1.5 x IOR

New cards

Lower fence

Every measurement beneath this fence is an outlier.

New cards

Lower fence formula

Q1 - 1.5 x IOR

New cards

Symmetric distribution

Median line in center of box and whiskers of equal length.

New cards

Skewed right

Median line left of center and long right whisker.

New cards

Skewed left

Median line right of center and long left whisker.

New cards

Numerical summaries

Earning full numbers that preserves the relevant features of the data set so that you can draw useful conclusions.

New cards

The variable for which we have sample data (variable of interest).

New cards

Sample size = number of observations of the variable y

New cards

The first sample observation of the variable y

New cards

The second sample observation of the variable y

New cards

The nth same observation of the variable y.

New cards

Median (M)

The value that divides the ordered sample in two sets of the same size; one half of the data lies below M, and the other half above M.

New cards

Mode

The value that occurs with the highest frequency in a data set. The value where the distribution is the tallest.

New cards

Deviations

The observation yi from the mean y bar.

New cards

Variance

Denoted by s2, is the sum of squared deviations from the mean divided by n - 1.

New cards

Standard deviation (s)

The square root of the variance.

New cards

Interquartile range

The range of the middle half of the data. Like the median, is resistant to outliers. Based on quartiles that divide the data into 4 equal sections.

New cards

pth percentile

The value so that p% of the measurements fall below this and (100-p)% fall above it.

New cards

Lower quartile (Q1)

The 25% percentile, separated the bottom 25% of the measurements from the top 75%.

New cards

Upper quartile Q3

The 75th percentile that separates the top 25% of the measurements from the bottom 75%.

New cards

Interquartile Range (IQR) formula

Q3 - Q1

New cards

Five number summary

A set of measurements consists of the minimum, the first quartile, the median, the third quartile, and the maximum, These numbers give a good summary of a distribution of quantitative observations.

New cards

Boxplot

The five number summary leads to this visual representation of a data set. A powerful graphical tool for summarizing data. It shows the center, the spread, and the symmetry or the skewness at the same time these diagrams are particularly useful when comparing groups.

New cards

Time plots

A variable plots each observation against the time at which it was measured. Time is always used on the horizontal axis.