module 2-1 categorical variables

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/43

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

44 Terms

1
New cards

The distribution of a categorical variable

Given in form of a stable with

  • each possible category

  • Frequency (or number) of individuals who fall into each category

  • Relative frequency (or percentage) of individuals who fall into each category

2
New cards

The relative frequency for particular category

The percentage of the frequency that the category appears in the data set.

3
New cards

Formula for relative frequency

Frequency/number of observations

4
New cards

Percentage of relative frequency

Relative frequency x 100%

5
New cards

A frequency distribution table can be displayed as

A bar chart or a pie chart.

6
New cards

Bar chart

A graph of the distribution of a categorical variable showing the counts for each category next to each other.

  • will effectively show the frequency or percent in different categories

7
New cards

Pie chart

Will show the relationship between the parts and the whole using the slices.

  • used only when you want to emphasize each category’s relation to the whole.

8
New cards

Slice size formula

Category relative frequency x 360 degrees

9
New cards

Contingency table

Shows how individuals are distributed long each variable, contingent on the value of the other variable.

  • Exploring the relationship between 2 categorical variables

10
New cards

Marginal distribution

  • Each frequency distribution of its respective variable.

  • A distribution of either the row or column variable

11
New cards

Conditional distribution

Shows the distribution of one variable for just the observations that satisfy a condition on another variable.

12
New cards

Dependent variables

If the continual distribution of one variable is not the same for each category of another.

  • there is an association between these variables

13
New cards

Independent variables

If the conditional distribution of one variable in a contingency table is the same for each category of another.

  • there is no association between these variables

14
New cards

Mean

A set of numerical observations is the familiar arithmetic average. The balance point of the distribution.

  • is not resistant to outliers and skewness

  • May not represent the “center” of the data very well

15
New cards

In a symmetrical distribution

Mean = median = mode

16
New cards

In a positively (right) skewed distribution

Mean > median > mode

17
New cards

Negatively (left) skewed distribution

Mean < median < mode

18
New cards

Range

Max - min

19
New cards

Upper fence

Every measurement above this fence is an outlier.

20
New cards

Upper fence formula

Q3 + 1.5 x IOR

21
New cards

Lower fence

Every measurement beneath this fence is an outlier.

22
New cards

Lower fence formula

Q1 - 1.5 x IOR

23
New cards

Symmetric distribution

Median line in center of box and whiskers of equal length.

24
New cards

Skewed right

Median line left of center and long right whisker.

25
New cards

Skewed left

Median line right of center and long left whisker.

26
New cards

Numerical summaries

Earning full numbers that preserves the relevant features of the data set so that you can draw useful conclusions.

27
New cards

y

The variable for which we have sample data (variable of interest).

28
New cards

n

Sample size = number of observations of the variable y

29
New cards

y1

The first sample observation of the variable y

30
New cards

y2

The second sample observation of the variable y

31
New cards

yn

The nth same observation of the variable y.

32
New cards

Median (M)

The value that divides the ordered sample in two sets of the same size; one half of the data lies below M, and the other half above M.

33
New cards

Mode

The value that occurs with the highest frequency in a data set. The value where the distribution is the tallest.

34
New cards

Deviations

The observation yi from the mean y bar.

35
New cards

Variance

Denoted by s2, is the sum of squared deviations from the mean divided by n - 1.

36
New cards

Standard deviation (s)

The square root of the variance.

37
New cards

Interquartile range

The range of the middle half of the data. Like the median, is resistant to outliers. Based on quartiles that divide the data into 4 equal sections.

38
New cards

pth percentile

The value so that p% of the measurements fall below this and (100-p)% fall above it.

39
New cards

Lower quartile (Q1)

The 25% percentile, separated the bottom 25% of the measurements from the top 75%.

40
New cards

Upper quartile Q3

The 75th percentile that separates the top 25% of the measurements from the bottom 75%.

41
New cards

Interquartile Range (IQR) formula

Q3 - Q1

42
New cards

Five number summary

A set of measurements consists of the minimum, the first quartile, the median, the third quartile, and the maximum, These numbers give a good summary of a distribution of quantitative observations.

43
New cards

Boxplot

The five number summary leads to this visual representation of a data set. A powerful graphical tool for summarizing data. It shows the center, the spread, and the symmetry or the skewness at the same time these diagrams are particularly useful when comparing groups.

44
New cards

Time plots

A variable plots each observation against the time at which it was measured. Time is always used on the horizontal axis.