Statistics- 1: Description and Inferences

0.0(0)
studied byStudied by 7 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/36

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

37 Terms

1
New cards

variable=

any characteristic, number or quantity that can be measured and can differ across entities or across time

2
New cards

variables examples

  • hair colour, level of trust in the government, age, number of consecutive stairs climbed…

3
New cards

Types of variables:

  • different levels of measurement: nature of information of the values assigned to variables

<ul><li><p>different levels of measurement: nature of information of the values assigned to variables</p><p></p></li></ul>
4
New cards

Categorical (level of measurement): nominal

  • 2 or more exclusive categories

  • no natural order

  • no arithmetic operations possible

eye colour, marital status, hair colour, political party affiliation…

5
New cards

Categorical (level of measurement): ordinal

  • clear ordering of the values (high to low)

  • distance between values not the same across levels

education level, political interest, performance ratings, agreement to a statement…

6
New cards

Numerical (level of measurement): continuous

  • can be measured to any level of precision

height, weight…

7
New cards

Numerical (level of measurement): Discrete

  • only countable variables are possible

    • whole and positive numbers

  • Can be measured in discrete terms= whole numbers

pets owned, point in an exam, number of car accidents…

8
New cards

Explanatory variables

= cause

often x

Independent variable

9
New cards

Response variable

Outcome

often y

Dependent variable

10
New cards

Organizing data in a dataset:

  • column= variable

  • row= a given record of the data set

  • cell= one observation

11
New cards

Frequency distribution:

= how the values are distributed in relation to other values.

  • display of the pattern of frequencies of a variable

  • how often they occur in a data set.

12
New cards

Skewness:

negative (left) skew: mass concentrated on the right, left tail is longer.

positive (right) skew: mass concentrated on the left, tail longer on the right

<p>negative (left) skew: mass concentrated on the right, left tail is longer.</p><p>positive (right) skew: mass concentrated on the left, tail longer on the right</p>
13
New cards

how can we summarize/ describe distributions of variables?

  1. visualize data

  2. Calculate measures

    • measure of dispersion: how stretched or squeezed the distribution is.

14
New cards

Measures of central tendency/ level of measurement: nominal

mode

15
New cards

Measures of central tendency/ level of measurement: ordinal

median +mode

16
New cards

Measures of central tendency/ level of measurement: numeric

Mean + median+ mode

17
New cards

Mode=

  • most frequent score in a data set

  • data with 1 mode= uni modal

  • can be several

18
New cards

Median=

  • middle score of a set of data that has been arranges in order of magnitude:

    • when even numbers: divide by 2

19
New cards

Mean=

  • arithmetic

  • average of numbers.

  • sensitive to extreme values (outliers)

  • = robust statistic.

20
New cards

Measures of central dispersion/ level of measurement: nominal

not possible

21
New cards

Measures of central dispersion/ level of measurement: ordinal

range, inter-quartile range

22
New cards

Measures of central dispersion/ level of measurement: numeric

Range, inter-quartile range, variance/ standard deviation

23
New cards

Range=

difference between the lowest and highest values

24
New cards

Percentiles=

  • split data into chunks

    • percentile= 100

    • deciles= 10

    • quintiles=5

    • quartiles=4

25
New cards

Inter-quartile range=

= range of the middle 50% if the data:

  • calculate by subtracting 1st quartile from the 3rd quartile.

  • robust statistic → not affected by outliers as covers middle 50% of the values

<p>= range of the middle 50% if the data:</p><ul><li><p>calculate by subtracting 1st quartile from the 3rd quartile.</p></li><li><p>robust statistic → not affected by outliers as covers middle 50% of the values</p></li></ul>
26
New cards

Problem with IQR?

  • ‘robust’ → sensitive to outliers

    • only uses a selection of data

27
New cards

Measures using all data:

deviance= how much does each value deviate from the mean.

28
New cards

Deviance=

  • calculate all deviances= value- mean

  • and then the sum of them = total deviance

29
New cards

Problem with total deviance + solution=

  • when added up=0 → not useful measure of spread

    • instead → calculate sum of squared errors (SS)

      • square the deviances

      • sum of the squared deviances.

30
New cards

Problem with squared errors + Solution=

  • increase n -→ increase in SS= NOT useful to compare.

    • Solution= divide sum of squared errors by number of observations (N) minus 1.

      • =VARIANCE

31
New cards

calculate variance=

  • divide sum of squared errors by number of observations (N) minus 1.

32
New cards

Standard deviation:

  • calculated once we have the variance.

  • Letter sigma (σ)

    = square root of the variance

  • dependent on the scale.

33
New cards

Larger standard deviation=

bigger dispersion around the mean

34
New cards

Can you identify mode? (in levels of measurement)

  • nominal= YES

  • ordinal= YES

  • numeric= YES

35
New cards

Can you identify median and percentiles? (Level of measurement)

  • nominal= NO

  • ordinal= YES

  • numeric= YES

36
New cards

Can you add/ subtract? (Level of measurement)

  • nominal = NO

  • ordinal= NO

  • numeric= YES

37
New cards

Can you identify the mean & standard deviation? (level of measurement)

  • nominal = NO

  • ordinal= NO

  • numeric= YES