Studied by 1 person

0.0(0)

get a hint

hint

Looks like no one added any tags here yet for you.

1

Quantitative Data

Discrete and Continuous

New cards

2

Discrete Data

Must be measured in specific order / values, such as number of students in a class

New cards

3

Continuous Data

Measured infinitely such as age, height, time

New cards

4

Qualitative Data

Categorical, ordinal and nominal

New cards

5

Ordinal Data

Places in order and conveys a ranking such as clothing sizes (small, medium large)

New cards

6

Nominal Data

Does not convey ranking such as ethnicity, gender

New cards

7

What type of data is the number of cars a family owns?

Discrete

New cards

8

What type of data is the type of accommodation (such as budget, tourist, superior)

Ordinal - conveys a ranking

New cards

9

What type of data is favourite fruit preference at the market?

Nominal, conveys no ranking

New cards

10

What type of data is time spent at the market?

Discrete - measures time which is a specific value

New cards

11

Weekly household spending is divided into these groups: less than $50, $50-$100, $150-$200. What type of variable is this?

Categorical & Ordinal (defines categories and placed in order to convey a ranking)

New cards

12

Cross tabulation

Compares categorical with categorical

New cards

13

scatter plot

Compares numerical with numerical

New cards

14

frequency table

analyses 1 categorical variable. E.g. the fave stall of people at the market

New cards

15

Stacked / clustered bar chart

compares categorical with categorical e.g. proportion of M/F choosing fave stall

New cards

16

Relative frequency histogram

compares categorical with numerical (e.g. market spend of various occupational groups)

New cards

17

If the 2 variables are, "favourite stall" and "if visitors are regular or not", ac ross tabulation should be used because,

both variables are categorical and define a particular category

New cards

18

Mean

simple average

New cards

19

median

middle most value (when ranked from ascending to descending)

New cards

20

mode

most frequent

New cards

21

trimmed mean

without most extreme 5%

New cards

22

Range

maximum - minimum

New cards

23

interquartle range

75th percentile minus 25th percentile

New cards

24

variance

represents spread of data around the mean. Standard deviation squared

New cards

25

standard deviation

square root of variance, higher spread means more spread

New cards

26

co-efficient of variation

compares different groups with different magnitudes to compare variability

New cards

27

skewness

positive = right negative = left

New cards

28

significantly skewed

data is skewed more than twice its standard error

New cards

29

New cards

30

mode

median

New cards

31

Kurtosis

measures the extent to which observations cluster around the central point

New cards

32

What is it called when the kurtosis statistic is zero?

normal distribution

New cards

33

data clusters close to centre: positive or negative kurtosis?

positive

New cards

34

data clusters further from centre: positive or negative kurtosis?

negative

New cards

35

co-variance

measures co-movement between 2 variables

New cards

36

correlation of co-efficient

measures the linear relationship between 2 variables

New cards

37

What graph would measure the following: comparing time spent at the market average income

scatterplot, as it measures numerical by numerical

New cards

38

population

whole collection under analysis

New cards

39

sample

a portion of the population

New cards

40

parameter

summary measure describing a characteristic of the data, a type of rule or limit

New cards

41

statistic

summary measure computed to describe a characteristic of a sample

New cards

42

primary data

collected yourself

New cards

43

secondary data

taken from another source

New cards

44

observational data

you observe and record

New cards

45

experimental data

data you've obtained through experiments

New cards

46

simple random sampling

everyone is equally likely to get chosen from the population. E.g. randomly picking a certain number of students

New cards

47

systematic random sampling

having a system when randomly selecting sample. E.g. randomly selecting a sample then every K'th sample thereafter

New cards

48

Stratified random sampling

dividing populaiton into homogenous groups (similar characteristics) then taking random sample, e.g. dividing students by which degree they take then taking random sample

New cards

49

cluster sampling

dividing population into several clusters that aren't homogenous but are each representative of the population then taking a random sample

New cards

50

You want to sample residential halls but worry that a random sample wont include the small halls. Which sampling method should you use?

Stratified random sampling

New cards

51

Non sampling errors

human errors

New cards

52

coverage errors

when the sample has targeted the wrong subjects

New cards

53

non-response error

when subject chooses to not respond, impacting the data

New cards

54

measurement error

caused by bad question and misunderstanding

New cards

55

margin of error

quantified measure of sampling error

New cards

56

probability

how likely an event is to occur

New cards

57

how is probability written

P(event)

New cards

58

What is U in probability

union - probability of one event occurring over another

New cards

59

what is 'n' in probability

intersection - probability that both events occur together

New cards

60

collectively exhaustive

when the outcomes given are the only possible outcomes

New cards

61

complement

2 events complement each other if their probabilities add to 1. E.G. P(a) + P(b) =1

New cards

62

A Priori Classical

when you already know the probability exists through information

New cards

63

Empirical (relative frequency)

when you choose to work out the probability through experiments rather than information

New cards

64

Subjective

when the probability is based in your opinion

New cards

65

Conditional Probability

the probability of an event occurring given that another event has already occurred.

New cards

66

How is conditional probability written

P(A I B) e.g. P(Student I Female) "what is the probability that it is a student and they're female"

New cards

67

how is conditional probability calculated?

P(a n b) / P(b)

New cards

68

Marginal probability

total probability of a row or column

New cards

69

Probability independence

when the probability of one event does not influence the probability of another event occurring

New cards

70

When does co-variance = 0?

when variables are independent

New cards

71

Random Variables

variables with multiple possible values and an associated probability of getting each variable

New cards

72

Discrete Random Variables

can only take on a finite number of variables, e.g. the number of 6's rolled on a dice over 2 rolls: there can only be either 0 sixes, 1 six, or 2 sixes.

New cards

73

Expected Value defined

the value we expect based on the probabilities that exist.

New cards

74

Expected Value formula

E = ∑ [x • P(x)]

New cards

75

Variance

measures data spread around the mean

New cards

76

Variance formula

V(X) = ∑ [p(xi) + (xi-M)^2]

New cards

77

Binomial Distribution

discrete probability distribution with 4 characteristics

New cards

78

what are the 4 binomial characteristics

has to be 2 outcomes to every trial (success or fail)

fixed number of trials

probability of success remains the same for every trial

trials are independent, where the outcomes don't affect each other).

New cards

79

Discrete Random Variables

Cannot be divided, whole numbers, e.g. number of phone calls in a day, number of visitors

New cards

80

Expected Value

what we expect based on previous data. Formula: E(x) = (0 x 0.25) + (1 x 0.5) + (2 x 0.25) = 1

New cards

81

Variance

spread of the data. Formula is similar to expected value: V(x) = ((0² x 0.25) + (1² x 0.5) + (2² x 0.25))-1²

New cards

82

Poisson Probability Distribution

A discrete probability distribution used to find probabilities of the number of times a certain event occurs in a specified time interval (no fixed number of trials)

New cards

83

4 characteristics of Poisson

number of successes in trial is independent of number of successes in any other interval

Probability is the same for all equal sized intervals

probability of success in a trial is proportional to the size of the interval

probability of more than one success in an interval approaches zero as it becomes smaller

New cards

84

Empirical Rule

68% = 3 standard dev 95% = 2 standard dev 100% = 1 standard dev

New cards

85

normal distribution

A function that represents the distribution of variables as a symmetrical bell-shaped graph.

New cards

86

Standardized Z-Distribution

mean = 0 standard deviation = 1

New cards

87

How to recognise if data is normally distributed

graph is mound shaped and symmetrical

mean = median

empirical rule applies (68=3, 95=2, 100=3)

skewness & kurtosis close to 0

New cards

88

Graphs to show normally distributed data

histogram

box plot

stem & leaf

qq pp plot

New cards

89

What does a sample statistic do

makes an inference on a population parameter if you cant sample an entire population.

New cards

90

A quantitative estimate involves

a mean "what is the mean grade of the students"

New cards

91

what are x̅ and μ

x̅ represents the mean in a sample statistic, and μ is the same as x̅, but it represents the whole (parameter) population

New cards

92

A qualitative estimate involves

a proportion "what proportion of the population is from christchurch

New cards

93

Interval Estimates

estimations of a range of values of a population parameter. E.g. we expect μ to fall within $75-$100, or, we expect P to fall within 0.25-0.50

New cards

94

Point Estimates

estimates an exact value of a parameter using a single value. Unlikely to estimate correctly so use interval estimate instead

New cards

95

how to calculate confidence intervals

point estimate plus or minus margin of error (confidence level x standard error)

New cards

96

standard error

is the standard deviation of sample mean/proportion and represents the sample mean/proportions accuracy

New cards

97

when would you use the z distribution when trying to estimate a confidence interval

when the population standard deviation is known

the sample is normally distributed or, sample is large

New cards

98

When would you use the t distribution when trying to estimate a confidence interval

population standard deviation is unknown

sample is normally distributed or, is large

New cards

99

when would you use the Z distribution when trying to estimate a confidence interval

for proportions as you'll always know the population ST.D

New cards

100

What are the Z values

99% = 2.576 95% = 1.96 90% = 1.645

New cards