Statistics - Unit 1

studied byStudied by 23 people
0.0(0)
Get a hint
Hint

What is data?

1 / 82

83 Terms

1

What is data?

Information about individuals or subjects in a population

New cards
2

What is a variable?

Any characteristic, numerical value, or quantity that can be measured or counted

New cards
3

What are two examples of variables?

Eye colour, height

New cards
4

What are the different types of data?

Qualitative (categorical) and quantitative (numerical)

New cards
5

What are the two types of quantitative/numerical data?

Discrete and continuous data

New cards
6

What is discrete data?

Countable data that only results in whole numbers

New cards
7

What is continuous data?

Ranges of values that are not exact

New cards
8

What is population?

The collection of individuals or subjects that are being studied by the researcher

New cards
9

What is a sample?

A subset of the population where the collection is made without bias and the sample is highly representative of the total population

New cards
10

What is a census?

A collection of information from every individual or subject of the population

New cards
11

What is a parameter?

A numerical value or quantity measuring some aspect of the population

New cards
12

What is a statistic?

A numerical value or quantity measuring some aspect of the population

New cards
13

What is distribution?

The variation of data

New cards
14

What are outliers?

Data points that are either too high or too low when compared to the other data points

New cards
15

What is a frequency table?

A chart of the number of times each value occurs

New cards
16

What is a bar graph?

A visual display of data in which quantities are represented by bars of equal width

New cards
17

When are bar graphs used?

Bar graphs are used for discrete data

New cards
18

What is a histogram?

A graph representing ranges of data

New cards
19

When is a histogram used?

A histogram is used with continuous data

New cards
20

How is range calculated?

Max value - min value

New cards
21

How is the number of classes chosen?

Based upon the grapher’s discretion, usually a minimum of 5 bins and a maximum of 15

New cards
22

How is class interval/bin width calculated?

Range/# of classes

New cards
23

What is a class interval or bin?

The ranges of values to encompass continuous data

New cards
24

What is a sampling technique?

A method of selecting a sample that will be representative of the overall population

New cards
25

What are the defining characteristics of simple random sampling?

  • Every member of the population has an equal change of being selected

  • The selection of any particular individual does not impact the chances of any other from being chosen

New cards
26

What is the effectiveness of simple random sampling?

  • Reduction of sample bias

  • May not be representative of the population, but these derivations are due only to chance

New cards
27

What are the defining characteristics of stratified sampling?

  • The population is divided into groups or members who share common characteristics such as gender, age, education level, geographic areas, etc. which are called strata

  • A stratified sample has the same proportion of members from each stratum as the population does

  • A simple random sample for the members of each stratum is taken

New cards
28

What is the effectiveness of stratified sampling?

  • Ensures each subgroup within the population receives proper representation'

  • Many conditions have to be met, so it cannot be used for every study if you cannot classify every member of the population into a stratum

New cards
29

What are the defining characteristics of systematic sampling?

  • Used to sample a fixed percent of the population

  • A random starting point is chosen and every individual from that point is determined by:

    n = population size ÷ sample size

New cards
30

What is the effectiveness of systematic sampling?

  • It is simple and is therefore popular among researchers

  • Low probability of contaminating data

  • If every nth data point has a random characteristic the sample may disproportionately represent the population

New cards
31

What are the defining characteristics of convenience sampling?

  • Made up of a conveniently available pool of respondents

  • Members are chosen based on proximity rather than population representation

New cards
32

What is the effectiveness of convenience sampling?

  • Commonly used as it is prompt, simple, and economical

  • Possibility of bias as some groups will be over-represented while others with be under-represented

  • Since the selection is biased, there will be inaccuracies in the study

New cards
33

What is sampling bias?

Inconsistencies in studies caused by biased selection of samples

New cards
34

What are the defining characteristics of quota sampling?

  • Survey population is divided into mutually exclusive subgroups

  • Subgroups are selected with respect to known (non-random) features, traits, or interests

New cards
35

What is the effectiveness of quota sampling?

  • Inexpensive method of selecting a sample

  • Guarantees the inclusion of people you need

  • Participants are not randomly drawn and may have specific characteristics meaning it is impossible to know how well they represent the groups in a population

New cards
36
<p>What is statistical bias?</p>

What is statistical bias?

Any factor that favours certain outcomes on responses, skewing the results, and can be unintentional or deliberate

New cards
37

What is cumulative frequency?

The cumulative frequency of the previous class added to the frequency of the current class that adds to the total frequency

New cards
38

What is relative frequency?

The frequency of a class divided by the total frequency

New cards
39

What is the measure of central tendency?

The measure of the location of the middle of a data with the purpose of describing a set of numerical data using a single value

New cards
40

What are the measures of central tendency for ungrouped data?

Mean, median, and mode

New cards
41

What is the mode?

The value(s) that occur(s) the most often, and can be more than one value depending on the distribution (ex. bimodal distributions)

New cards
42

What is the mean?

The average of a set of values

New cards
43

How is mean calculated?

xˉ = (x ₁ + x ₂ + x ₃ … + x ₙ) ÷ (n)

New cards
44

What measure of central tendency is the most common?

The mean

New cards
45

What is the median?

The middle value in a data distribution

New cards
46

What measure of central tendency do outliers impact?

Mean

New cards
47

What measure of central tendency should be used if outliers are present?

Median

New cards
48

What measure of central tendency should be used if data is mostly symmetric?

Mean or median

New cards
49

What measure of central tendency should be used if frequency is important?

Mode

New cards
50

What measure of central tendency should be used if data is qualitative?

Mode

New cards
51

If a constant is added to each value in a data set, what is the impact on mean and standard deviation?

Mean would increase by the added value but standard deviation would not change

New cards
52

If a constant is multiplied by each value in a data set, what is the impact on mean and standard deviation?

Mean would be multiplied by the value, standard deviation would also increase

New cards
53

What is a weighted mean?

A measure of central tendency that reflects the relative importance of data

New cards
54

What is the formula for weighted mean?

xˉ = (∑ f * x) ÷ (n)

New cards
55

What does x represent in the formula for weighted mean?

The mid-interval value for a class interval

New cards
56

When is weighted mean used?

When a central tendency measurement is required for a set of grouped data

New cards
57

What is another term for a cumulative frequency graph?

An Ogive

New cards
58

How is a cumulative frequency graph built for ungrouped data?

Cumulative frequency is on the y axis and discrete data is on the x axis

New cards
59

How is a cumulative frequency graph built for grouped data?

Cumulative frequency is on the y axis and the upper class limit is on the x axis

New cards
60

What is a cumulative frequency graph used for?

To study the growth rate of data by showing the accumulation of frequency and to determine estimates of the percentiles and quartiles of the data

New cards
61

What are the features of the Ogive?

  • S-shape used to estimate some values

  • The ability to determine median by dividing the final cumulative frequency by 2

New cards
62

What are percentiles?

Separations of large ordered data into hundredths

New cards
63

What are quartiles?

Separations of large ordered data into quarters

New cards
64

What is the point showing lower quartile on an Ogive?

The x point when the cumulative frequency is (n + 1) ÷ (4)

New cards
65

What is the point showing the median on an Ogive?

The x point when the cumulative frequency is (n + 1) ÷ (2)

New cards
66

What is the point showing the upper quartile on an Ogive?

The x point when the cumulative frequency is 3 * (n + 1) ÷ 4

New cards
67

What are the points showing percentiles on an Ogive?

The x point when the cumulative frequency is p * (n + 1) ÷ 100

New cards
68

What is the formula for interquartile range (IQR)?

IQR = Q ₃ - Q ₁

New cards
69

What is the measure of spread?

The distance of each data point from the mean

New cards
70

Why is the measure of spread important?

It shows how well a mean represents the rest of the data

New cards
71

When is range used as a measure of spread?

When the sample sizes are small

New cards
72

What is variance?

A method of measuring spread by taking the sum of the squares of the difference between each data point and the average

New cards
73

What is the formula for variance (σ²)

σ² = (∑ (x - xˉ)²) ÷ (n)

New cards
74

What is standard deviation?

An average of the square of the distance of each piece of data from the mean, meaning the smaller the standard deviation, the more compact the data set

New cards
75

What is the formula for standard deviation (σ)

σ = √((∑ (x - xˉ)²) ÷ (n))

New cards
76

Why is standard deviation an approximation?

Because when variables are grouped and the midpoint is used, the spread of observation within the interval is ignored, causing the standard deviation to be lower than the true value

New cards
77

What is a box and whisker plot?

A plot showing the lower extreme, lower quartile, median, upper quartile, and upper extremes of a data set, with a box showing the lower-upper quartiles and whiskers showing the extremes

New cards
78
<p>What do pirates and global warming show?</p>

What do pirates and global warming show?

Causation ≠ correlation

New cards
79

What is the name for the values of r corresponding to correlations?

Pearson’s product-moment correlation

New cards
80

How do you draw a regression line?

  1. Start at the mean

  2. Place the ruler on the mean

  3. Align the ruler so that the amount of data points above and below are equal

  4. Draw the line

New cards
81

What is linear regression?

A line of best fit

New cards
82

What are the two equations for lines of best fit?

y on x, y = ax + b
x on y, x = cy + d

New cards
83

How do you find a point (p, q) from a regression line?

  1. Find the mean of x

  2. Find the mean of y

  3. Draw two lines of best fit, one for y = ax +b and another for x = cy + d

  4. Find the point at which line ax + b intersects with line cy + d

  5. The calculated (x,y) coordinate is equal to (p,q)

New cards

Explore top notes

note Note
studied byStudied by 18 people
... ago
5.0(1)
note Note
studied byStudied by 36 people
... ago
5.0(1)
note Note
studied byStudied by 9 people
... ago
5.0(1)
note Note
studied byStudied by 22 people
... ago
5.0(1)
note Note
studied byStudied by 6 people
... ago
5.0(1)
note Note
studied byStudied by 5 people
... ago
5.0(1)
note Note
studied byStudied by 12 people
... ago
5.0(1)
note Note
studied byStudied by 91 people
... ago
5.0(2)

Explore top flashcards

flashcards Flashcard (54)
studied byStudied by 33 people
... ago
5.0(1)
flashcards Flashcard (166)
studied byStudied by 76 people
... ago
5.0(2)
flashcards Flashcard (30)
studied byStudied by 1 person
... ago
5.0(1)
flashcards Flashcard (30)
studied byStudied by 5 people
... ago
5.0(1)
flashcards Flashcard (135)
studied byStudied by 2 people
... ago
5.0(1)
flashcards Flashcard (71)
studied byStudied by 3 people
... ago
5.0(1)
flashcards Flashcard (303)
studied byStudied by 15 people
... ago
5.0(1)
flashcards Flashcard (26)
studied byStudied by 20 people
... ago
5.0(2)
robot