Statistics - Unit 1

0.0(0)

Studied by 25 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/82

Earn XP

Description and Tags

Math

IB Mathematics: Analysis and Approaches (SL)

Statistics and probability

Last updated 3:26 PM on 10/17/23

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

83 Terms

New cards

What is data?

Information about individuals or subjects in a population

New cards

What is a variable?

Any characteristic, numerical value, or quantity that can be measured or counted

New cards

What are two examples of variables?

Eye colour, height

New cards

What are the different types of data?

Qualitative (categorical) and quantitative (numerical)

New cards

What are the two types of quantitative/numerical data?

Discrete and continuous data

New cards

What is discrete data?

Countable data that only results in whole numbers

New cards

What is continuous data?

Ranges of values that are not exact

New cards

What is population?

The collection of individuals or subjects that are being studied by the researcher

New cards

What is a sample?

A subset of the population where the collection is made without bias and the sample is highly representative of the total population

New cards

What is a census?

A collection of information from every individual or subject of the population

New cards

What is a parameter?

A numerical value or quantity measuring some aspect of the population

New cards

What is a statistic?

A numerical value or quantity measuring some aspect of the population

New cards

What is distribution?

The variation of data

New cards

What are outliers?

Data points that are either too high or too low when compared to the other data points

New cards

What is a frequency table?

A chart of the number of times each value occurs

New cards

What is a bar graph?

A visual display of data in which quantities are represented by bars of equal width

New cards

When are bar graphs used?

Bar graphs are used for discrete data

New cards

What is a histogram?

A graph representing ranges of data

New cards

When is a histogram used?

A histogram is used with continuous data

New cards

How is range calculated?

Max value - min value

New cards

How is the number of classes chosen?

Based upon the grapher’s discretion, usually a minimum of 5 bins and a maximum of 15

New cards

How is class interval/bin width calculated?

Range/# of classes

New cards

What is a class interval or bin?

The ranges of values to encompass continuous data

New cards

What is a sampling technique?

A method of selecting a sample that will be representative of the overall population

New cards

What are the defining characteristics of simple random sampling?

Every member of the population has an equal change of being selected
The selection of any particular individual does not impact the chances of any other from being chosen

New cards

What is the effectiveness of simple random sampling?

Reduction of sample bias
May not be representative of the population, but these derivations are due only to chance

New cards

What are the defining characteristics of stratified sampling?

The population is divided into groups or members who share common characteristics such as gender, age, education level, geographic areas, etc. which are called strata
A stratified sample has the same proportion of members from each stratum as the population does
A simple random sample for the members of each stratum is taken

New cards

What is the effectiveness of stratified sampling?

Ensures each subgroup within the population receives proper representation'
Many conditions have to be met, so it cannot be used for every study if you cannot classify every member of the population into a stratum

New cards

What are the defining characteristics of systematic sampling?

Used to sample a fixed percent of the population
A random starting point is chosen and every individual from that point is determined by:
n = population size ÷ sample size

New cards

What is the effectiveness of systematic sampling?

It is simple and is therefore popular among researchers
Low probability of contaminating data
If every nth data point has a random characteristic the sample may disproportionately represent the population

New cards

What are the defining characteristics of convenience sampling?

Made up of a conveniently available pool of respondents
Members are chosen based on proximity rather than population representation

New cards

What is the effectiveness of convenience sampling?

Commonly used as it is prompt, simple, and economical
Possibility of bias as some groups will be over-represented while others with be under-represented
Since the selection is biased, there will be inaccuracies in the study

New cards

What is sampling bias?

Inconsistencies in studies caused by biased selection of samples

New cards

What are the defining characteristics of quota sampling?

Survey population is divided into mutually exclusive subgroups
Subgroups are selected with respect to known (non-random) features, traits, or interests

New cards

What is the effectiveness of quota sampling?

Inexpensive method of selecting a sample
Guarantees the inclusion of people you need
Participants are not randomly drawn and may have specific characteristics meaning it is impossible to know how well they represent the groups in a population

New cards

What is statistical bias?

Any factor that favours certain outcomes on responses, skewing the results, and can be unintentional or deliberate

New cards

What is cumulative frequency?

The cumulative frequency of the previous class added to the frequency of the current class that adds to the total frequency

New cards

What is relative frequency?

The frequency of a class divided by the total frequency

New cards

What is the measure of central tendency?

The measure of the location of the middle of a data with the purpose of describing a set of numerical data using a single value

New cards

What are the measures of central tendency for ungrouped data?

Mean, median, and mode

New cards

What is the mode?

The value(s) that occur(s) the most often, and can be more than one value depending on the distribution (ex. bimodal distributions)

New cards

What is the mean?

The average of a set of values

New cards

How is mean calculated?

xˉ = (x ₁ + x ₂ + x ₃ … + x ₙ) ÷ (n)

New cards

What measure of central tendency is the most common?

The mean

New cards

What is the median?

The middle value in a data distribution

New cards

What measure of central tendency do outliers impact?

Mean

New cards

What measure of central tendency should be used if outliers are present?

Median

New cards

What measure of central tendency should be used if data is mostly symmetric?

Mean or median

New cards

What measure of central tendency should be used if frequency is important?

Mode

New cards

What measure of central tendency should be used if data is qualitative?

Mode

New cards

If a constant is added to each value in a data set, what is the impact on mean and standard deviation?

Mean would increase by the added value but standard deviation would not change

New cards

If a constant is multiplied by each value in a data set, what is the impact on mean and standard deviation?

Mean would be multiplied by the value, standard deviation would also increase

New cards

What is a weighted mean?

A measure of central tendency that reflects the relative importance of data

New cards

What is the formula for weighted mean?

xˉ = (∑ f * x) ÷ (n)

New cards

What does x represent in the formula for weighted mean?

The mid-interval value for a class interval

New cards

When is weighted mean used?

When a central tendency measurement is required for a set of grouped data

New cards

What is another term for a cumulative frequency graph?

An Ogive

New cards

How is a cumulative frequency graph built for ungrouped data?

Cumulative frequency is on the y axis and discrete data is on the x axis

New cards

How is a cumulative frequency graph built for grouped data?

Cumulative frequency is on the y axis and the upper class limit is on the x axis

New cards

What is a cumulative frequency graph used for?

To study the growth rate of data by showing the accumulation of frequency and to determine estimates of the percentiles and quartiles of the data

New cards

What are the features of the Ogive?

S-shape used to estimate some values
The ability to determine median by dividing the final cumulative frequency by 2

New cards

What are percentiles?

Separations of large ordered data into hundredths

New cards

What are quartiles?

Separations of large ordered data into quarters

New cards

What is the point showing lower quartile on an Ogive?

The x point when the cumulative frequency is (n + 1) ÷ (4)

New cards

What is the point showing the median on an Ogive?

The x point when the cumulative frequency is (n + 1) ÷ (2)

New cards

What is the point showing the upper quartile on an Ogive?

The x point when the cumulative frequency is 3 * (n + 1) ÷ 4

New cards

What are the points showing percentiles on an Ogive?

The x point when the cumulative frequency is p * (n + 1) ÷ 100

New cards

What is the formula for interquartile range (IQR)?

IQR = Q ₃ - Q ₁

New cards

What is the measure of spread?

The distance of each data point from the mean

New cards

Why is the measure of spread important?

It shows how well a mean represents the rest of the data

New cards

When is range used as a measure of spread?

When the sample sizes are small

New cards

What is variance?

A method of measuring spread by taking the sum of the squares of the difference between each data point and the average

New cards

What is the formula for variance (σ²)

σ² = (∑ (x - xˉ)²) ÷ (n)

New cards

What is standard deviation?

An average of the square of the distance of each piece of data from the mean, meaning the smaller the standard deviation, the more compact the data set

New cards

What is the formula for standard deviation (σ)

σ = √((∑ (x - xˉ)²) ÷ (n))

New cards

Why is standard deviation an approximation?

Because when variables are grouped and the midpoint is used, the spread of observation within the interval is ignored, causing the standard deviation to be lower than the true value

New cards

What is a box and whisker plot?

A plot showing the lower extreme, lower quartile, median, upper quartile, and upper extremes of a data set, with a box showing the lower-upper quartiles and whiskers showing the extremes

New cards