1/97
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
Statistics
___________ - a set of mathematical procedures for:
1) Organizing + summarizing info → describe
2) Interpreting information (help answer questions) → inferences
Population
____________ - the set of all individuals of interest in a particular study
Ex: Chinese Americans over 65
Sample
__________ - A set of individuals selected from a population intended to represent the population in a research study
Ex: 200 Chinese American participants over 65
Random sample
___________ - a sample where everyone in a population has an equal chance of being picked to better represent a population
Variable
__________ - a characteristic or condition that changes or has different values for different individuals
Data
___________ - measurements or observations
Data Set
____________ - a structured collection of measurements or observations
Ex: A chart of height and weight of kittens
Datum
___________ - a single measurement or observation, commonly called a score or a raw score
Parameter
_________ - a characteristic that describes a population
Ex: The average running speed of 11-year-old girls in Saskatchewan
Statistic
_________ - A characteristic that describes a sample
Ex: The average speed of thirty 11-year-old males in a study
Descriptive statistics
____________ - statistical procedures used to summarize, organize, and simplify data
Ex: Averaging, tables, graphs
Inferential statistics
__________ - consist of techniques that allow us to study samples and then make generalizations about populations from which they were selected
Representative sample
_________ - a sample that reflects general population, used in inferential statistics to better make generalizations about a population
Sampling error
___________ - the naturally occurring discrepancy that exists between a sample statistic and the corresponding population parameter
AKA margin of error
When inferring, you do not have every individual accounted for
Ex: A population with an average IQ of 100 but a sample with an average IQ of 107
Why we use n-1 in variance formula
Constructs
_________ - internal attributes or characteristics that cannot be directly observed but are useful for describing and explaining behaviour
Ex: Self-esteem or hunger
Operational definition
____________ - identifies a measurement procedure for measuring external behaviour and used the resulting measurements as a definition and a measurement of a hypothetical construct
Ex: Heart rate while giving a speech to measure anxiety levels
Discrete variable
____________ - consists of separate indivisible categories. No values can exist between two neighbouring categories
Ex: Number of students in a class (there can't be half a student) or majors of students
Continuous variable
___________ - when there are an infinite number of possible values that fall between two observed values.
Ex: Time
Real limits
___________ - the boundaries of intervals for scores that are represented on a continuous number line. Located halfway between scores
Ex: 49.5 and 50.5 are the limits for what counts as a score of 50
Upper ______: the highest to round (50.5)
Lower _______: the lowest to round (49.5)
Nominal scale
___________ - a set of categories that have different names. Measurements on this scale label categories as observations but do not make any quantitative distinctions between observations
Ex: Hair color
Ordinal scale
___________ - consists of a set of categories that are organized in an ordered sequence. Measurements on an ________ rank observations in terms of size and magnitude
Ex: First, second, third
Interval scale
__________ - consists of ordered categories that are all intervals of exactly the same size. Zero point on an interval scale is arbitrary and does not indicate a 0 amount being measured
Ex: Temperature
Ratio scale
__________ - An interval scale with an absolute 0 point representing nothing/an absolute zero. Ratios of numbers reflect ratios of magnitude
Correlational method
____________ - two different variables are observed to set whether there is a relationship between them
Does NOT determine causation
Frequency distribution
_________- an organized tabulation of the number of individuals located in each category on the scale of measurement
Proportion + percentage
__________ - are two other measures besides frequency distribution that describe distribution of scores and can be incorporated into the table
Proportion
_________- the fraction of the total group associated with each score
Parts of a whole— can be described as fractions, decimals or percentages
p= f/n
Percentage formula
% = p(proportion)100
Percentile rank
__________ - a particular score is defined as the percentage of individuals in the distribution with scores at or below the particular value
Ex: A rank of 80th percentile means the student scored better than 80% of other test-takers, while only 20% scored higher.
Percentile
_____________- specific value or score— when a score is identified by its percentile rank
Ex: A score of 9 is within the 95th percentile
Cumulative frequency
_____________ - the accumulation of individuals as you move up the scale, listed as cf
Sum of all frequency counts of scores up to an including the upper real limit of a given scores
Cumulative percentage
____________ - shows the percentage of individuals as you move up the scale, listed as c%
The percentage of all frequency counts of scores in a freq. dist. up to and including the upper real limit of a giving score
Cumulative percentage formula
c%= cf/N(100%)
Real limits
___________ - Frequency distribution uses ________ when using continuous data because because it measures in intervals
Ex: X=8 could be 7.5 - 8.5
Abscissa
____________ - another name for the X axis
Ordinate
__________- another name for the Y axis
Measurement scales
__________ - how variables (x values) are categorized, counted, or measured
NOIR: Nominal, ordinal, interval, ratio
Histograms + polygons
___________ - Two graphing options for data using interval or ratio scale
Characteristics of a histogram
__________ - Made of bars, no spaces between bars
Informal histogram
__________ - Instead of bars the graph consists of stacked blocks.
Characteristics of a polygon
_____________ - Dots representing data, are placed through each score with a line going through, the line is ended by drawing it down to the x axis
Bar graph
_________ - What graph can be used for nominal or ordinal data?
Characteristics of a bar graph
__________ - For nominal scale, there are spaces between bars to indicate separate categories
For ordinal, it is because you can't assume the categories are the same size
Smooth curves
____________ - What graphs can be used for population distribution? (Interval or ratio scale)
Normal curve
_________ - a smooth curve graph with a single wide slope that is symmetrical on both sides
Ex: IQ scores
Cannot be bimodal
Symmetrical distribution
____________ - the scores tend to pile up toward the middle parts of the scale and taper off gradually at the other ends
Also applies to bimodal
Positively skewed
__________ - tail moving towards the right, body to the left
Ex: It is positive if you’re drunk and closer to the wall
Negatively skewed
___________ - tail going toward the left, body to the right
Ex: It is negative if you’re not close to the wall
Central tendency
__________ - statistical measure to determine a single score that defines the center of a distribution of scores
Goal is to find the most typical or representative value!
Mean, median, & mode are most used
Mean esp.
Mean
Central tendency
__________ - the sum of scores divided by the number of scores
Represented by— M, Xbar or mu
Associated w. interval or ratio data
Median
Central tendency
_________ - the score corresponding to the point having 50% of the observations below it [and 50% above] when the observations are arranged in numerical order, does not always change when adding or dropping a score
When is the median preferred?
Central tendency
___________ - Ordinal data or continuous data that is skewed
Mode
Central tendency
__________ - the most commonly occurring score in a sample or population does not always change with new score or change in score
People in the population/sample MUST have this as a score
When is the mode preferred?
Central tendency
_________ - for nominal scale, discrete variables
The mode is also useful for describing shape when used along with the mean
3 measures of central tendency
Central tendency
_________- Mean, Median, Mode
Examples of mean
Central tendency
__________ - examples include: report card, sports
Examples of median
Central tendency
__________ - examples include: household income, salary
Examples of mode
Central tendency
__________ - examples include: retail sales, election voting
Variability
_________- a group of quantitative measures of the differences between scores + describes the degree to which the scores are spread out or clustered together
Includes: Range, IQR, variance, standard deviation
GREATLY affected by outliers
Two purposes of variability
Variability
__________-
Describes the distribution of scores
Helps determine how representative a score is to the entire distribution
Range
Variability
________ - distance covered by scores in a distrbution. Determined by only the most extreme high and low scores in a distribution— doesn’t cover all scores, so it is not the most accurate
Crude and unreliable measure of variability
Not really used in formal descriptions cuz of this
Range formula
Variability
________ - For this class, focuses on these two formulas:
For discrete variables:
Xmax - Xmin + 1
For continuous variables:
XmaxURL (upper real limit) - XminLRL (lower real limit)
Simple ______ is Xmax-Xmin
Interquartile range (IQR)
Variability
________ - the range of scores that make up the middle 50% of a dist. Based on quartiles, a type of percentile rank. Bottom and top 25% of distribution are excluded
For semi ______, simply divide the _____ by half.
Q1 = 25th percentile (25% of scores fall below it)
Q2 = 50th percentile
Q3 = 75th percentile
Q4 = 100th percentile (highest score)
IQR formula
Variability
__________ - Q3 - Q1
For semi ______, simply divide the _____ by half/2
Deviation
Variability
______- the distance of a score from the mean
For populations, deviation score = X - μ
For samples, Deviation score = X - M
Sum of Squares (SS)
Variability
__________ - the sum of the squared deviation scores
Represented by the symbol, SS, SS =Σ(X -μ)²
Variance
Variability
___________ - the average squared [raised to the second power] distance from the mean, represented by σ² or s²— measures how spread out a set of numbers is from its average (mean), calculated as the average of the squared differences from the mean
Population: SS/n
Sample: SS/n -1
Standard devation
Variability
_________ - the square root of the variance and provides a measure of the average distance from the mean
Represented by σ or s
Standard deviation formula
Variability
o= sqrt Sum(X-M)^2/N
Degrees of freedom
___________- The number of scores in the sample that are independent & free to vary
Z-score
__________ - a measure of how many standard deviations you are away from the average/mean. Used to describe a location within a distribution using a single number.
Allows for comparisons between distributions w/ diff means and std. deviations
Comparisons are based on the equivalent magnitude of differences from the mean
Z-score formula
_________- z = (X - μ)/σ
Standardized (z) distribution
__________- a symmetrical distribution composed of scores that have been transformed to create predetermined values for μ & σ.
Probability
___________ - for a situation in which several diff, outcomes are possible, the __________ for any specific outcome is defined as a fraction or a proportion of all possible outcomes.
If possible outcomes are identified as A, B, C, D and so on then
probability of A = number of outcomes classified as A / total number of possible outcomes
Told in fractions or decimals (not percentages unless asked)
Random sampling w/ replacement
_________ - requires random sampling (each individual has an equal chance of being selected) also that the probability of being selected stays constant from one selection to the next if more than one individual is selected
Items are returned to the pool after selection, allowing duplicates and keeping probabilities constant
Can also simply be called random samples or independent random sampling
Ex: Looking for an Ace of Hearts = 1/52
Pulled a Deuce of Spades, keeping that card in. Still probability(Ace of Hearts) = 1/52
Random sampling WITHOUT replacement
___________ - requires random sampling (each individual has an equal chance of being selected) also that the probability of selected items being removed (not constant) preventing repeats and changing probabilities w/ each draw
Samples = unique, draws dependent
Ex: Looking for an Ace of Hearts = 1/52
Pulled a Deuce of Spades, taking that card out. Now probability(Ace of Hearts) = 1/51
History
________- in this order:
Florence Nightingale (1820-1910)
Francis Galton (1822-1911)
Karl Pearson (1857-1936)
Sir Ronald Fisher (1890-1962)
Freudians and Behaviourists (1920s)
Trait theorists (1940s)
Decline of behaviourism (1950s)
Statistics and psychology
Quantitative vs qualitative
1) Florence Nightingale
History
___________ - revolutionized statistics by using data visualization, particularly her "rose diagram," to show that sanitation, not battle wounds, caused most soldier deaths in the Crimean War, advocating for public health reforms and establishing data collection standards for comparable hospital statistics, making her a pioneer in data-driven healthcare and a foundational figure in statistical graphics.
2) Francis Galton
History
___________ -
Suffered a breakdown in anticipation of the honors exams which resulted in his graduating without a distinguished degree
First to demonstrate that the "normal distribution" could be applied to human psychological attributes, including intelligence
Coined the term "eugenics" and the phrase "nature versus nurture"
Discovered that fingerprints were an index of personal identity
First to utilize the survey as a method for data collection
3) Karl Pearson
History
___________ -
Argued successfully to have the University regulations changed so that attendance was no longer compulsory at divinity lectures or at the chapel — and then continued to attend
Galton's Natural Inheritance in 1889 prompted him to explore statistical analyses to explain heredity and evolution: regression, correlation, and the chi-square test
Correlation, demonstrating the relation of two variables
Cannot determine cause and effect
Pearson was a co-founder, with Weldon and Galton, of the statistical journal Biometrik
4) Sir Ronald Fisher
History
___________ -
After leaving Cambridge, Fisher had no means of financial support and worked for a few months on a farm in Canada
Pearson offered him the post of chief statistician at the Galton laboratories but he instead became chief statistician at the Rothamsted Agricultural Experiment Station
Experimental method
Random assignment
Independent variable
Dependent variable
Small samples
5) The rest of history
History
___________ -
1920’s Freudians and Behaviorists
1940’s Trait theorists
1950’s Decline of Behaviorism
Statistics as the language of psychology
Quantitative vs. Qualitative
Variable
_________- a characteristic or condition that can or does change between individuals in a sample
Participant + Enviornmental
IV vs DV
Quasi-independent ______
Discrete and continuous _____
Constant
__________ - a characteristic or condition that is fixed and cannot change between individuals in a sample
Experimental method
__________- determine if one variable causes another variable to change, by manipulating one variable and controlling the research situation
Contains manipulation and control
IV and DV
Control and experimental conditions
Participant + enviornmental variables
Variables
___________ -
Participant: individual differences — age, gender, intelligence, education, etc.
Environmental: Time of day, weather, termperature, colour of walls, etc.
IV vs DV
Variables
___________ -
IV: stays same throughout experiment (ex: group category, sex)
Can be quasi-independent
DV: changes dependent on the research, specficially the experimental condition
Quasi-experimental/nonexperimental design
__________ - experiments where you cannot manipulate the IV and control of other extraneous variables
Ex: Comparing depression rates in North Central vs Wascana
Control condition
Experimental method
__________- there is NO experimental treatment
Ex: Placebo
Experimental condition
Experimental method
__________- there is an experimental treatment
Ex: Actual drug
Outliers
_________ - scores that are substantially distant from the other scores in a sample
Ex: Elon Musk’s wealth compared to everyone else
Descriptive research
__________ - involved measuring one or more seperate variables for intent of simply describing the individual variables
Grouping
________ - collapsing of scores into mutually exclusive classes defined by grouping intervals
Less cumbersome, greater comprehension
Info can be lost when categories/data are combined
Categories can be arbitrary
Cummulative proportion
________ - the proportion of all frequency counts of scores in a freq. dist. up to and including the upper real limit of a given score
Kurtosis
_________ - a statistical measure describing the "tailedness" or outliers in a probability distribution, indicating how heavy or light its tails are compared to a normal distribution
How tall a distribution is, gives idea into variability
Weighted mean
M = ∑X1 + ∑X2 / n1 + n2
Research design
________ - Statistics are important but the research design is far more important
Statistics cannot fix a poor research design
Causality is determined by the research design, not by the statistical analysis
Skewness
__________ - measures the asymmetry or lack of symmetry in a data distribution, indicating if data points are clustered more on one side of the mean than the other, unlike a perfect bell-shaped normal distribution where mean, median, and mode align