1/105
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Data Set
Includes information about individuals, which can be a person, place, object, or animal.
Individual
In a data set, it refers to the entity being measured, which may not always be a person.
Variable
Characteristics that are evaluated or collected in a data set.
Example of Variable
In a zoo, the individual is one tiger and the variable is the number of pounds of meat consumed in one day.
Categorical Variables
Labels or categories into which data are grouped, often displayed using pie charts and bar graphs.
Quantitative Variables
Variables that can be quantified, such as height, as opposed to subjective values.
Caution in Data Classification
Not all numerical data are quantitative; for example, area codes and social security numbers should not be averaged.
Statistical Data Displays
Organized visual representations of data to enhance understanding.
Pie Chart
Displays the distribution of a categorical variable as slices of a whole, sized by percentage.
Bar Graph
Displays each category of data as a bar, allowing for quick visual comparisons.
Limitations of Graphs
Too many categories can make pie charts and bar graphs confusing and unusable.
Two-Way Table
Also known as a 'contingency table', used to display categorical data that require a more complex graph.
Level of Concern Table Example
A two-way table showing the responses of 480 adolescents regarding their concern over privacy on social media.
Row Variable
In a two-way table, it represents one of the categorical variables, such as level of concern.
Column Variable
In a two-way table, it represents another categorical variable, such as gender.
Total Responses
In the level of concern table, the total number of adolescents surveyed is 480.
Not Concerned at All
In the table, 26 adolescents (11 females and 15 males) reported not being concerned at all.
A Little Concerned
In the table, 99 adolescents (45 females and 54 males) reported being a little concerned.
Moderately Concerned
In the table, 115 adolescents (65 females and 50 males) reported being moderately concerned.
Somewhat Concerned
In the table, 155 adolescents (90 females and 65 males) reported being somewhat concerned.
Very Concerned
In the table, 85 adolescents (44 females and 41 males) reported being very concerned.
Marginal Distribution
The distribution of a categorical variable shows how often each outcome occurred, represented in a two-way table.
Conditional Distributions
Distributions that include only members of a specified variable within a table, showing how the variables in a data set relate to one another.
Mosaic Plot
A modified segmented bar graph where the width of the bars is proportionate to the size of the group, allowing comparison of both percentages and sizes among groups.
Statistical Relationships
An association between variables when there appears to be a statistical relationship between the values of each.
Independent Variables
When the distribution of one variable is the same for all categories of another, indicating no association between these variables.
Frequency Distribution Table
A table that converts raw data to percentages or proportions to show how variables relate within a group.
Total Column
The column in a two-way table that shows the total for each row or column as part of the entire sample.
Sample Size
The number of respondents in a survey, which can affect the ability to make equal comparisons using raw data.
Percentage Calculation
The process of converting individual values to a percentage of the whole column to obtain conditional distributions.
No. of Females
The count of female respondents in a survey, used to calculate conditional distributions.
No. of Males
The count of male respondents in a survey, used to calculate conditional distributions.
Survey Results
Data collected from respondents regarding their privacy concerns about social media use.
Adolescents' Privacy Concerns
The focus of the survey, examining how adolescents feel about privacy in relation to social media.
Column Total
The total number of responses in a specific column of a two-way table.
Row Responses
The individual responses recorded in each row of a two-way table.
Group Comparison
The analysis of how individuals responded within their groups and between two groups.
Statistical Independence
A concept where the outcome of one variable does not affect the outcome of another variable.
Alternative hypothesis
the hypothesis that sample observations are influenced by some nonrandom cause
Association
a relationship between two variables where the values of one variable occur in combination with specific values of the other variable
Bias
overestimating or underestimating the desired response in a survey consistently
Binomial distribution
the probability distribution of a binomial random variable
Binomial random variable
the number of successes, x, in repeated trials of a binomial experiment
Bivariate data
quantitative data that has two variables; often represented using a scatterplot
Blinding
the practice of not telling subjects whether they are receiving a treatment or placebo
Blocking
method of dividing subjects into subgroups called blocks, such that the variability within blocks is less than the variability between blocks
Categorical variable
places an individual into a category or group
Causation
cause-and-effect relationship between or among variables
Census
a survey that collects information from every member of a population
Central limit theorem
when n is large, the sampling distribution of the sample mean is approximately Normal
Chi-square for homogeneity
a test to determine whether two or more categorical distributions are equal
Chi-square test for independence
a test to determine whether there is an association between two categorical variables
Conditional distribution
the distribution of values of a categorical variable among one specified group of individuals described in a two-way table; each group will have a separate conditional distribution
Confidence interval
range of values that describes the amount of uncertainty associated with a sample statistic of a population parameter
Confounding variables
variables that affect the response variable under consideration
Continuous random variables
random variables with outcomes that can take on any numeric value within the range of values
Convenience
choosing only those individuals for a survey who are easy to access
Critical value
a factor used to compute margin of error
Degrees of freedom
the number of independent observations in a sample less the number of population parameters that must be estimated from sample data
Density curve
a curve that is on or above the horizontal axis; the total area underneath = 1, representing 100% of observations
Discrete random variables
random variables with a countable number of outcomes
Dotplot display
a graphic display of data for visual comparison of frequency within categories
Event
any outcome or collection of outcomes that is a subset of the sample space
Expected value
the sum of the products of each possible value and the probability that it occurs
Extrapolation
the use of a regression line to predict values that are outside the original interval of the explanatory variable; these predictions are often inaccurate or unrealistic
Geometric random variable
the number of trials, Y, takes to get a success
Independent event
when the occurrence of one event does not change the probability that the other event will happen
Individuals
objects described by a set of data
Inferential statistics
statistical data from a sample that are used to draw conclusions about the entire population
Law of large numbers
as more and more repetitions of any chance process occur, the proportion of times a specific outcome will happen approaches a single value
Least-squares regression line
the line that makes the sum of the squared residuals as small as possible
Lurking variables
variables other than the independent variable and the dependent variable that may affect experimental outcomes
Margin of error
the maximum expected difference between the true population parameter and a sample estimate of that parameter
Matched pairs design
experimental method where subjects are grouped into pairs based on a blocking variable, then randomly assigned to treatment or control
Mode
the number that occurs most frequently in a set
Nonresponse
when an individual chosen for the sample can't be contacted or refuses to participate
Normal distribution
shown by a Normal density curve with the mean, median, and mode at the center of the curve and described in the format N(µ,σ)
Null hypothesis
the hypothesis that sample observations result purely from chance
Observational study
a study that observes individuals and measures variables of interest but does not attempt to influence the outcome
P-value
a value that measures the strength of the evidence in support of the null hypothesis
Percentile
the lowest score that is greater than a certain percentage of the scores; for example, the 30th percentile indicates that 30 percent of the data fall below that number
Placebo effect
a subject's positive response to receiving a placebo when no treatment has actually been applied
Point estimate
the statistic itself, such as the sample mean, sample median, or sample proportion given as an estimate of the population parameter of interest
Pooling
the name given to a technique used to obtain a more precise estimate of the standard deviation of a sample statistic by combining the estimates given by two (or more) independent samples
Population
the entire group of individuals about which we want information
Power of the test
the probability a test will reject the null hypothesis at a chosen significance level α when the specified alternative value of the parameter is true
Probability
the likelihood that an event will occur; the mathematics of chance
Quantitative variable
measures a specific numerical value that can be used for analysis
Random sample
a group or set chosen in a random manner that allows for each member of the population to have an equal chance of being selected
Regression line
a line that describes how the response variable changes as the explanatory variable changes; can be used to make predictions about a relationship
Representative sampling
a group or set chosen to replicate characteristics of a larger population
Residual
the difference between the observed value of the response variable and the value predicted by the regression line
Robust
procedures work even if there is a violation of the condition of Normality
Sample
part of the population from which information is collected; used to draw conclusions about the entire population
Sampling distribution
a distribution of all the proportions (or means, depending on what is being calculated) from all possible samples
Sampling variability
statistical information from a statistic that varies as random sampling is repeated
Simpson's Paradox
a paradox in statistics in which a trend appears in different groups of data but is reversed when those groups are combined
Simulation
a random process of numerous trials used to estimate probability and imitate chance behavior
Standard deviation
the average distance of a value from the mean of the data
Standard Normal table
a table of areas under the Standard Normal curve; the table entry for each value of z is the area under the curve to the left of that z-score