1/105
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Data
Raw information or facts that become useful when organized meaningfully, and can be qualitative or quantitative.
Data Management
Involves looking after and processing data, including checking and correcting raw data, preparing data for analysis, and documenting and archiving data and metadata.
Importance of Data Management
Ensures high-quality data for analysis, allows future use of data, enables efficient integration of results, and improves processing efficiency and data quality.
Documenting and Archiving Data
Ensures data and metadata are preserved for future reference and analysis.
How Good Data Management Improves Quality
Leads to improved processing efficiency and enhances the meaningfulness of the data.
Census
The procedure of systematically acquiring and recording information about all members of a population.
Why Census is Rarely Used
Due to high costs and the dynamic nature of populations.
Sample Survey
Involves selecting a subset within a population to gain knowledge about the population.
Advantages of Sampling
Includes lower costs, faster data collection, and improved accuracy and quality of data.
Experiment
Performed with controlled variables to study their effects on other observed variables; requires the possibility of replication.
Observational Study
Used when there are no controlled variables and replication is impossible, typically involving surveys.
Well-Designed Survey
A good survey must be representative of the population.
Randomness in Surveys
Surveys should incorporate a chance element, such as a random number generator, to ensure probabilistic results.
Neutral Wording
Crucial as it affects the answers subjects provide, leading to different responses based on phrasing.
Sampling Frame
The subset of items available for measurement from the population of concern.
Nonprobability Sampling
A method where some elements of the population have no chance of selection, based on criteria other than randomness.
Issue with Nonprobability Sampling
Can lead to bias due to the exclusion of some elements of the population, making it difficult to estimate sampling errors.
Nonprobability Sample
A sample where not all individuals have a chance of being selected, often due to practical limitations in calculating probabilities.
Example of Nonprobability Sampling
Visiting every household in a street and interviewing the first person to answer the door.
Convenience Sampling
A type of nonprobability sampling where subjects are selected based on their availability, such as customers in a supermarket.
Quota Sampling
A nonprobability sampling method where subjects are selected based on specified proportions, such as sampling 200 females and 300 males.
Nonresponse Effects
Can turn a probability design into a nonprobability design if the characteristics of nonresponse are not well understood.
Probability Sampling
A method where it is possible to determine which sampling units belong to which sample and the probability of each being selected.
Simple Random Sampling (SRS)
A probability sampling method where all samples of a given size have an equal probability of being selected and selections are independent.
Drawback of SRS
Can be vulnerable to sampling error, potentially resulting in a sample that does not reflect the population's makeup.
Systematic Sampling
A probability sampling method that involves dividing the population into strata and selecting elements at regular intervals from each stratum.
Advantage of Systematic Sampling
Helps to spread the sample over the list, making it efficient for sampling from databases.
Vulnerability of Systematic Sampling
Susceptible to periodicities in the list, which can lead to biased samples.
Difference Between Systematic and SRS
In systematic sampling, different samples of the same size can have different selection probabilities.
Stratified Sampling
A probability sampling method where the population is divided into distinct categories (strata) and each stratum is sampled as an independent sub-population.
Benefit of Stratified Sampling
Allows researchers to draw inferences about specific subgroups that may be lost in a generalized random sample.
Randomization in Systematic Sampling
Randomization of the starting point is essential to ensure it is a type of probability sampling.
Effect of Nonresponse on Sampling Probability
Nonresponse can modify each element's probability of being sampled.
Sample Variance in SRS
A good indicator of the population variance, aiding in estimating the accuracy of results.
Issue with SRS for Subgroup Analysis
Cannot accommodate the needs of researchers interested in specific subgroups.
How Stratified Sampling Solves SRS Issues
Allows independent sampling of distinct subpopulations, giving more accurate subgroup insights.
Example of Systematic Sampling
Selecting every 10th name from a telephone directory, starting from a randomly chosen point.
Impact of Periodicities
Can lead to samples that are not representative (e.g., selecting only odd-numbered houses).
Understanding Nonresponse Characteristics
Crucial for maintaining the integrity of a probability sampling design.
Selection Probabilities: Systematic vs. SRS
Systematic can vary between samples; SRS gives all samples equal probability.
When Stratified Sampling Is Most Effective
When variability within strata is minimized, variability between strata is maximized, and stratification variables are strongly related to the outcome variable.
Drawbacks of Stratified Sampling
Can increase the cost and complexity of the sample selection.
Cluster Sampling
Involves selecting a random sample of areas in the first stage and then a random sample of respondents within those areas in the second stage.
Advantages of Cluster Sampling
Can reduce travel and administrative costs.
Disadvantage of Cluster Sampling
If the chosen clusters are biased, the results about the population will be inaccurate.
Matched Random Sampling
Involves two samples where members are paired or matched explicitly by the researcher, such as identical twins or repeated measurements on the same subject.
Well-Designed Experiment
A good statistical experiment includes stating the purpose of research, including estimates regarding the size of treatment effects, alternative hypotheses, and the estimated experimental variability. Experiments must compare the new treatment with at least one standard treatment, to allow an unbiased estimate of the difference in treatment effects.
Blocking
The design of experiments using blocking is meant to reduce the influence of confounding variables by grouping similar experimental units together.
Random Assignment
The randomized assignment of treatments to subjects ensures that the treatment effects, if present, will be similar within each group.
Secondary Analyses
Examining the data set to suggest new hypotheses for future study.
Documentation and Presentation
Documenting and presenting the results of the study for transparency and reproducibility.
Hawthorne Effect
A change in behavior by the subjects of a study due to their awareness of being observed, as shown in the famous Hawthorne study.
Control Group
The group in an experiment that does not receive the treatment, used for comparison against the treatment group.
Experimental Unit
The individual or object to which a treatment is applied in an experiment.
Replication
Repeating measurements or observations in an experiment to reduce variability and allow findings to be confirmed by other researchers.
Confounding Variable
An extraneous variable that correlates with both the dependent and independent variables, possibly leading to a false conclusion of causality.
Placebo
An imitation treatment (e.g., a sugar pill) that lacks active ingredients but is identical in appearance to the actual treatment.
Placebo Effect
A simulated effect where a patient experiences improvement because they believe they are receiving treatment.
Blinding
A technique where subjects do not know whether they are receiving the actual treatment or the placebo, to prevent bias.
Blocking (in experiments)
The arrangement of experimental units into similar groups (blocks) to control for sources of variability not of primary interest, such as gender.
Completely Randomized Design
An experimental design where all units are randomly assigned to different levels of a primary factor without accounting for other variables.
Randomized Block Design
A collection of completely randomized experiments conducted within blocks to control for variability.
Matched Pairs Design
A special case of randomized block design where each block consists of two matched units, such as the same subject before and after treatment.
Chi-Square Test
Determines if there is a significant difference between expected and observed frequencies in one or more categories.
Chi-Square Goodness-of-Fit Test
Assesses whether a sample data matches a population.
Chi-Square Test for Independence
Compares two variables in a contingency table to determine if they are related.
Small Chi-Square Statistic
Indicates that the observed data fits the expected data well, suggesting a relationship.
Large Chi-Square Statistic
Indicates that the observed data does not fit the expected data well, suggesting no relationship.
Assumption 1 of Chi-Square Test
The sample must be a random sample.
Assumption 2 of Chi-Square Test
Observations must be independent, with one observation per subject.
Assumption 3 of Chi-Square Test
There should be no expected counts less than five.
Importance of Last Two Assumptions
They concern the expected counts, not the raw observed counts, which is crucial for the validity of the test.
Chi-Square Formula
𝜒² = Σ((O - E)² / E), where O is the observed frequency and E is the expected frequency.
Purpose of Goodness-of-Fit Test
Tests whether an experimentally obtained frequency distribution fits an expected frequency distribution based on theoretical probabilities.
Steps in Goodness-of-Fit Test
Null Hypothesis in Goodness-of-Fit Test (𝐻₀)
The population frequencies are equal to the expected frequencies.
Alternative Hypothesis (𝐻ₐ)
The null hypothesis is false.
Significance Level (𝛼)
The threshold for determining whether to reject the null hypothesis, commonly set at 0.05.
Degrees of Freedom (df) in Goodness-of-Fit
df = k - 1, where k is the number of categories.
Critical Value in Chi-Square Test
The value from the chi-square distribution table used to determine whether to reject the null hypothesis.
Decision Rule
Reject 𝐻₀ if 𝜒² is larger than the critical value.
Purpose of Test of Independence
To assess whether two factors are related or independent of each other.
Question Answered by Test of Independence
Whether Variable X is independent of Variable Y.
What the Test of Independence Cannot Do
It cannot determine how two variables are related, only if they are.
Expected Frequency Requirement
Each expected frequency must be at least 5 to ensure validity.
Observed Frequency Organization
Frequencies are written in a table where columns contain outcomes for one variable and rows for the other variable.
Null Hypothesis in Independence Test (𝐻₀)
There is no association between the two categorical variables.
Alternative Hypothesis (𝐻ₐ)
There is an association (the two variables are not independent).
Expected Frequency Formula (𝐸ᵣ,𝚌)
𝐸ᵣ,𝚌 = (Row Total × Column Total) / Grand Total.
Degrees of Freedom in Test of Independence
df = (Number of rows - 1) × (Number of columns - 1).
Example Significance Level
𝛼 = 0.10.
Example Critical Value
The critical value is 4.605 at 𝛼 = 0.10 with 2 degrees of freedom.
Example Statistic (1st Case)
The calculated chi-square statistic was approximately 14.07.
Conclusion Based on Critical Value
If the chi-square statistic exceeds the critical value, reject the null hypothesis and conclude a relationship exists.
Example Significance Level (2nd Case)
𝛼 = 0.01.
Critical Value (2nd Case)
16.812 at 𝛼 = 0.01 with 6 degrees of freedom.
Chi-Square Statistic (2nd Case)
Approximately 26.8.
Conclusion (2nd Case)
Enough statistical evidence to reject the null hypothesis; the proportion of births is not the same for each day.
Total Sample in 2nd Example
700 births.
Observed Frequency (Sunday)
65