Statistics in HC - Final Exam

0.0(0)
studied byStudied by 3 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/99

flashcard set

Earn XP

Description and Tags

Statistics

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

100 Terms

1
New cards
Sampling bias
Not all members of the population are equally likely to be selected
2
New cards
Sampling error
A misrepresentation of an association between an exposure and the outcome of a study due to using too small a sample size. The larger the sample, size, the more valid the research.
3
New cards
Simple random sampling
A quick and convenient way to select a random sample by giving each member of the population a number then using a method of unknowingly choosing a number for use in the sample.
4
New cards
Cluster sampling
Choosing a random sample by dividing the population into groups, and then selecting a set of groups.
5
New cards
Systematic sampling
A method of choosing a random sample by selecting every nth person until the desired amount has been chosen.
6
New cards
A study was done to determine the age, number of times per week, and duration residents use a local park. The first house in the neighborhood around the park was selected randomly and then every eighth house in the neighborhood around the park was interviewed. The **sampling method** used in this scenario is known as __?
Systematic
7
New cards
A study was done to determine the age, number of times per week, and the duration residents use a local park. The first house in the neighborhood around the park was selected randomly and then every eighth house in the neighborhood around the park was interviewed. The **population** is __?
All the residents in all of the houses in the neighborhood that use the local park.
8
New cards
A study was done to determine the age, number of times per week, and the duration residents use a local park. The first house in the neighborhood around the park was selected randomly and then every eighth house in the neighborhood around the park was interviewed. **Duration** is what type of data?
Quantitative continuous
9
New cards
A study was done to determine the age, number of times per week, and the duration residents use a local park. The first house in the neighborhood around the park was selected randomly and then every eighth house in the neighborhood around the park was interviewed. **Number of times per week** is what type of data?
Quantitative discrete
10
New cards
Define ratio.
A comparison of two or more **unrelated**, independent variables.
11
New cards
Define proportion.
A comparison of two **related** parts of a whole.
12
New cards
How are ratios and proportions similar and different?
They are similar because they are both comparative and describe a situation in terms of numbers and amounts. They are different because ratios compare independent variables while proportions compare one variable to the entire group.
13
New cards
If a table is given with percentages of students who use computers at home, and it is given that the number of students who have and use 1 computer at home is 87% and those who have and use 2 computers at home is 75%, what percent of students do not have any computer at home?
13% (100-87)
14
New cards
How can the flow of statistics be described?
From simple, unclustered, unsupervised to complex, clustered, and supervised.
15
New cards
Is statistics an exact science?
No
16
New cards
Is convenience sampling a method of random sampling?
No
17
New cards
What is the difference between a control group and a placebo group?
A control group is a group that receives no treatment or a standard treatment, while a placebo group receives a fake treatment that has no therapeutic effect. The purpose of a control group is to provide a baseline for comparison, while a placebo group is used to test the effectiveness of a new treatment.
18
New cards
Define lurking variables.
Variables that are not anticipated in an experiment that could have affected the results.
19
New cards
Descriptive statistics refers to a sample and not a proportion. True/False
True
20
New cards
Mean, median, mode, and percentiles are measures of __.
Central tendency
21
New cards
Standard deviation, range, span, and variance are measures of __.
The spread of the data/variation
22
New cards
What are some non-numerical examples of descriptive statistics?
Box & Whisker plots, quartiles, IQR, outliers, pie charts, bar charts, histograms, pareto charts
23
New cards
What is the difference is uses for bar charts and histograms?
Bar charts are used for fewer n, qualitative categories, and have separated bars. Histograms are used for over 100 n, frequency or relative frequency data, and the bars are connected.
24
New cards
What is a pareto chart?
A bar chart in an ascending or descending order.
25
New cards
Contingency table
A tool used by statisticians to organize data in order to determine frequency of different variables.
26
New cards
Equally likely events
Experiments or events that have the same theoretical probability of occurring.
27
New cards
Mutually exclusive events
Outcomes that cannot be shared between sample space subsets at the same time.
28
New cards
Probability
The likelihood of something happening; the extent that something is likely to occur.
29
New cards
Sample space
A set that is concerned with all of the possible, unique outcomes of a probability experiment.
30
New cards
What is the notation for an AND compound probability?
The notation for an AND probability is represented by the symbol "∩".
31
New cards
What is the notation for an OR compound probability?
The notation for an OR compound probability is represented by the symbol "∪".
32
New cards
Given a Venn diagram, to find the set of an OR compound probability, what do you do?
List all of the elements of all circles involved in the OR probability, and use only the elements that are **unique.**
33
New cards
What is the test for independent events?
P(A|B) = P(A)

P(B|A) = P(B)

P(A and B) = P(A) x P(B)

Only one has to be true to be independent.
34
New cards
What is the test for mutually exclusive events?
P(A and B) = 0
35
New cards
What is the assumption for events before testing for independence and mutual exclusivity?
Assume the events are **dependent** and are **not** **mutually exclusive.**
36
New cards
List 3 distinct applications for using the normal and/or standard normal distributions.

1. Finding outliers
2. Finding percentiles
3. Determining variance in a data set
37
New cards
The empirical rule accounts for a total of __ % of the area under a bell-shaped curve.
99\.7
38
New cards
Unlike a uniform distribution, where we can use geometry to determine the area (probability) of a continuous random variable occurring, we require tables, a calculator, or software to provide areas (probabilities) under bell-shaped normal or standard normal distributions. True/False
True
39
New cards
A positive z-score value tells us how many __ the value x is above (or to the right of) the population mean.
standard deviations
40
New cards
By convention, probabilities calculated from x- or z-scores are those areas located to the LEFT of our value of interest. True/False
True
41
New cards
One good reason for sketching bell-shaped curves is to assess probability direction (and predicted values) relative to our x- or z-value of interest. True/False
True
42
New cards
Calculations of percentiles are an additional application for the central limit theorem, but are easier to calculate via programmable calculator or software. True/False
True
43
New cards
The calculated value of the term σ/√n is called the standard error of the mean. (It’s the population standard deviation divided by the square root of the sample size). True/False
True
44
New cards
Given an adequately large history of sample invoices, n, we could conceivably use the central limit theorem for sums to predict the probability that a lab manager goes on a wild spending spree and exceeds his average monthly budget limit. True/False
True
45
New cards
The population mean lifetime for prosthetic hips is 15.25 years. A medical device company has implemented some improvements in the manufacturing process and believes that the lifetime is now linger. A sample study of 49 new devices reveals a mean lifetime of 18.75 years with a standard deviation of 8.6 years. Identify the relevant variables.
Sample mean (x̄): 18.75

Population mean (μ): 15.25

Sample size (n): 49

Population (in this case it’s actually the sample) standard deviation (σ): 8.6
46
New cards
The central limit theorem states that __ and __ form their own normal distributions as sample sizes, n, increases.
Means and sums
47
New cards
If in calculating the probability of an event occurring using z-scores, you need to calculate the probability that an event occurs is **greater than or equal to** a value, you will need to subtract the calculated probability from 1 to get your final answer. True/False
True
48
New cards
The standard error term for a single x value distribution is only the standard deviation, while the standard error term for a sampling distribution of means is the standard deviation divided by the square root on n. True/False
True
49
New cards
List 3 point estimates AND their corresponding population parameters.
1\.) x̄ > μ

2\.) s > σ

3\.) p’ > p̂
50
New cards
List 3 commonly-used confidence levels AND their corresponding statistical significance values.
1\.) 0.90 sig. = 0.1

2\.) 0.95 sig. = 0.05

3\.) 0.99 sig. = 0.01
51
New cards
A hospital is trying to cut down on emergency room waiting times. It is interested in the amount of time patients must wait before being called back to be examined. An investigation committee randomly surveyed 263 patients. The sample mean was 0.75 hours with a sample standard deviation of 0.25 hours. Identify the variables and their corresponding vales and the EBM that was used given that the 95% CI is (0.12, 1.38).
x̄: 0.75

s: 0.25

n: 263

EBM: 0.63 (0.75+x=1.38)
52
New cards
If given that z(sub)(a/2), do you divide the given critical value by 2?
No
53
New cards
What are the two common critical values (do not divide by 2)?
1\.) 95% CL > 1.96

2\.) 99% CL > 2.053
54
New cards
A hypothesis test of a single population proportion, p, muse meet the conditions for __.
A binomial distribution.
55
New cards
Hypothesis testing can be regarded as a 5-step process that can involve all of the following EXCEPT __.
Directly testing the alternative hypothesis statement.
56
New cards
The preset or preconceived a (alpha) is a “significance level” that is also the probability of a Type I error. True/False
True
57
New cards
Hypothesis testing for matched or paired samples requires that the two samples are **independent** of each other. True/False
False
58
New cards
The **decision** step of hypothesis testing depends upon direct comparisons between __.
A calculated **p-value** and the a (alpha) value.
59
New cards
If the p-value is lower than the alpha, do you reject or not reject the null hypothesis?
REJECT
60
New cards
When writing the null and alternative hypotheses, you must include symbols (or written words) that are the __ of each other. Give an example.
Opposite

Ho: x̄ __
61
New cards
When does a Type 1 error occur?
When the null hypothesis is actually true but rejected.
62
New cards
When does a Type II error occur?
When the null hypothesis is actually false but NOT rejected (or accepted).
63
New cards
A student’s t-distribution is best used when the sample size is __.
Less than 30.
64
New cards
Can a student’s t-distribution be used when the sample size, n, is greater than 30?
Yes
65
New cards
A student’s t-distribution requires a degree of freedom value, which is n-1. True/False
True
66
New cards
In a student’s t-distribution with as sample sizes increase, what happens to the sample standard deviation?
Become constant
67
New cards
What is the first step in quality control?

1. Set up two contradictory hypothesis statements.
68
New cards
What is the second step in quality control?

2. Collect and sort sample data (throw out outliers)
69
New cards
What is the third step in quality control?

3. Determine correct distributions to perform hypothesis test.
70
New cards
What is the fourth step in quality control?

4. Analyze sample data (perform probability calculations (p-value) and make a decision based on the p-value and the alpha value.
71
New cards
What is the fifth/last step in quality control?

5. Write a meaningful conclusion.
72
New cards
Homogeneity test
Compared distributions need not be known
73
New cards
Independence test
Expected values are calculated from a contingency table
74
New cards
Variance test
Comparing the squared standard deviation of both a sample and its population
75
New cards
Goodness-of-fit test
Differences between (theoretical or given) expected values and observed (or actual) values
76
New cards
Absolute Risk
A subtracted difference between an exposed group and an un-exposed group
77
New cards
Relative Risk
A ratio of ratios
78
New cards
Odds Ratio
Determines an adverse condition as a fold-difference
79
New cards
Relative Risk, Absolute Risk, and the Odds Ratio are three calculations that similarly consider __.
The numbers of an adverse condition relative to a control group and a ratio or proportion relative to a control group
80
New cards
Which of the following is NOT a Chi-square test?
Test of Two Variances
81
New cards
What are the Chi-square tests?
Goodness-of-Fit, Independence, Single Variance, Homogeneity
82
New cards
Chi-square hypothesis tests necessarily consider __.
degrees of freedom, sums of each data set’s differences, and differences between observed and expected values
83
New cards
The most common and easiest way to determine if a new data set may be appropriate for a linear regression would be to __.
Construct a scatter plot
84
New cards
Linear regression outliers are determined from the differences in **y, the dependent variable.** True/False
True
85
New cards
If a hypothesis test concludes the correlation coefficient, r, is significantly different from zero, we say the correlation coefficient is “significant”. This implies a significant linear relationship between x and y and therefore we may use a regression line to model its linear relation in the population. True/False
True
86
New cards
The value of r is always __.
\-1 __
87
New cards
Is the coefficient of determination the same as r?
No
88
New cards
The coefficient of determination, when expressed as a percent, represents the percent of variation in the independent variable x that can be explained by variation in the dependent variable y using a line-of-best-fit. True/False
False
89
New cards
Given y = 33.27 + 0.152, where it has been determined that y shall not be a cost beyond $175 and where x is a range between 0 and 900 pages, determining the cost of a textbook with x=755 pages is an example of __.
interpolation and prediction
90
New cards
A good rule of thumb with regard to graphically and/or numerically determining outliers is by looking for points/values beyond __.
2 standard deviations from a line-of-best-fit
91
New cards
A line-of-best-fit __.
Is known as the Least-Squares Line, necessarily underestimates actual y data values above it and overestimates actual y values below it, and necessarily will have residuals that could be positive or negative
92
New cards
Which Chi-square test does not require you to know the distribution beforehand?
Homogeneity
93
New cards
Which Chi-square tests require contingency tables?
Independence and homogeneity
94
New cards
What is the purpose of a Cohen’s d?
Testing the effect of sample size
95
New cards
Is extrapolation and estimation reliable?
No
96
New cards
What does the correlation value (Pearson) indicate?
The strength and direction of the slope
97
New cards
What is the formula for the coefficient of determination? How is it expressed?
r^2; as a percent of y that can be explained by x
98
New cards
When is ANOVA best used and why?
When there are 3+ groups to compare to avoid multiple Type I errors
99
New cards
If the null hypothesis in an ANOVA test is **false**, one or more differences between groups is significant among all data groups tested. True/False
True
100
New cards
With independent samples and using proportions, the population must be ______ times the sample size, assume a normal distribution and uses pooled samples.
10-20; normal