Statistical Literacy and Critical Thinking Notes

Parameter and Statistic

  • Citrix Security Survey:

    • Sample: 1,001 adults in the U.S.

    • Finding: 69% believe personal information theft is inevitable.

    • Question: Is 69% a statistic or a parameter?

Quantitative vs. Categorical Data

  • Quantitative Data: Numerical data that can be measured or counted.

  • Categorical Data: Data that can be grouped into categories.

  • Examples:

    • A. Platelet counts (Appendix B): Quantitative.

    • B. Cigarette brands (Appendix B): Categorical.

    • C. M&M colors (Appendix B): Categorical.

    • D. M&M weights (Appendix B): Quantitative.

Discrete vs. Continuous Data

  • Discrete Data: Data that can only take specific values (usually integers).

  • Continuous Data: Data that can take any value within a range.

  • Examples:

    • A. NBA player heights: Continuous.

    • B. Number of people surveyed: Discrete.

    • C. Time spent on smartphones: Continuous.

E-Cigarette Survey

  • Survey Details:

    • Sample: 36,000 adults.

    • Finding: 3.7% regularly use e-cigarettes.

  • Questions:

    • A. Identify the sample and population.

      • Sample: 36,000 adults surveyed

      • Population: all adults

    • B. Is 3.7% a statistic or a parameter?

      • Statistic because it represents a sample of the population.

    • C. Level of measurement of 3.7% (nominal, ordinal, interval, ratio).

      • Ratio

    • D. Discrete or continuous numbers of subjects?

      • Discrete

Statistic or Parameter Identification

  • Definitions:

    • Statistic: A value that describes a sample.

    • Parameter: A value that describes an entire population.

  • Examples:

    • 5. Lost wallets: 89% turned in (sample) - Statistic.

    • 6. Licensed drivers: 212 million (Federal Highway Admin.) - Parameter.

    • 7. Titanic deaths: 1503 passengers/crew (entire population) - Parameter.

    • 8. Baby birth weight: 3152.0g average (sample) - Statistic.

    • 9. Baby gender: 51% girls (sample) - Statistic.

    • Smartphone ownership: 72% (sample) - Statistic.

    • Super Bowl attendance: 70,081 people (all attendees) - Parameter.

    • Prisoners in the U.S.: 2,227,318 (U.S. Bureau of Justice) - Parameter.

Discrete or Continuous Data Set Determination

  • Examples:

    • Freshman 15 weight gains: Continuous.

    • Fraud detection inter-arrival times: Continuous.

    • House attendance (number of representatives): Discrete.

    • Students passing a course: Discrete.

    • Amazon sales processing times: Continuous.

    • Texting fatalities (number of fatalities): Discrete.

    • Students earning an A: Discrete.

    • Foot lengths: Continuous.

Levels of Measurement

  • Nominal: Categories only, no order (e.g., colors).

  • Ordinal: Categories with a meaningful order (e.g., rankings).

  • Interval: Equal intervals, but no true zero point (e.g., temperature in Celsius).

  • Ratio: Equal intervals with a true zero point (e.g., height, weight).

  • Examples:

    • College majors: Nominal.

    • Medical school rankings: Ordinal.

    • Movie ratings (0-5 stars): Ordinal.

    • Prison sentence lengths: Ratio.

    • Baseball World Series years: Interval.

    • Painting styles: Nominal.

    • State areas (km²): Ratio.

    • Body temperatures: Interval.

Level of Measurement Identification and Calculation Errors

  • Examples:

    • Super Bowl jersey numbers: Nominal (averaging is meaningless).

      • The average of jersey numbers (49.6) is meaningless because jersey numbers are nominal data and do not represent quantifiable values.

    • Social Security last four digits: Nominal (averaging is meaningless).

      • The average of the last four digits of social security numbers (4.7) are meaningless for the same reason as above.

    • Temperatures in Fahrenheit: Interval (ratio calculation is incorrect).

      • While Fahrenheit is interval data, it does not have a true zero point. Therefore, it is not correct to say it is twice as warm in France as it is in Anchorage

    • College ranks: Ordinal (differences may not be equal).

      • College ranks are ordinal data. It is not correct to assume the difference from Harvard to MIT is the same as Stanford to UC Berkeley.

Type of Countable Data

  • Categorization:

    • Discrete (finite values).

    • Discrete (infinite but countable values).

    • Continuous (infinite and not countable values).

  • Examples:

    • A. Exact foot lengths: continuous.

    • B. Shoe sizes: discrete (infinite but countable).

    • C. Number of albums sold: discrete (infinite but countable).

    • D. Monkeys typing lyrics: discrete (infinite but countable).

Directions in Degrees

  • Measurement: Directions measured in degrees (navigation).

  • Scale: North = 0°, East = 90°, South = 180°, West = 270°.

  • Level of Measurement: Interval (can have meaningful differences, but no true zero).

Collecting Sample Data

  • Key Concept: Appropriate data collection is crucial for meaningful analysis.

  • Simple Random Sample: Every sample of size nn has the same chance of selection.

Basics of Design of Experiments and Collecting Sample Data

  • Gold Standard: Randomness with placebo/treatment groups.

  • Placebo: Harmless, ineffective treatment.

  • Example: Salk Vaccine Experiment (1954)

    • 401,974 children randomly assigned.

    • Treatment group: 200,745 (Salk vaccine).

    • Placebo group: 201,229 (no drug).

    • Results: 33 in vaccine group, 115 in placebo group developed paralytic polio.

Experiments vs. Observational Studies

  • Experiment: Apply treatment, observe effects.

    • Individuals are called experimental units or subjects.

  • Observational Study: Observe and measure characteristics without intervention.

  • Lurking Variable: Affects study variables but is not included.

  • Example: Ice Cream & Drownings

    • Observational study error: Incorrectly concluding ice cream causes drownings.

    • Lurking variable: Temperature (increases both ice cream sales and swimming).

Design of Experiments: Replication, Blinding, Randomness

  • Replication: Repeating experiment on multiple individuals (large sample sizes).

  • Blinding: Subject doesn't know treatment/placebo.

    • Double-blind: Both subject and evaluator are unaware.

  • Randomness: Assigning individuals using random selection.

Sampling Methods

  • Systematic Sampling: Select a starting point, then every kthk^{th} element.

  • Convenience Sampling: Using easily accessible data.

  • Stratified Sampling: Divide population into subgroups (strata), sample from each.

  • Cluster Sampling: Divide area into sections (clusters), select some clusters, include all members.

  • Multistage Sampling: Combination of sampling methods.

  • Cluster Sampling mnemonic: Cluster = Class, choose All members.

Multistage Sample Design Example

  • U.S. Government Unemployment Statistics:

    1. Partition U.S. into 2,025 primary sampling units.

    2. Group PSUs into 824 strata.

    3. Select one PSU per stratum (probability proportional to population size).

    4. Randomly select ~60,000 households from selected PSUs.

    5. Interview about employment status.

Observational Studies: Cross-Sectional, Retrospective, Prospective

  • Cross-Sectional Study: Data collected at one point in time.

  • Retrospective Study: Data collected from the past.

  • Prospective Study: Data collected in the future from cohorts.

Experiments: Confounding and Control

  • Confounding: Cannot identify the specific factor causing an effect.

  • Control: Completely randomized experimental design (random assignment).

Designs of Experiments

  • Completely Randomized Design: Random assignment to treatment groups.

  • Randomized Block Design:

    1. Form blocks of similar subjects.

    2. Randomly assign treatments within each block.

  • Matched Pairs Design: Compare two treatment groups using matched pairs (before/after, twins).

  • Rigorously Controlled Design: Carefully assign subjects to ensure similarity; difficult to implement.

Sampling Errors

  • Sampling Error: Discrepancy between sample and population results due to random chance.

  • Non-Sampling Error: Human error (wrong data, biased questions, etc.).

  • Non-Random Sampling Error: Using a non-random sampling method.

Magnet Treatment of Pain Study

Study: Magnets for treating back pain
Methods: Visual analog scale for pain measurement.
Results: Matrix below.

Treatment

N

Mean ($\bar{x}$)

Standard Deviation (s)

Reduction after Magnet

20

0.49

0.96

Reduction after Sham

20

0.44

1.40

Questions:

  • Is it an experiment or observational study?:* Experiment because a treatment was applied.

  • What does double-blind mean?:* Neither the subject nor the experimenter knows who is receiving the treatment.

  • Replication: Used 20 subjects for each treatment.

  • What type of sampling was completed?:* Convenience, because the subjects were recruited from a hospital, and this may affect the quality of results.

Cell Phone Use: Hemispheric Dominance

Study: Association between ear used for cell phone calls/handedness.
Method: Online survey to 5,000 otology group members; 717 responses.

Questions:

  • Sampling method?:* Convenience sampling.
    Does the method of sampling appear to adversely affect the quality of the results?:* Yes, those who belong to the online otolaryngology group may be more likely to have ear problems.

  • Experiment or Observational Study?:* Observational study, as data was collected with no intervention.

  • Response rate?:* 7175000=14.34%\frac{717}{5000} = 14.34\%. The response rate appears to be low, and the problem with a low response rate is that the people who respond may have strong opinions or have a common trait. Therefore, they may not be a good representation of the population.

  • Sampling method assume that the population consists of all students currently in your statistics class.* Below are samples of 6 students using different sampling methods.

    • Simple random sample: Number each student, and randomly choose 6 numbers.

    • Systematic sample: If there are 30 students in the class, choose a random starting point, then select every fifth student.

    • Stratified sample: Divide the class into genders, then randomly select 3 members of each gender.

    • Cluster sample: Break up students into groups, select a single group, and then select all members of the group.

    • Convenience sample: Select the first six students to arrive at class.

Identifying Types of Sampling

  • Cormorant Density:* Systematic sampling - data collected at intervals of 20 km.

  • Sexuality of Women:* Convenience sampling - 4,500 responses from 100,000 questionnaires means people self selected.

  • UFO Poll:* Random sampling - telephone numbers were randomly generated.

  • Reported and Observed Results:* Convenience sampling - easy to interview and observe people in public location.

  • Books:* Simple random sample - randomly selecting pages from a book.

  • Acupuncture Study:* Random sampling - assigned to four different treatment groups.

  • Criminology:* Stratified sampling - selected felons from each category.

  • Deforestation Rates:* Systematic sampling - every one degree intersection of latitude and longitude.

  • Testing Lipitor:* Random sampling - randomly assigned to different treatment groups.

  • Exit Polls:* Cluster sampling - polling stations randomly selected, all voters leaving surveyed.

  • Literary Digest Poll:* Convenience sampling - people responding to a mailed questionnaire.

  • Highway Strength:* Systematic sampling - core samples collected at regular intervals.

Critical Thinking: Identifying Problems in Studies

  • Online News:* Observational Study + voluntary response bias - biased sample.

  • Physicians' Health Study:* Experiment - Good design (random selections, placebo).

  • Drinking and Driving:* Experiment - Ethical concerns - subjects driving drunk.

  • School Suspensions:* Statistic based on 3 students - Insufficient sample size.

  • Sleep Study:* College students only then expanded - Not representative of all adults.

  • Adkins Weight Loss Program:* Researcher called their subjects to report on their weight - honesty can always be a concern, which is why doctors physically weigh their subjects.

  • Crime Survey:* Sensitive questions - Subjects may be reluctant to answer.

  • Medications:* Low response rate - Could lead to biased results.

Types of Observational Studies

  • Nurses' Health Study II:* Prospective - study ongoing.

  • Heart Health Study:* Retrospective - examining past data.

  • Marijuana Study:* Cross-sectional - survey at one point in time.

  • Framingham Heart Study:* Prospective - ongoing since 1948.

Experimental Designs

  • Lunesta:* Matched pairs design - before and after measurements.

  • Lipitor:* Randomized block design - considering gender differences.

  • West Nile vaccine:* Completely randomized design - random assignment to vaccine or placebo.

  • HIV vaccine:* Matched pairs design - using twins.

Random vs. Simple Random Sample

A) in Major League Baseball, there are 30 teams: is a Random Sample because its a player from a team and not a simple random because it is 25 players.

B)or the same Major League Baseball population described in part, A - is a Random and Simple Random, both are a sampling of 25 players.

C) Major League - this is none because it is the 25 youngest players, this one is Bias.

Chapter Quick Quiz

  1. Survey: This makes no sense to get an average.

  2. The survey is nominal
    The survey is Continuous data.
    The survey is quantitative data.

  3. this survey is Nominal

  4. Birth: that is a statistic.

  5. voluntary response for the sample of birth weights is no.

  6. This is an observational study.

  7. The physics health study is a study where the subjects were treated with aspirin (placebo).

  8. The sampling in the statistical study is Simple Random Samples.

Review Exercises

  1. Online Medical Info: this is a voluntary response is no.

  2. Paying for First Dates:

    1. survey bias.

    2. this survey result is statistic.

    3. The survey constitutes an observational study.

  3. The oxygen treatments that were conducted: this is a test with both Double blind & Randomized.

  4. This Divorce and Margarine Study: This cannot conclude that either one is not the cause because 2 variables can cause another variable.

  5. List the sampling methods:

    1. Systematic(Lipitor pills).

    2. Stratified sampling: (gender survey).

    3. Simple random sample ( Manhattan list).

    4. Convenience sample:( statistics student survey).

    5. Cluster:(Major League Baseball Random Select).

  6. Defensive Marriage Act:
    Yes difference in the wording seems to affect the way that people respond, different wording should apply the same logic.

  7. State populations:

    1. Discrete data.

    2. The measurement on the numbers of residence in the state can be ratio.

    3. Random Sample (2 full-time workers in each of the 50 states)

    4. Cluster Sampling ( randomly elect states.)

    5. Convenience Sample.

  8. Percentages:

    1. Claim is wrong because you cannot make the argument that protein in bars contains 125% fat in chocolate.( misrepresentation).

    2. 687 people said that they liked to drive.

    3. 27.99% percent of respondents stated that driving is chore.

  9. Types of data:

    1. Albany -Interval type, Systematic Sampling

    2. States party - Ratio Sample, Stratified Sample , 50voters per state - 50/50 States Voters.

    3. Pollster Office - Ordinal Data , - Convenience Sampling.
      statistical: There are a lot of the procedure that is going to be statistically significant
      It has practical significance

  10. Calculator Warm Up

  11. IQ Score of calculator is 128.33 (Mean)

  12. Boys Streak is 0.000122(2.1)
    Lebron score is 4.42( LeBron High - over 2 or 3)

  13. The Body Temperature score is -1.65.is negative one, 65 .

  14. Determine Sample Size is 1.067(estimate).

  15. Sample Data is 72.

  16. Standard Deviation -is 33 = one 10,591.083 (calculate this for data).

  17. The standard temperature is +/-0.40= root of zero.

  18. Six - power -0.00001 (power 0)

  19. 8^^-12=one over X = answer is. 1617

  20. 2^-12 =0.2 exp 12 =answer is 2.4 * (10 ^12).

Techniology Project

Find the number of males and females,what is the percentage,how do the number compares with males.
Males are 51.2%,based number does appear that the sample accurately reflects the population.
c. Data is used for the Missing ( the 8 A.M and 12 A.M .

  1. High lighting the value delete the value, delete all those rows to move them.

  2. Sort: rearrange using a particular sorting.The manual deletion will be applied here through a sorted technique..

Critical Thinking: Students vs. Carpenter

  • Longevity Data:* Students have an average age of death of 20.2

  • Not many people you know who are students aged 50 years or older. The most reasonable time that someone will retire at at 50. The most fundamental study is different student as student is only for duration which might be a cause.