Chapter 1: Data Collection

1. Data Collection

1.1 Intro to the Practice of Statistics

  • Statistical Thinking: Understanding large amounts of data and information by applying mathematical formulas.

    • Formulas help understand general tendencies but do not yield precise answers.

    • Statistics deals with approximations, not exact calculations.

  • Four-Part Definition of Statistics:

    1. Collection of information.

    2. Organizing and summarizing that information.

    3. Drawing conclusions about that information.

    4. Providing a measure of confidence in any conclusion (do the numbers reflect reality).

  • Correlation vs. Causation:

    • Correlation does not imply causation.

      • Example: Cell phone usage and brain tumors. More research is needed to draw conclusions about causation.

  • Lurking Variables:

    • A missing variable that skews the data.

      • Example: Classical music during pregnancy and academic benefit for the child. The lurking variable could be other factors like socioeconomic status, parental involvement, etc.

  • Data Collection:

    • When putting statistics to practice, consider the problem of data collection.

      • Example: Determining whether Denton is mostly Democratic or Republican.

    • Collecting a sample from the general population is more practical than asking every single resident.

Definitions

  • Population: The entire group of individuals to be studied.

  • Sample: A subset of the population.

  • Individual: A single member of a population.

  • Statistic: A numerical summary of a sample.

  • Parameter: A numerical summary of a population.

  • Descriptive Statistics: The process of analyzing data using numbers, tables, and graphs.

  • Inferential Statistics: The process of extending the conclusions of the sample to the population with some measure of reliability.

  • Example:

    • Sociologist wants to know the rate of recidivism of prison convicts from Huntsville prison released in 2011.

    • Population: Convicts from Huntsville prison released in 2011.

    • Sample: 20 convicts randomly selected.

    • 12 out of 20 return to prison.

    • Statistic: 12/20 = 60\%. Inference: We are 95% confident that the recidivism rate is between 55% and 72%.

  • Variables: Characteristics of the individuals of the population

    • Examples: height, income, ethnicity, left or right-handed, etc.

Types of Variables

  • Qualitative Variables: Allow for classification of individuals based on attributes or characteristics.

  • Quantitative Variables: Numerical measures of individuals. Subtraction and addition can be applied to quantitative variables

  • Examples:

    • Determine whether variable is qualitative or quantitative:

      • (a) weight - quantitative

      • (b) eye color - qualitative

      • (c) income bracket - qualitative

      • (d) gender - qualitative

      • (e) temperature - quantitative

      • (f) commute time - quantitative

      • (g) zip code - qualitative

      • (h) number of customers that use the drive through at a fast food restaurant - quantitative

  • Focus mainly on quantitative variables.

Types of Quantitative Variables

  • Discrete: Represents information which is countable (one by one).

    • Comes from the Latin "discretes" which means separated.

  • Continuous: Represents information which can divided indefinitely, meaning uninterrupted change.

  • Examples:

    • Determine whether variable is discrete or continuous:

      • (a) time - continuous

      • (b) temperature - continuous

      • (c) number of customers - discrete

      • (d) velocity - continuous

      • (e) number of houses sold each year - discrete

      • (f) bacteria population in a culture at any given time - continuous

      • (g) miles per gallon - continuous

Levels of Measurement of a Variable

  1. Nominal Level of Measurement:

    • Variable used to name, label, or categorize without attention to rank or order.

  2. Ordinal Level of Measurement:

    • Variable has properties of nominal level variables, but rank and order are important.

  3. Interval Level of Measurement:

    • Variable has properties of ordinal level variables, but addition and subtraction can be performed on the variables.

  4. Ratio Level of Measurement:

    • Variable has properties of ordinal level variables, but ratios and percentages have meaning.

    • Multiplication and division can be performed on the variables.

  • Examples:

    • Name the level of measurement for each variable:

      • (a) Nationality - Nominal

      • (b) Letter grade - Ordinal

      • (c) Time - Ratio

      • (d) Crime rates - Ratio

Practice Problems

  1. The legal profession conducted a study to determine the percentage of cardiologists who had been sued for malpractice in the last ten years. The sample was randomly chosen from a national directory of doctors. Identify the individuals in the study.

    • A) each cardiologist selected from the directory

  2. In a survey conducted in the town of Atherton, 22% of adult respondents reported that they had been involved in at least one car accident in the past ten years.

    • B) statistic

  3. 26.2% of the mayors of cities in a certain state are from minority groups.

    • B) parameter

  4. A study of 1600 college students in the city of Pemblington found that 10% had been victims of violent crimes.

    • B) statistic

  5. The number of seats in a school auditorium

    • A) quantitative

  6. The numbers on the shirts of a boy's football team

    • B) qualitative

  7. The low temperature in degrees Fahrenheit on January 1st in Cheyenne, Wyoming

    • B) continuous

  8. The number of pills in an aspirin bottle

    • B) discrete

  9. The medal received (gold, silver, bronze) by an Olympic gymnast

    • B) ordinal

  10. The musical instrument played by a music student

    • B) nominal

  11. The year of manufacture of a car

    • A) ratio

  12. Weight of rice bought by a customer

    • A) ratio

Homework

p. 11 #1-50, 57, 60