STATISTICS

STATISTICS

  • is the science of conducting studies to
  1. collect
  2. organize
  3. summarize
  4. analyze
  5. raw conclusions from data
    1. Why Study Statistics?
  • To be able to understand statistical studies in your field of expertise
  • To be able to conduct studies and analyze them statistically
  • To become better consumers & citizens – ex. when buying products, to understand governmental spending, etc.

Descriptive and Inferential Statistics

VARIABLE

  • is a characteristic or attribute that can assume different values – diff. age groups, grade levels, ACT scores, etc.

DATA (DATUM IS SINGULAR)

  • The values that a variable can assume

POPULATION

  • consists of all subjects (human or otherwise) that are studied (census).

SAMPLE

  • is a subset of the population (avoid bias - projecting your opinion into the study being conducted).

DESCRIPTIVE STATISTICS

  • consists of the collection, organization, summarization, - and presentation of data – ex. describing avg. income based on a census.

INFERENTIAL STATISTICS

  • consists of generalizing from samples to populations, performing estimations and hypothesis tests, determining relationships among variables, and making predictions.
  • uses probability, the chance of an event occurring – ex. what is the probability of rolling a 6 on a die cube.

HYPOTHESIS TESTING

  • area of inferential statistics that evaluates claims about a population based on information collected from samples – think the scientific method.
  • Statisticians use statistics to determine relationships among variables – ex. What is the relationship b/w smoking and lung cancer?
  • Statisticians also use statistics to make predictions – ex. based on sales of a certain make of car may determine how many of that type to order in the next month.
    1. Variables and Types of Data

BOUNDARIES

  • is a numerical range in which a measurement would be included before rounding off.
  • For continuous variables, the boundaries are given as one more decimal place beyond the measurement and always end with a 5.

VARIABLE

RECORDED VALUE

BOUNDARIES

Length

15 centimeters (cm)

14.5-15.5 cm

Temperature

86° Fahrenheit (°F)

85.5-86.5 °F

Time

0.43 second (sec)

0.425-0.435 sec

Mass

1.6 grams (g)

1.55-1.65 g

Density

1.24 g/mL

1.235 – 1.245 g/mL

LEVELS OF MEASUREMENT (MEASUREMENT SCALES)

NOMINAL

  • categorical (names) – no order/ranking – Freshmen, Sophomore, Junior, Senior

ORDINAL

  • nominal, plus can be ranked (order) – places in a contest – 1st, 2nd, 3rd, etc.

INTERVAL

  • ordinal, plus intervals are consistent, no consistent ratio, no real zero - ex. 0oF doesn’t mean no heat at all – a temp. of 10oF is not twice as warm as 5oF

RATIO

  • interval, plus ratios are consistent, true zero – ex. length measurements – 0 cm is a true zero, a length of 2 cm and a length of 4 cm have a ratio of 1:2
    1. Data Collection and Sampling Techniques
  1. TYPES OF SURVEYS
    1. Telephone survey

- not very costly, people tend to give more honest answers not being face-to-face with the person. However, it limits who can be surveyed as all people do not have phones and not all people answer the telephone.

    1. Mailed questionnaire

- can cover a wider geographic area than telephone surveys and less expensive to conduct, respondents can remain anonymous. Drawbacks include that there tend to be a low # of people who actually send the survey back, and inappropriate answers to the questions. Furthermore, some people may have difficulty reading the survey.

    1. Personal Interviews

- one can get more in-depth answers to the questions, but it is more costly than the other 2 types as people have to be trained in asking questions and recording responses. Furthermore, the interviewer may be biased in his/her selection of respondents.

SOME SAMPLING TECHNIQUES

RANDOM

-random number generator – all members of the population have an = chance of being selected for the study. See Table 1 – 3 on p. 13.

SYSTEMATIC

  • every kth subject – if you need 50 samples for a study, and there are 2000 people, then 2000/50 = 40. So k =40. So the first person is selected randomly (let’s ay it’s person number 8, then the next person would be the on numbered 48, then 88, etc until you have 50 people.

STRATIFIED

  • divide population into “layers” – the subjects are divided into groups based on some trait, then samples are taken from each subgroup. Ex. We want to survey high school students to see how they feel about the lunch menu and we want to see if it differs from 9th graders to 10th graders, etc. The population is layered by grade level, then subjects are surveyed from each grade level.

CLUSTER

  • use intact groups – subjects are grouped based on same trait such as geographic location and then ALL members of the cluster are part of the survey. Ex. – Since the surveying a large sample of people from New York City regarding hospital care, and there are a lot of hospitals in NYC, the survey might be done only with patients from a few of the hospitals in NYC.

OTHER SAMPLING METHOD

CONVENIENCE SAMPLE

  • This is the use of subjects are that are “convenient” to sample. Ex – sampling people who come to the mall – may not represent the population as it may be done at a certain time of day, so not all customers who shop at the mall have an = chance of doing the survey.

VOLUNTEER SAMPLE OR SELF-SELECTED SAMPLE

  • the radio station is doing a survey and gives you a number to call regarding your opinion on a certain issue. In many cases, only people with strong opinions bother to call. Thus the sample does not represent the general population.

SAMPLING ERROR

  • occurs because samples are NOT perfect representations of the population from which they are selected.
  • For example, suppose you select a sample of students attending LSSU and find that in your survey, 43% are Native American. You then check the records at the admissions and find that, in reality, there are 44% Native Americans attending LSSU. There is a 1% sampling error.

NON-SAMPLING ERROR

  • occurs when the data has been obtained in such a way that the data is biased.
  • Ex – using a radar gun to get the speed of a moving car but the radar gun has not been calibrated so is actually reading values that are 5 mph higher than the actual speed OR conducting a survey of high school students regarding Homecoming and only sampling the Senior class.
    1. Experimental Design

OBSERVATIONAL STUDY

  • the researcher merely observes and tries to draw conclusions based on the observations.

EXPERIMENTAL STUDY

  • The researcher manipulates the independent (explanatory) variable and tries to determine how the manipulation influences the dependent (outcome) variable
  • involves random assignment of subjects & treatments.

QUASI-EXPERIMENTAL STUDY

  • If random assignment of subjects is not possible, sometimes a quasi-experimental study is done – ex. using already intact groups in something like a school setting.

ADVANTAGE OF EXPERIMENTAL STUDIES

  • Researchers can decide how to select subjects & how to assign them to groups
  • Researchers can control or manipulate the independent variable under controlled conditions.

DISADVANTAGE OF EXPERIMENTAL STUDIES

  • They occur in unnatural settings (labs, special rooms, etc.) so what happens there may not apply to what happens in the natural environment
  • Hawthorne effect - This occurs when a subject participating in an experiment purposefully change their behavior in ways that affect the results – named after workers at the Hawthorne plant of Western Electric changed their behavior when they knew they were participating in a study & being observed (1924).
  • Another problem with statistical studies is that of a confounding variable or a lurking variable. This influences the dependent variable but cannot be separated from the independent variable.
  • Ex. – people who are placed on a specific exercise program may also change their diet so it’s hard to decide what had the greatest effect. Thus, diet is a confounding variable.
  • The placebo effect is another factor that affects statistical studies. This occurs when the control group responds in a similar way to the exp. group simply because they know they are participating in a study (like mind over matter) or because they are responding to clues from the researchers. This is reduced by doing blinding or double blinding.
  • Blocking also reduces variability. See p. 20.

Completely randomized design – both OTHER WAYS TO MINIMIZE VARIABILITY

  • subjects and treatments assigned randomly
  • Matched-pair design - subjects assigned based on certain traits like age, weight – ex. with studies of twins put one twin in the control group and the other in the exp. group.
  • Replication – so the same exp. in a diff. part of the country or do the exp. with college students one time and repeat it with adults who are not going to college.

Suspect Samples

  • Is the sample large enough?
  • How was the sample selected?
  • Is the sample representative of the population?

Ambiguous Averages

  • What particular measure of average was used and why? Mean, median, mode, midrange?

Changing the Subject

  • Are different values used to represent the same data?

Detached Statistics

  • One third fewer calories…….than what?

Implied Connections

  • Studies suggest that some people may understand what this statement means.
  • Ex. – Eating fish may help reduce cholesterol. There is not guarantee it will reduce cholesterol.

Misleading Graphs

  • Are the scales for the x-axis and y-axis appropriate for the data? Is the title misleading?

Faulty Survey Questions

  • Do you feel Brimley should build a new football stadium?
  • Do you favor increasing taxes so Brimley can build a new football stadium?