STATISTICS

STATISTICS

is the science of conducting studies to

collect
organize
summarize
analyze
raw conclusions from data
1. Why Study Statistics?

To be able to understand statistical studies in your field of expertise
To be able to conduct studies and analyze them statistically
To become better consumers & citizens – ex. when buying products, to understand governmental spending, etc.

Descriptive and Inferential Statistics

VARIABLE

is a characteristic or attribute that can assume different values – diff. age groups, grade levels, ACT scores, etc.

DATA (DATUM IS SINGULAR)

The values that a variable can assume

POPULATION

consists of all subjects (human or otherwise) that are studied (census).

SAMPLE

is a subset of the population (avoid bias - projecting your opinion into the study being conducted).

DESCRIPTIVE STATISTICS

consists of the collection, organization, summarization, - and presentation of data – ex. describing avg. income based on a census.

INFERENTIAL STATISTICS

consists of generalizing from samples to populations, performing estimations and hypothesis tests, determining relationships among variables, and making predictions.
uses probability, the chance of an event occurring – ex. what is the probability of rolling a 6 on a die cube.

HYPOTHESIS TESTING

area of inferential statistics that evaluates claims about a population based on information collected from samples – think the scientific method.
Statisticians use statistics to determine relationships among variables – ex. What is the relationship b/w smoking and lung cancer?
Statisticians also use statistics to make predictions – ex. based on sales of a certain make of car may determine how many of that type to order in the next month.
1. Variables and Types of Data

BOUNDARIES

is a numerical range in which a measurement would be included before rounding off.
For continuous variables, the boundaries are given as one more decimal place beyond the measurement and always end with a 5.

VARIABLE	RECORDED VALUE	BOUNDARIES
Length	15 centimeters (cm)	14.5-15.5 cm
Temperature	86° Fahrenheit (°F)	85.5-86.5 °F
Time	0.43 second (sec)	0.425-0.435 sec
Mass	1.6 grams (g)	1.55-1.65 g
Density	1.24 g/mL	1.235 – 1.245 g/mL

LEVELS OF MEASUREMENT (MEASUREMENT SCALES)

NOMINAL

categorical (names) – no order/ranking – Freshmen, Sophomore, Junior, Senior

ORDINAL

nominal, plus can be ranked (order) – places in a contest – 1^st, 2^nd, 3^rd, etc.

INTERVAL

ordinal, plus intervals are consistent, no consistent ratio, no real zero - ex. 0^oF doesn’t mean no heat at all – a temp. of 10^oF is not twice as warm as 5^oF

RATIO

interval, plus ratios are consistent, true zero – ex. length measurements – 0 cm is a true zero, a length of 2 cm and a length of 4 cm have a ratio of 1:2
1. Data Collection and Sampling Techniques

TYPES OF SURVEYS
1. Telephone survey

- not very costly, people tend to give more honest answers not being face-to-face with the person. However, it limits who can be surveyed as all people do not have phones and not all people answer the telephone.

1. Mailed questionnaire

- can cover a wider geographic area than telephone surveys and less expensive to conduct, respondents can remain anonymous. Drawbacks include that there tend to be a low # of people who actually send the survey back, and inappropriate answers to the questions. Furthermore, some people may have difficulty reading the survey.

1. Personal Interviews

- one can get more in-depth answers to the questions, but it is more costly than the other 2 types as people have to be trained in asking questions and recording responses. Furthermore, the interviewer may be biased in his/her selection of respondents.

SOME SAMPLING TECHNIQUES

RANDOM

-random number generator – all members of the population have an = chance of being selected for the study. See Table 1 – 3 on p. 13.

SYSTEMATIC

every k^th subject – if you need 50 samples for a study, and there are 2000 people, then 2000/50 = 40. So k =40. So the first person is selected randomly (let’s ay it’s person number 8, then the next person would be the on numbered 48, then 88, etc until you have 50 people.

STRATIFIED

divide population into “layers” – the subjects are divided into groups based on some trait, then samples are taken from each subgroup. Ex. We want to survey high school students to see how they feel about the lunch menu and we want to see if it differs from 9^th graders to 10^th graders, etc. The population is layered by grade level, then subjects are surveyed from each grade level.

CLUSTER

use intact groups – subjects are grouped based on same trait such as geographic location and then ALL members of the cluster are part of the survey. Ex. – Since the surveying a large sample of people from New York City regarding hospital care, and there are a lot of hospitals in NYC, the survey might be done only with patients from a few of the hospitals in NYC.

OTHER SAMPLING METHOD

CONVENIENCE SAMPLE

This is the use of subjects are that are “convenient” to sample. Ex – sampling people who come to the mall – may not represent the population as it may be done at a certain time of day, so not all customers who shop at the mall have an = chance of doing the survey.

VOLUNTEER SAMPLE OR SELF-SELECTED SAMPLE

the radio station is doing a survey and gives you a number to call regarding your opinion on a certain issue. In many cases, only people with strong opinions bother to call. Thus the sample does not represent the general population.

SAMPLING ERROR

occurs because samples are NOT perfect representations of the population from which they are selected.
For example, suppose you select a sample of students attending LSSU and find that in your survey, 43% are Native American. You then check the records at the admissions and find that, in reality, there are 44% Native Americans attending LSSU. There is a 1% sampling error.

NON-SAMPLING ERROR

occurs when the data has been obtained in such a way that the data is biased.
Ex – using a radar gun to get the speed of a moving car but the radar gun has not been calibrated so is actually reading values that are 5 mph higher than the actual speed OR conducting a survey of high school students regarding Homecoming and only sampling the Senior class.
1. Experimental Design

OBSERVATIONAL STUDY

the researcher merely observes and tries to draw conclusions based on the observations.

EXPERIMENTAL STUDY

The researcher manipulates the independent (explanatory) variable and tries to determine how the manipulation influences the dependent (outcome) variable
involves random assignment of subjects & treatments.

QUASI-EXPERIMENTAL STUDY

If random assignment of subjects is not possible, sometimes a quasi-experimental study is done – ex. using already intact groups in something like a school setting.

ADVANTAGE OF EXPERIMENTAL STUDIES

Researchers can decide how to select subjects & how to assign them to groups
Researchers can control or manipulate the independent variable under controlled conditions.

DISADVANTAGE OF EXPERIMENTAL STUDIES

They occur in unnatural settings (labs, special rooms, etc.) so what happens there may not apply to what happens in the natural environment
Hawthorne effect - This occurs when a subject participating in an experiment purposefully change their behavior in ways that affect the results – named after workers at the Hawthorne plant of Western Electric changed their behavior when they knew they were participating in a study & being observed (1924).
Another problem with statistical studies is that of a confounding variable or a lurking variable. This influences the dependent variable but cannot be separated from the independent variable.
Ex. – people who are placed on a specific exercise program may also change their diet so it’s hard to decide what had the greatest effect. Thus, diet is a confounding variable.
The placebo effect is another factor that affects statistical studies. This occurs when the control group responds in a similar way to the exp. group simply because they know they are participating in a study (like mind over matter) or because they are responding to clues from the researchers. This is reduced by doing blinding or double blinding.
Blocking also reduces variability. See p. 20.

Completely randomized design – both OTHER WAYS TO MINIMIZE VARIABILITY

subjects and treatments assigned randomly
Matched-pair design - subjects assigned based on certain traits like age, weight – ex. with studies of twins put one twin in the control group and the other in the exp. group.
Replication – so the same exp. in a diff. part of the country or do the exp. with college students one time and repeat it with adults who are not going to college.

Suspect Samples

Is the sample large enough?
How was the sample selected?
Is the sample representative of the population?

Ambiguous Averages

What particular measure of average was used and why? Mean, median, mode, midrange?

Changing the Subject

Are different values used to represent the same data?

Detached Statistics

One third fewer calories…….than what?

Implied Connections

Studies suggest that some people may understand what this statement means.
Ex. – Eating fish may help reduce cholesterol. There is not guarantee it will reduce cholesterol.

Misleading Graphs

Are the scales for the x-axis and y-axis appropriate for the data? Is the title misleading?

Faulty Survey Questions

Do you feel Brimley should build a new football stadium?
Do you favor increasing taxes so Brimley can build a new football stadium?