Note

0.0(0)

Take a practice test

Chat with Kai

undefined Flashcards

Explore Top Notes

Studied by 6 people

Studied by 31 people

Factors and Multiples

Studied by 21 people

Health Psychology: Introduction

Studied by 14 people

AFPF casus 3: Marieke

Studied by 1 person

DIFFERENT ROCK DESCRIPTIONS

Studied by 7 people

UNIT 1 | Exploring One Variable Data

What Can We Learn From Data?

Any information you learn from a piece of sample data is called a statistic; whereas any information you learn from a population is called a parameter
We collect data from individuals (which can be anything, not just a person)
Variables are any characteristics that can change from individual to individual
- Two types of variables: categorical [takes on values that are a category name or group label + usually going to be characterized by a word or phrase; ex: eye color ethnicity, age, fondness of apples] or quantitative [takes on a numerical value that is measured or counted + usually going to be characterized by a number or value; ex: weight, how many candies are in a bag]

Representing a categorical variable with tables

Categorical data can be organized by tables that include the various categories of the study, frequency (the number of each category in the sample, ex: 15 trees, 17 trees, 23 trees), and relative frequency (the proportion of each category in the sample, ex: 0.258 of the trees, 0.360 of the trees, 0.191 of the trees)
- Relative frequencies or proportions are better representatives of data than simple frequencies
Two options for graphing categorical data: bar graphs [which can either display the frequencies or relative frequencies of a data set] and pie/circle graphs [which displays each slice as a proportion of the whole]
Distribution of data is what values the data takes on and how often
- Best way to talk about distribution of data is often to compare two data samples

Representing a quantitative variable with tables

Two types of quantitative data: discrete [takes on a countable number of value that are usually finite, usually whole numbers; ex: number of goals, number of candies, number of shirts] and continuous [takes on infinitely many values that cannot be counted, usually in decimal points with several decimal places; ex: weight of a frog, speed of a car, time to finish a puzzle]
Can be analyzed into a frequency or relative frequency table
- Since there are no categories, the data must be placed into “bins” of intervals that are all equal in size (ex: 10-20, 20-30, 30-40, 40-50, etc)
- Basically: how many of our individuals were between the range of each bins? The “how many” is going to be our frequency
Four types of graph can be made from quantitative data:
- Dot plot

Stem and Leaf plot

Histogram (usually preferred type of graph; NOT the same as a bar graph)

Cumulative graph

Describing the Distribution of a Quantitative Variable

There are four things that have to be mentioned:
- Shape – unimodal, bimodal, gap, clusters, skewed right, skewed left, symmetric, asymmetric
- Center – what the average value is
- Spread – how the data varies
- Outliers – unusual features
Example response: skewed left and unimodal with a center around 110 feet. The tree heights are spread from 20 to 140 feet but very little spread where majority of tree are from 120-140 feet

Measures of center
- Mean – sum of the data values divided by the number of values there are
  - Nonresistant
- Median – the middle value
  - Can be found in exact with an odd number of values; can be found by taking average of the two middle-most values together
  - Resistant
  - Put data in number order
- Roughly symmetric data = roughly equal mean and median
- Skewed left = mean is smaller than median
- Skewed right = median is smaller than median
Measures of position
- Percentile – interpreted as the value that contains p% of the data less than or equal to it (ex: 25th percentile = that position in the data + everything less than that)
  - First quartile (Q1) is the 25th percentile or median of the lower half of data
  - Median is 50th percentile
  - Third quartile (Q3) is the 75h percentile or median of the upper half of data
Measures of spread
- Range
  - Max value - min value
  - Easily influenced by outliers
- IQR
  - Q3 - Q1
  - Spread of the middle 50% of the data
  - Not influenced by outliers
- Standard deviation
  - Measure variability of the distribution and how far typical values are from the mean
  - High SD means most data is spread far from the mean
  - Low SD means most data is near the mean
  - Easily influenced by outliers
Outliers
- Two methods for determining outliers
  - Fence method: in which an outlier is a value greater than the upper fence or less than the lower fence
    - Upper fence: Q3 + (1.5*IQR)
    - Lower fence: Q1 - (1.5*1QR)
  - 2 Standard Deviation method: an outlier is a value that is located 2 or more standard deviations above or below the mean
    - x̄ + 2 standard deviations (anything above is outlier)
    - x̄ - 2 standard deviations (anything below is outlier)

Graphical representation of summary statistics

Five number summary: min, Q1, median, Q3, and max
Can be used to create a box plot to summarize the data

Box plots can also potentially show you the skew of a data set (box more to the right can indicate right skew, and vice versa)

Comparing Distributions of a Quantitative Variable

Compare shape, center, and spread + interpret them
BE SPECIFIC (don’t just say 35, say 35 trees)
Some sets of data can be modeled with a density curve [used to model a set of data to give insight as to what the actual population the data is representing could possibly look like]
- Ex: normal distribution curve

Empirical rule: in normal distributions, 68% of the population is within the 1 standard deviation of the mean, 95% of the population is within 2 standard deviations of the mean, and 99.7% is within 3 standard deviations of the mean
- Most all other data isn’t really necessary
Z score measure how many SDs above or below the mean could be (can be negative or positive)
- Formula for z score: Z = (x-μ)/σ
- Allows us to compare data better
- P(z [<, >, or =] z score); ex: P(z<1.11), P(z>1.11), P(-0.56 < z < 1.11), P(z=1.11)
- CALC FUNCTION FOR Z SCORES: 2nd → vars → normalcdf
  - Lower value: either z score or -99
  - Upper value: either z score or 99
  - μ: 0
  - σ: 1
- If given z score, you could find the value that it represents through calc function invNorm
  - area: z score in decimal
  - Or plug known numbers into the z score formula and calculate from there

Note

0.0(0)

Take a practice test

Chat with Kai

undefined Flashcards

Explore Top Notes

Studied by 6 people

Studied by 31 people

Factors and Multiples

Studied by 21 people

Health Psychology: Introduction

Studied by 14 people

AFPF casus 3: Marieke

Studied by 1 person

DIFFERENT ROCK DESCRIPTIONS

Studied by 7 people