AP STATS

Charts-

Bar charts- (horizontal) best for category comparison(try to use when less than 7)

vertical bar chart best for RANKINGS like election votes from top to least

Histogram- Distribution of continuous variable (time, age, weight) Great to understand things like household income distribution and population distribution (ex.)

Each bar is called a class and the beginning of class(bar) is called lower limit and the end is called upper limit. If want to transfer to line graph use the middle of class.

Pie chart- Bad w/ showing multiple data points. If you have to use one never use it for more than 5 data points, and rank in order where biggest in top right hand corner w labels.

Scatterplot- Good for correlation or how things relate to each other. shows clustering trends or spot outliers. This one has 2 variables(x,y)

Dot plot- Only one variable and it has dots in between. only (x)

Box plot- Has minimum, Q1,Q2,Q3 and maximum. Q2 is the median and it has no mean in this plot.

Line chart(graph)- Shows how something changes over time. Ex. like stock market price or visitors to your website. Skew right(higher left to lower right AKA positive) Skew left(Higher right lower left AKA negative). Tail on right hand side. Mode is highest point, then median and hen mean from left to right on skew right (Mean bigger median and then mode). Opposite order for a skew left

Stem leaf plot- Dividing nubers into stem and leafs.

Definitions and notes-

Variable- charactericits or condittion that can change or take in different values. Most researches begin w question betw 2 variables

Entire group of individuals AKA population.

  • Sample - A subset of the population selected for analysis to draw conclusions about the entire group.

Types of variables-

Discrete- (such as class size) consist of indivisible categories

Continous variables- (such as time or weight) are infinetly divisible into whatver units a researcher may choose. For example, time.

Real limits- To define the units for a continuous variable, a researcher must use real limits which are boundaries located exactly half-way between adjacent categories.

Measuring Variables-

The process of measuring a variable requires a set of categories called a scale of measurement and classifies each individual into one category.

4 types of Measurement Scales-

  1. a nominal scale- unordered set of categories identified only by name. Only allow you to determine whether 2 ind. are same or diff

  2. Ordinal scale- ordered set of categories. Tell you the direction of difference between 2 individuals.

  3. Interval scale- ordered series of equal sized categories. identify direction of magnitude of difference. Zero point is located arbitrarily on interval scale

  4. Ratio scale- Interval scale where value of zero indicated none of the variable. Ratio measurement identify the direction and magnitude of differences and allow ratio comparison of measurements.

Correlational sutdies- The goal of a correlational study is to determine wheter there is a relationship betw. 2 variabes and describe relationship.

Sample Space: all the possible outcome of any event.

  • coin = 2, H, T

  • Die = 6(1,2,3,4,5,6)

  • 2 dice =36= 6×6

Probability of any even between 0 and 1. Possible —> certainty

Combination VS permutation

combination no order and permutation does matter.

nCr where n is total elements, r is how many elements selected

equation= n!/(n-r)!r!

(combination always smaller thatn pemutation.

Permutation= nPr

Equation = n!/(n-r)!

P(An(multiply)B) = P(A) * P(B) —> IND

=P(A|B) P(B)=P(A) P(B|A)

Conditional= whatever happens with something doesnt affect the others probability.

Notes and summary-

  1. Independent (multiplication rules) and/n (upside down u)

  • P(AnB) = P(A) P(B) = P(A|B) (P(B) = P(A) * P(B|A)

  1. Conditional Events: P(A|B) = Find the probability of A event given that B event already occured

  • P(A|B) = P(A) P(B|A) = P(B)

  • P(A|B) = P(AnB)/P(B)

  1. Complement: P(A) + P(not A) =1

P(A) = 1-P(NOT A)

  1. General formula (Addition Rule): or/U if not independent cant do it

  • P(AUB)= P(A)+P(B) -P(AnB)

  1. Mutually Exclusive (Disjoint) : No intersection

    • P(AUB) = P(A) + P(B) where P(AnB) =0

*1.2

1. Understanding Datasets and Variables
  • Statistics is the study of collecting, analyzing, interpreting, and presenting data.

  • A dataset is a structured collection of data. Each dataset consists of:

    • Individuals (Rows): The entities being studied (e.g., students in a class).

    • Variables (Columns): The characteristics of individuals that can change.

2. Two Main Types of Variables

Variables describe different attributes of individuals in a dataset.

A. Categorical Variables (Qualitative)

  • These variables represent group labels or categories, not numerical values.

  • Examples:

    • Hair color (Black, Brown, Blonde)

    • Blood type (A, B, AB, O)

    • Location (New York, California, Texas)

    • Grade level (Freshman, Sophomore, Junior, Senior)

  • Key Identifiers:

    • Can’t be mathematically measured.

    • Used for grouping or classification.

B. Quantitative Variables (Numerical)

  • These variables represent measurable quantities and have numerical values.

  • Examples:

    • Height (e.g., 5’8”, 170 cm)

    • Test Score (e.g., 85%, 92%)

    • Age (e.g., 16 years, 21 years)

    • Weight (e.g., 150 lbs, 68 kg)

  • Key Identifiers:

    • Can be mathematically analyzed (mean, median, range).

    • Represents actual numerical data.

3. Identifying Variables in a Dataset
  • Categorical Variable Example:

    • Student Name: A label that does not have numerical value.

    • Class Level: A category (Freshman, Sophomore, etc.).

  • Quantitative Variable Example:

    • GPA: A numerical value that can be measured and analyzed.

4. Vocabulary Recap
  • Dataset: Collection of data organized in rows (individuals) and columns (variables).

  • Variable: A characteristic that differs between individuals in a dataset.

  • Categorical Variable: Represents categories or labels (not numerical).

  • Quantitative Variable: Represents numerical values that can be measured.

*1.3

1. Understanding Categorical Data Representation
  • Categorical Variables represent data in group labels rather than numerical values.

  • Data for categorical variables is often summarized using tables.

2. Types of Tables Used
  • Frequency Table:

    • Displays the count (frequency) of observations in each category.

    • Example: A table showing the number of films in different genres.

  • Relative Frequency Table:

    • Displays the proportion (percentage) of observations in each category.

    • Calculated by dividing each category’s frequency by the total number of observations.

    • Useful for comparing proportions rather than absolute counts.

3. Converting Between Tables
  • From Frequency to Relative Frequency:

    • Divide each category's frequency by the total count.

  • From Relative Frequency to Frequency:

    • Multiply each relative frequency by the total count.

4. Using Tables for Data Interpretation
  • Tables help identify trends and make comparisons.

  • Example: If the relative frequency of premium olive oils is 0.55, it indicates that more than 50% of sales come from premium types.

5. Distribution of a Variable
  • A distribution lists possible values a variable can take and how often they occur.


Key Vocabulary

Frequency Table – A table showing counts of observations in each category.
Relative Frequency Table – A table showing proportions (percentages) of observations in each category.
Distribution – A list of all possible values of a variable and their occurrences.


Breakdown of These Notes

Concept

Definition

Example

Categorical Variables

Group labels (not numerical).

Film genres, laptop brands.

Frequency Table

Shows count of observations per category.

Number of action, drama, or comedy films.

Relative Frequency Table

Shows proportion (%) of observations per category.

Percentage of films in each genre.

Converting Tables

Can switch between frequency and relative frequency using division/multiplication.

Convert film counts to film percentages.

Distribution

How data values are spread across different categories.

Olive oil sales distribution among different grades.

*1.4

1. Understanding Bar Charts (Bar Graphs)
  • Bar charts visually represent categorical data using bars to show frequency (count) or relative frequency (percentage).

  • They help compare different categories effectively.

2. Constructing a Bar Chart
  • Step 1: Choose the axes:

    • The x-axis represents categories (e.g., days of the week, food items).

    • The y-axis represents frequency (count) or relative frequency (percentage).

  • Step 2: Label axes appropriately.

  • Step 3: Draw bars for each category, ensuring heights correspond to their frequencies.

  • Step 4: Compare and analyze the data presented.

3. Interpreting a Bar Chart
  • The tallest bar represents the most frequent category.

  • If no bar exceeds 50% relative frequency, no single category dominates the dataset.

  • Example:

    • In a survey about volunteer day preferences, Friday was most preferred, while Thursday was least preferred.

    • In a restaurant order analysis, Fajitas were most popular at Location 1, while Tacos were most popular at Location 2.


Key Vocabulary

Bar Chart (Graph) – A graph that uses bar height/length to display the frequency (or relative frequency) of categorical variables.


Breakdown of Notes

Concept

Definition

Example

Bar Chart

A graphical representation of categorical data using bars.

Graph showing employee preferences for a volunteer day.

Frequency (Count)

The number of times a category appears.

50 employees chose Friday, 30 chose Monday.

Relative Frequency (%)

The proportion of total observations in each category.

40% of employees chose Friday.

Axis Representation

X-axis → Categories, Y-axis → Frequency/Percentage.

Days of the week on x-axis, count of employees on y-axis.

Interpretation

Tallest bar = most frequent category.

Friday is the most popular day for the trip.

Notes:

Z= x-M/Omega where x represents the value of interest, M is the mean of the dataset, and Omega denotes the standard deviation.

Z= measure the position (Standard Normal), if uses z score, it becomes standard normal bell curve.

Percentile%= Part Prior / Whole

If you are 90th percentile, it means there are 90 percent of people behind you.

If using invNorm, you put the percentile and you weill get the Z score.

Every time convert to Z its standard

Normal- big numbers

  1. bell shaped curve

  2. Area under curve = 1 (100%) and P+Q=1

  3. Symmetrical

  4. P< or equal to always find area to the left

  5. To find the right subtract by 1

  6. Use real numbers

Standard Normal- asks for z score where u have to use equation

  1. Standard uses M=0 and sigma =1

  2. Standard is a position rather than a number

Central Limit Theorem (CLT)

The distribution of sample means will be normal if the sample size is large enough. This is true regardless of the distribution of the original population.

Skinnier graph is more accurate

Z = X-M/sigma

Z=X-M/sigma/squroot of n(sample size)

sampling distribution=CLT

A sampling distribution is normal if the population could be normal or abnormal

Chapter 6

Confidence Interval (CI): to capture the true population parameter particular confidence level. (95%).

CL=1-alpha (alpha is what is outside of the Confidence level.

Ex. (1-.095)/2 (because two sides u divide by 2)

(ONLY USE POSITIVE ON THESE)

3 formulas:

1. mean = x(bar) x(bar)+_ E where E is margin of error use when you dont know population mean

sigma is given

x(bar) +- Zc sigma/squroo of n use Z=critical

  1. when sigma is unknown use t-critical (calculator is dist 4) Degree of freedom is= n-1

    x(bar) +- tc Sx/squaeroo of n

  1. Proportion%

    p(hat) +-Zc squaroo of p(hat)*q(hat)/n

(calc use dist 3)

the half alpha is also known as reject zone.

Z test statistic- X-M/sigma/squroot of n(sample size)

T test statistic- X(bar)-M/Samples/squareroot(n)

robot