Note

0.0(0)

Take a practice test

Chat with Kai

undefined Flashcards

Explore Top Notes

Chapter 11: Globalization and the Future of Comparative Politics

Studied by 28 people

Studied by 20 people

Sociological Perspective on Education

Studied by 9 people

Ch 3. Database systems, data centers, and business intelligence

Studied by 50 people

Unit 3: National Income and Price Determination

Studied by 8373 people

US HISTORY STUDY GUIDE

Studied by 5 people

Stats

STAT 118 Study Set

Key Terms and Concepts

1. Definitions of Statistics

Statistics: The science of collecting, organizing, and interpreting data.
Individuals: The objects on which data are collected (e.g., students, states, hospitals).
Variables: Characteristics recorded about individuals.

2. Types of Variables

Quantitative Variables: Numeric values with meaningful operations (e.g., height, weight).
Categorical Variables: Groups or categories (e.g., gender, college type).
Identifier Variables: Unique values assigned to individuals (e.g., ID numbers).

3. Data Visualization

Bar Charts & Pie Charts: Represent categorical data.
Histograms: Display quantitative data distributions.
Boxplots: Compare distributions and identify outliers.
Dotplots & Density Plots: Represent distributions and trends.

4. Measures of Center

Mean (x̄): Sum of all values divided by the number of values.
Median (m): The middle value when data is ordered.

5. Measures of Spread

Range: Difference between the largest and smallest values.
Interquartile Range (IQR): The difference between Q3 (75th percentile) and Q1 (25th percentile).
Standard Deviation (S): Measures variation around the mean.

6. Standardization

Z-Score: Z=X−μσZ = \frac{X - \mu}{\sigma} (Measures how far a value is from the mean in standard deviations)
68-95-99.7 Rule: Describes Normal Distribution percentages.

7. Relationship Between Variables

Explanatory Variable: The variable suspected to influence another.
Response Variable: The variable that is measured as an outcome.

8. Simpson’s Paradox

When a relationship between two variables reverses due to a lurking variable.

Important R Functions

1. Data Visualization

# Bar Chart
bargraph(~Variable, data = Dataset)

# Histogram
histogram(~Variable, data = Dataset)

# Boxplot
bwplot(Variable ~ Category, data = Dataset)

2. Summary Statistics

# Mean and Median
mean(Dataset$Variable)
median(Dataset$Variable)

# Standard Deviation
sd(Dataset$Variable)

# Interquartile Range
IQR(Dataset$Variable)

3. Normal Distribution

# Probability below a value
pnorm(value, mean = mu, sd = sigma)

# Probability above a value
1 - pnorm(value, mean = mu, sd = sigma)

# Finding percentiles
qnorm(percentile, mean = mu, sd = sigma)

STAT 118 Exam Study Sheet (Chapters 1-5)

Key Terms and Concepts

1. Types of Variables

Categorical (Qualitative) Variables: Describe qualities or categories (e.g., gender, college type).
Quantitative Variables: Numeric values with meaningful operations (e.g., height, weight).
Nominal Variables: Categories without a meaningful order (e.g., colors, names).
Ordinal Variables: Categories with a meaningful order but no consistent difference (e.g., ranking, education level).
Natural Variables: Ordered with meaningful differences (e.g., temperature, income).

2. Proportion vs. Percent

Proportion: A fraction representing part of a whole (e.g., 0.25 or 1/4).
Percent: A proportion multiplied by 100 (e.g., 0.25 = 25%).

3. Measures of Center & Spread

Mean (x̄): Average of data.
Median (m): Middle value when ordered.
Range: Max - Min.
Interquartile Range (IQR): Q3 - Q1 (middle 50% of data).
Standard Deviation (S): Measures spread around the mean.
Issues with SD for Outliers: SD is sensitive to outliers; extreme values heavily influence it.

4. Histograms & Distribution Shapes

Symmetric (Bell-shaped): Mean ≈ Median.
Right-skewed: Mean > Median.
Left-skewed: Mean < Median.
Uniform: Equal frequency across bins.
Bimodal: Two peaks.

5. Standardizing, Shifting & Scaling

Standardizing (Z-score): Z=X−μσZ = \frac{X - \mu}{\sigma} (Tells how many SDs a value is from the mean)
Shifting: Adding/subtracting a constant affects mean but not spread.
Scaling: Multiplying/dividing a constant affects both center and spread.

6. Normal Distribution & SD Bell Curve Percents

68-95-99.7 Rule:
- 68% within 1 SD
- 95% within 2 SDs
- 99.7% within 3 SDs

7. Using pnorm vs. qnorm in R

pnorm(x, mean, sd): Finds the probability below a value.
1 - pnorm(x, mean, sd): Finds the probability above a value.
qnorm(percentile, mean, sd): Finds the value corresponding to a given percentile.

8. Correlation (R-Value)

Measures the strength of a linear relationship between two quantitative variables.
Ranges from -1 to 1:
- R = 1: Perfect positive correlation.
- R = -1: Perfect negative correlation.
- R = 0: No linear correlation.

9. Simpson’s Paradox

A trend in different groups reverses when combined due to a lurking variable.
Example: A hospital appears to have a higher death rate overall, but when split by patient condition, it actually has a lower death rate in each category.

Key R Commands

Summary Statistics:

mean(dataset$variable)
median(dataset$variable)
sd(dataset$variable)
IQR(dataset$variable)

Histograms & Boxplots:

histogram(~ variable, data = dataset)
bwplot(variable ~ category, data = dataset)

Normal Distribution Calculations:
```
pnorm(x, mean, sd)   # Probability below x
1 - pnorm(x, mean, sd)   # Probability above x
qnorm(percentile, mean, sd)   # Value at given percentile
```
STAT 118 Exam Study Sheet (Chapters 1-5)
Key Terms and Concepts
1. Types of Variables
- Categorical (Qualitative) Variables: Describe qualities or categories (e.g., gender, college type).
- Quantitative Variables: Numeric values with meaningful operations (e.g., height, weight).
- Nominal Variables: Categories without a meaningful order (e.g., colors, names).
- Ordinal Variables: Categories with a meaningful order but no consistent difference (e.g., ranking, education level).
- Natural Variables: Ordered with meaningful differences (e.g., temperature, income).
2. Proportion vs. Percent
- Proportion: A fraction representing part of a whole (e.g., 0.25 or 1/4).
- Percent: A proportion multiplied by 100 (e.g., 0.25 = 25%).
3. Measures of Center & Spread
- Mean (x̄): Average of data.
- Median (m): Middle value when ordered.
- Range: Max - Min.
- Interquartile Range (IQR): Q3 - Q1 (middle 50% of data).
- Standard Deviation (S): Measures spread around the mean.
- Issues with SD for Outliers: SD is sensitive to outliers; extreme values heavily influence it.
4. Histograms & Distribution Shapes
- Symmetric (Bell-shaped): Mean ≈ Median.
- Right-skewed: Mean > Median.
- Left-skewed: Mean < Median.
- Uniform: Equal frequency across bins.
- Bimodal: Two peaks.
5. Standardizing, Shifting & Scaling
- Standardizing (Z-score): Z=X−μσZ = \frac{X - \mu}{\sigma} (Tells how many SDs a value is from the mean)
- Shifting: Adding/subtracting a constant affects mean but not spread.
- Scaling: Multiplying/dividing a constant affects both center and spread.
6. Normal Distribution & SD Bell Curve Percents
- 68-95-99.7 Rule:
  - 68% within 1 SD
  - 95% within 2 SDs
  - 99.7% within 3 SDs
7. Using pnorm vs. qnorm in R
- pnorm(x, mean, sd): Finds the probability below a value.
- 1 - pnorm(x, mean, sd): Finds the probability above a value.
- qnorm(percentile, mean, sd): Finds the value corresponding to a given percentile.
8. Correlation (R-Value)
- Measures the strength of a linear relationship between two quantitative variables.
- Ranges from -1 to 1:
  - R = 1: Perfect positive correlation.
  - R = -1: Perfect negative correlation.
  - R = 0: No linear correlation.
9. Simpson’s Paradox
- A trend in different groups reverses when combined due to a lurking variable.
- Example: A hospital appears to have a higher death rate overall, but when split by patient condition, it actually has a lower death rate in each category.
Key R Commands
- Summary Statistics:
```
mean(dataset$variable)
median(dataset$variable)
sd(dataset$variable)
IQR(dataset$variable)
```
- Histograms & Boxplots:
```
histogram(~ variable, data = dataset)
bwplot(variable ~ category, data = dataset)
```
- Normal Distribution Calculations:
```
pnorm(x, mean, sd)   # Pro
```
  Here’s a detailed term-definition study set based on the exam topics:
  Statistics and Data Analysis
  1. Mean (Average): The sum of all values in a dataset divided by the number of values.
  2. Median: The middle value in an ordered dataset; if even, the average of the two middle values.
  3. Mode: The most frequently occurring value(s) in a dataset.
  4. Range: The difference between the maximum and minimum values in a dataset.
  5. Interquartile Range (IQR): The range of the middle 50% of data, calculated as Q3 - Q1.
  6. Quartiles: Values that divide a dataset into four equal parts:
    - Q1 (First Quartile): 25th percentile
    - Q2 (Median): 50th percentile
    - Q3 (Third Quartile): 75th percentile
  7. Standard Deviation (SD): A measure of how spread out the data is around the mean. A higher SD indicates more variability.
  8. Five-Number Summary: A set of five values (Min, Q1, Median, Q3, Max) that summarize a dataset.
  Outliers Detection
  1. Outliers: Data points that are significantly higher or lower than the rest of the dataset.
  2. Lower Fence: Q1−1.5×IQRQ1 - 1.5 \times IQR, used to detect low-end outliers.
  3. Upper Fence: Q3+1.5×IQRQ3 + 1.5 \times IQR, used to detect high-end outliers.
  Probability and Percentage Calculations
  1. Proportion: A fraction representing a part of a whole, often converted into a percentage.
  2. Percentage: A way to express a proportion out of 100, calculated as partwhole×100\frac{\text{part}}{\text{whole}} \times 100.
  3. Conditional Probability: The likelihood of an event occurring given that another event has already occurred (e.g., percentage of Obama supporters who were male).
  Categorical Data Analysis
  1. Frequency Table: A table that lists the number of times different categories occur in a dataset.
  2. Contingency Table: A table that shows the frequency distribution of variables to examine relationships between them.
  3. Gender Gap in Voting: A phenomenon where voting preferences differ significantly between males and females.
  Comparing Distributions
  1. Boxplot: A graphical representation of the five-number summary, useful for comparing distributions.
  2. Histogram: A bar chart representing the frequency of numerical data intervals.
  3. Symmetric Distribution: A dataset where the left and right sides of the histogram are roughly mirror images.
  4. Skewed Distribution:
  - Right-Skewed (Positive Skew): Tail is longer on the right side.
  - Left-Skewed (Negative Skew): Tail is longer on the left side.
  1. Spread/Variability: The extent to which data values differ, measured by range, IQR, and standard deviation.
  Data Visualization and Alternative Representations
  1. Bar Chart: A chart that uses bars to represent categorical data.
  2. Scatterplot: A graph of plotted points that show the relationship between two variables.
  3. Alternative Graphical Representations: Other ways to display data, such as side-by-side boxplots for comparing distributions.
  Software-Specific Knowledge (R Programming Basics)
  1. favstats(): An R function that provides summary statistics (mean, median, Q1, Q3, etc.) for a dataset.
  2. histogram(): An R function that generates a histogram to visualize numerical data distributions.
  3. tally(): An R function that creates frequency tables for categorical data.
  4. bwplot(): An R function that generates boxplots to compare distributions of a variable across different categories.

Note

0.0(0)

Take a practice test

Chat with Kai

undefined Flashcards

Explore Top Notes

Chapter 11: Globalization and the Future of Comparative Politics

Studied by 28 people

Studied by 20 people

Sociological Perspective on Education

Studied by 9 people

Ch 3. Database systems, data centers, and business intelligence

Studied by 50 people

Unit 3: National Income and Price Determination

Studied by 8373 people

US HISTORY STUDY GUIDE

Studied by 5 people