Intro to Statistics

0.0(0)
studied byStudied by 0 people
GameKnowt Play
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/43

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

44 Terms

1
New cards

Dot Plot

A simple graph where each observation is shown as a dot along a number line. Ex) number of rooms in 45 homes displayed with one dot per home.

2
New cards

What is statistics?

The art and science of learning from data

3
New cards

It deals with collection

classification

4
New cards

Left-skewed

Most values are large

5
New cards

Types of data collection

Interviews

6
New cards

3 main components of statistics

Study design: Planning how to obtain data to help answer the
questions of interest (Data Collection).
Description: Exploring and summarizing patterns in the data (Data
Analysis).
Inference: Making decisions and predictions based on data/known
evidence.

7
New cards

Types of statistics

Descriptive: Involves organizing

8
New cards

What is data?

Systematically recorded information such as numbers

9
New cards

Types of data

Tabular

10
New cards

Tabular Data

A collection of objects and their attributes. An object (record or observation) is what is described

11
New cards

Types of variables

Numeric and categorical

12
New cards

Numerical/Quantitative variable

A variable that records measurable numerical values with units. It represents amounts or degrees of something

13
New cards

Discrete (numerical) variable

A type of quantitative variable that takes on a finite or countably infinite set of values. Ex) number of students in a class.

14
New cards

Continuous (numerical) variable

A type of quantitative variable that can take on infinitely many values within a range. Example: age

15
New cards

Categorical/Qualitative variable

A variable that classifies observations into groups or categories

16
New cards

Nominal (categorical) variable

Categories that have no natural order. Ex) favorite color

17
New cards

Ordinal (categorical) variable

Categories that follow a logical order. Ex) drink sizes (small

18
New cards

Transforming numerical into categorical

A numerical variable can be grouped into ranges and treated as categories. Ex) age reported as 18–24

19
New cards

Population

The total group of individuals or objects you want to make conclusions about in a statistical study. Ex) all students at Columbia University.

20
New cards

Sample

A subset of the population used to draw conclusions

21
New cards

Parameter

A summary value calculated from a population. Ex) the average age of all students at Columbia.

22
New cards

Statistic

A summary value calculated from a sample. Ex) the average age of students in one statistics class.

23
New cards

Why can’t we usually observe an entire population?

Studying every individual is often impractical because it takes too long

24
New cards

A bad sample

A sample that is not representative of the population and lead to biased results. Ex) using income data from only Manhattan households to represent the entire U.S.

25
New cards

Sampling

A method that allows researchers to study a population by investigating a subset instead of every individual. Ex) estimating MLB player salaries by surveying part of the league.

26
New cards

Probability sampling

A method where every individual in the population has a known

27
New cards

Non-probability sampling

A method where not all individuals have a chance of being selected

28
New cards

Sampling Methods

Simple Random Sampling

29
New cards

Simple Random Sample (SRS)

A sample where every individual has the same chance of being chosen and every possible sample has the same chance of selection. Ex) giving each MLB player a number and drawing numbers at random.

30
New cards

Stratified Sampling

The population is divided into subgroups (strata) with similar characteristics

31
New cards

Cluster Sampling

The population is divided into clusters

32
New cards

Systematic Sampling

Individuals are chosen at regular intervals from an ordered list

33
New cards

Convenience Sampling

Participants are chosen based on availability or willingness

34
New cards

Sampling Frame

A complete list of all individuals or units in the population who are eligible to be selected in a sample. Ex) a university registrar’s list of enrolled students.

35
New cards

Bar Plot

A graph that shows the frequency of each category of a categorical variable using bars

36
New cards

Proportion (p-hat vs p)

The proportion of cases in a category is cases in category ÷ total cases. The sample proportion is written as p-hat

37
New cards

Contingency Table

A table that shows the frequency of cases for combinations of two categorical variables. Ex) class year (rows) by early class status (columns).

38
New cards

Segmented Bar Plot

A graph where each bar represents a group and is split into colored segments that show the distribution of a second categorical variable within that group. Ex) one bar for freshmen divided into early vs no early class segments.

39
New cards

Side-by-Side Bar Plot

A graph that compares groups by placing separate bars for each category of a second variable next to each other. Ex) separate bars for early vs no early class shown side by side within each class year.

40
New cards

Dot Plot

A simple graph where each observation is shown as a dot along a number line. Ex) number of rooms in 45 homes displayed with one dot per home.

41
New cards

Mosaic Plot

A graph that uses the area of rectangles to show relationships between two or more categorical variables

42
New cards

Histogram

A graph that groups numerical data into intervals of equal width and shows how many cases fall in each interval. Ex) living area of homes grouped into 250-sq-ft bins.

43
New cards

Symmetric/Bell-Shaped

Data is clustered in the middle with roughly equal smaller and larger values. Ex) heights of adults.

44
New cards

Symmetric but Not Bell-Shaped

Values on one side of the distribution mirror the other