Stats 10 Midterm- UCLA

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/68

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

69 Terms

1
New cards

Variable

any characteristics, number, or quantity that can be measured or counted. It is called a ---- because the value may vary between data units in a population, and may change in value over time or observation. (usually put on columns in data table)

2
New cards

Observation

occurrence of a specific variable that is recorded about a data unit. (usually put on rows in data table)

3
New cards

Population

collection of observations of interest. This number is usually very large and nearly impossible to obtain measurements from.

4
New cards

Sample

portion of a population of interest. A sample is usually taken to measure a characteristic about a population.

5
New cards

Numerical

---- Variable (Quantitative) : Temperature, GPA, Height, Distance, ...

6
New cards

Categorical

---- Variable (Qualitative) : Eye color, Gender, Year in school, Major in college, ...

7
New cards

Frequency

natural way to summarize the categorical variables

8
New cards

Confounding Variable

A difference between the two groups that could explain why the outcomes were different. affects the variables of interest but is not known or acknowledged, and thus (potentially) distorts the resulting data

9
New cards

observational study

In an ---- ---, researchers don't assign choices, they simply observe them

10
New cards

experimental study

In an experiment, the experimenter actively and deliberately manipulates the treatment variable and assigns the subjects to those treatments, generally at random.

11
New cards

Large sample size, Random assignment/ bias, Double-blinded experiment, placebo

Four things that are important in an experimental study:

12
New cards

Stem-and-leaf plot

Divides each observation into a "stem" and "leaf". Best for numerical data

<p>Divides each observation into a "stem" and "leaf". Best for numerical data</p>
13
New cards

Dot Plot

Places a dot along an axis for each case in the data. Best for numerical data

<p>Places a dot along an axis for each case in the data. Best for numerical data</p>
14
New cards

Histogram

displays the number of cases in each bin. (Bin widths are a personal choice. Too small a width shows too much detail while too large a width shows too little.)

The horizontal axis is numerical and the vertical axis is the (relative) frequency. Best for numerical data

<p>displays the number of cases in each bin. (Bin widths are a personal choice. Too small a width shows too much detail while too large a width shows too little.)</p><p>The horizontal axis is numerical and the vertical axis is the (relative) frequency. Best for numerical data</p>
15
New cards

relative frequency

how often something happens divided by all the possible outcomes. Used in histogram y-axis

16
New cards

Shape

focuses on number of modes (peaks), Symmetricity, outliers, central tendency

17
New cards

Unimodal

single mode

<p>single mode</p>
18
New cards

Bimodal

two modes

<p>two modes</p>
19
New cards

Uniform

no apparent peaks

20
New cards

Symmetric

the left hand side is roughly the mirror image of the right hand side. (bell-shaped)

21
New cards

Right-Skewed

(positively skewed) : long tail to the right

<p>(positively skewed) : long tail to the right</p>
22
New cards

Left-Skewed

(negatively skewed) : long tail to the left

<p>(negatively skewed) : long tail to the left</p>
23
New cards

Outliers

unusually large or small values in the distribution

24
New cards

typical value

not necessarily center in location. Whether its in the center or not

25
New cards

spread/ variability

how spread out the data is from the center

When the data values are tightly clustered around the center of the distribution, the spread is small

When data values are scattered far from the center, the spread is large

26
New cards

Bar Chart

Displays counts of each category next to each other for easy comparison. May put relative frequency proportions for each category. Best for categorical data

<p>Displays counts of each category next to each other for easy comparison. May put relative frequency proportions for each category. Best for categorical data</p>
27
New cards

Mean

The center as a balancing point. Arithmetic average (calculation)

28
New cards

standard deviation

The average distance of a value from the mean.

It measures how far away the typical observation is from the mean. In symmetric, unimodal distributions, the majority (more than half) of the observations are less than one ---- from the mean.

29
New cards

Empirical Rule

A rough guideline, a rule of thumb, that helps us understand how the standard deviation measure variability. According to the ----, 68% of the data will fall between 85 and 115 (one SD from the mean), 95% of the data will fall between 70 and 130, and 99.7% of the data will fall between 55 and 145.

30
New cards

z-score

measure the distance of each data value from the mean in standard deviations. A negative --- tells us that the data value is below the mean, while a positive --- tells us that the data value is above the mean.

31
New cards

magnitude

of z-score (absolute z-score) implies the'unusualness' of the observation regardless itsdirection.

32
New cards

median

The center as the middle (half-way point)

33
New cards

IQR

tells us how much space the middle 50% of the data occupy.

34
New cards

scatterplot

are the best way to start observing the relationship and the ideal way to picture associations between two numerical variables.you can see patterns, trends, relationships, and even the occasional extraordinary value sitting apart from the others

<p>are the best way to start observing the relationship and the ideal way to picture associations between two numerical variables.you can see patterns, trends, relationships, and even the occasional extraordinary value sitting apart from the others</p>
35
New cards

explanatory variable

The variable in the x-axis (aka independent variable)

36
New cards

response variable

the variable on the y-axis is called the (aka dependent variable)

37
New cards

linear

If the points appear as a cloud or swarm of points stretched out in a generally consistent, straight form, the form of the relationship is

<p>If the points appear as a cloud or swarm of points stretched out in a generally consistent, straight form, the form of the relationship is</p>
38
New cards

correlation coefficient

(r) gives us a numerical measurement of the strength of the linear relationship between the explanatory and response variables. A correlation near zero corresponds to a weak linear association.

39
New cards

regression line

Tool for making predictions about future observed values. Provides us with a useful way of summarizing a linear relationship

40
New cards

predicted value

The estimate made from a linear model. denoted as ^Y

41
New cards

residual

The difference between the observed value and its associated predicted value

42
New cards

line of best fit

the line for which the sum of the squared residuals is smallest. Among possible regression lines, find the line that minimizes the sum or squared residuals.

43
New cards

random

no predictable pattern occurs and no digit is more likely to appear than any other

44
New cards

theoritical probability

the relative frequency at which an event happens after infinitely many repetitions.

45
New cards

trial

Each time you hit shuffle

46
New cards

outcome

The song that plays as a result of hitting shuffle

47
New cards

sample space

the collection of all possible outcomes of a trial

48
New cards

event

A combination of outcomes

49
New cards

independent

the outcome of one trial doesn't influence or change the outcome of another.

50
New cards

two-way table

displays the count of two categorical variables at a time conditional on each other

<p>displays the count of two categorical variables at a time conditional on each other </p>
51
New cards

Treatment group

people who receive the treatment (or have the characteristic of interest)

52
New cards

Control group

people who do not receive the treatment (or do not have the characteristic of interest)

53
New cards

mode

value of the data set that occurs with the greatest frequency

54
New cards

types of bar charts

pareto (greatest to least), side-by-side, segmented(both display two categorical variables graphically), pie chart

55
New cards

boxplot

useful graphical tool for visualizing a distribution, especially when comparing two or more groups of data, five-number summary

56
New cards

scatterplot three patterns

strength (how closely they’re clustered), direction (positive or negative), and shape (linear or non-linear) 

57
New cards

Correlation vs Causation

a high correlation between two variables does NOT imply causation

58
New cards

Measure of Goodness of fit 

If the correlation between x and y is r = 1, or r = -1, then the model would predict y perfectly, and the residuals would all be 0, hence variability in the response variable would be explained by the explanatory variable 

59
New cards

R² coefficient of determination

gives us the percentage of the variance of y accounted for by the regression model or in other words explained by x

60
New cards

Residual Plot Bad examples 

Fan shaped residuals, Curved residuals 

61
New cards

empirical probability 

relative frequencies based on an experiment or on observations of a real life process

62
New cards

theoretical probability example

we may not be able to predict which song we play each time we hit shuffle, however we know in the long run 5 out of 1000 songs will be JB songs

63
New cards

empirical probability example

we listen to 100 songs on shuffle and 4 ofthem are JB songs, the empirical probability is 4/100 = 0.04

64
New cards

Complement rules

the set of outcomes that are not in the event A is called the complement of A denoted Ac

65
New cards

Complement rule example

A: it rains in LA 0.4

Ac : it does not rain in LA 0.6

66
New cards

Disjoint events (Mutually exclusive event)

Events that have no outcomes in common and thus cannot occur together

67
New cards

Combined events

Event A and B happen at the same time, Event A or B either or both happen

68
New cards

Conditional probability

we focus on one group of objects and imagine taking a random sample from that group alone.

69
New cards

Law of Large Numbers

the long run relative frequency of repeated independent events gets closer and closer to the true relative frequency as the number of trials increases.