MCAT Physics and Math - Data-Based and Statistical Reasoning

0.0(0)

Studied by 10 people

Knowt Play

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/56

Earn XP

Description and Tags

691

Logic

MCAT

Physics and Math

Data-based and statistical reasoning

University/Undergrad

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

57 Terms

New cards

Measures of central tendency

describe the middle(s) of a sample

New cards

mean / (arithmetic) average

calculated by adding up all of the individual values within the data set and dividing the result by the number of values

where x_i to x_n are the values of all of the data points in the set and n is the number of data points in the set

good indicator of central tendency when all of the values tend to be fairly close to one another

<p>calculated by adding up all of the individual values within the data set and dividing the result by the number of values</p><p>where x<sub>i</sub> to x<sub>n</sub> are the values of all of the data points in the set and n is the number of data points in the set</p><p>good indicator of central tendency when all of the values tend to be fairly close to one another</p>

New cards

outlier

extremely large or extremely small value compared to the other data values

New cards

median

midpoint, where half of data points are greater than the value and half are smaller; data set must first be listed in increasing fashion

where n is the number of data values

least susceptible to outliers, but may not be useful for data sets with very large ranges or multiple modes

<p>midpoint, where half of data points are greater than the value and half are smaller; data set must first be listed in increasing fashion</p><p>where n is the number of data values</p><p>least susceptible to outliers, but may not be useful for data sets with very large ranges or multiple modes</p>

New cards

mode

the number that appears the most often in a set of data; may be multiple, one or none; represented graphically as peaks; not directly used as measure of central tendency but relationships can be enlightening

New cards

normal distribution / bell curve

mean = median = mode

68% of the distribution is within one standard deviation of the mean, 95% within two, and 99% within three

New cards

standard distribution

mean of zero and a standard deviation of one, can be extrapolated from any normal distribution

New cards

skewed distribution

one that contains a tail on one side or the other of the data set; visual shift in the data appear opposite the direction of the skew

New cards

negative skew

tail to left

mean < median < mode

New cards

positive skew

tail to right

mode < median < mean

New cards

bimodal

distribution containing two peaks with a valley in between; might have only one mode if one peak is slightly higher than the other; can often be analyzed as two separate distribution

<p>distribution containing two peaks with a valley in between; might have only one mode if one peak is slightly higher than the other; can often be analyzed as two separate distribution</p>

New cards

range

difference between its largest and smallest values; does not consider the number of items of the data set, nor the placement of any measures of central tendency; possible to approximate the standard deviation as one-fourth

range = x_max − x_min

New cards

Quartiles

divide data (when placed in ascending order) into groups that comprise one-fourth of the entire set

New cards

Calculate quartiles

To calculate the position of the first quartile (Q1) in a set of data sorted in ascending order, multiply n by ¼.

median (Q2)

To calculate the position of the third quartile (Q3), multiply the value of n by ¾.

If this is a whole number, the quartile is the mean of the value at this position and the next highest position.
If this is a decimal, round up to the next whole number, and take that as the quartile position.

New cards

Interquartile range

calculated by subtracting the value of the first quartile from the value of the third quartile

IQR = Q₃ – Q₁

New cards

outlier

Any value that falls more than 1.5 interquartile ranges below the first quartile or above the third quartile OR that lies more than three standard deviations from the mean

may be: true statistical anomaly, measurement error, distribution that is not approximated by the normal distribution

New cards

Standard deviation

calculated by taking the difference between each data point and the mean, squaring this value, dividing the sum of all of these squared values by the number of points in the data set minus one, and then taking the square root of the result

where σ is the standard deviation, xi to xn are the values of all of the data points in the set, is the mean, and n is the number of data points in the set.

<p>calculated by taking the difference between each data point and the mean, squaring this value, dividing the sum of all of these squared values by the number of points in the data set minus one, and then taking the square root of the result</p><p>where σ is the standard deviation, xi to xn are the values of all of the data points in the set, is the mean, and n is the number of data points in the set.</p>

New cards

independent events

have no effect on one another

New cards

Dependent events

have an impact on one another, such that the order changes the probability

New cards

Mutually exclusive outcomes

cannot occur at the same time

New cards

exhaustive

there are no other possible outcomes

New cards

probability of two or more independent events

product of their probabilities alone

P(A ∩ B) = P(A and B) = P(A) × P(B)

New cards

probability of at least one of two independent events

equal to the sum of their initial probabilities, minus the probability that they will both occur.

P(A ∪ B) = P(A or B) = P(A) + P(B) − P(A and B)

New cards

Hypothesis testing

begins with an idea about what may be different between two populations

New cards

null hypothesis

always a hypothesis of equivalence; says that two populations are equal, or that a single population can be described by a parameter equal to a given value

New cards

alternative hypothesis

a hypothesis contrary to the null hypothesis

New cards

nondirectional

alternative hypothesis that the populations are not equal

New cards

directional

alternative hypothesis that the mean of population A is greater than the mean of population B

New cards

z- or t-tests

most common hypothesis tests which rely on the standard distribution or the closely related t-distribution

New cards

test statistic

calculated and compared to a table to determine the likelihood that that statistic was obtained by random chance

New cards

p-value

the likelihood that that statistic was obtained by random chance under the assumption that our null hypothesis is true

New cards

significance level (α)

comparison of p-value; 0.05 is commonly used

p-value > α, fail to reject the null hypothesis - not a statistically significant difference between the two populations

p-value < α, reject the null hypothesis - there is a statistically significant difference between the two groups

New cards

type I error

the likelihood that we report a difference between two populations when one does not actually exist; probability is α; false positive

New cards

type II error

incorrectly fail to reject the null hypothesis; probability is β; false negative

New cards

power

The probability of correctly rejecting a false null hypothesis; equal to 1 − β

New cards

confidence

probability of correctly failing to reject a true null hypothesis

New cards

Results of Hypothesis Testing

New cards

Confidence intervals

essentially the reverse of hypothesis testing; determine a range of values from the sample mean and standard deviation

New cards

Charts

present information in a visual format and are frequently used for categorical data

New cards

Pie / circle charts

used to represent relative amounts of entities and are especially popular in demographics; may be labeled with raw numerical values or with percent values

as the number of represented categories increases, the visual representation loses impact and becomes confusing

<p>used to represent relative amounts of entities and are especially popular in demographics; may be labeled with raw numerical values or with percent values</p><p>as the number of represented categories increases, the visual representation loses impact and becomes confusing</p>

New cards

Bar charts

used for categorical data, which sort data points based on predetermined categories; may then be sorted by increasing or decreasing bar length; length of a bar is generally proportional to the value it represents

breaks should be avoided in the chart because of the potential to distort scale

<p>used for categorical data, which sort data points based on predetermined categories; may then be sorted by increasing or decreasing bar length; length of a bar is generally proportional to the value it represents</p><p>breaks should be avoided in the chart because of the potential to distort scale</p>

New cards

Histograms

present numerical data rather than discrete categories; particularly useful for determining the mode of a data set because they are used to display the distribution of a data set

New cards

Box plots

used to show the range, median, quartiles and outliers for a set of data

New cards

box-and-whisker

a labeled box plot; box is bounded by Q1 and Q3; Q2 is the line in the middle of the box; ends of the whiskers correspond to maximum and minimum values of the data set

<p>a labeled box plot; box is bounded by Q1 and Q3; Q2 is the line in the middle of the box; ends of the whiskers correspond to maximum and minimum values of the data set</p>

New cards

Maps

data can be illustrated geographically; relatively easy to comprehend and may show geographic clustering for some data

New cards

Linear graphs

show the relationships between two variables; curve may be linear, parabolic, exponential, or logarithmic; axes of a linear graph will be consistent in the sense that each unit will occupy the same amount of space

New cards

linear shape graph

New cards

Parabolic graph

New cards

Exponential graph

New cards

Logarithmic graph

New cards

Slope (m)

change in the y-direction divided by the change in the x-direction for any two points

New cards

Semilog graphs

specialized representation of a logarithmic data set; curved nature of the logarithmic data is made linear by a change in the axis ratio

New cards

axis ratio

spacing based on a ratio, usually 10, 100, 1000, and so on

New cards

Tables

more likely to contain disjointed information than either charts or graphs because they often contain categorical data or experimental results; significant organization is likely to be relevant; should be able to convert it to a rough graph or to a linear equation

New cards

Correlation

a connection—direct relationship, inverse relationship, or otherwise—between data

New cards

correlation coefficient

number between –1 and +1 that represents the strength and direction of the relationship

+1 = strong positive relationship

–1 = strong negative relationship

0 = no apparent relationship

New cards

causation

manipulation of one variable is the reason for an effect in another