1/35
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
bar graphs
summarize categorical data, each bar represents the count and the category
histograms
depict scale data
histograms x axis, y axis
x axis: independent variable is broken into intervals on continuous scale, bars next to eachother
y axis: always start at 0, shows the count of values w/i each interval
width of bar, histogram
represents the size of the interval
area of all bars in a histogram
represents true frequency, because area = 100% of data, absolute and relative freq doesnt matter, it will have the same shape.
unimodial
graph having 1 peak(mode)

bimodial
graph having 2 clear peaks(modes)

multimodial
graph having 3+ peaks(modes)
symmetric graph
mean, median, mode all same at peak
positive skewed graph
mode, median, mean(in that order)
negative skewed graph
mean, median, mode (in that order
frequency polygons
visualizing scale data, same axises as histogram (but x axis marks midpoint of each interval rather than the borders of interval), then a line going thru middle of each bar
- good for comparing multiple groups

cumulative frequency polygon
- the sum of counts up to a certain interval

one way scatter plots
- categorical or scale, uses a single axis to display the relative potition of each point in a group
- advantage: all obs are represented individually
- downside: can becomes hard to read if a lot of points are close together
- can be vertical or horizontal

boxplots
categorical or scale data
- anything beyond adjacent values are considered extreme values and are plotted as individual dots

one way scatter or boxplot for single axis data?
scatter: good to show many obs, maybe overwhelming
box: nice summary of iqr, range, easy to interpret, doesnt tell u obs, excludes extreme values, used more often
two way scatter plots
depict relationship between 2 scale variables
each point represents where the x and y axis meet
- multiple can be used to compare groups (need different colours and a legend)
line graphs
represent relationship between 2 scale variables
- each point on the x axis has a corresponding y value
- most commonly the scale of x axis represents time
what constitutes a potential outlier - mean and stdv
values more than 2 stdvs above or below the mean
what constitutes a potential outlier - median and iqr
values more than 1.5 times the iqr above or below the quartiles
which outlier equation to use?
- median and IQR=most stable bc outliers can impact the mean
- mean and stdv=easier to interpret
data entry errors
types, can cause an outlier, double check data, correct error, re analyze
process error
issue when collecting data, if possible re do data collection, if not possible, remove outlier
when to keep outliers
when they appear to be genuinely obtained and might give new insight on a phenomenon or signal another group that should be accounted for
missing factor
reflect on if there is another factor that might impact outcome that should be considered - reflect on if you should remove or not and explain
random chance
the value is theoretically possible but highly unlikely - reflect on if you should remove or not and explain
random sampling
random selection is used to choose observations
each thing has equal chance of being included in sample
non random sampling
the items included in study are selected for a reason (proximity, feasibilty)
may not give you representative sample of population
sampling bias
when each memeber of relevant population does not have equal chance of ending up in sample
response bias
when participants give answered they believe the researcher wants or socially accepted answers
survirorship bias
when inidviduals leave the study and the researcher continues to measure the remaining participants without considering those that left
recall bias
when participants don’t remember past events properly or omit details, especially when measuring data long time after the event
representative samples
when your data represents the entire population
random sampling benefits and downfalls
can mitigate sampling bias, can ensure representative samples, but challening
transparency
sometimes not possible to use random sampling so you have to be transparent with methods so others can find possible mistakes
central tendency for skewed data
median because it takes extreme values into account but not greatly impacted by them