1/41
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
descriptive statistics
- arrange data (e.g. table, graph)
- characterize data (e.g. average, error)
characterizes the sample
summarizes, presents data
Inductive statistics
- generalization: → estimation
→ hypothesis testing
characterizes the statistical population (based on the sample)
interpret, analyze data
observation unit
he elements of a sample are known as sampling point, observation unit
sample
set of data collected
population
set of similar items or events which is of interest for some question or experiment, can be a group of existing objects
correct sampling method
define the population of interest
- decide sample size
- how many observation units are necessary to charaterize the population?
- representative sample size? (= representative of the scientific question!)
Sampling has to be unbiased (observation units can not be preferred)
Sampling is unbiased (objective) and representative if it contains random elements
sampling
Choosing part of the statistical population in order to characterize the population or
to predict characteristics of the population
sampling with replacement
one member of the popultion may be chosen more than once
sampling with no replacement
no member of the population can be choosen twice
sampling population
from finite population
from infinite population
sampling a finite population with replacement is the same as sampling from an infinite population
simple random sampling
- all members of the population have equal chance to be chosen
- sampling units are chosen independently
e.g. using random numbers
stratified sampling
- if there are sub-populations
- choosing from the sub-population should be random
(ex. sampling a group of students proportional to gender)
systematic sampling
select some starting point and then select every kth element in the population
(ex. choosing every 4th person at the supermarket)
qualitative variables
identifies (ex. id number), not all variables are numbers
quantitative variables
characterizes quantity (numbers), different ways to characterize them.
Nominal
example: country name, passport number
- text, identifying number
- has no numerical value, can not be added
- can not be ranked
Binomial
(special case of nominal)
- example: female/male, yes/no
- text or identifying number
- two possible values only
(participate in the Olympics (yes-no))
Ordinal
- example: grading, cloth sizing
- text or ranking number
- has value, can be ranked
(grades: E < D < C < B < A, T-shirt size: S < M <L <XL)
qualitative variables examples
Prevalence (frequency) can be given (e.g. 8 students are from Hungary, 5 from France)
Ordinal variable have cumulative frequency
(e.g. how many people wear L size? - frequency
how many people wear smaller than L size? - cumulative frequency)
Quantitative variables - interval scale
example: temperature in degree Celsius
- number in reference to an arbitrary zero
- possible to add, substract
- ratios can not be calculated (can not divide or multiply)
Quantitative variables - ratio scale / absolute scale
- example: mass in kilogram, number of people
- number in reference to an absolute zero
- possible to add, substract
- ratios can be calculated (possible to divide or multiply)
types of variables
qualitative (factors) : nominal, ordinal
quantitative (numeric or) random variables: discrete, continious, absolute, interval
presenting qualitative variables - one variable
factor (ordinal)
- frequency table
- bar graph
presenting qualitative variables - one variable
factor (nominal)
- frequency table
- bar graph
presenting quantitative variables
Bar graph
- simple (one variable)
- multiple (many variables)
- stacked (many variables)
Pie chart
Dots and/or lines
- two (or more) variables!
median
middle value
mean
the central value of a discrete set of numbers
modus
most repeated value, not all samples have medians
unimodial
a frequency distribution that has only one peak. Unimodality means that a single value in the distribution occurs more frequently than any other value.
range
x(max)-x(min)
interquartile region
q1-q3
lower and upper quartile
median of the upper / lower half of the data
Variance
sum of squared deviations from mean,
divided by the degrees of freedom
standard deviation
square root of variance
Normalization
make the maximum equal to a set value (usually =1)
Standardization
make the date set's mean =0 and SD =1
Estimation statistics
a data analysis framework that uses a combination of effect sizes,
confidence intervals, precision planning, and meta analysis to plan experiments, analyze
data and interpret results
point estimate
population statistics are approximated (estimated using sample statistics)
Example
population standard deviation (SD) ¬ sample standard deviation (SD)
population mean ¬ sample mean
standard error formula
SE = SD/√n sd= standard deviation
i
Gives the interval which contains the estimated parameter with given probability
Density function
describes the relative likelihood for a variable to take on a given value
Distribution function
describes the relative likelihood for a variable to be smaller than a given value