1/55
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
Variable
A characteristic or attribute that can assume different values (etc, age, height, bmi, hair/eye color)
Data
Values the variables can take/have been observed to assume
2 types of data (Singular for data is datum)
Qualitative And Quantitative
What is quantitative data/variable?
Can be measured/counted and can be ordered/ranked (eg. Age, temperature, bmi)
2 types of Quantitative data
Discrete and Continuous
What is qualitative data/variables?
Values that can be placed into distinct categories based on some attribute (nationality, gender, religion, hair color) NO MEASUREMENT!
What is discrete variables?
Assume certain values that are distinct from each other/values can be counted (eg. # of children in a family, u can have 0,1,2,3 not 2.7)
What is continuous variables?
Can assume infinite range of values between any 2 specific values. (eg. Height of a person 173cm,172.5cm ..)
2 types of qualitative measurement scales
Nominal and Ordinal
What is nominal?
Data are categories that cant be ordered/can be categorised but not ordered. (E.g. fav colour, movie)
What is Ordinal?
Qualitative values/data that can be ordered but no specific scale (E.g unit grades, position is a race, disagree/agree questionaire, military titles)
2 types of quantitative measurement scales?
Interval and Ratio
What is interval?
Data have an order/scale but have no meaningful zero points (e.g. temperature in c, year)
What is ratio?
Data that have a scale and a meaningful zero points (e.g. height, weight, survival, time to run a race)
What is population in statistics?
Consists of all subjects that are being studied
What is sample in statistics?
A group of subjects selected from a population from which the data is obtained.
What is critical of sampling?
It has to be representative of the population, otherwise it would bias our inference of the population
What are the sampling methods?
Random, Systematic, Stratified, Cluster
What is random sampling?
Number each member of the population then select subjects using random numbers
What is Systematic sampling?
Number each member of the population and then select every kth subject, e.g. assign every 5th person would come down to get their height measured.
What is Sratified sampling?
Divide population into groups according to a characteristics, then select subjects randomly from within each group
Cluster sampling?
Select all subjects from intact groups eg turotiral class
How do graphs display data?
Provide a visual summary of how data is distributed.
First few graphical displays?
Histograms, Stem and leaf plots and scatter plots
How do Histograms look/are arranged?
Divide the data into equal sized intervals, the height of the vertical bar above each interval indicates the number of data in that interval. (.eg. The height of the vertical bar is the frequency of each interval)
What information can be derived from a histogram?
3 things. 1) the middle, we can determine roughly where the middle is. 2) the spread, we can assess the spread or the range of the data. (Can see the lowest/highest value) 3) distribution shape, we can determine the shape, (eg. Most data is clumped towards the low age with a single peak.)
Distribution shapes
Positively skewed (to the left), Symmetric, Negatively skewed (to the right)
Downside of Histograms?
Dont give us fine resolution and raw data
Stem and leaf plots downside?
It is not practical for large data sets
What are stem and leaf plots?
Finer resolution, display the same distribution characteristics as the histogram. Take each data point each will have a stem and a leaf. The stem is all digits but the last digits and the leaf is the last digit.
How are stem and leaf plots set out?
Take each data point each will have a stem and a leaf. The stem is all digits but the last digits and the leaf is the last digit. Write all of the stems in a vertical column smallest to largest. Leaf in the row to the right of its stem smallest to largest.
What is scatter plots?
Are appropriate when you have more than 1 variable/used to illustrate relationships between 2 variables
How can you/when can you predict in scatter plots
The line can predict new values
What are other type of graphs?
Bar graphs, pie chart, time-series graph
What are bar graphs used for?
Similar to histogram, but for qualitative/categorical data
What are pie charts used for?
A circle with categories represented by wedges
What is time-series graphs?
Represents data collected over time.
What is descriptive statistics?
From a graph we get a sense for the characteristics of a distribution from a graph, we can assign numbers to these values. This allows these numerical quantities to be more precise and also allows comparisons between datas. These are called descriptive statistics.
One of the key descriptive statistics is?
Measure of central tendency
3 common measures of middle-ness/central tendency
Mean (arithmetic average), median (middle data point) mode (most common data point)
Which middle value is the best, which one is the best to pick for our analysis?
1) depends on the situation, 2) if we have quantitative data and if the shape of the data is symmetrical, then the mean is the best choice. 3) If we have a significant skew or if the data is ordinal, then the median is the most appropriate. 4) If only nominal data, can only calculate the mode.
How to calculate the mean? What is the mean of a sample symbol?
X/x with a bar on top, the thing that looks like evil version of 3 means sum all of x
How to calculate the median?
Denoted as MD, is the midpoint of the sorted data set. 1) arrange the data in lowest/highest order 2) locate the mid point
Important note about calculating the median?
If the number of data values is odd, the median is the middle value. If the number of data values is even, the median is mid-way between the 2 middle values (calculate)
Determining the mode?
Most commonly occurring number in a data set, it can be unimodal - 1 mode, be bimodal - 2 modes, be multimodal - > 2 modes, have no mode - all values occur once