1/91
quandingle methods ;3
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
Descriptive Statistics
Describes the characteristics and properties of anything that you can gather data from (Person, Place, Companies). Based on facts or meaningful information. Cannot draw conclusions about larger sets of data.
Inferential Statistics
Draws conclusions about a population based on the data gathered from samples with the use of DS techniques, concerned with methods of analyzing smaller groups of data that lead to predictions about larger sets of data. Gives generalization about the whole from analyzing a part of it.
Population
Totality of all observations in which the dataset is obtained
parameter
A variable describing the population
Sample
a portion of a population, said portion will include as much diverse data as possible which will represent the entire population.
statistic
A variable describing the sample
Variables
parameters being studied in statistics
Qualitative Variables
nonnumeric data like gender, civil status, and location.
Quantitative Variables
numerical data like force, weight, voltage.
Continuous Data
This type of data needs to be obtained using measuring tools, measurable quantities but not countable, infinite values but has a range.
Discrete Data
countable and measurable quantities, finite values and only whole numbers. This data can be counted or measured using counting tools
Independent Variable
A variable that can be altered to see an outcome
Dependent Variable
A variable that is observed after altering an independent variable
Controlled Variable
A variable kept constant to avoid any influenced outcomes
Extraneous Variable
An unexpected/unplanned variable but with minimal effects to the outcome
Nominal
Assign numerical data to categorical data. Using counting to analyze data falling into a category
Ordinal Data
Assign rank to levels of data. Range for the ranks of the variable is not constant
Interval
Assign a constant difference between numeric data, addition and subtraction is applicable. Zero does not mean “nothing”.
Ratio
Assign continuous range of data over a range and allow all arithmetic operations. Zero means nothing.
Sampling
process of taking portions from the population.
Probability Sampling
eliminates biases against certain events with no chance of being selected, listing all possibilities and taking a chance that they will be selected to be part of the sample.
Non-Probability Sampling
Increases bias for certain events with no chance of being selected, not including all of the population in the sample.
Simple Random Sampling
arranging the population to a certain rule. Elements are numbered and a sample is taken by various randomizing principles.
Systematic Sampling
sample will be taken by dividing the population into equal groups and getting the kth element in each group.
Stratified Sampling
grouping the population into strata, random sampling is performed for each stratum proportional to the size of each stratum based on the population.
strata
a sample with generally similar characteristics.
Cluster Sampling
done by identifying groups known as clusters, must be similar to each other with respect to parameters being examined.
cluster
a subpopulation with elements as diverse as possible
Convenience Sampling
based primarily on availability of respondents.
Quota Sampling
There is a desired number of samples. Respondents were taken as they volunteered themselves to become part of the experiment.
Purposive Sampling
The sample is obtained based on certain conditions.
Textual Form
presenting data via sentences and paragraphs in describing data
Tabular Form
presenting data with tables arranged by row and column for various parameters
Graphical Form
presenting data with pictures
Ungrouped Data
Data points are treated individually. These are raw, individual data points that are not organized into groups or classes.
Grouped Data
Data points are treated as grouped according to categories, this is raw data that is arranged into classes or intervals
Frequency Distribution Table
Showing each value or range of values their frequent appearance in a dataset, used in statistics for larger sets of data to ease the interpretation and also for graphs
Reason to use Frequency Distribution Table
This procedure is used to lessen work by treating the data by group.
Class Limits
smallest and largest values that fall into class intervals and taken with equal number of significant figures as the given data.
range
r = highest value - lowest value
class amount
k = 1 + 3.322log(n)
class width
cw = r/k
Class Boundaries (tree class limits)
precise expression of class interval, usually one significant digit more than the class limit.
Class Boundary formula
Upper limit of Class A + Lower limit of Class B / 2
Class Mark
midpoint of a class interval
Class Mark formula
cm = Lower Class Limit + Upper Class Limit / 2
Cumulative Frequency Distribution
derived from frequency distribution and can be also obtained by adding class frequencies.
Relative Frequency
percentage of total frequency with respect to the total population
rf formula
rf = f/∑f
Relative Frequency Distribution (%rf)
percentage of frequency’s proportion in each class to the total frequency
%rf formula
%rf = f/∑f x 100
Less than cumulative frequency (<cf)
distribution whose frequencies are lower the upper-class boundary they correspond to
obtaining “<cf”
adding the frequencies from top to bottom
Greater than cumulative frequency (>cf)
distribution whose frequencies are above the lower-class boundary they correspond to
obtaining “>cf“
adding the frequencies from bottom to top
Frequency Polygon
points are plotted using the midpoint and frequency
Histogram
points are plotted using the midpoint and frequency
Ogive
points are plotted using upper(>ogive)/lower(<ogive) class boundary and cumulative frequency.
Pareto Chart
graph used to represent frequency distribution for categorical data and frequencies are displayed by the heights of bars, arranged from highest to lowest.
Bar Chart
graph similar to histogram. The height of each rectangle represents the frequency of that category, applicable for categorical data (or nominal level).
Pie Chart (Circle Graph)
circle divided into portions representing relative frequencies (or percentage) of the data.
Scatter Plot
used to examine possible relationships between two numerical variables. Two variables are plotted in x-axis and y-axis.
Time Series Graph
represents data occurring over a specific period under observation. Shows trend or pattern on the increase or decrease over the period
Pictograph
appropriate pictures are arranged in a row (sometimes columns) presented quantities for comparison.
Measure of Central Tendency
Statistical values that describe the center or typical value of a dataset, also helps in summarizing entire sets of data with a single representative number. Calculated by adding the highest value and the lowest value then dividing by 2
The Mean
Most used parameter for describing ratio data
Arithmetic Mean
Only measure under central tendency where sum of deviations of each value from mean is zero, affected by abnormally large or small values. Calculated by sum of all values and divided by number of values.
Geometric Mean
Used in factors multiplied to another quantity
Geometric Mean formula
GM = sqrt(ab)
Trimmed Mean
Removing upper and lower values of the distributing and obtaining the arithmetic mean. Calculated by trimming a certain percent of both the largest and smallest set of values
Trimmed Mean formula
TM = ∑%x/%n
The Median
Midpoint of the values, as many values above as well as below it, unaffected by extremely large or small values, computed for ratio-level data, interval-level data, ordinal-level data, and open-ended frequency distribution if not in an open-ended class.
The Mode
Value of observation appearing most frequently, used to find most occuring/frequent value, Most unreliable compared to other measures, only measurement that is used for nominal data.
Bi-modal
When distribution has 2 modes
Tri-modal
If distribution has three modes
Multi-modal
The distribution has more than 3 modes
Measures of Position
Describes the relative standing of a value in a dataset, Indicates where a particular data point lies in relation to the rest of the data, Main measurements are Quartile (Q), Decile (D), Percentile (P) and standard score (z).
Quantiles (or Fractiles)
points taken at regular intervals from cumulative distribution
Quartiles
Division of dataset in 4 groups
Deciles
Division of dataset in 10 groups
Percentiles
Division of dataset in 100 groups
Measure of Variation (or Dispersion)
describes how spread out or dispersed the values are in a dataset. Tells how far the data points are from that center, measurements are range, standard deviation, variance, quartile deviation, interquartile range, and coefficient of variation.
Range
Difference between largest to smallest number in the set
Variance
Average of square deviations
Standard Deviation (SD)
given as positive square root of population/sample variance
Coefficient of Variation (CV)
Percentage of the ratio of standard deviation to the mean
Mean Absolute Deviation (MAD)
Average of unsigned deviations from mean
Quartile Deviation (QD)
Absolute measurements of dispersion
Interquartile Range (IQR)
Spread of the middle 50% of the data
Measure of Shape
Describe the distribution pattern of data, specifically the values are spread and whether the distribution is symmetric, skewed, or has peaks of certain sharpness.
Skewness
Degree of asymmetry of distribution about a mean, measurement of how data departs from symmetry, can be interpreted as symmetric, and positively or negatively skewed.
Kurtosis
Degree of peakedness exhibited by the distribution, computed as the fourth-degree moment from the mean.