1/79
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No study sessions yet.
categorical data
labels or names used to identify an attribute of each element. (Nominal or ordinal, numeric or nonnumeric)
quantitative data
numeric values that indicate how much or how many of something.(interval or ratio scale)
Frequency distribution calculations:
Frequency
Relative frequency
Percent frequency
bar chart
A graphical device for depicting categorical data that have been summarized in a frequency, relative frequency, or percent frequency distribution.(Column/Bar Chart in Excel)
pie chart
A graphical device for presenting data summaries based on subdivision of a circle into sectors that correspond to the relative frequency for each class.
Summarizing Categorical Data
Summary TableObjective is to: summarize the frequency of each category Excel function used: COUNTIF
Frequency Distribution ~ Formula:=COUNTIF(range, category)
Relative Frequency Distribution ~ Formula:relative frequency * 100
Bar Chart
Pie Chart
Quantitative data frequency distribution steps
Number of classes
Width of classes
Class limits
class midpoint
value halfway between the lower and upper class limits
Relative frequency formula for a quantitative variable
frequency/n
dot plot
graphical device that summarizes data by the number of dots above each data value on the horizontal axis
histogram
A graphical display of a frequency distribution, relative frequency distribution, or percent frequency distribution of quantitative data constructed by placing the class intervals on the horizontal axis and the frequencies, relative frequencies, or percent frequencies on the vertical axis.
histogram skewness types
symmetric, moderately skewed left, moderately skewed right highly skewed right
symmetric
left tail is the mirror image of the right tail(heights of people)
moderately skewed left
longer tail to the left(exam scores)
moderately skewed right
longer tail to the right(housing values)
highly skewed right
very long tail to the right(executive salaries)
cumulative frequency distribution
A tabular summary of quantitative data showing the number of data values that are less than or equal to the upper class limit of each class.
Last entry: total number of observations
cumulative relative frequency distribution
A tabular summary of quantitative data showing the fraction or proportion of data values that are less than or equal to the upper class limit of each class.
Last entry: 1.00
cumulative percent frequency distribution
A tabular summary of quantitative data showing the percentage of data values that are less than or equal to the upper class limit of each class.
Last entry: 100% or 100
stem-and-leaf display
A graphical display used to show simultaneously the rank order and shape of a distribution of data.
summarizing quantitative data(1)
Frequency Distribution - has 3 steps. Use Excel's Pivot Tables to construct Frequency Distribution.
summarizing quantitative data(2)
Relative and Percent Frequency Distributions
summarizing quantitative data(3)
Dot Plot - Horizontal axis shows the range of data values.Then each data value is represented by a dot placed above the axis.
summarizing quantitative data(4)
Histogram - Variable of interest is placed on the horizontal axis. A rectangle is drawn above each class interval with its height corresponding to the interval's frequency, relative frequency, or percent frequency. No natural separation between rectangles.
summarizing quantitative data(5)
Cumulative Distributions. Two Types
summarizing quantitative data(6)
Stem-and-Leaf Display - shows both the rank order and shape of the distribution of the data. Similar to histogram on its side, BUT has the advantage of showing the actual data values. Each line (row) in the display is referred to as a stem. Each digit on a leaf is a data value.
crosstabulation
A tabular summary of data for two variables. The classes for one variable are represented by the rows; the classes for the other variable are represented by the columns.
crosstabulation(2)
row or column percentages
Can provide additional insight about the relationship between the two variables. Simpson's Paradox - the reversal of conclusions based on aggregate and unaggregated data
scatter diagram
A graphical display of the relationship between two quantitative variables. One variable is shown on the horizontal axis and the other variable is shown on the vertical axis.
trendline
A line that provides an approximation of the relationship between two variables.
Scatter diagrams and trendlines: Are useful in exploring the relationshipsbetween 2 Quantitative variables.
Scatter diagram relationships
Positive(both increase)
Negative(increase, decrease)
No apparent(flat)
side by side bar chart
A graphical display for depicting multiple bar charts on the same display.
stacked bar chart
A bar chart in which each bar is broken into rectangular segments of a different color showing the relative frequency of each class in a manner similar to a pie chart.
data visualization
describes the use of graphical displays to summarize and present information about a data set. The goal is to communicate as effectively and clearly as possible the key information about the data.
data dashboard
Widely used data visualization tool that provides timely, summary information in an easy to read and interpret format. Organizes and presents KPIs (Key Performance Indicators) used to monitor an organization or process.
Statistics
the art and science of collecting, analyzing, presenting and interpreting data
data
the facts and figures collected, analyzed, and summarized for presentation and interpretation
data set
all the data collected in a particular study
elements
the entities on which data are collected(ROW)
entity
an object, individual, or unit about which data are collected.(ROW)
variable
a characteristic of interest for the elements(COLUMN)
observation
the set of measurements obtained for a particular element(ROW)
scale
determines the amount of information contained within the data. Indicates the type of appropriate statistical analysis that can be performed.
Scales of measurement categorical
nominal, ordinal
nominal
scale of measurement for a variable when the data are labels or names used to identify an attribute of an element.(nonnumeric or numeric).
WTO Status
ordinal
scale of measurement for a variable if the data exhibit the properties of nominal data and the order or rank of the data is meaningful(nonnumeric or numeric).
Class rank of a student
Scale of measurement quantitative
interval, ratio
Interval
scale of measurement for a variable if the data demonstrate the properties of ordinal data and the interval between values is expressed in terms of a fixed unit of measurement.(always numeric)
SAT SCORES
ratio
scale of measurement for a variable if the data demonstrate all the properties of interval data and the ratio for two values is meaningful.(always numeric)
Melissa's college record shows 36 credit hours earned,while Kevin's record shows 72 credit hours earned. Kevin has twice as many as Melissa.
Data classifications
Categorical, quantitative, cross-sectional, time series data
Cross-sectional data
data collected at the same or approximately the same point in time
Per capita GDP
time series data
data collected over several time periods
U.S average price per gallon of conventional regular gas between 2012 and 2018
Data variable types
categorical, quantitative
categorical variable
a variable with categorical data, Labels or names used to identify an attribute of each element; grouped by category(QUALITATIVE)
quantitative variable
a variable with quantitative data, How many or how much, always numeric
quantitative data
discrete, continuous
Discrete
if measuring how many(exact figures you can count)
continuous
if measuring how much(measurable values representing a range of information)
data sources
existing data sources:
Internal company records
Business database services
Government agencies
Industry associations
Special-interest organizations
Internet: data we create
Statistical Studies
observational studies
no attempt is made to control or influence the variables of interest. Observe what is happening & record data only
Surveys, public opinion polls
experimental studies
variable of interest is first identified, then one or more variables are identified and controlled so that data can be obtained about how they influence the variable of interest
Pharmacy companies test new drugs by giving different doses to people and monitoring the effects.
descriptive statistics
Data that are summarized and presented in a form that is easy for people to understand
descriptive statistics summaries of data may be:
tabular, graphical, and numerical
mean(average)
demonstrates a measure of the central tendency(location) of the data for a variable
statistical processes
Statistical inference(data to make hypotheses about population characteristics)
Census(survey collects data on an entire population)
Sample survey(collect data on a sample)
element sizes
population, sample
statistics analysis software
Excel
analytics
scientific process of transforming data into insight for making better decisions
analytics technique categories
descriptive, predictive, prescriptive
descriptive analytics
A set of analytics techniques that describe what has happened in the past
predictive analytics
set of analytical techniques that use models constructed from past data to predict the future or assess the impact of one variable on another
Prescriptive Analytics
set of analytical techniques that yield a best course of action
big data
A set of data that cannot be managed, processed, or analyzed with commonly available software in a reasonable amount of time.(great volume, high velocity, wide variety)
data warehousing
Process of capturing, storing, and maintaining the data
data mining
The process of using procedures from statistics and computer science to extract useful information from extremely large databases.
data mining major applications
made by companies with a large amount of data
data mining model reliability
careful validation of results and extensive testing is important
Ethical guidelines for statistical practice was developed by
the American Statistical Association
REMEMBER THIS
observation = elements = entities
Data values
variables x elements/observations/entities