1/23
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Data
Compilations of facts, figures, or other contents, both
numerical and non-numerical.
Statistics
the science that deals with the collection, preparation, analysis, interpretation, and presentation of data
Descriptive statistics
refers to the summary of important aspects
of a data set.
• Includes collecting, organizing, and presenting the data in the
form of charts and tables.
Inferential statistics
refers to drawing conclusions about a larger
set of data (population) based on a smaller set of data (sample)
Cross-sectional data
refers to data collected by recording a
characteristic of many subjects at the same point in time, or without regard to differences in time.
Time-series data
refers to data collected over several time periods focusing on certain groups of people, specific events, or objects.
Structured data
Reside in a pre-defined, row-column format.
Unstructured data
Do not conform to a pre-defined, row-column format.
Big data
A massive volume of structured and unstructured data.
Extremely difficult to manage, process, and analyze using traditional data
processing tools
Volume
immense amount of data complied for a single or multiple sources
Velocity
data is generated at a rapid speed
Variety
data come in all types, forms, granularity, structured and unstructured.
Veracity
credibility and quality of the data
Value
useful insights or measurable improvements due to the use of data
Variable
a characteristic of interest that differs in kind or degree among various observations (records)
Categorical data
Also called qualitative.
• Represent categories.
• We use labels or names to identify distinguishing
characteristic of each observation.
• Can be defined by two or more categories.
• Coded into numbers for data processing.
• Example: marital status, grade in a course
Numerical data
Also called quantitative.
• Represent meaningful numbers.
• We use numbers to identify the distinguishing characteristic of each
observation.
• Either discrete or continuous.
Nominal scale
Least sophisticated.
• can be only categorized or grouped.
• Observations differ by label or name.
• Example: marital status
Ordinal scale
Stronger level of measurement.
• We can categorize and rank data with respect to some characteristic.
• Differences between the ranked observations cannot be interpreted, numbers
are arbitrary
Interval data
Can be categorized and ranked.
• Differences between the observations are meaningful.
• Zero value is arbitrary and does not reflect absence of characteristic.
• Ratios are not meaningful for interval data.
• Example: Fahrenheit scale for temperatures.
Ratio data
Strongest level of measurement.
• Has all the characteristics of the interval scale as well as a true zero point.
• Zero reflects absence of characteristic.
• Ratios are meaningful.
• Example: profits.
The omission strategy
observations with missing values are
excluded from subsequent analysis
The imputation strategy
missing values are replaced with
some reasonable imputed values
Subsetting
is the process of extracting a portion of a data
set that is relevant for subsequent statistical analysis or when the objective of the analysis is to compare two subsets of the data