1/14
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
The statistical problem-solving process
Formulate questions - Collect Data - Analyze Data - Interpret Results
Research Question
Broad, ongoing investigation with multiple aspects, e.g. How can a company increase sales?
Statistical Question
Can be answered directly by analyzing a relevant data set, e.g. What is the average salary of people with an MBA?
A statistical question has what characteristics?
Can be answered directly using an appropriate dataset
Multiple analysts using the same dataset would get the same answer (That is, not subjective)
If a statistical question involves two or more variables :
The __ is what we would like to predict
The __ variables are used to calculate these predictions
Response
Explanatory
Example :
Response Variable : amount spent by online customers
Possibly Explanatory Variables : Gender, Age, Geographic location
How are data tables usually organized?
With cases in the rows, and the variables in the columns (structured data)
The rows of a data table correspond to
cases
If we have several measurements for an item, this item is a case and should have it’s own row.
The measurements recorded about items are called ___ and are shown in the __ of the data table
variables, column
The first column is the __, which is used to identify individual cases uniquely and is not considered a variable
Identifier
Quantitative Variable
tells us how much of something was measured and quantifies exactly how far apart individual items are.
It is possible to compute an average value. WE SHOULD BE ABLE TO COMPUTE THE AVERAGE VALUE FOR A QUANTITATIVE VARIABLE.
Categorical Variable
has separate distinct categories. It is not possible to specify exactly how far apart 2 items are, nor do mathematical operations such as compute the average.
Identifier
A unique code assigned to each individual or item, listed in the first column of the data table. It could be a name, or an alpha-numeric code.
Example : Social Security Numbers
Identifiers have similar characteristics as categorical variables in that they can’t be added or averaged, but they are NOT ANALYZED.
Data that consist of the same item measured repeatedly are called
time series
To qualify as time series, we should be able to plot the data with
a line plot with time on the X-Axis
Data that is measured only once is called
cross-sectional - Only measured at one specific point in time, and not ongoing.