1/15
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Cases:
The objects described by a set of data
They can be customers, companies, subjects in a study,
units in an experiment, or other objects.
Variable:
is a special characteristic of a case
Values:
Different cases can have different values of a variable
Label:
is a special variable used in some data sets to
distinguish among the different cases
Categorical Variable:
places each case into one of several groups, or categories
Quantitative Variable:
takes numerical values for which arithmetic operations such as adding and averaging make sense
Key Characteristics of a Data Set:
Every data set is accompanied by important background
information. In a statistical study, always ask the following
questions:
Who? What cases do the data describe? How many
cases does a data set have?
What? How many variables does the data set have? What
are the exact definitions of these variables? What are the
units of measurement for each quantitative variable?
Why? What purpose do the data have? Do the data
contain the information needed to answer the questions of
interest?
Exploratory Data Analysis:
Begin by examining each variable by itself. Then, move
on to study the relationships among the variables
Begin with a graph or graphs. Then, add numerical
summaries of specific aspects of the data.
Variables:
We construct a set of data by first deciding which cases or units we want to study. For each case, we record information about characteristics that we call variables
Characteristics of the individual
Distribution of a Variable:
To examine a single variable, we graphically display its distribution
The distribution of a variable tells us what values it takes and how often it takes these values
Distributions can be displayed using a variety of graphical tools. The proper choice of graph depends on the nature of the variable
Categorical Variable:
Pie chart
Bar Graph
Quantitative Variable:
Histogram
Stemplot
The Distribution of a Categorical Variable:
list the categories and gives the count of the percentage of individuals who fall into each category
Pie charts: show the distribution of a categorical variable as a “pie”. Its slices’ sies reflect the counts or percent’s for the categories
Bar graph: represent categories as bars whose heights show the category counts or percent’s
The Distribution of a Quantitative Variable:
The distribution of a quantitative variable tells us what values the variable takes on and how often it takes those values
Stemplots: separate each observation into a stem and a leaf that are then plotted to display the distribution while maintaining the original values of the variable
Histograms: show the distribution of a quantitative variable by
using bars. The height of a bar represents the number of
individuals whose values fall within the corresponding class
To Construct Stemplots:
1.) Separate each observation into a stem (all but the rightmost digit)
and a leaf (the remaining digit)
2.) Write the stems in a vertical column; draw a vertical line to the right
of the stems
3.) Write each leaf in the row to the right of its stem; order the leaves,
if desired
4.) If there are very few stems (when the data cover only a very small range
of values), then you might want to create more stems by splitting the
original stems
Examining Distributions:
In any graph of data, look for the overall pattern and for striking deviations from that pattern
You can describe the overall pattern by its shape, center, and
spread
An important kind of deviation is an outlier, an individual that falls
outside the overall pattern
Extreme values of a distribution are in a tail of the distribution
A peak in a distribution is called a mode. Bimodal Distribution
A distribution is symmetric if the right and left sides of the graph are approximately mirror images of each other
A distribution is skewed to the right (right-skewed) if the right side of the graph (containing the half of the observations with larger values) is much longer than the left side
It is skewed to the left (left-skewed) if the left side of the graph is
much longer than the right side
Outliers:
An important kind of deviation is an outlier
Outliers are observations that lie outside the overall pattern of a distribution
Always look for outliers and try to explain them
Time Plots:
A time plot shows behavior over time
Time is always on the horizontal axis, and the variable being measured is on the vertical axis
Look for an overall pattern (trend) and deviations from this trend. Connecting the data points by lines may emphasize this trend
Look for patterns that repeat at known regular intervals (seasonal variations)