1/53
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Data
A column of numbers.
Population
The totality of elements in a well-defined group that is to be studied.
Sample
Any part of the population that is:
Small enough to measure, and in a good sample is
Representative of the population.
Individual
One element of a population.
Simple Random Sampling
A method of sampling that gives every individual in the population the same chance of being chosen.
With Replacement
To select and measure an individual, then return it back into the population, so there is some small chance that it could be selected again.
Without Replacement
To select and measure an individual, then do not return it to the population, so there is no chance that it could be selected again.
Data Value
A measurement from an individual, such as height, volts, beak length.
Qualitative Data
Consists of qualities or categories.
Discrete Data
Consists of numerical information that has only a few possible values in the population (less than 15 to 20 values).
Continuous Data
Consists of numerical information that has many, many possible values in the population.
Constant
When the data value is fixed to only one possible value by there being only one value in the population.
Variable
When the data values can vary by there being more than one value in the population.
Random variable
When the data values can vary, and the value varies randomly.
Data Set
A specific way to organize data by putting the information from:
Variables into columns, and
Individuals into rows.
Experimental Study
Where the individuals are in a highly controlled environment before measuring, so that physical controls can be used to allow only the variable of interest to have an effect.
Observational Study
Where the individuals are in an uncontrolled environment before measuring, so that statistical controls are needed to cancel out the effects of the non-interesting variables.
Probabilistic Data
Where the value of the next data value is not known (this is the short run), but the value of many, many data values is very well known (this is the long run)
Purpose of Statistics
To extract information from columns of sample data, to help make better decisions.
Descriptive Statistics
Statistical methods used to summarize and describe columns of data for the purpose of extracting information about the values of the data.
Summary Numbers
Numerical values used to summarize one characteristic from a column of data in order to communicate the largest amount of information as simply as possible.
Inferential Statistics
Statistical methods that combine sample with probability to get information about a population.
Distribution of Data
The shape, location, and spread of a column of data values.
Mathematical Distance
The number of mathematical units between two numbers. Found by taking the difference between the two numbers.
Statistical Distance
The number of spread units between two numbers. Found by dividing the mathematical distance by the spread of the data.
Concept of Close and Far
Uses the probability distribution of the population, expressed in terms of statistical distance, to determine which values are close to the population mean and which values are far from the population mean.
Relationship
Used to determine if, and how, the values of one variable relate to the values of another variable (i.e. the relationship).
Lurking Variable-
A known, or unknown, variable whose values affect the values of the variables being studied.
Analytical Thinking
To break a big problem down into smaller parts, solve each part individually, then put the parts back together to get the answer to the big problem.
Synthetic Thinking
To look at the whole problem at once, see what aspect of the problem is the most important, then use this aspect to solve the problem.
Process of Abstraction
To look at a problem, extracting only the information relevant to solving the problem, and ignoring all other, unneccessary, information.
Distribution of Data
To describe a column of data by giving the shape, location, and spread of all the data values in the column.
Shape
Refers to the pattern the data values make when graphed, usually over a real number line. Shape can be expressed with a bar chart for any data, or with an equation for pretty data.
Location
Gives the middle of the data values, again usually on a real number line. Location is often considered the most representative data value of the column.
Spread
Gives the width of the data values over a real number line in how far away the minimum data value is from the maximum data value, or by measuring how far the data values are from the middle on average.
Summary Number
A single number summarizing information about one characteristic from a column of data.
Parameter
A number summarizing information for a characteristic from a column containing population data.
Its value does not change when repeating the statistical process because the column of data values does not change.
Usually denoted with Greek letters (μ, σ, or ρ).
Statistic
A number summarizing information for a characteristic from a column containing sample data.
Its value changes when repeating the statistical process because the column of data values changes every time a new random sample is chosen.
Usually denoted with Roman letters (¯x, s, or r).
Size
The number of data values in a column of data.
Denoted: Denoted: N for population, and n for sample.
Degrees of Freedom
The number of units of information contained in a sample statistic.
Efficient Statistics
Summary numbers that extract the most information about a characteristic out of a column of data.
Resistant Statistics
Summary numbers that extract less, but more robust, information about a characteristic out of a column of data.
Used for discrete or continous data.
Strength is that they are weakly affected by extreme values (resistant).
Weakness is that they contain less information than efficient statistics.
Frequency Table
A table summarizing the shape information in a column of data by listing all possible data values, and recording how often each value occurs in the column of data.
Bar Chart
A graphical summary of a frequency table for qualitative data giving shape information by showing a non-touching bar for each category, with the height of the bar representing how many data values are in the category.
Histogram
A graphical summary of a frequency table for discrete data giving shape information by showing a touching bar for each category, with the height of the bar representing how many data values are in the category.
Binning
To separate continuous data into bins (or groups) to reduce the number of values (or categories) for use in a frequency table.
Stem-and-Leaf Plot
A graphical summary of continuous data giving shape information by displaying each data value as a stem for the category, and a leaf for the bar.
Boxplot
A graphical summary of continuous data giving shape information by displaying the middle 50% of the data values as a box, the median as a line inside the box, and the upper 25% and lower 25% of the data values as tails on either side of the box.
Overall Shape of Data
Overall shape of data will be simply defined with two characteristics:
Modality
Symmetry
Exceptions from the Overall Shape
There are three major exceptions to the overall shape of the data:
Extreme values
Gaps or peaks
Patterns or grouping.
Efficient statistics
Statistics that are designed to extract the most information about a characteristic from a column of data values.
Mean