Parameter: a numerical measurement describing some characteristic of a POPULATION
Statistic: a numerical measurement describing some characteristic of a SAMPLE
Quantitative (or numerical) data consists of numbers representing counts or measurements
Categorical (or qualitative / attribute) data consists of names or labels
It is important to include appropriate units of measurements (such as $, ft, m).
Discrete data result when the data values are quantitative and the number of values is finite, or countable (for example, number of tosses of a coin before getting tails).
Continuous (numerical) data result from infinitely many possible quantitative values where the collection of values is not countable (for example, the lengths of distances from 0 to 12 cm).
Levels of measurement are important because they tell us which computations and statistical methods are appropriate for that type of data.
Nominal: characterized by data that consists of names, labels, or categories only. This data cannot be arranged in some order (such as low to high). Examples include a survey with only the responses yes, no, and undecided.
Ordinal: when the data can be arranged in some order, but differences (obtained through subtraction) between data values either cannot be determined or are meaningless. Examples include course grades (A, B, C, D, F).
Interval: when the data can be arranged in order, and differences between data values ARE meaningful. Data at this level do not have a natural 0 starting point at which none of the quantity is present. Examples include temperatures and years.
Ratio: when data can be arranged in order, differences can be found and are meaningful, and there is a natural 0 starting point. Both differences and ratios are meaningful. Examples include heights of students or class times.
To distinguish between interval and ratio levels of measurement, you can consider whether there is a "true zero" value and whether the term "twice" accurately describes the ratio of one value to be double the other value.
Big data: data sets so large and so complex that their analysis is beyond the capabilities of traditional software tools.
Data science: an area of study that involves applications of statistics, computer science, software engineering, and some other relevant fields.
A data value is missing completely at random if the likelihood of its being missing is independent of its value or any of the other values in the data set.
A data value is missing not at random if the missing value is related to the reason that it is missing.
To correct for missing data:
\
\
\