1/134
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Descriptive Statistics
A branch of statistics that summarises and describes the main features of a dataset using measures such as central tendency, dispersion and skewness.
Central Tendency
A numerical value around which most data points cluster; represented by measures like mean, median and mode.
Variation / Dispersion
The extent to which data values scatter around the central value; described by range, variance and standard deviation.
Skewness
A statistic that measures the asymmetry of a distribution, indicating whether data are stretched to the left (negative) or right (positive).
Population
The complete set of individuals, items or measurements under investigation.
Sample
A subset of the population selected for analysis, ideally mirroring the population’s characteristics.
Population Parameter (μ)
A descriptive measure calculated from an entire population, such as the population mean denoted by the Greek letter mu (μ).
Sample Statistic (x̄)
A descriptive measure calculated from sample data, such as the sample mean denoted by x-bar (x̄).
Ungrouped Data
Raw, unorganised observations presented individually, suitable for small datasets.
Grouped Data
Data organised into class intervals with corresponding frequencies to simplify analysis of large datasets.
Class Interval
A continuous range of values in grouped data, often written as a closed interval [a, b].
Class Width (h)
The difference between the upper and lower boundaries of a class interval.
Mid-Value (mi)
The midpoint of a class interval, calculated as (lower limit + upper limit) ÷ 2.
Arithmetic Mean (AM)
The sum of all data values divided by the number of observations; the most common measure of central tendency.
Population Mean Formula
μ = (Σ Xi) / N, where Σ Xi is the sum of all population values and N is the population size.
Sample Mean Formula
x̄ = (Σ Xi) / n, where Σ Xi is the sum of sample values and n is the sample size.
Direct Method (Mean)
Computation of the mean by summing all observed values (or fi xi for grouped data) and dividing by the total number of observations.
Indirect / Shortcut Method (Mean)
Mean calculation using an assumed mean A and deviations di: x̄ = A + (Σ fi di) / n.
Step Deviation Method
A shortcut mean formula for grouped data: x̄ = A + [(Σ fi di) / n] × h, where h is class width.
Weighted Arithmetic Mean
Mean that multiplies each value by a weight reflecting its importance: x̄w = Σ wi xi / Σ wi.
Median
The middle value that divides an ordered dataset into two equal halves; the 50th percentile (P50) and second quartile (Q2).
Median Class
In grouped data, the class interval containing the (n / 2)th observation after cumulative frequencies are calculated.
Median Formula (Grouped Data)
Median = l + [(n / 2 − cf) / f] × h, where l is lower class limit, cf cumulative frequency before the class, f class frequency and h class width.
Mode
The value or class interval with the highest frequency in a dataset.
Modal Class
For grouped data, the class interval with the greatest frequency.
Mode Formula (Grouped Data)
Mode = l + [(fm − fm-1) / (2 fm − fm-1 − fm+1)] × h, where fm is modal class frequency.
Unimodal Distribution
A frequency distribution with one mode (single peak).
Bimodal Distribution
A distribution possessing two values of equal highest frequency, resulting in two peaks.
Partition Values
Statistical measures (quartiles, deciles, percentiles) that divide ordered data into equal-sized parts.
Quartiles (Q1, Q2, Q3)
Values that split an ordered dataset into four equal parts, marking 25 %, 50 % and 75 % positions.
Quartile Formula (Grouped Data)
Qi = l + [(i × n / 4 − cf) / f] × h, where i = 1, 2, 3.
Deciles (D1–D9)
Nine values dividing ordered data into ten equal parts, each representing 10 % of the observations.
Decile Formula (Grouped Data)
Di = l + [(i × n / 10 − cf) / f] × h, where i = 1 … 9.
Percentiles (P1–P99)
Ninety-nine values dividing ordered data into one hundred equal parts; Pk marks the kth percent.
Percentile Formula (Grouped Data)
Pi = l + [(i × n / 100 − cf) / f] × h, where i = 1 … 99.
Cumulative Frequency
The running total of frequencies up to and including a given class boundary.
Ogive
A cumulative frequency curve used to estimate median, quartiles, deciles and percentiles graphically.
Symmetrical Distribution
A dataset where mean = median = mode because values are evenly distributed around the centre.
Positively Skewed Distribution
A distribution with a long right tail where Mean > Median > Mode.
Negatively Skewed Distribution
A distribution with a long left tail where Mean < Median < Mode.
Outlier
An observation markedly distant from other values in the dataset, potentially distorting the mean.
Ideal Measure of Central Tendency
A measure that is rigidly defined, easy to compute, based on all observations, algebraically tractable, minimally variable across samples and resistant to extreme values.
Statistics
Branch of mathematics concerned with collecting, analysing, interpreting, and presenting data.
Descriptive Statistics
Methods that summarise and organise data using measures such as mean, median, mode, and graphs.
Inferential Statistics
Techniques that draw conclusions about a population based on data from a sample, e.g., hypothesis testing.
Functions of Statistics
Sequential activities of data collection, organisation, analysis, and interpretation to support decision-making.
Data Collection
Process of gathering relevant information to meet a study’s objectives.
Direct Data Collection
Primary data gathered firsthand via surveys, interviews, observation, or experiments.
Indirect Data Collection
Secondary data obtained from existing sources such as reports, databases, or historical records.
Tabulation
Systematic arrangement of data in rows and columns for easy comparison and analysis.
Class Interval
Numerical range that groups data values, defined by upper and lower limits.
Frequency (Absolute Frequency)
Number of times a particular observation occurs in a data set; denoted by f.
Cumulative Frequency
Running total of frequencies for all classes up to a specified point in an ordered data set.
Frequency Distribution
Table that shows the number of observations falling into each class interval.
Contingency Table
Cross-tabulation displaying the frequency distribution of two or more categorical variables.
Exploratory Data Analysis (EDA)
Initial investigation of data to uncover patterns, spot anomalies, and test assumptions through visualisation.
Measures of Central Tendency
Single values (mean, median, mode) that describe the centre of a data set.
Scatter Plot
Graph plotting paired numerical data to reveal relationships or correlations between two variables.
Bar Chart
Graphical display of categorical data where bar heights represent frequencies or proportions.
Histogram
Two-dimensional graph of continuous data showing frequencies within adjoining class intervals.
Pie Chart
Circular graph divided into sectors representing proportional parts of a whole.
Ogive (Cumulative Frequency Curve)
Graph plotting cumulative frequency against upper class limits to show data accumulation.
Box Plot
Box-and-whisker diagram summarising median, quartiles, and outliers of numerical data.
Sampling
Technique of selecting a subset of a population to estimate characteristics of the whole.
Population
Entire set of individuals or items about which information is sought.
Sample
Subset of a population selected for study, ideally reflecting population characteristics.
Sampling Frame
Complete list or set of criteria that defines all elements eligible for sampling.
Probability Sampling
Sampling approach where every population member has a known, non-zero chance of selection.
Non-Probability Sampling
Sampling where some population members may have unknown or zero chance of selection.
Simple Random Sampling
Method giving each population element an equal chance of selection, minimising bias.
Systematic Sampling
Selecting every kᵗʰ element from an ordered population after a random start.
Stratified Sampling
Dividing population into homogeneous strata and randomly sampling each stratum proportionally.
Cluster Sampling
Dividing population into clusters, randomly selecting clusters, then sampling all or some units within them.
Sample Size
Number of observational units included in a sample.
Sampling Bias
Systematic error caused by non-representative sampling, leading to incorrect conclusions.
Numerical Data
Data expressed as numbers, suitable for arithmetic operations; includes discrete and continuous types.
Categorical Data
Data consisting of labels or categories, analysed by counting frequency of occurrence.
Qualitative Data
Non-numeric data describing qualities, attributes, or opinions.
Quantitative Data
Numeric data representing measured quantities, enabling statistical calculations.
Discrete Data
Numerical data that take only specific, separate values (e.g., number of students).
Continuous Data
Numerical data that can take any value within a range (e.g., height, weight).
Primary Data
Information collected firsthand specifically for the current research purpose.
Secondary Data
Information previously collected for another purpose and reused in new research.
Structured Data
Organised data in predefined formats such as tables or spreadsheets.
Unstructured Data
Data lacking a predefined format, e.g., text, images, videos.
Static Data
Data that remain unchanged over time, typically historical or reference data.
Dynamic Data
Data that change frequently and may be updated in real time.
Sensitive Data
Information requiring special protection due to confidentiality, e.g., medical or financial records.
Non-Sensitive Data
Information that can be shared freely without compromising privacy or security.
Theoretical Distribution
A probability-based mathematical model that predicts how values are expected to behave under ideal conditions (e.g., normal, binomial, Poisson, exponential).
Empirical Distribution
A distribution derived from observed data rather than theoretical probability rules.
Random Experiment
A process of measurement or observation with uncertain outcome but well-defined possible results.
Outcome
A single possible result of a random experiment or trial.
Sample Space (S)
The set of all possible outcomes of a random experiment.
Event
Any subset of outcomes from the sample space; a ‘simple event’ cannot be decomposed further.
Mutually Exclusive Events
Events that cannot occur simultaneously in a single trial.
Collectively Exhaustive Events
A set of events that includes every possible outcome of the experiment.
Independent Events
Events whose occurrence does not affect each other’s probabilities.
Dependent Events
Events where the occurrence of one influences the probability of the other.
Compound Event
An event formed by the simultaneous occurrence of two or more simple events.