1/45
Flashcards on Descriptive Data Analysis, Campo Techniques, and Data Analysis.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Statistics
The science that studies methods for collecting, organizing, summarizing, and analyzing data, as well as obtaining valid conclusions about the population under study.
Population
A set of elements that meet a certain characteristic.
Parameter
A descriptive property of the population.
Sample
A subset formed by elements of the population.
Statistic
A descriptive property of the sample.
Descriptive Statistics
The set of techniques oriented to the numerical description of a set of elements (sample). The results of the analysis do not intend to go beyond the data set.
Statistical Inference
The set of techniques oriented to obtaining valid conclusions about a population from a sample of it. The results of the analysis go beyond the collected data set.
Variable
A characteristic of the sample or population that is being observed and that varies among the different individuals in the study. It collects all possible values that the characteristic of interest takes.
Qualitative Variable
Expresses a quality. Categories are either nominal (not ordered) or ordinal (ordered).
Quantitative Variable
Expresses a quantity. Can be discrete (countable values) or continuous (uncountable values).
Discretization
The process of transforming quantitative variables into qualitative ones, which results in loss of information.
Absolute Frequency (ni)
Number of times the value xi is repeated.
Relative Frequency (fi)
Proportion of times the value xi is repeated (ni / N).
Absolute Cumulative Frequency (Ni)
Sum of the ni of all values less than or equal to xi.
Relative Cumulative Frequency (Fi)
Sum of the fi of all values less than or equal to xi.
Graphical Representations
Presents information in a reliable and fast manner, but can be misleading if not constructed correctly.
Pie Chart (Diagrama de Sectores)
A circle with a sector for each value, with the angle proportional to its frequency; suitable for nominal qualitative variables.
Bar Chart (Diagrama de Barras)
A rectangle for each value of the variable, with height equal to its frequency; used to compare categories.
Histogram
A rectangle for each interval, with the area equal to the fraction of data within the interval; for continuous or discrete (many values) quantitative variables.
Map
Shows the spatial distribution of a characteristic of interest; any variable can be represented on it.
Arithmetic Mean
The sum of all values in the distribution divided by the total number of data points. Only applicable to quantitative variables.
Mode
The value that appears most frequently in a dataset.
Median
The value that divides the distribution into two equal parts when the values are ordered.
Quantiles
Values that divide the distribution into intervals of equal frequency.
Quartiles
Three values that divide the distribution into four parts of equal frequency.
Deciles
Nine values that divide the distribution into 10 parts of equal frequency.
Percentiles
Ninety-nine values that divide the distribution into 100 parts of equal frequency.
Range
Difference between the maximum and minimum values; sensitive to outliers.
Interquartile Range (RI)
Difference between the third and first quartiles; represents the dispersion of the central 50% of the data.
Variance (S^2)
Represents the dispersion of data with respect to the arithmetic mean.
Standard Deviation (S)
Square root of the variance; expressed in the same units as the data.
Coefficient of Variation (CV)
A dimensionless measure of relative dispersion, allowing comparison of distributions with different units.
Bivariate Descriptive Analysis
Studying two variables together to see if there is any relationship between them.
Scatter Plot
Shows the relationship between two quantitative variables using points on a Cartesian plane.
Marginal Distribution of Y
Expresses how many times each value yj is repeated, regardless of the X value.
Marginal Distribution of X
Expresses how many times each value xi is repeated, regardless of the Y value.
Conditional Distribution
Describes how the values of X are distributed for each value of Y (X|Y = yj) or vice versa (Y|X = xi).
Statistical Independence
Two variables are statistically independent when the joint relative frequency is equal to the product of the marginal relative frequencies.
Covariance (Sxy)
A measure of the linear association between two quantitative variables.
Statistical Relationship between Variables
Studying the degree of dependence between variables (correlation) and determining the function that best expresses the relationship (regression).
Linear Correlation Coefficient (r)
A measure of the degree of linear dependence between two variables, ranging from -1 to 1.
Dependent Variable
The variable whose behavior is to be explained or predicted (Y).
Independent or Explanatory Variable
The variable used to try to explain the behavior of the dependent variable (X).
Regression of Y on X
A function that explains variable Y for each value of X.
Simple Linear Regression
Focusing on linear adjustments, representing straight lines: y* = f(x) = a + bx.
Coefficient of Determination (r^2)
Indicates the percentage of variability of Y explained by the adjusted model; a measure of how well the model fits.