Definition: Curve fitting is a technique used when data are provided for discrete values along a continuum.
Primary Objectives:
* Estimation: To estimate points between existing discrete data values.
* Intermediate Estimates: To fit curves to data specifically to obtain these intermediate values.
* Simplification: To simplify a mathematically complicated function using a simpler function. This is achieved by computing values of the complex function at various discrete points along a range, and then computing a simpler function that fits those discrete values.
General Approaches to Curve Fitting
Least-Squares Regression:
* Used when data contain significant error or when working with scatter data.
* The goal is to identify a single curve that represents the general trend or pattern of the data rather than hitting every point exactly.
Interpolation:
* Used when data are considered very precise.
* The goal is to find a curve that passes directly through every single data point in the set.
Descriptive Statistics and Data Summary
Definition: Descriptive statistics are summary statistics that quantitatively describe or summarize features from a collection of information.
Three Main Types of Descriptive Statistics:
1. Central Tendency: Information concerning the averages of the values.
2. Variability or Dispersion: Information concerning how spread out the values are.
3. Distribution: Information concerning the frequency of each specific value.
Measures of Central Tendency
Arithmetic Mean (μ, yˉ):
* Calculated as the sum of the individual data points (yi) divided by the number of points (n).
* Formula: yˉ=n∑yi
Median:
* The midpoint of a group of data (the 50th percentile).
* Calculation Process:
1. Arrange the data in ascending order.
2. If n is odd, the median is the middle value.
3. If n is even, the median is the arithmetic mean of the two middle values.
Mode:
* The value that occurs most frequently within the data set.
Measures of Variability (Spread)
Range: The difference between the largest value and the smallest value in the data set.
Standard Deviation (sy):
* Represents the average amount of variability in the data; it indicates, on average, how far each data point lies from the mean.
* A larger standard deviation indicates the data set is more variable (spread out widely around the mean).
* A smaller standard deviation indicates data points are grouped tightly around the mean.
* Formula: sy=n−1St
* Variable definition (St): The total sum of the squares of the residuals between the data points and the mean.
* St=∑(yi−yˉ)2
Variance: The square of the standard deviation (sy2).
* General Formula: sy2=n−1∑(yi−yˉ)2
* Computational Formula (does not require predetermining yˉ): sy2=n−1∑yi2−n(∑yi)2
Degrees of Freedom:
* The quantity n−1 is called the degrees of freedom.
* St and sy are based on n−1 degrees of freedom.
* Justification for n−1:
* If St=0, then (y1−yˉ)2+(y2−yˉ)2+⋯+(yn−yˉ)2=0. If yˉ and n−1 of the yi values are known, the final value of y is predetermined. Thus, only n−1 values are freely determined.
* The spread of a single data point (n=1) does not exist; applying the formula for n=1 yields a result of infinity, which is meaningless.
Coefficient of Variation (c.v.):
* The ratio of the standard deviation to the mean, providing a normalized measure of spread.
* Formula (expressed as percentage): c.v.=yˉsy×100%
Detailed Simple Statistics Example
Data Summary (n = 24 entries):
* Sum of values (∑yi): 158.400
* Sum of squared residuals (∑(yi−yˉ)2): 0.21700
* Sum of squares (∑yi2): 1045.657
Definition: Data distribution describes the shape with which the data are spread around the mean.
Histogram: A graphical representation constructed by sorting measurements into specific intervals or "bins."
Normal Distribution:
* Also known as Gaussian distributions or bell curves.
* Characteristics:
* Data is symmetrically distributed with no skew.
* Follows a bell shape when plotted.
* Most values cluster around a central region.
* Values taper off as they move further away from the center.
* The Empirical Rule: Known as the 68-95-99.7 rule, describing the percentage of data falling within 1, 2, and 3 standard deviations of the mean.