Page 2
Assessing the Validity of DataDATA VALIDATION is important because succeeding discussions relevant to synthesized information including formulation o conclusions and recommendations heavily depend on it.
Page 3
Data Analysis Systematic process of taking out information from raw data to assist in the interpretation and discussion of information. Performed in a way that data would be organized for effective derivation of explanations of the information being presented.
Page 4
Data AnalysisGenerally, the analysis of data entails the following: Comparing data with existing information as derived from previous studies. Testing the consistency of data via repeated measurements to establish accuracy and precision. Assessing the robustness (reproducibility) of the methodologies adopted for the data collection. Determining the bias or error of methods used. Evaluating the choice of method (selectivity and sensitivity) on the bias of potential contribution of the results or data generated. Examining the limitations and the various possibilities that the extracted information could offer. Correlating results representing the variables tested.
Page 5
Data AnalysisThe validity of data may be gauged by the extent of the following: Scope and coverage of the data-collection process. Similarities and inconsistencies of data with results generated from previous studies. Environmental and circumstantial concerns. Compliance to ethical standards. Relevance and originality of data. Usability and accessibility of data.
Page 6
STATISTICAL TOOLS VALUABLE TO DATA EXAMINATIONAny data will not have significance unless the uncertainly associated with each measurement from which the data were obtained is established. (Miller, 1988) Statistical methods enable the verification of the uncertainly accompanying each measurement made during data collection.
Page 7
STATISTICAL TOOLS VALUABLE TO DATA EXAMINATIONReplicates - the number of individual samples taken for analysis with the same size and treated the same manners Page 9
MedianIs the middle value in a group of values when all the data are arranged in either decreasing or increasing manner. Usually reported instead of the mean in case the data set contains an outlier.
Page 10
ModeIt is the number that occurs most often in a data set. Can be used for non-numerical values, such as colors. EXAMPLE: Data set: 2, 9, 3, 12, 2, 4, 2, 5 Mode = most frequent value The observation 2 is repeated thrice Mode = 2
Page 11
STATISTICAL TOOLS VALUABLE TO DATA EXAMINATIONAlso need to detect and qualify the errors that went with the conduct of measurements, which establish precision and accuracy. Precision - characterized by standard deviation, variance, and coefficient of variation (also called the percent relative to standard deviation). - described the closeness of analytical data with one another.
Page 12
Standard DeviationIt measures the dispersion or spread of data points around the mean. It indicates how closely the values are clustered together. A smaller standard deviation indicates that the data points are close to the mean and to each other.
Page 13
Standard Deviation
Page 14
Variance is the square of the standard deviation and therefore also measures the spread of the data.
Page 16
TRUE VALUE Can come from measurements involving a CRM as its composition has been certified and is indicated in a certificate of analysis. To quantify accuracy, the relative error is computed as follows: Relative error =âexperimental value-true value â/ true value x 100%Applied to help researchers with the decision making should a suspected outlier is achieved. An outlier is a value that appears to be excessively different from the rest of the data set.
Page 17
Dixonâs Q TestIs designed to identify outliers in a data set. These tests provide a statistical method to determine whether a suspected outlier is significantly different from the other data points and should be rejected.
Page 18
Dixonâs Q Test
Page 19
T-test or Independent sample t-testused to compare the means of two independent groups to determine if there is a statistically significant difference between them.
Page 20
Correlation Analysis vs. Regression AnalysisCorrelation analysis measures the strength and direction of the linear relationship between two variables. Regression analysis can also be used to assess the relationship and predict the dependent variable based on the independent variable.
Page 21
Regression AnalysisIs can be used to model and analyze cause-and-effect relationships between variables. It allows researchers to examine how changes in the independent variable(s) affect the dependent variable(s).
Page 22
Statistical Methods
Page 23
Statistical MethodsHYPOTHESIS - formulated to warrant a deeper investigation on a particular subject matter. Data are gathered to find pieces of evidence that will support the hypothesis
Page 24
Testing starts with the assumption that two data sets are the same, as stated as null hypothesis.Testing includes the following: Comparing the mean of the data set to a known or true value. Comparing the two means via a t-test or paired t-test for the purpose of Determining whether the difference in the two means is due to the presence of random errors or not. Determining whether the two analytical methods give the same results or not Determining whether tow researchers employing the same data collection strategies give the same mean or not Comparing the respective standard deviations of two populations means via an F- test. Analysis of variance (ANOVA)
Page 25
ANOVAThe ANOVA can be applied to test the following: There is no significant difference in the results of the water samples when two different analytical methods (e.g., Atomic Absorption Spectroscopy or AAS and Inductively Coupled Plasma or ICP) are employed. There is no significant difference in the results of the water samples produced by three different researchers using the same analytical techniques.
Page 26
An ANOVA table
Page 27
Presentation of ResultsDATA VISUALIZATION TECHNIQUE - the technique employs the use of graphics in checking the relationships that could possibly be presented between and within the variables tested. INFERENTIAL STATISTICS -playing as statistics, graphical formats are prepared for the purpose of gaining deeper understanding of the casual relationships and associations within variables in an illustrative manner
Page 28
GRAPHS -strongly support or negate the claims you make in discussion. Data display technique may illustrate the following specific relationships: Dependence of some factors on time (temporal deviation) or rate of occurrence of a phenomenon. Ranking among variables Distribution or homogeneity Extent of deviation from known standard Correlations Spatial relationship Frequency of occurrence
Page 29
GraphsYour choice of graphics depends on the nature of the relationship or association you would like to communicate. Use computer programs Microsoft Word Microsoft Excel Origin
Page 30
Line GraphThis illustrates the changes on a variable as a function of time or any relationships that would indicate trends or patterns.
Page 31
Bar GraphThis consist rectangular blocks, the height of which is reflective of the frequency that it represents.
Page 32
HistogramThis is a variation of a bar graph in which a rectangular blocks are written next to one another without spaces in-between and the rectangular blocks are drawn horizontally.
Page 33
Pie ChartThis resembles a round-shaped cake, the slice of each corresponds to the fraction or the distribution of the data.
Page 34
Scattergram or ScatterplotThis visually demonstrates the changes in one variable as a function of the change in the other variable. Data points are represented usually with dots and that the dots are not connected with one another.
Page 35
Aside from graph, tables are also valuable tool to represent data. Data are written in rows and columns forming a tabular format. Tables can be classified as: Univariate - table contains only one kind of information pertinent to only variable.2. Bivariate - table gives information about the two variables.3. Multivariate - table contains information about more than two variables.