AAUACSC 111
AAUACSCI 111 Introduction to Statistics for Computer Science
1. Introduction to Computer Science & Statistics
1.1 Meaning of Computer Science
Computer Science is the study of:
Computation
Algorithms
Data
Computer systems
Focuses on how data is:
Collected
Processed
Stored
Transformed into useful information using computers
1.2 Meaning of Statistics
Statistics is the science of:
Collecting
Organising
Analyzing
Interpreting
Presenting data
Purpose: To support decision-making
1.3 Relationship Between Computer Science and Statistics
Computer Science and Statistics are closely related because:
Computer systems generate large volumes of data
Statistics provides tools to analyze and interpret this data
Many modern CS fields rely heavily on statistical concepts, including:
Artificial Intelligence (AI)
Machine Learning (ML)
Data Science
Software Engineering
Examples of relationships:
Performance evaluation of algorithms
Software reliability and defect prediction
Data mining and machine learning models
1.4 Importance of Statistics in Computing
Supports decision making under uncertainty
Aids in analyzing experimental and observational data
Enables prediction and trend analysis
Improves system optimization and quality control
1.5 Applications of Statistics in Computer Science
Artificial Intelligence and Machine Learning
Data Science and Big Data Analytics
Software Engineering (testing, reliability analysis)
Network traffic analysis
Cybersecurity and fraud detection
2. Fundamentals of Statistics
2.1 Concept of Statistics
Statistics can be viewed in two ways:
As numerical data: Facts and figures obtained through observation
As a discipline: A body of methods used to collect, analyze, and interpret data
2.2 Definitions of Statistics
Statistics is defined as:
“The science of collecting, organizing, presenting, analyzing, and interpreting data to aid decision making.”
2.3 Types of Statistics
Statistics is broadly divided into two main types: (a) Descriptive Statistics
Deals with methods used to summarize and describe data.
Examples include:
Tables
Charts and graphs
Measures of central tendency (mean, median, mode)
Measures of dispersion (range, variance, standard deviation)
(b) Inferential StatisticsInvolves making conclusions or predictions about a population based on sample data.
Examples include:
Estimation
Hypothesis testing
Regression and correlation
3. Statistical Data
3.1 Concept of Data
Data refers to raw facts, figures, observations, or measurements collected for analysis
3.2 Categories of Data
Data can be categorized into two types: (a) Qualitative Data
Non-numerical data
Describes attributes or characteristics.
Examples:
Gender
Color
Software type
(b) Quantitative DataNumerical data
Can be measured or counted.
Examples:
Age
Number of users
Execution time
3.3 Types of Data
(a) Discrete Data
Countable values
Examples:
Number of students
Number of errors
(b) Continuous Data
Measurable values within a range
Examples:
Time
Temperature
Memory usage
4. Data Collection
4.1 Meaning of Data Collection
Data collection is the process of gathering information for analysis and decision-making.
4.2 Sources of Statistical Data
(a) Primary Data
Collected firsthand by the researcher.
Examples:
Surveys
Interviews
Experiments
Observations
(b) Secondary Data
Collected by others and reused.
Examples:
Journals
Government reports
Databases
Online repositories
4.3 Methods of Statistical Data Collection
Questionnaires
Interviews
Direct observation
Experiments
Automated data logging (sensors, software logs)
5. Presentation of Data
5.1 Meaning of Data Presentation
Data presentation is the process of organizing data in a meaningful way so that it can be easily understood and interpreted.
5.2 Techniques of Data Presentation
(a) Tabular Presentation
Data is arranged in rows and columns
Simple and easy to understand
(b) Graphical PresentationBar charts
Pie charts
Histograms
Line graphs
(c) Diagrammatic PresentationPictograms
Flow diagrams
5.3 Importance of Data Presentation
Makes data easy to understand
Highlights trends and patterns
Aids comparison
Supports effective decision making
6. Measures of Central Tendency
6.1 Meaning of Central Tendency
Measures of central tendency are statistical values that represent the center or typical value of a dataset.
They help summarize large datasets with a single representative figure.
6.2 Types of Measures of Central Tendency
(a) Mean
The mean is the arithmetic average of a set of values.
Formula:
Applications in Computer Science:
Average execution time of algorithms
Average response time of systems
Advantages:
Easy to compute
Uses all observations
Disadvantages:
Affected by extreme values (outliers)
(b) Median
The median is the middle value when data is arranged in ascending or descending order.
Applications:
Used when data contains extreme values
System performance analysis with skewed data
(c) Mode
The mode is the value that occurs most frequently in a dataset.
Applications:
Most frequent error type
Most common user choice in applications
7. Measures of Dispersion
7.1 Meaning of Dispersion
Measures of dispersion show how spread out or scattered data values are around the central value.
7.2 Types of Measures of Dispersion
(a) Range
Simple but does not consider all data values.
(b) VarianceVariance measures the average squared deviation from the mean.
Applications:
Performance stability analysis
System reliability assessment
(c) Standard Deviation
Standard deviation is the square root of variance.
Advantages:
Uses all data values
Widely used in computing and data science
8. Probability and Its Applications
8.1 Meaning of Probability
Probability is the measure of the likelihood that an event will occur.
Probability values range from 0 to 1.
8.2 Basic Probability Concepts
Experiment: An activity with observable outcomes
Sample Space: Set of all possible outcomes
Event: A subset of the sample space
8.3 Applications of Probability in Computer Science
Machine learning models
Fault prediction in software systems
Network reliability analysis
Cybersecurity risk assessment
9. Introduction to Statistical Inference
9.1 Meaning of Statistical Inference
Statistical inference involves concluding about a population based on sample data.
9.2 Population and Sample
Population: Entire set of interest
Sample: Subset of the population
9.3 Sampling Techniques
(a) Random Sampling
Every element has an equal chance of selection.
(b) Systematic SamplingSelection at regular intervals.
(c) Stratified SamplingPopulation divided into groups (strata).
9.4 Importance of Statistical Inference
Reduces cost and time
Enables generalization
Supports decision making in computing research.