AAUACSC 111

AAUACSCI 111 Introduction to Statistics for Computer Science

1. Introduction to Computer Science & Statistics

1.1 Meaning of Computer Science

Computer Science is the study of:
- Computation
- Algorithms
- Data
- Computer systems
Focuses on how data is:
- Collected
- Processed
- Stored
- Transformed into useful information using computers

1.2 Meaning of Statistics

Statistics is the science of:
- Collecting
- Organising
- Analyzing
- Interpreting
- Presenting data
Purpose: To support decision-making

1.3 Relationship Between Computer Science and Statistics

Computer Science and Statistics are closely related because:
- Computer systems generate large volumes of data
- Statistics provides tools to analyze and interpret this data
- Many modern CS fields rely heavily on statistical concepts, including:
- Artificial Intelligence (AI)
- Machine Learning (ML)
- Data Science
- Software Engineering
Examples of relationships:
- Performance evaluation of algorithms
- Software reliability and defect prediction
- Data mining and machine learning models

1.4 Importance of Statistics in Computing

Supports decision making under uncertainty
Aids in analyzing experimental and observational data
Enables prediction and trend analysis
Improves system optimization and quality control

1.5 Applications of Statistics in Computer Science

Artificial Intelligence and Machine Learning
Data Science and Big Data Analytics
Software Engineering (testing, reliability analysis)
Network traffic analysis
Cybersecurity and fraud detection

2. Fundamentals of Statistics

2.1 Concept of Statistics

Statistics can be viewed in two ways:
- As numerical data: Facts and figures obtained through observation
- As a discipline: A body of methods used to collect, analyze, and interpret data

2.2 Definitions of Statistics

Statistics is defined as:

“The science of collecting, organizing, presenting, analyzing, and interpreting data to aid decision making.”

2.3 Types of Statistics

Statistics is broadly divided into two main types: (a) Descriptive Statistics
- Deals with methods used to summarize and describe data.
- Examples include:
- Tables
- Charts and graphs
- Measures of central tendency (mean, median, mode)
- Measures of dispersion (range, variance, standard deviation)
  (b) Inferential Statistics
- Involves making conclusions or predictions about a population based on sample data.
- Examples include:
- Estimation
- Hypothesis testing
- Regression and correlation

3. Statistical Data

3.1 Concept of Data

Data refers to raw facts, figures, observations, or measurements collected for analysis

3.2 Categories of Data

Data can be categorized into two types: (a) Qualitative Data
- Non-numerical data
- Describes attributes or characteristics.
- Examples:
- Gender
- Color
- Software type
  (b) Quantitative Data
- Numerical data
- Can be measured or counted.
- Examples:
- Age
- Number of users
- Execution time

3.3 Types of Data

(a) Discrete Data

Countable values
Examples:
- Number of students
- Number of errors
  (b) Continuous Data
Measurable values within a range
Examples:
- Time
- Temperature
- Memory usage

4. Data Collection

4.1 Meaning of Data Collection

Data collection is the process of gathering information for analysis and decision-making.

4.2 Sources of Statistical Data

(a) Primary Data

Collected firsthand by the researcher.
Examples:
- Surveys
- Interviews
- Experiments
- Observations
  (b) Secondary Data
Collected by others and reused.
Examples:
- Journals
- Government reports
- Databases
- Online repositories

4.3 Methods of Statistical Data Collection

Questionnaires
Interviews
Direct observation
Experiments
Automated data logging (sensors, software logs)

5. Presentation of Data

5.1 Meaning of Data Presentation

Data presentation is the process of organizing data in a meaningful way so that it can be easily understood and interpreted.

5.2 Techniques of Data Presentation

(a) Tabular Presentation

Data is arranged in rows and columns
Simple and easy to understand
(b) Graphical Presentation
Bar charts
Pie charts
Histograms
Line graphs
(c) Diagrammatic Presentation
Pictograms
Flow diagrams

5.3 Importance of Data Presentation

Makes data easy to understand
Highlights trends and patterns
Aids comparison
Supports effective decision making

6. Measures of Central Tendency

6.1 Meaning of Central Tendency

Measures of central tendency are statistical values that represent the center or typical value of a dataset.
They help summarize large datasets with a single representative figure.

6.2 Types of Measures of Central Tendency

(a) Mean

The mean is the arithmetic average of a set of values.
Formula: $Mean = \frac{\text{Sum of all observations}}{\text{Number of observations}}$
Applications in Computer Science:
- Average execution time of algorithms
- Average response time of systems
Advantages:
- Easy to compute
- Uses all observations
Disadvantages:
- Affected by extreme values (outliers)
(b) Median
The median is the middle value when data is arranged in ascending or descending order.
Applications:
- Used when data contains extreme values
- System performance analysis with skewed data
(c) Mode
The mode is the value that occurs most frequently in a dataset.
Applications:
- Most frequent error type
- Most common user choice in applications

7. Measures of Dispersion

7.1 Meaning of Dispersion

Measures of dispersion show how spread out or scattered data values are around the central value.

7.2 Types of Measures of Dispersion

(a) Range

$Range = \text{Maximum value} - \text{Minimum value}$
Simple but does not consider all data values.
(b) Variance
Variance measures the average squared deviation from the mean.
Applications:
- Performance stability analysis
- System reliability assessment
  (c) Standard Deviation
Standard deviation is the square root of variance.
Advantages:
- Uses all data values
- Widely used in computing and data science

8. Probability and Its Applications

8.1 Meaning of Probability

Probability is the measure of the likelihood that an event will occur.
Probability values range from 0 to 1.

8.2 Basic Probability Concepts

Experiment: An activity with observable outcomes
Sample Space: Set of all possible outcomes
Event: A subset of the sample space

8.3 Applications of Probability in Computer Science

Machine learning models
Fault prediction in software systems
Network reliability analysis
Cybersecurity risk assessment

9. Introduction to Statistical Inference

9.1 Meaning of Statistical Inference

Statistical inference involves concluding about a population based on sample data.

9.2 Population and Sample

Population: Entire set of interest
Sample: Subset of the population

9.3 Sampling Techniques

(a) Random Sampling

Every element has an equal chance of selection.
(b) Systematic Sampling
Selection at regular intervals.
(c) Stratified Sampling
Population divided into groups (strata).

9.4 Importance of Statistical Inference

Reduces cost and time
Enables generalization
Supports decision making in computing research.