AAUACSC 111

AAUACSCI 111 Introduction to Statistics for Computer Science

1. Introduction to Computer Science & Statistics

1.1 Meaning of Computer Science
  • Computer Science is the study of:

    • Computation

    • Algorithms

    • Data

    • Computer systems

  • Focuses on how data is:

    • Collected

    • Processed

    • Stored

    • Transformed into useful information using computers

1.2 Meaning of Statistics
  • Statistics is the science of:

    • Collecting

    • Organising

    • Analyzing

    • Interpreting

    • Presenting data

  • Purpose: To support decision-making

1.3 Relationship Between Computer Science and Statistics
  • Computer Science and Statistics are closely related because:

    • Computer systems generate large volumes of data

    • Statistics provides tools to analyze and interpret this data

    • Many modern CS fields rely heavily on statistical concepts, including:

    • Artificial Intelligence (AI)

    • Machine Learning (ML)

    • Data Science

    • Software Engineering

  • Examples of relationships:

    • Performance evaluation of algorithms

    • Software reliability and defect prediction

    • Data mining and machine learning models

1.4 Importance of Statistics in Computing
  • Supports decision making under uncertainty

  • Aids in analyzing experimental and observational data

  • Enables prediction and trend analysis

  • Improves system optimization and quality control

1.5 Applications of Statistics in Computer Science
  • Artificial Intelligence and Machine Learning

  • Data Science and Big Data Analytics

  • Software Engineering (testing, reliability analysis)

  • Network traffic analysis

  • Cybersecurity and fraud detection

2. Fundamentals of Statistics

2.1 Concept of Statistics
  • Statistics can be viewed in two ways:

    • As numerical data: Facts and figures obtained through observation

    • As a discipline: A body of methods used to collect, analyze, and interpret data

2.2 Definitions of Statistics
  • Statistics is defined as:

“The science of collecting, organizing, presenting, analyzing, and interpreting data to aid decision making.”

2.3 Types of Statistics
  • Statistics is broadly divided into two main types: (a) Descriptive Statistics

    • Deals with methods used to summarize and describe data.

    • Examples include:

    • Tables

    • Charts and graphs

    • Measures of central tendency (mean, median, mode)

    • Measures of dispersion (range, variance, standard deviation)
      (b) Inferential Statistics

    • Involves making conclusions or predictions about a population based on sample data.

    • Examples include:

    • Estimation

    • Hypothesis testing

    • Regression and correlation

3. Statistical Data

3.1 Concept of Data
  • Data refers to raw facts, figures, observations, or measurements collected for analysis

3.2 Categories of Data
  • Data can be categorized into two types: (a) Qualitative Data

    • Non-numerical data

    • Describes attributes or characteristics.

    • Examples:

    • Gender

    • Color

    • Software type
      (b) Quantitative Data

    • Numerical data

    • Can be measured or counted.

    • Examples:

    • Age

    • Number of users

    • Execution time

3.3 Types of Data

(a) Discrete Data

  • Countable values

  • Examples:

    • Number of students

    • Number of errors
      (b) Continuous Data

  • Measurable values within a range

  • Examples:

    • Time

    • Temperature

    • Memory usage

4. Data Collection

4.1 Meaning of Data Collection
  • Data collection is the process of gathering information for analysis and decision-making.

4.2 Sources of Statistical Data

(a) Primary Data

  • Collected firsthand by the researcher.

  • Examples:

    • Surveys

    • Interviews

    • Experiments

    • Observations
      (b) Secondary Data

  • Collected by others and reused.

  • Examples:

    • Journals

    • Government reports

    • Databases

    • Online repositories

4.3 Methods of Statistical Data Collection
  • Questionnaires

  • Interviews

  • Direct observation

  • Experiments

  • Automated data logging (sensors, software logs)

5. Presentation of Data

5.1 Meaning of Data Presentation
  • Data presentation is the process of organizing data in a meaningful way so that it can be easily understood and interpreted.

5.2 Techniques of Data Presentation

(a) Tabular Presentation

  • Data is arranged in rows and columns

  • Simple and easy to understand
    (b) Graphical Presentation

  • Bar charts

  • Pie charts

  • Histograms

  • Line graphs
    (c) Diagrammatic Presentation

  • Pictograms

  • Flow diagrams

5.3 Importance of Data Presentation
  • Makes data easy to understand

  • Highlights trends and patterns

  • Aids comparison

  • Supports effective decision making

6. Measures of Central Tendency

6.1 Meaning of Central Tendency
  • Measures of central tendency are statistical values that represent the center or typical value of a dataset.

  • They help summarize large datasets with a single representative figure.

6.2 Types of Measures of Central Tendency

(a) Mean

  • The mean is the arithmetic average of a set of values.

  • Formula: Mean=Sum of all observationsNumber of observationsMean = \frac{\text{Sum of all observations}}{\text{Number of observations}}

  • Applications in Computer Science:

    • Average execution time of algorithms

    • Average response time of systems

  • Advantages:

    • Easy to compute

    • Uses all observations

  • Disadvantages:

    • Affected by extreme values (outliers)

    (b) Median

  • The median is the middle value when data is arranged in ascending or descending order.

  • Applications:

    • Used when data contains extreme values

    • System performance analysis with skewed data

    (c) Mode

  • The mode is the value that occurs most frequently in a dataset.

  • Applications:

    • Most frequent error type

    • Most common user choice in applications

7. Measures of Dispersion

7.1 Meaning of Dispersion
  • Measures of dispersion show how spread out or scattered data values are around the central value.

7.2 Types of Measures of Dispersion

(a) Range

  • Range=Maximum valueMinimum valueRange = \text{Maximum value} - \text{Minimum value}

  • Simple but does not consider all data values.
    (b) Variance

  • Variance measures the average squared deviation from the mean.

  • Applications:

    • Performance stability analysis

    • System reliability assessment
      (c) Standard Deviation

  • Standard deviation is the square root of variance.

  • Advantages:

    • Uses all data values

    • Widely used in computing and data science

8. Probability and Its Applications

8.1 Meaning of Probability
  • Probability is the measure of the likelihood that an event will occur.

  • Probability values range from 0 to 1.

8.2 Basic Probability Concepts
  • Experiment: An activity with observable outcomes

  • Sample Space: Set of all possible outcomes

  • Event: A subset of the sample space

8.3 Applications of Probability in Computer Science
  • Machine learning models

  • Fault prediction in software systems

  • Network reliability analysis

  • Cybersecurity risk assessment

9. Introduction to Statistical Inference

9.1 Meaning of Statistical Inference
  • Statistical inference involves concluding about a population based on sample data.

9.2 Population and Sample
  • Population: Entire set of interest

  • Sample: Subset of the population

9.3 Sampling Techniques

(a) Random Sampling

  • Every element has an equal chance of selection.
    (b) Systematic Sampling

  • Selection at regular intervals.
    (c) Stratified Sampling

  • Population divided into groups (strata).

9.4 Importance of Statistical Inference
  • Reduces cost and time

  • Enables generalization

  • Supports decision making in computing research.