Data Structures Notes

CSF102: Data Structures Session: 2025

B. Tech 2nd Semester
Course Coordinator: Dr. Madhu Sharma
Course Instructor: Mohd Khalid

Unit 1: Introduction to Algorithms & Data Structure

Introduction:
- Data types, Abstraction, Abstract Data Type (ADT), Concept of data structure, Types of data structures, Operations on Data Structures
- Introduction to Algorithms, Writing Pseudocodes, Algorithm analysis, Complexity of algorithms and Time space trade-off
- Searching: Linear and Binary Search Techniques and their complexity analysis.

Data Structure

A logical and mathematical model for storing and organizing data in a particular way to facilitate algorithm design and implementation.

Basic Terminologies

Data: An elementary value or a collection of values (e.g., Employee's name and ID).
Data Items: A single unit of value.
Record: A collection of various data items.
File: A collection of records of one type.
Elementary Items: Data items that cannot be further divided (e.g., the ID of an Employee).
Information: Data with attributes, i.e., meaningful data.

Introduction

Data structure affects the design of both structural & functional aspects of a program.
Program = Algorithm + Data Structure
Algorithm: A step-by-step procedure to solve a particular function.

Classification of Data Structures

Data structures are normally divided into two broad categories:
- Primitive Data Structure
- Non-Primitive Data Structure

Primitive Data Structures

Basic structures directly operated upon by machine instructions.
Examples:
- INTEGER
- CHARACTER
- FLOAT
- POINTERS
Can hold a single value in a specific location.

Non-Primitive Data Structures

ARRAYS
- Single Dimension
- Two Dimension
- Multi Dimension
LISTS
- Linear
  - Stack
  - Queue
  - Linked List
- Non-Linear
  - Trees
  - Graphs

Operations on Primitive Data Structures

Create
Selection
Updating
Destroy or Delete

Non-Primitive Data Structure characteristics

Can hold multiple values either contiguously or randomly.
Defined by the programmer.

Linear Data Structures

Have homogeneous elements.
Elements are in a sequence, forming a linear series.
Easy to implement because computer memory is organized linearly.
Examples:
- Stack
- Queue
- Linked Lists

Non-Linear Data Structures

Data items are connected to several other data items.
Exhibit hierarchical or parent-child relationships.
Data elements are not arranged sequentially.
Examples:
- Trees
- Graphs

Operations on Non-Primitive Data Structures

Traversal
Insertion
Selection
Searching
Sorting
Merging
Destroy or Delete

Primitive vs. Non-primitive Data Structures

Primitive:
- Stores data of only one type.
- Examples: integer, character, float.
- Cannot be NULL.
- Size depends on the data type.
- Starts with a lowercase character.
- Cannot be used to call methods directly.
Non-primitive:
- Stores data of more than one type.
- Examples: Array, Linked list, stack.
- Can consist of a NULL value.
- Size is not fixed.
- Starts with an uppercase character.
  *cannot be used to call methods.

Linear vs. Non-linear Data Structures

Linear:
- Data elements are arranged linearly; each element is attached to its previous and next adjacent.
- Single level is involved.
- Easier implementation.
- Elements can be traversed in a single run.
- Memory may not be utilized efficiently.
- Examples: array, stack, queue, linked list.
- Applications in application software development.
- Useful for simple data storage and manipulation.
- Good performance for adding/removing at ends, slower for searching/removing in the middle.
Non-linear:
- Data elements are attached hierarchically.
- Multiple levels are involved.
- Complex implementation.
- Elements cannot be traversed in a single run.
- Memory is utilized efficiently.
- Examples: trees and graphs.
- Applications in Artificial Intelligence and image processing.
- Useful for representing complex relationships and data hierarchies.
- Performance varies, can be optimized for specific operations.

Operations on Data Structures (Detailed)

Traversing: Visiting each element stored in the data structure.
Searching: Finding a particular element; successful if the element is found.
Insertion: Adding an element to the data structure.
Deletion: Removing an element from the data structure.
- In stack, it's called "Pop."
- In queue, it's called "Dequeue."
Merging: Combining lists of two sorted data items to form a single sorted list.

Abstract Data Type (ADT)

A mathematical model for data types defined by its behavior (semantics) from a user perspective.
Defined in terms of:
- Possible values
- Possible operations on data of this type
- Behavior of these operations
Contrasts with data structures, which are concrete representations of data from an implementer's view.
Only specifies what operations are to be performed, not how they will be implemented.
Gives an implementation-independent view, providing only essentials and hiding the details (abstraction).
Can be defined as a set of data elements along with the operations on the data.

ADT Implementation

Contains a storage structure to store data elements and algorithms for fundamental operations.
Examples: array, linked list, queue, stack.
Separates a data structure from its implementation information.

List-ADT

Contains elements of the same type arranged sequentially.
Operations:
- get(): Return an element at a given position.
- insert(): Insert an element at any position.
- remove(): Remove the first occurrence of an element.
- removeAt(): Remove the element at a specified location.
- replace(): Replace an element at any position.
- size(): Return the number of elements.
- isEmpty(): Return true if empty, otherwise false.
- isFull(): Return true if full, otherwise false.

ADT Model

Consists of public and private functions.
Involves:
- Encapsulation: Wrapping all data in a single unit (ADT).
- Abstraction: Showing operations that can be performed and the data structures used.

Real-World Example (Smartphone)

Data (Specifications):
- 4 GB RAM
- Snapdragon 2.2ghz processor
- 5 inch LCD screen
- Dual camera
- Android 8.0
Operations:
- call(): Make a call.
- text(): Send a text message.
- photo(): Click a photo.
- video(): Make a video.

Applications of Data Structures

Organization of data in computer's memory.
Representing information in databases.
Implementation of algorithms for searching data (e.g., search engine).
Implementation of algorithms to manipulate data (e.g., word processors).
Implementation of algorithms to analyze data (e.g., data miners).
Support for algorithms to generate data (e.g., a random number generator).
Support for algorithms to compress and decompress data (e.g., a zip utility).
Implementation of algorithms to encrypt and decrypt data (e.g., a security system).

Algorithm

A process or set of rules required to perform calculations or problem-solving operations, especially by a computer.
Formal definition: A finite set of instructions carried out in a specific order to perform a specific task.
Not a complete program but a solution (logic) of a problem, represented as a flowchart or pseudocode.
An algorithm is a finite set of Instructions for solving a particular problem in a finite amount of time using a finite amount of data.

Characteristics Required of an Algorithm

Each instruction should be precise and unambiguous.
Each instruction should be performable in a finite time.
One or more instructions should not be repeated infinitely (ensuring termination).
The desired results must be obtained after the algorithm terminates.
Algorithm can be represented as programs, flowcharts and pseudo-codes.

Characteristics of an Algorithm

Input: Zero or more input values.
Output: One or more output values.
Unambiguity: Instructions should be clear and simple.
Finiteness: Should contain a limited (countable) number of instructions.
Effectiveness: Each instruction affects the overall process.
Language independence: Can be implemented in any language with the same output.

Dataflow of an Algorithm

Problem: A real-world problem or instance.
Algorithm: Step-by-step procedure designed for the problem.
Input: Required and desired inputs provided to the algorithm.
Processing unit: Processes the input to produce the desired output.
Output: The outcome or result of the program.

Why Algorithms

Scalability: Helps to scale down a large real-world problem into smaller steps for easy analysis.
Performance: Indicates if the problem can be easily broken into smaller steps, making it feasible.

Sample Algorithm (Example 1)

Problem: Write an algorithm to find the sum of two numbers.
Algorithm:
- Step 1: Read two numbers in variable num1 and num2
- Step 2: Add num1 and num2 and sum=num1+num2
- Step 3: Display result sum

Algorithm Example: Even or Odd

Step 1: Start
Step 2: Read a number to N
Step 3: Divide the number by 2 and store the remainder in R.
Step 4: If R = 0 go to Step 6
Step 5: Print “N is odd” go to step 7
Step 6: Print “N is even”
Step 7: Stop

Factors of an Algorithm

Modularity: Ability to break down a problem into small modules or steps.
Correctness: The algorithm produces the desired output for given inputs.
Maintainability: Should be designed simply for easy redefinition without major changes.
Functionality: Considers various logical steps to solve the real-world problem.
Robustness: How clearly an algorithm can define a problem.
User-friendly: Easy to explain to a programmer.
Simplicity: Easy to understand.
Extensibility: Others can use and extend the algorithm.

Issues of Algorithms

How to design algorithms (step-by-step procedure).
How to analyze algorithm efficiency.

Categories of Algorithms

Sort: For sorting items in a certain order.
Search: For searching items inside a data structure.
Delete: For deleting existing elements from the data structure.
Insert: For inserting items inside a data structure.
Update: For updating existing elements inside a data structure.

Pseudocode

An informal high-level description of the operating principle of a computer program or algorithm.
Uses structural conventions of a standard programming language for human reading.
More structured than prose but less formal than a programming language.
Methodology that allows programmers to represent the implementation of an algorithm.
Expressions:
- Use standard mathematical symbols for numeric and boolean expressions
- Use equal keyword for assignment ( = in C)
- Use = for equality relationship (== in C)
Method Declarations:
- Algorithm name(param1, param2)

Good vs Bad Ways of Writing Pseudocode

Good Example
- FOR each index of string
- IF string(index) is equal to hash THEN
- RETURN "Hash found"
- END IF
- END FOR
Bad Example
- FOR i = 0 to 20
- IF string[i] = '#'
- RETURN "Hash found"
- END IF
- END FOR
Worst Example
- FOR each index of string
- if string[i] = '#'
- return "Hash found"
- END if
- END FOR

Advantages of Pseudocode

Easily transformed into actual programming language.
Easily understood by non-programmers.
Easily modifiable compared to flowcharts.
Beneficial for structured, designed elements.
Errors can be detected before coding.

Disadvantages of Pseudocode

Lacks a standardized style or format.
Higher error possibility when transforming into code.
May require a tool for extracting and drawing flowcharts.
Does not depict the design.

Algorithm vs. Pseudocode

Algorithm: A problem-solving process, a series of instructions to solve a problem.
Pseudocode: A way of writing an algorithm using informal, simple language without strict syntax.

Problem Example: Calculating Absentees

Problem: Suppose there are 60 students in the class. How will you calculate the number of absentees in the class?
Pseudo Approach:
- Initialize a variable called as Count to zero, absent to zero, total to 60
- FOR EACH Student PRESENT DO the following:
  - Increase the Count by One
- Then Subtract Count from total and store the result in absent
- Display the number of absent students
Algorithmic Approach:
- Count <- 0, absent <- 0, total <- 60
- REPEAT till all students counted
  - Count <- Count + 1
  - absent <- total - Count
- Print "Number absent is:" , absent

Algorithm Analysis

Can be analyzed before (priori) and after (posterior) creation.
1. Priori Analysis: Theoretical analysis before implementation; factors like processor speed are not considered.
2. Posterior Analysis: Practical analysis after implementation using a programming language; evaluates running time and space taken by the algorithm.

Algorithm Complexity

Performance measured by:
- Time complexity: Amount of time required to complete execution, denoted by Big O notation.
- Space complexity: Amount of space required to solve a problem and produce an output, also expressed in Big O notation.

Analysis of Algorithms

Worst-case, Average-case, Best-case
- Worst-case: Upper bound on the running time for any input.
- Average-case: Estimate of the running time for an 'average' input, assuming all inputs are equally likely.
- Best-case: Analysis under optimal conditions (rarely used for decision-making).

Asymptotic Analysis and Notations

Evaluates algorithm performance in terms of input size, not actual running time.
Measures how time (or space) taken by an algorithm increases with input size.
Uses machine-independent constants.
Asymptotic notations are mathematical tools to represent the time complexity.

BIG O Notation

Defines an upper bound of an algorithm.
O(g(n)) = { f(n): \text{ there exist positive constants } c \text{ and } n0 \text{ such that } 0 <= f(n) <= c*g(n) \text{ for all } n >= n0 }

Categories of Algorithms (Based on Big O Notation)

Constant time algorithm: running time complexity given as $O(1)$
Linear time algorithm: running time complexity given as $O(n)$
Logarithmic time algorithm: running time complexity given as $O(log n)$
Polynomial time algorithm: running time complexity given as $O(n^k)$ , where k > 1
Exponential time algorithm: running time complexity given as $O(2^n)$
O(1) < O(log n) < O(n) < O(n^2) < O(2^n) < O(n!)

BIG Omega Notation

Notation provides an asymptotic lower bound.
\Omega(g(n)) = {f(n): \text{ there exist positive constants } c \text{ and } n0 \text{ such that } 0 <= c*g(n) <= f(n) \text{ for all } n >= n0}

BIG Theta Notation

Bounds a function from above and below, defining exact asymptotic behavior.
Drop low order terms and ignore leading constants.
Example: $3n^3 + 6n^2 + 6000 = \Theta(n^3)$
(g(n)) = {f(n) \text{ there exist positive constants } c1, c2 \text{ and } n0 \text{ such that } 0 <= c1g(n) <= f(n) <= c2g(n) \text{ for all } n >= n0}

Theta Notation (Θ-Notation)

Encloses the function from above and below.
Represents the upper and lower bound of the running time of an algorithm.
Used for analyzing the average-case complexity of an algorithm.
Theta (Average Case): Add the running times for each possible input combination and take the average.

Linear Search

Also called sequential search algorithm.
Simplest searching algorithm.
Traverse the list completely and match each element with the item to be found.
Returns the location of the item if found; otherwise, returns NULL.

Linear Search Algorithm

Linear_Search(a, n, val) // 'a' is the given array, 'n' is the size of given array, 'val' is the value to search
- Step 1: set pos = -1
- Step 2: set i = 1
- Step 3: repeat step 4 while i <= n
- Step 4: if a[i] == val
  - set pos = i
  - print pos
  - go to step 6
  - set i = i + 1
- Step 5: if pos = -1
  - print "value is not present in the array "
- Step 6: exit

Linear Search - Time Complexity

Best Case: $O(1)$
Average Case: $O(n)$
Worst Case: $O(n)$
Best Case: Element is at the first position of the array.
Average Case: The average time it takes to find element.
Worst Case: Element is at the end of the array, or not present in the array.

Binary Search

Works efficiently on sorted lists.
Ensures the list is sorted before searching.
Follows the divide and conquer approach.
Divides the list into two halves, compares the item with the middle element.
If a match is found, the location of the middle element is returned.
Otherwise, search either of the halves, depending on the result produced through the match.

Binary Search Algorithm

Search a sorted array by repeatedly dividing the search interval in half.
Begin with an interval covering the whole array.
If the value of the search key is less than the item in the middle of the interval, narrow the interval to the lower half.
Otherwise narrow it to the upper half. Repeatedly check until the value is found or the interval is empty.

Binary Search - Iterative Pseudocode

BinarySearchIT(A[0..N-1], value)
- low = 0
- high = N-1
- while (low <= high)
  - mid=(low +(high-low)/2)
  - if (A[mid] > value)
    - high = mid-1
  - else if (A[mid] < value)
    - low = mid + 1
  - else
    - return mid
- return -1

Binary Search - Recursive Pseudocode

BinarySearchREC(A[0..N-1], value, low, high)
- if (high>=low)
  - mid = low +((high-low)/2)
  - if (A[mid] > value)
    - return BinarySearchREC(A, value, low, mid-1)
  - else if (A[mid] <value)
    - return BinarySearchREC(A, value, mid+1, high)
  - else
    - return mid
- return -1

Binary Search - Time Complexity

Best Case Complexity - In Binary search, best case occurs when the element to search is found in first comparison, i.e., when the first middle element itself is the element to be searched. The best-case time complexity of Binary search is O(1).
Average Case Complexity - The average case time complexity of Binary search is $O(log n)$ .
Worst Case Complexity - In Binary search, the worst case occurs, when we have to keep reducing the search space till it has only one element. The worst-case time complexity of Binary search is $O(log n)$ .

Time-Space Trade-Off

The best algorithm requires less memory space and takes less time to complete its execution.
Designing such an ideal algorithm is not a trivial task.
More than one algorithm can solve a particular problem.
It is not uncommon to sacrifice one thing for the other.
There exists a time–space trade-off among algorithms.
If space is a big constraint, then choose a program that takes less space at the cost of more CPU time.
On the contrary, if time is a major constraint, then choose a program that takes minimum time to execute at the cost of more space.

References

https://www.geeksforgeeks.org/
Thareja, R. (2014). Data structures using C. Oxford University.
Lipschutz, S. (2003). Schaum's outline of theory and problems of data structures. McGraw-Hill.