Data Structures Notes

CSF102: Data Structures Session: 2025

  • B. Tech 2nd Semester

  • Course Coordinator: Dr. Madhu Sharma

  • Course Instructor: Mohd Khalid

Unit 1: Introduction to Algorithms & Data Structure

  • Introduction:

    • Data types, Abstraction, Abstract Data Type (ADT), Concept of data structure, Types of data structures, Operations on Data Structures

    • Introduction to Algorithms, Writing Pseudocodes, Algorithm analysis, Complexity of algorithms and Time space trade-off

    • Searching: Linear and Binary Search Techniques and their complexity analysis.

Data Structure

  • A logical and mathematical model for storing and organizing data in a particular way to facilitate algorithm design and implementation.

Basic Terminologies

  • Data: An elementary value or a collection of values (e.g., Employee's name and ID).

  • Data Items: A single unit of value.

  • Record: A collection of various data items.

  • File: A collection of records of one type.

  • Elementary Items: Data items that cannot be further divided (e.g., the ID of an Employee).

  • Information: Data with attributes, i.e., meaningful data.

Introduction

  • Data structure affects the design of both structural & functional aspects of a program.

  • Program = Algorithm + Data Structure

  • Algorithm: A step-by-step procedure to solve a particular function.

Classification of Data Structures

  • Data structures are normally divided into two broad categories:

    • Primitive Data Structure

    • Non-Primitive Data Structure

Primitive Data Structures

  • Basic structures directly operated upon by machine instructions.

  • Examples:

    • INTEGER

    • CHARACTER

    • FLOAT

    • POINTERS

  • Can hold a single value in a specific location.

Non-Primitive Data Structures

  • ARRAYS

    • Single Dimension

    • Two Dimension

    • Multi Dimension

  • LISTS

    • Linear

      • Stack

      • Queue

      • Linked List

    • Non-Linear

      • Trees

      • Graphs

Operations on Primitive Data Structures

  • Create

  • Selection

  • Updating

  • Destroy or Delete

Non-Primitive Data Structure characteristics

  • Can hold multiple values either contiguously or randomly.

  • Defined by the programmer.

Linear Data Structures

  • Have homogeneous elements.

  • Elements are in a sequence, forming a linear series.

  • Easy to implement because computer memory is organized linearly.

  • Examples:

    • Stack

    • Queue

    • Linked Lists

Non-Linear Data Structures

  • Data items are connected to several other data items.

  • Exhibit hierarchical or parent-child relationships.

  • Data elements are not arranged sequentially.

  • Examples:

    • Trees

    • Graphs

Operations on Non-Primitive Data Structures

  • Traversal

  • Insertion

  • Selection

  • Searching

  • Sorting

  • Merging

  • Destroy or Delete

Primitive vs. Non-primitive Data Structures

  • Primitive:

    • Stores data of only one type.

    • Examples: integer, character, float.

    • Cannot be NULL.

    • Size depends on the data type.

    • Starts with a lowercase character.

    • Cannot be used to call methods directly.

  • Non-primitive:

    • Stores data of more than one type.

    • Examples: Array, Linked list, stack.

    • Can consist of a NULL value.

    • Size is not fixed.

    • Starts with an uppercase character.
      *cannot be used to call methods.

Linear vs. Non-linear Data Structures

  • Linear:

    • Data elements are arranged linearly; each element is attached to its previous and next adjacent.

    • Single level is involved.

    • Easier implementation.

    • Elements can be traversed in a single run.

    • Memory may not be utilized efficiently.

    • Examples: array, stack, queue, linked list.

    • Applications in application software development.

    • Useful for simple data storage and manipulation.

    • Good performance for adding/removing at ends, slower for searching/removing in the middle.

  • Non-linear:

    • Data elements are attached hierarchically.

    • Multiple levels are involved.

    • Complex implementation.

    • Elements cannot be traversed in a single run.

    • Memory is utilized efficiently.

    • Examples: trees and graphs.

    • Applications in Artificial Intelligence and image processing.

    • Useful for representing complex relationships and data hierarchies.

    • Performance varies, can be optimized for specific operations.

Operations on Data Structures (Detailed)

  • Traversing: Visiting each element stored in the data structure.

  • Searching: Finding a particular element; successful if the element is found.

  • Insertion: Adding an element to the data structure.

  • Deletion: Removing an element from the data structure.

    • In stack, it's called "Pop."

    • In queue, it's called "Dequeue."

  • Merging: Combining lists of two sorted data items to form a single sorted list.

Abstract Data Type (ADT)

  • A mathematical model for data types defined by its behavior (semantics) from a user perspective.

  • Defined in terms of:

    • Possible values

    • Possible operations on data of this type

    • Behavior of these operations

  • Contrasts with data structures, which are concrete representations of data from an implementer's view.

  • Only specifies what operations are to be performed, not how they will be implemented.

  • Gives an implementation-independent view, providing only essentials and hiding the details (abstraction).

  • Can be defined as a set of data elements along with the operations on the data.

ADT Implementation

  • Contains a storage structure to store data elements and algorithms for fundamental operations.

  • Examples: array, linked list, queue, stack.

  • Separates a data structure from its implementation information.

List-ADT

  • Contains elements of the same type arranged sequentially.

  • Operations:

    • get(): Return an element at a given position.

    • insert(): Insert an element at any position.

    • remove(): Remove the first occurrence of an element.

    • removeAt(): Remove the element at a specified location.

    • replace(): Replace an element at any position.

    • size(): Return the number of elements.

    • isEmpty(): Return true if empty, otherwise false.

    • isFull(): Return true if full, otherwise false.

ADT Model

  • Consists of public and private functions.

  • Involves:

    • Encapsulation: Wrapping all data in a single unit (ADT).

    • Abstraction: Showing operations that can be performed and the data structures used.

Real-World Example (Smartphone)

  • Data (Specifications):

    • 4 GB RAM

    • Snapdragon 2.2ghz processor

    • 5 inch LCD screen

    • Dual camera

    • Android 8.0

  • Operations:

    • call(): Make a call.

    • text(): Send a text message.

    • photo(): Click a photo.

    • video(): Make a video.

Applications of Data Structures

  • Organization of data in computer's memory.

  • Representing information in databases.

  • Implementation of algorithms for searching data (e.g., search engine).

  • Implementation of algorithms to manipulate data (e.g., word processors).

  • Implementation of algorithms to analyze data (e.g., data miners).

  • Support for algorithms to generate data (e.g., a random number generator).

  • Support for algorithms to compress and decompress data (e.g., a zip utility).

  • Implementation of algorithms to encrypt and decrypt data (e.g., a security system).

Algorithm

  • A process or set of rules required to perform calculations or problem-solving operations, especially by a computer.

  • Formal definition: A finite set of instructions carried out in a specific order to perform a specific task.

  • Not a complete program but a solution (logic) of a problem, represented as a flowchart or pseudocode.

  • An algorithm is a finite set of Instructions for solving a particular problem in a finite amount of time using a finite amount of data.

Characteristics Required of an Algorithm

  • Each instruction should be precise and unambiguous.

  • Each instruction should be performable in a finite time.

  • One or more instructions should not be repeated infinitely (ensuring termination).

  • The desired results must be obtained after the algorithm terminates.

  • Algorithm can be represented as programs, flowcharts and pseudo-codes.

Characteristics of an Algorithm

  • Input: Zero or more input values.

  • Output: One or more output values.

  • Unambiguity: Instructions should be clear and simple.

  • Finiteness: Should contain a limited (countable) number of instructions.

  • Effectiveness: Each instruction affects the overall process.

  • Language independence: Can be implemented in any language with the same output.

Dataflow of an Algorithm

  • Problem: A real-world problem or instance.

  • Algorithm: Step-by-step procedure designed for the problem.

  • Input: Required and desired inputs provided to the algorithm.

  • Processing unit: Processes the input to produce the desired output.

  • Output: The outcome or result of the program.

Why Algorithms

  • Scalability: Helps to scale down a large real-world problem into smaller steps for easy analysis.

  • Performance: Indicates if the problem can be easily broken into smaller steps, making it feasible.

Sample Algorithm (Example 1)

  • Problem: Write an algorithm to find the sum of two numbers.

  • Algorithm:

    • Step 1: Read two numbers in variable num1 and num2

    • Step 2: Add num1 and num2 and sum=num1+num2

    • Step 3: Display result sum

Algorithm Example: Even or Odd

  • Step 1: Start

  • Step 2: Read a number to N

  • Step 3: Divide the number by 2 and store the remainder in R.

  • Step 4: If R = 0 go to Step 6

  • Step 5: Print “N is odd” go to step 7

  • Step 6: Print “N is even”

  • Step 7: Stop

Factors of an Algorithm

  • Modularity: Ability to break down a problem into small modules or steps.

  • Correctness: The algorithm produces the desired output for given inputs.

  • Maintainability: Should be designed simply for easy redefinition without major changes.

  • Functionality: Considers various logical steps to solve the real-world problem.

  • Robustness: How clearly an algorithm can define a problem.

  • User-friendly: Easy to explain to a programmer.

  • Simplicity: Easy to understand.

  • Extensibility: Others can use and extend the algorithm.

Issues of Algorithms

  • How to design algorithms (step-by-step procedure).

  • How to analyze algorithm efficiency.

Categories of Algorithms

  • Sort: For sorting items in a certain order.

  • Search: For searching items inside a data structure.

  • Delete: For deleting existing elements from the data structure.

  • Insert: For inserting items inside a data structure.

  • Update: For updating existing elements inside a data structure.

Pseudocode

  • An informal high-level description of the operating principle of a computer program or algorithm.

  • Uses structural conventions of a standard programming language for human reading.

  • More structured than prose but less formal than a programming language.

  • Methodology that allows programmers to represent the implementation of an algorithm.

  • Expressions:

    • Use standard mathematical symbols for numeric and boolean expressions

    • Use equal keyword for assignment ( = in C)

    • Use = for equality relationship (== in C)

  • Method Declarations:

    • Algorithm name(param1, param2)

Good vs Bad Ways of Writing Pseudocode

  • Good Example

    • FOR each index of string

    • IF string(index) is equal to hash THEN

    • RETURN "Hash found"

    • END IF

    • END FOR

  • Bad Example

    • FOR i = 0 to 20

    • IF string[i] = '#'

    • RETURN "Hash found"

    • END IF

    • END FOR

  • Worst Example

    • FOR each index of string

    • if string[i] = '#'

    • return "Hash found"

    • END if

    • END FOR

Advantages of Pseudocode

  • Easily transformed into actual programming language.

  • Easily understood by non-programmers.

  • Easily modifiable compared to flowcharts.

  • Beneficial for structured, designed elements.

  • Errors can be detected before coding.

Disadvantages of Pseudocode

  • Lacks a standardized style or format.

  • Higher error possibility when transforming into code.

  • May require a tool for extracting and drawing flowcharts.

  • Does not depict the design.

Algorithm vs. Pseudocode

  • Algorithm: A problem-solving process, a series of instructions to solve a problem.

  • Pseudocode: A way of writing an algorithm using informal, simple language without strict syntax.

Problem Example: Calculating Absentees

  • Problem: Suppose there are 60 students in the class. How will you calculate the number of absentees in the class?

  • Pseudo Approach:

    • Initialize a variable called as Count to zero, absent to zero, total to 60

    • FOR EACH Student PRESENT DO the following:

      • Increase the Count by One

    • Then Subtract Count from total and store the result in absent

    • Display the number of absent students

  • Algorithmic Approach:

    • Count <- 0, absent <- 0, total <- 60

    • REPEAT till all students counted

      • Count <- Count + 1

      • absent <- total - Count

    • Print "Number absent is:" , absent

Algorithm Analysis

  • Can be analyzed before (priori) and after (posterior) creation.

    1. Priori Analysis: Theoretical analysis before implementation; factors like processor speed are not considered.

    2. Posterior Analysis: Practical analysis after implementation using a programming language; evaluates running time and space taken by the algorithm.

Algorithm Complexity

  • Performance measured by:

    • Time complexity: Amount of time required to complete execution, denoted by Big O notation.

    • Space complexity: Amount of space required to solve a problem and produce an output, also expressed in Big O notation.

Analysis of Algorithms

  • Worst-case, Average-case, Best-case

    • Worst-case: Upper bound on the running time for any input.

    • Average-case: Estimate of the running time for an 'average' input, assuming all inputs are equally likely.

    • Best-case: Analysis under optimal conditions (rarely used for decision-making).

Asymptotic Analysis and Notations

  • Evaluates algorithm performance in terms of input size, not actual running time.

  • Measures how time (or space) taken by an algorithm increases with input size.

  • Uses machine-independent constants.

  • Asymptotic notations are mathematical tools to represent the time complexity.

BIG O Notation

  • Defines an upper bound of an algorithm.

  • O(g(n)) = { f(n): \text{ there exist positive constants } c \text{ and } n0 \text{ such that } 0 <= f(n) <= c*g(n) \text{ for all } n >= n0 }

Categories of Algorithms (Based on Big O Notation)

  • Constant time algorithm: running time complexity given as O(1)O(1)

  • Linear time algorithm: running time complexity given as O(n)O(n)

  • Logarithmic time algorithm: running time complexity given as O(logn)O(log n)

  • Polynomial time algorithm: running time complexity given as O(nk)O(n^k), where k > 1

  • Exponential time algorithm: running time complexity given as O(2n)O(2^n)

  • O(1) < O(log n) < O(n) < O(n^2) < O(2^n) < O(n!)

BIG Omega Notation

  • Notation provides an asymptotic lower bound.

  • \Omega(g(n)) = {f(n): \text{ there exist positive constants } c \text{ and } n0 \text{ such that } 0 <= c*g(n) <= f(n) \text{ for all } n >= n0}

BIG Theta Notation

  • Bounds a function from above and below, defining exact asymptotic behavior.

  • Drop low order terms and ignore leading constants.

  • Example: 3n3+6n2+6000=Θ(n3)3n^3 + 6n^2 + 6000 = \Theta(n^3)

  • (g(n)) = {f(n) \text{ there exist positive constants } c1, c2 \text{ and } n0 \text{ such that } 0 <= c1g(n) <= f(n) <= c2g(n) \text{ for all } n >= n0}

Theta Notation (Θ-Notation)

  • Encloses the function from above and below.

  • Represents the upper and lower bound of the running time of an algorithm.

  • Used for analyzing the average-case complexity of an algorithm.

  • Theta (Average Case): Add the running times for each possible input combination and take the average.

Linear Search

  • Also called sequential search algorithm.

  • Simplest searching algorithm.

  • Traverse the list completely and match each element with the item to be found.

  • Returns the location of the item if found; otherwise, returns NULL.

Linear Search Algorithm

  • Linear_Search(a, n, val) // 'a' is the given array, 'n' is the size of given array, 'val' is the value to search

    • Step 1: set pos = -1

    • Step 2: set i = 1

    • Step 3: repeat step 4 while i <= n

    • Step 4: if a[i] == val

      • set pos = i

      • print pos

      • go to step 6

      • set i = i + 1

    • Step 5: if pos = -1

      • print "value is not present in the array "

    • Step 6: exit

Linear Search - Time Complexity

  • Best Case: O(1)O(1)

  • Average Case: O(n)O(n)

  • Worst Case: O(n)O(n)

  • Best Case: Element is at the first position of the array.

  • Average Case: The average time it takes to find element.

  • Worst Case: Element is at the end of the array, or not present in the array.

Binary Search

  • Works efficiently on sorted lists.

  • Ensures the list is sorted before searching.

  • Follows the divide and conquer approach.

  • Divides the list into two halves, compares the item with the middle element.

  • If a match is found, the location of the middle element is returned.

  • Otherwise, search either of the halves, depending on the result produced through the match.

Binary Search Algorithm

  • Search a sorted array by repeatedly dividing the search interval in half.

  • Begin with an interval covering the whole array.

  • If the value of the search key is less than the item in the middle of the interval, narrow the interval to the lower half.

  • Otherwise narrow it to the upper half. Repeatedly check until the value is found or the interval is empty.

Binary Search - Iterative Pseudocode

  • BinarySearchIT(A[0..N-1], value)

    • low = 0

    • high = N-1

    • while (low <= high)

      • mid=(low +(high-low)/2)

      • if (A[mid] > value)

        • high = mid-1

      • else if (A[mid] < value)

        • low = mid + 1

      • else

        • return mid

    • return -1

Binary Search - Recursive Pseudocode

  • BinarySearchREC(A[0..N-1], value, low, high)

    • if (high>=low)

      • mid = low +((high-low)/2)

      • if (A[mid] > value)

        • return BinarySearchREC(A, value, low, mid-1)

      • else if (A[mid] <value)

        • return BinarySearchREC(A, value, mid+1, high)

      • else

        • return mid

    • return -1

Binary Search - Time Complexity

  • Best Case Complexity - In Binary search, best case occurs when the element to search is found in first comparison, i.e., when the first middle element itself is the element to be searched. The best-case time complexity of Binary search is O(1).

  • Average Case Complexity - The average case time complexity of Binary search is O(logn)O(log n).

  • Worst Case Complexity - In Binary search, the worst case occurs, when we have to keep reducing the search space till it has only one element. The worst-case time complexity of Binary search is O(logn)O(log n).

Time-Space Trade-Off

  • The best algorithm requires less memory space and takes less time to complete its execution.

  • Designing such an ideal algorithm is not a trivial task.

  • More than one algorithm can solve a particular problem.

  • It is not uncommon to sacrifice one thing for the other.

  • There exists a time–space trade-off among algorithms.

  • If space is a big constraint, then choose a program that takes less space at the cost of more CPU time.

  • On the contrary, if time is a major constraint, then choose a program that takes minimum time to execute at the cost of more space.

References

  1. https://www.geeksforgeeks.org/

  2. Thareja, R. (2014). Data structures using C. Oxford University.

  3. Lipschutz, S. (2003). Schaum's outline of theory and problems of data structures. McGraw-Hill.