Module 1 Notes: Data Structures and Algorithms (CS214)

Module 1: Data Structure

Learning Outcomes

Define data structure, basic terminology, algorithm, basic concept of function and pointers, and dynamic memory allocation in data structure.

A. Engage Quotation

Figure 1.1: Linus Benedict Torvalds quote referenced (described as a famous quote). The exact quote is not reproduced in the transcript; the emphasis is on the idea that choosing appropriate data structures affects program efficiency and energy/time usage.

B. Explore Video

Video: Data Structures – Week 1
YouTube link: https://www.youtube.com/watch?timecontinue=7&v=18V8Avz2OH8&feature=emblogo
Module Video Filename: Data Structures – Week 1

C. Explain (Torvalds’ perspective)

Using concrete data structures leads to better programmer efficiency and effectiveness.
CPU cycles saved translate to reduced time and energy, enabling better use of resources for other instructions.
A program built with improper data structures tends to be inefficient or unnecessarily complex.
The study covers description, implementation, and quantitative performance analysis of data structures.

D. Elaborate Introduction

Data and information are essential in the modern world; data can be stored in various formats.
Data item: a single value or a set of values, possibly broken into sub-items.
- Example: a student name can be divided into first name, middle name, last name (three sub-items).
- A student ID is typically a single item.
A data structure is a specialized format for organizing, processing, retrieving, and storing data.
Data structures are designed to arrange data to suit a specific purpose so that it can be accessed and manipulated efficiently.
A data structure is a scheme for organizing data in computer memory.
Common data structures include: lists, arrays, stacks, queues, heaps, trees, graphs, tries, and hash tables.
The arrangement of data affects program performance for different tasks.
In programming, a data structure may be chosen or designed to store data for processing by algorithms; each structure contains:
- information about the data values,
- relationships between the data,
- functions that can be applied to the data.

Characteristics of Data Structures

Data structures are classified by:
1) Linear or non-linear: whether items are arranged in a chronological sequence (e.g., arrays) or in an unordered fashion (e.g., graphs).
2) Homogeneous or non-homogeneous: whether all items are of the same type or of various types.
3) Static or dynamic: whether sizes/structures/memory locations are fixed at compile time (static) or can grow/shrink at run time (dynamic).

Types of Data Structures

Data structure types are determined by the operations or algorithms applied to them. Types include:

1. Arrays: store items at adjoining memory locations; items of the same type are stored together; can be fixed or flexible in length.
2. Stacks: store items in a linear order with LIFO (last in, first out) semantics or FIFO in certain variants.
3. Queues: store items similar to stacks but strictly follow FIFO order.
4. Linked lists: store items in a linear sequence; each node contains a data item and a link to the next item.
5. Trees: hierarchical structure with nodes connected in a parent-children relation.
6. Graphs: non-linear structure consisting of vertices and edges, useful for modeling networks and related systems.
7. Tries: a keyword tree that stores strings as data items.
8. Hash tables: associative arrays that map keys to values using a hash function to determine bucket indices.
Primitive data structures: integers, floats, Booleans, characters.
Figure 1.2: Data Structure Hierarchy (referenced)

Uses of Data Structures

Data structures are used to implement the physical forms of abstract data types.
Examples: representing a relational database as a binary tree; organizing code and information in digital space (e.g., Python lists/dictionaries, JavaScript arrays/objects).

Importance of Data Structures

Essential for managing large data sets (databases, indexing services) efficiently.
Help with memory allocation, data interrelationships, and data processing.
Choosing the proper data structure for a task is crucial; a poor choice can lead to slow runtimes or unresponsive code.
Factors in selecting a data structure include:
- What information will be stored,
- Where existing data should be placed,
- How data should be sorted,
- How much memory should be reserved.

Algorithm

Definition: A finite sequence of steps to accomplish a computational task.
Characteristics of a good algorithm:
- Steps are simple and definite (precision).
- Must terminate after finitely many steps.
- Must be effective (produce correct answers).
An algorithm is a computational procedure: takes input values, processes them, and produces output.
An algorithm can be described as:
- A recipe (illustrative example), or
- In multiple representations such as human language, pseudo code, or flow charts.
Phases of algorithmic problem solving:
1) Derivation of an algorithm that solves the problem.
2) Conversion of the algorithm into code.

Important Notes about Algorithms

1) An algorithm is a sequence of steps, not a program.
2) The same algorithm can be implemented in different programs or languages; it is abstracted from implementation details.

Ways to Express Algorithms

a) Human language
b) Pseudo code
c) Flow chart

Example: Largest element in a list

Problem: Given a list of positive numbers, return the largest number in the list.
Inputs: $L$ , a list of positive numbers with at least one element: $|L| gt; 0$ .
Outputs: $n$ , the largest number in $L$ , i.e., n = igmax Lig.
Algorithm:
1) Set max to 0.
2) For each number $x$ in the list $L$ , compare to the current max; if x > ext{max} then set $ext{max} = x$ .
3) After processing all elements, $ext{max}$ holds the largest value in the list.

Characteristics of an Algorithm (detailed)

1) Precision: each step must be exact and unambiguous.
2) Termination: the algorithm must finish after a finite number of steps (no infinite loops).
3) Effectiveness: each step must be executable and yield correct results.
4) Generality: solves every instance of the problem it is intended to solve.
5) Uniqueness: results of each step are uniquely defined given inputs and prior steps.
6) Finiteness: total number of steps is finite.
7) Output: the algorithm produces a defined output.

Pseudo Code

Pseudo code: an informal, high-level description of an algorithm, using the notation and structure of programming languages but intended for human readability.
It omits details not essential to understanding the core algorithm (e.g., variable declarations, system-specific code, some subroutines).
Purpose: provide an environment-independent description to sketch the structure before actual coding.

Flowchart

Flow chart: a diagram that represents an algorithm, workflow, or process using boxes and arrows.
Uses:
- Analysis, design, documentation, and management of processes or programs.
- Helpful for visualizing the sequence of operations and decision points.

Functions in C/C++

A function is a set of statements that take inputs, perform a computation, and produce output.
Rationale: to group commonly performed tasks into a function so code can be reused and maintained easily.
General form:
- returntype functionname([argtype argname, …]) {
- // code
- }
Example (conceptual description): a function max(x, y) returns the greater of x and y; used in main by passing two numbers, which demonstrates basic function usage and return value.
In practice, there are C and C++ implementations that define a max function and call it from a main routine to illustrate modular programming and code reuse.

Why do we need functions?

Reduce code redundancy: avoid repeating the same logic in multiple places.
Improve maintenance: change the functionality in one place, and all call sites benefit.
Improve modularity: break large programs into manageable, readable units.
Enable abstraction: use library or built-in functions without requiring knowledge of internal workings.

Pointers

Pointers are variables that store the memory address of another value.
Dereferencing: obtaining the value stored at the memory location referenced by a pointer.
Pointers enable dynamic data structures and efficient data manipulation, reducing unnecessary data copying.
Call-by-value vs Call-by-reference:
- Call-by-value: arguments' values are copied for each operation, which can be inefficient.
- Call-by-reference: functions receive memory addresses, allowing direct updates to the original data.

Control Program Flow with Pointers

Pointers can be used to control program flow via control tables that store entry points to subroutines, enabling sequential or recursive traversal of procedures.
They are essential for recursive algorithms and for traversing structures like linked lists where addresses of successive elements are linked.
Example structure (conceptual):
- struct Node { int data; struct Node* next; };
Pointers are also used in managing the next memory locations in a linked structure, enabling dynamic growth.

Dynamic Memory Allocation

Many languages allocate memory for run-time variables on a heap (free store) rather than the stack.
Pointers store addresses of dynamically allocated blocks or arrays.
A linked list tail is commonly marked by a NULL pointer to indicate there are no further elements.
Conceptual example (not showing full code): allocate memory for nodes using heap, link nodes via their next pointers, and terminate with a NULL at the end of the list.

Pointer Demo (Conceptual) and Output

Conceptual demonstration shows creating pointers to an integer variable and to a pointer variable, then dereferencing to access the value, and printing addresses and values.
The demonstration emphasizes:
- address retrieval via &
- value access via *
- multiple levels of indirection (e.g., **ptr2)
The transcript references an illustrative output labeled as Figure 1.3, indicating the successful display of pointer addresses and dereferenced values.

Concluding Statement for This Module

In this module, you learn about algorithms, their properties, and the different representations (pseudo code, flowchart, function, pointers, dynamic memory allocation) and how these concepts apply to C/C++ programming.

E. Evaluate

ASSESSMENT: Instructions

You may write your answers on the Answer Sheet (AS) provided in this module.

CONTENT FOR ASSESSMENT: Identification (2 points each)

1) A collection of facts and figures or you can say data are values or a set of values that are in a particular format.
2) It is a specialized format for organizing, processing, retrieving and storing data.
3) It means a finite sequence of steps for accomplishing some computational task.
4) Is an informal high-level description of the operating principle of a computer program or other algorithm.
5) A type of diagram that represents an algorithm, workflow or process.
6) It is a set of statements that take inputs, do some specific computation and produces output.
7) These are the variables that are used to store the location of value present in the memory.
8) This characteristics describes wheter the data items are arranged in chronological sequence, such as with an array.
9) This characteristic describes whether all data items in a given repository are of various types or of the same type.
10) This characteristic describes how the data structures are compiled.

References

Introduction to Data Structure and Algorithms – https://www.w3schools.in/data-structures-tutorial/intro/
Lecture Notes on Data Structure and Algorithms – https://searchsqlserver.techtarget.com/definition/data-structure
https://www.geeksforgeeks.org/functions-in-c/
https://www.learnpick.in/prime/documents/ppts/details/1708/introduction-to-data-structure
https://www.educba.com/pointers-in-data-structure/
Data Structure and Algorithms – Harrison Njoroj, The African Virtual University Headquarters Cape Office Park Ring Road Kilimani PO Box 25405-00603 Nairobi, Kenya

Facilitated By: Name : MS Teams Account (email) : Smart Phone Number