WGU 684: Intro to Comp Sci
Data Fundamentals and Information Representation
Data vs. Information:
Data: Raw facts and figures (unstructured and lacking context) that can be processed to produce meaningful information.
Information: Data that has been organized or processed in a meaningful way to be useful in solving problems.
Binary Data Storage: All data in modern computers is stored as binary digits (s and s). One bit represents two states; bits represent possible combinations.
Data Compression: The process of reducing the size of data to save storage or transmission time.
Compression Ratio: The ratio of the compressed data size to the original data size (). A ratio closer to indicates tighter compression.
Lossless Compression: No information is lost; the original data can be perfectly reconstructed.
Lossy Compression: Permanently eliminates some information to reduce size, which may affect quality.
Compression Techniques:
Keyword Encoding: Replacing frequently occurring words with a single character (e.g., "the" replaced by "~").
Run-Length Encoding (Recurrence Coding): Replacing a sequence of repeated characters with a flag character, the character itself, and a count (e.g., "AAAAAAA" becomes "*A7").
Huffman Encoding: Uses variable-length bit strings based on frequency. Frequently occurring characters get shorter codes to minimize overall file size. No code is a prefix of another.
Analog vs. Digital Data:
Analog Data: A continuous representation analogous to the information it represents (e.g., mercury thermometer).
Digital Data: A discrete representation that breaks information into separate elements (binary zeroes and ones).
Pulse-Code Modulation (PCM): The behavior of digital signals jumping sharply between two extremes.
Reclocking: The process of periodically refreshing digital timing signals to prevent degradation and loss of information.
Numeric Representation:
Signed-Magnitude Representation: Uses one bit for the sign (typically for positive, for negative) and the remaining bits for magnitude. It suffers from having two representations of zero (plus and minus zero).
Ten's Complement: A decimal method where negative numbers are represented by ( = number of digits, = the number).
Two's Complement: The binary equivalent of ten's complement. To calculate , invert the bits of and add .
Overflow: Occurs when a calculation exceeds the maximum value representable by the allocated number of bits (e.g., adding in an -bit signed system resulting in ).
Real Numbers (Floating Point):
Represented using a Mantissa (digits), an Exponent (shifting the radix point), and a Sign.
Formula: .
Scientific Notation: A floating-point form where the decimal point is kept to the right of the leftmost whole digit (e.g., ).
Character Sets:
ASCII (American Standard Code for Information Interchange): Originally -bit ( characters); evolved to Latin-1 Extended ASCII (-bit, characters).
Unicode: A universal -bit set (superset of ASCII) representing virtually all writing systems globally.
Variables, Logic, and Arithmetic
Variables: Named storage locations in memory whose contents can change during program execution.
Declaration: Specifying a variable's data type, identifier, and optionally an initial value.
Initialization: Assigning a starting value to a variable.
Garbage: Unknown/unused data residing in an uninitialized variable.
Data Types in Programming:
Numeric: Includes integers (whole numbers) and floating-point (real numbers with fractional parts).
String: Sequences of characters (alphanumeric). Cannot be used in arithmetic.
Boolean: Logical values that are either True or False.
Naming Conventions:
Camel Casing:
myVariableName(Common in Java, C#).Pascal Casing:
MyVariableName(Common in Visual Basic).Hungarian Notation: Type prefix within the name, e.g.,
strNameornumRate.Snake Casing:
my_variable_name(Common in Python, C++, Ruby).Kebob Case:
my-variable-name(Common in Lisp).
Constants:
Literal (Unnamed) Constant: A fixed value written directly into code (e.g., ).
Named Constant: An identifier assigned a value that cannot change (conventionally all caps:
SALES_TAX_RATE). Helps eliminate Magic Numbers.
Assignment & Operators:
Assignment Operator (=): Assigns the value on the right to the identifier on the left (lvalue). It has right-to-left associativity.
Binary Operators: Require two operands (e.g., ).
Rules of Precedence (Order of Operations):
Parentheses (innermost first).
Multiplication () and Division () (left-to-right).
Addition () and Subtraction () (left-to-right).
Algorithms and Data Structures
Algorithm: A step-by-step set of unambiguous instructions to solve a problem in finite time using finite data.
Pseudocode: A human-readable, shorthand language used to express algorithm logic without strict syntax rules.
Selection Constructs:
Single-alternative IF: Only executes if the condition is true.
Dual-alternative (IF-THEN-ELSE): Chooses between two paths.
Nested Logic: Placing control structures inside others.
Repetition (Loops):
Count-Controlled: Repeats a specific number of times using a loop control variable (initialization, test, increment).
Event-Controlled: Executes until a specific event/condition occurs (e.g., reading a sentinel value).
Pretest Loop (WHILE): Evaluates the condition before executing the body.
Data Structures:
Arrays: Homogeneous collection where items are accessed via an Index (usually starting at ).
Records: Heterogeneous group of items (fields) accessed by name.
Abstract Data Types (ADTs): Containers with independent implementation.
Stack: LIFO (Last-In, First-Out) structure using Push (insert) and Pop (remove) on the top end.
Queue: FIFO (First-In, First-Out) structure using Enqueue (insert at rear) and Dequeue (remove from front).
List: Linear, homogeneous collection with varying length.
Linked Structure: Composed of Nodes (data + link/pointer to the next node).
Searching and Sorting:
Sequential Search (Linear): Checks each item one by one.
Binary Search: Divide-and-conquer strategy on a sorted array; repeatedly halves the search interval.
Selection Sort: Finding the smallest element and swapping it into the first unsorted position.
Bubble Sort: Adjacent elements are compared and swapped; the smallest "bubbles" up to the top.
Insertion Sort: Builds a sorted segment one item at a time by inserting correctly relative to existing items.
Object-Oriented Methodology and Programming Paradigms
Object-Oriented Design (OOD): Focuses on self-contained entities (Objects) composed of data (Fields) and operations (Methods).
Class: A blueprint for objects; defines common properties and behaviors.
Instantiation: Creating a concrete instance (object) of a class using the
newoperator.
Design Stages:
Brainstorming: Spontaneous contribution of class ideas.
Filtering: Combining or removing redundant or irrelevant classes.
Scenarios: Assigning responsibilities (knowledge and behavior) to classes, often recorded on CRC Cards (Class, Responsibility, Collaboration).
Responsibility Algorithms: Writing logic for the methods.
Core OOP Ingredients:
Encapsulation: Bundling data and methods while hiding internal details (Information Hiding). Access is controlled through
publicandprivatekeywords.Inheritance: Creating a Derived Class (subclass) from a Superclass to inherit properties and methods ("is-a" relationship).
Polymorphism: The ability for different classes to have methods with the same name but different implementations.
Programming Paradigms:
Imperative: Describes how to solve a problem (sequential, variables, assignment). Includes Procedural and Object-Oriented.
Declarative: Describes the results without detailing the steps. Includes Functional (evaluation of mathematical functions like Lisp/Scheme) and Logic (symbolic logic like Prolog).
Translation Process:
Compiler: Translates high-level source code into machine code or Bytecode entirely before execution.
Interpreter: Translates and executes statements one at a time.
Java Virtual Machine (JVM): A virtual machine that executes Java Bytecode, providing high portability.
Software Development Lifecycle and Ethics
George Pólya's Problem Solving:
Understand the problem (Ask Questions).
Devise a plan (Divide and Conquer).
Carry out the plan (Algorithm Development).
Look back (Evaluate results).
Software Development Lifecycle (SDLC):
Understand the Problem: Interact with End Users to determine requirements and document specifics.
Plan the Logic: Use flowcharts and pseudocode to develop an algorithm.
Code the Program: Write source code in a high-level language.
Translate: Use a compiler/interpreter; fix Syntax Errors.
Test the Program: Execute with sample data to find Logical Errors (Debugging).
Production: Switch the organization over to the new system (Conversion).
Maintenance: Updating code for tax rates, bug fixes, or new info.
Ethics and Public Interest:
IEEE and ACM: Professional organizations providing codes of conduct. IEEE focuses on hardware/engineering; ACM on software/computing.
Intellectual Property (IP): Creations of the mind.
Copyright: Exclusive rights to original work (song, book, webpage). Duration: years after death.
Trademark: Recognizable mark (symbol/slogan like "So Yummy!") registered with the government.
Patent: Government license for a device or process (must be novel, useful, non-obvious).
IP Licensing:
GNU GPL: Free software "copyleft" license allowing sharing and modification.
Creative Commons (CC): Allows free use/distribution (variations include BY for attribution, NC for non-commercial).
MIT License: Permissive license requiring only original terms/notices be kept.
Fair Use: Legal exception for news, education, and research based on purpose, nature, amount used, and market effect.
Cybercrime: Illegal acts using computer/networks.
Types: Identity theft, Data breaches, Fraud, Trafficking, Extortion (Ransomware), Malware.
Cyberbullying: Willful use of tech to repeatedly threaten or abuse others.
Operating Systems and Memory Management
Operating System (OS): The core system software that manages hardware resources and provides a user interface.
Multiprogramming: Keeping multiple programs in main memory concurrently to share the CPU.
Process: An instance of a program in execution.
Process Control Block (PCB): Data structure storing a process's state (PC, registers, priority).
Context Switch: Saving the state of the current CPU process and loading another.
Process States: New, Ready, Running, Waiting (blocked for I/O), Terminated.
Memory Management Techniques:
Address Binding: Mapping logical addresses (relative to program start) to physical addresses (actual hardware locations).
Single Contiguous: OS in one part, only one program in the rest.
Partition Management:
Fixed: Memory set into static sizes at boot.
Dynamic: Partitions created based on program needs.
Selection Strategies: First Fit, Best Fit (smallest gap), Worst Fit (largest gap).
Compaction: Shuffling programs to create one large free space.
Paged Memory Management: Dividng memory into Frames and programs into Pages.
Demand Paging: Pages are only brought to memory when needed.
Virtual Memory: Illusion of unlimited memory using a disk area.
Thrashing: Performance-degrading excessive page swapping.
CPU Scheduling:
Nonpreemptive: Currently running process must terminate or wait (FCFS, SJN).
Preemptive: OS interrupts a process (Round Robin).
First-Come, First-Served (FCFS): Processes handled in arrival order.
Shortest Job Next (SJN): Provably optimal turnaround time; needs estimated service times.
Round Robin: Uses a Time Slice (Quantum) to rotate through ready processes fairly.
Asynchronous (Event-Driven) Processing: Handling inputs like mouse clicks that occur outside the program's sequential flow.
File Systems and Disk Management
Secondary Memory: Nonvolatile storage (e.g., hard drives, SSDs).
File Types:
Text File: Data organized as characters (ASCII/Unicode).
Binary File: Requires specific interpretation (executables, images like JPEG/GIF, formatted word docs).
File Operations: Create, Delete, Open, Close, Read, Write, Rename, Truncate, Append.
Access Methods:
Sequential Access: Data processed in order from beginning to end.
Direct Access: Accessing logical records by number in any order.
File Protection: Controls access via categories like Owner, Group, and World with permissions like Read, Write, and Execute.
Directory Management:
Directory Tree: Hierarchical structure starting from the Root Directory.
Absolute Path: From the root (e.g.,
C:\Windows\System).Relative Path: From current working directory (e.g.,
..\landscape.jpg).
Disk Scheduling: Managing I/O requests for the magnetic disk to minimize Seek Time (head movement to cylinder) and Latency (rotation delay).
FCFS: Requests in order received.
SSTF (Shortest-Seek-Time-First): Closest cylinder next. Risk of Starvation.
SCAN (Elevator): Moves in and out, servicing requests in current direction.
Computer Architecture and the von Neumann Model
von Neumann Architecture:
Stored-Program Concept: Data and instructions are treated the same and stored in the same place.
Five Major Components:
Memory Unit: Addressable cells in bytes or words.
ALU (Arithmetic/Logic Unit): Performs arithmetic and logical (AND/OR) operations using specialized Registers.
Control Unit: Stage manager of the fetch-execute cycle.
Input Unit: Channels into the computer (mouse, keyboard).
Output Unit: Channels out (printers, displays).
Central Processing Unit (CPU): Composed of the ALU and Control Unit.
Registers in the Control Unit:
Instruction Register (IR): Holds the current instruction being executed.
Program Counter (PC): Holds the address of the next instruction.
The Fetch-Execute Cycle:
Fetch: Copy instruction from PC address to IR; increment PC.
Decode: Logic circuitry determines the operation.
Get Data: Fetch operands from memory if needed.
Execute: Signal the ALU to carry out the task.
System Specifications:
Hertz (Hz): Cycles per second of the system Clock.
Bus: Collection of wires (Address, Data, Control) connecting components. Bus Width determines bits moved at once.
Motherboard: Main circuit board with connections for all devices.
Storage Technologies:
RAM: Volatile; random access.
ROM: Nonvolatile; permanent instructions (e.g., for Booting).
HDD (Hard Disk Drive): Magnetic platters; cylinders, tracks, sectors.
SSD (Solid-State Drive): No moving parts; faster, nonvolatile chip-based storage.
Optical (CD/DVD): Uses lasers; capacities from to .
Touch Screen Technologies: Resistive (layers), Capacitive (electrical flow to finger), Infrared (light beams), SAW (sound waves).
Network Architecture and Internet Technologies
Network Definitions:
Node (Host): Any device on a network.
Protocol: A set of rules for interaction.
Data Transfer Rate (Bandwidth): Speed of data movement.
Client/Server vs. P2P:
Client/Server: Client requests (Browser); Server responds (Web server, File server).
P2P (Peer-to-Peer): Decentralized; nodes share resources directly.
Network Categories:
LAN: Small geographic area (room/building).
WAN: Connects multiple LANs over large distances (The Internet).
MAN: Metropolitan area (city or campus).
Topologies:
Ring: Closed loop, one direction.
Star: Central node connects all others.
Bus: Single communication line shared by all nodes.
Network Communication:
Packet Switching: Messages divided into fixed-size Packets, routed independently by Routers, and reassembled at destination.
Gateway: Node handling traffic between LANs and other networks.
Firewall: Security gateway enforcing access control policies by filtering ports.
Internet Protocols:
OSI Reference Model: abstraction layers.
TCP/IP Suite:
IP (Internet Protocol): Routing packets based on addresses.
TCP (Transmission Control Protocol): Reliable, ordered assembly of packets.
UDP (User Datagram Protocol): Faster, less reliable alternative.
High-Level Protocols: SMTP (Email), FTP (Files), Telnet (Remote login), HTTP (Web).
IP Addressing:
IPv4: -bit; bytes (e.g.,
205.39.155.18). Limit: billion addresses.IPv6: -bit; written in hexadecimal. Successor to IPv4.
Domain Name System (DNS): Distributed database translating Hostnames (readable e.g.
matisse.csc.villanova.edu) to numeric IP addresses.TLD (Top-Level Domain):
.com,.edu,.org,.ninja, etc. Managed by ICANN.
Modern Computing Trends and the Internet of Things (IoT)
Hardware Trends:
Multicore Processors: Multiple CPUs (cores) on a single chip.
GPU: Specialized for graphics and parallel processing.
Server Farms: Thousands of servers in huge specialized buildings (often near cheap power).
Embedded Systems: Computers within devices (cars, microwaves) often written in assembly or C and stored in ROM.
Internet of Things (IoT): A network of interconnected devices communicating without human interaction.
Four Pillars of IoT:
The Device: Mini-computer/Chip with logic.
Input: Sensory data (temperature, movement) not from humans.
Connectivity: Network/Internet access.
Action: Fetching data, sending notifications, or signaling other devices.
Network Neutrality: The principle that ISPs should treat all internet traffic equally without favoritism or throttle-down for non-paying users.