CS50 Lecture Notes: Information Representation, Algorithms, and Scratch Basics
Representing Information and the CS50 Foundations
Intro and personal framing
- CS50 is Harvard’s introduction to the intellectual enterprises of Computer Science and the Art of Programming, taught by David Malin.
- Personal journey: many students start out nervous about CS; CS50 aims to empower through problem solving and hands-on practice.
- Emphasis: CS is broadly applicable across arts, humanities, social/natural sciences, etc., not just a narrow technical field.
- Real-world vision: by term’s end you should be able to teach yourself new ideas beyond the class with the help of TAs and fellows.
- Traditions and community: CS50 Puzzle Day, Hackathon, CS50 Fair; the goal is to build problem-solving skills and collaboration, not just language syntax.
- MIT hack analogy: education can feel like drinking from a fire hose—lots of information; with time and effort, the returns are high.
- Core message: focus on where you started versus where you end up; what matters is your delta, not relative standing to peers.
- Inspirational anecdote: first CS50 homework (hello world in C) once earned a negative score, illustrating that mistakes are part of learning.
What is computer science?
- Definition: the study of information and the processing of information.
- Central idea: computational thinking—the application of CS ideas to problems of interest in class and beyond.
- Core model of problem solving: input -> processing -> output.
- Emphasis on representation: how to represent inputs/outputs in standardized forms that computers can manipulate.
Binary fundamentals and the birth of bits
- Unary vs binary foundations:
- Unary (base-1) uses a single state per object (e.g., counting on fingers). A single hand can count up to 5 naively, but clever patterns can extend that.
- Binary (base-2) uses two states per digit (0 or 1). In practice, these states map to physical on/off switches (transistors).
- A single bit stores two possibilities: 0 or 1.
- With multiple bits you can count higher:
- With 3 bits you can count 0 through 7 (8 patterns total).
- General rule: with k bits you can count up to 2^k-1, and there are 2^k distinct patterns from all zeros to all ones.
- Examples from the talk:
- 3 bits: patterns range from 000 to 111, corresponding to decimal values 0–7.
- 3-bit counting demonstrates why 7 is the max with three bits and how 8 patterns require an extra bit.
- Why binary is preferred in hardware:
- Voltage levels are easiest to distinguish at extremes (0 vs high voltage).
- Tolerances for “in-between” levels are error-prone; binary keeps discrimination clean.
- Base comparison recap:
- Decimal (base 10) uses digits 0–9; weight per place is powers of 10: d2 10^2 + d1 10^1 + d_0 10^0 for a three-digit number.
- Binary uses digits 0 and 1; weight per place is powers of 2: d2 2^2 + d1 2^1 + d_0 2^0 for a three-bit number.
From bits to letters: ASCII and text representation
- Representing characters with bits requires a mapping from binary patterns to symbols.
- ASCII (American Standard Code for Information Interchange) maps letters, digits, and symbols to numeric codes.
- Key ASCII examples:
- Uppercase A through Z occupy decimals 65–90 (for example, 65 = 'A', 66 = 'B', …).
- Lowercase a through z occupy 97–122 (e.g., 97 = 'a', 98 = 'b').
- The examples used in the lecture: 65 = 'A', 66 = 'B', 72 = 'H', 73 = 'I', 33 = '!' (illustrating ASCII values for a simple string such as "HI!").
- Note on the bit width:
- ASCII originally used 7 bits, allowing 128 characters.
- Modern systems commonly use 8 bits per character (extended ASCII), allowing 256 characters.
- Multi-byte sequences:
- Three bytes (24 bits) can encode three characters (as in a three-byte sequence for a simple text message).
- Practical demonstration:
- A text message composed of three ASCII bytes (e.g., 72, 73, 33) corresponds to the string "HI!".
- Limits of ASCII and motivations for Unicode:
- ASCII is insufficient for all human languages; Unicode expands character representation to cover many more languages and symbols.
- Unicode can use 16, 24, or 32 bits per character (often variable-length encodings) to cover tens of thousands to billions of characters.
Emoji, languages, and the rainbow of Unicode
- Emoji are characters encoded in Unicode; their rendered appearance is font- and platform-dependent.
- Unicode’s goal: be backwards compatible with ASCII while enabling much larger character sets (including pictograms).
- Emoji longevity and updates:
- New emoji are released regularly by the Unicode Consortium, requiring software updates to display newly added glyphs correctly.
- Example: the most popular emoji as of a recent period is a face with tears of joy, represented in Unicode code points that map to a colorful glyph when rendered by fonts.
How color and images are stored as bytes
- Colors on screens are represented using RGB: red, green, blue channels.
- Standard color depth: 3 channels, each 8 bits, totaling 24 bits per pixel (often written as 24-bit color).
- Example: a single pixel color can be described by three bytes, e.g., R=72,\, G=73,\, B=33, which visually corresponds to a shade (described in the talk as a yellow-ish color in that context).
- The relation between pixel data and images: a digital image is a grid of pixels; each pixel contributes 3 bytes (for RGB) or more if a different color model is used.
- Since images contain many pixels, images become large: megabytes to gigabytes for photos and videos.
- Note on compression: modern images often use compression to reduce storage and bandwidth without losing essential quality.
How video and sound fit into the bit world
- Video basics:
- A video is a sequence of still images called frames, typically 24–30 frames per second (fps) in common media formats.
- Each frame is conceptually a still image represented by millions of color pixels (each pixel as 3 bytes for RGB).
- The large size of video data explains why videos are often gigabytes in size before compression.
- Sound basics:
- Audio is represented by samples that encode sound waves, typically with multiple channels and a sample rate (e.g., 44.1 kHz).
- In the lecture’s simplified model, sound is represented by triplets (or more) of numbers corresponding to pitch, volume, duration, and potentially instrument. In a pixel/data sense, you can think of a sequence of numbers encoded in bytes that produce audible notes when played back.
Algorithms and the power of abstraction
- What is an algorithm?
- A precise description of how to do something; it must be unambiguous and finite.
- A classic real-world example: searching a phone book.
- Naive sequential search (check one page after another) is slow: with n pages it can take up to n steps.
- A faster approach (jumping two pages at a time) is still linear but faster in practice; still not optimal.
- The optimal strategy demonstrated is binary search: repeatedly halve the search space by comparing the target with the middle element.
- Performance and time complexity:
- Three archetypal patterns shown:
- Linear search: O(n) time, roughly proportional to the size of the input.
- Skipping two at a time: O(n/2) ≈ O(n) but with a constant factor improvement.
- Binary search (divide and conquer): O(log2 n) time; dramatically better for large n.
- Graphical intuition: the linear line, a shallower line for the “two-at-a-time” approach, and a logarithmic curve for binary search.
- Why this matters:
- As data grows (e.g., Google-scale datasets), efficient algorithms become crucial for performance, usability, and cost.
- From idea to code: pseudocode as a bridge
- Pseudocode is English-like yet precise; it uses verbs (functions) and conditionals to outline algorithms without language-specific syntax.
- Example pseudocode features:
- Functions (verbs) like pick up, open, call, etc.
- Conditionals (if/else if/else) driven by Boolean expressions.
- Indentation signals scope and blocks of code; loops like go back to line X introduce repetition.
- Booleans and Boolean expressions produce true/false decisions.
- Common coding constructs and their roles:
- Functions: encapsulate a bite-sized task.
- Conditionals: forks in the road depending on test results.
- Expressions/Booleans: questions that yield true/false outcomes.
- Loops: allow repeated execution until a condition is met (or forever in some examples).
- The danger of infinite loops and the need for termination conditions.
- The practical mindset: code is designed not only to work but to be maintainable, extensible, and less error-prone (e.g., avoid duplicating code; prefer abstractions).
From pseudocode to AI and modern computing
- Artificial intelligence overview (brief):
- AI relies on large data sets and statistical models to predict responses, often via neural networks.
- Large Language Models (LLMs) are trained on vast data; they output the most probable answer rather than following a fixed rule-based set of conditionals.
- The supervisor role of developers shifts toward guiding, shaping, and constraining outputs rather than enumerating every possible response.
- Rubber duck debugging and teaching aids:
- The rubber duck concept helps programmers verbalize the problem to themselves to discover mistakes.
- CS50 uses a physical (and virtual) duck to illustrate this debugging technique and to provide a consistent support persona (the CS50 AI tool also functions as a teaching aid).
- CS50’s stance on AI tools:
- External tools like ChatGPT are policy-disallowed for coursework; students are encouraged to use CS50’s own AI tools (cs50.ai) integrated into the coding environment (Visual Studio Code, cloud-based) to assist learning without outsourcing solutions.
- The role of AI in programming education:
- AI tools can act as guided tutoring, not as a replacement for student problem solving and learning through practice.
Scratch and the first programming experiences
- Scratch as an introduction to programming concepts without heavy syntax:
- Visual blocks categorize by function: Motion (blue), Looks (purple), Sound (pink), and more.
- A Sprite (character) and a world with coordinates (x, y) to position sprites on the screen.
- Basic program structure starts with an event (e.g., When green flag clicked) to kick things off.
- A simple program: Hello World using a say block to display text on the screen.
- Inputs and outputs in Scratch:
- Inputs can be provided to functions (blocks) such as say, ask, or join (concatenate) strings.
- Outputs can be side effects (text displayed or sound played) or return values passed back to the program (via blocks like answer).
- Return values and variables:
- Some blocks return a value (e.g., Ask block returns the user’s input stored in a variable like answer).
- You can join strings (join block) to create dynamic output like "Hello, " + answer.
- Controlling timing and flow:
- Use wait blocks to slow down output so humans can read it; otherwise, actions may complete too quickly to observe.
- Use loops (repeat, forever) to repeat actions; define custom blocks (functions) to reuse code and hide implementation details.
- Extending Scratch with new functionality:
- Text-to-speech extension allows the cat to speak audible phrases.
- Sound blocks let you play, loop, or sequence sounds.
- Extensions and cloud-based features enable more interactive and data-driven behavior.
- Refactoring and design principles shown in Scratch:
- Create a custom function Meow to encapsulate the behavior of meowing.
- Parameterize with an input n to control how many times the cat meows (meow n times).
- Use a single point of truth to adjust timing or repetition by changing one input rather than duplicating blocks.
- Introduce and manage variables to keep track of state (e.g., a score in Oscar Time).
- The importance of iterative development:
- Build in baby steps: start with a minimal version, then add features, test, and refine.
- Expect bugs and view debugging as part of the learning process, not a failure.
- Example project: Oscar Time (a Scratch game)
- Components: Oscar (trash can), trash sprites, a score variable.
- Features added step by step:
- Drag items into Oscar’s trash can to earn points.
- Randomize trash spawn positions at the top, with trash falling downward.
- Collision detection between trash and Oscar increments the score and resets trash at the top.
- The developmental mindset: implement one tiny feature first, then more features over time; avoid trying to implement everything at once.
- Another example: Ivy’s Hardest Game (multi-level maze)
- Harvard sprite moves with arrow keys; Yale, MIT, and Princeton sprites act as adversaries/obstacles.
- Levels escalate in difficulty, fewer walls, more adversaries, and faster enemies.
- The goal is to reach the endpoint while avoiding obstacles; the logic evolves from a simple movement to a more complex scene with multiple moving parts.
- Final CS50 tradition and wrap-up:
- CS50’s focus on hands-on programming and problem solving culminates in projects, a final hackathon, and a campus-wide fair.
- The course stresses that learning to program is a long process with incremental milestones, not a single breakthrough.
Connecting to real-world relevance and ethical considerations
- The lecture connects foundational concepts (bits, bytes, encoding, and algorithms) to practical problems in computing and digital media.
- It emphasizes the widespread applicability of CS ideas to fields beyond traditional computer science.
- The discussion of AI includes ethical and policy considerations: in an educational setting, tools are regulated to preserve learning while offering supportive AI-assisted learning paths.
- The rubber duck debugging practice invites reflective practice: articulating problems aloud often clarifies thinking, reducing cryptic errors and improving problem-solving skills.
- The broader real-world relevance: understanding data representations helps in areas like data science, software engineering, cybersecurity, and human-computer interaction.
Quick reference: key numbers, terms, and formulas to remember
- Base concepts
- Base 1 (unary): counting with a single state per symbol (e.g., fingers).
- Base 2 (binary): digits {0,1}, weights 2^0, 2^1, 2^2, …
- Bit, byte, and capacity
- A bit is a binary digit: ext{bit} o ext{0 or 1}
- A byte is 8 bits:
- Maximum value with 1 byte: 2^8-1 = 255, giving 0–255 as possible values.
- If you count 0 through 255, there are 256 distinct values per byte.
- ASCII and character encoding
- ASCII uses 7 bits originally; extended ASCII commonly uses 8 bits per character.
- Uppercase letters: 65 \le\text{code} \le 90 (e.g., 65 = 'A', 66 = 'B', 72 = 'H', 73 = 'I').
- Lowercase letters: 97 \le\text{code} \le 122 (e.g., 97 = 'a').
- A sample: 72 = 'H', 73 = 'I', 33 = '!', which formed "HI!" in the example.
- Unicode and emoji
- Unicode expands beyond ASCII to cover many languages and symbols; a modern system may use 16, 24, or 32 bits per character.
- With 32 bits, up to 2^{32} = 4,294,967,296 code points are possible (a rough upper bound for “how many characters” can be represented).
- Emoji glyphs are rendered via fonts and platform-specific rendering; updates to the Unicode standard require software updates to render new glyphs.
- Colors and images
- RGB color model uses 3 channels per pixel: red, green, blue.
- Typical depth: 8 bits per channel, so each pixel uses 8+8+8 = 24 bits.
- A single pixel color is represented by a triple: (R,G,B) with each component in [0,255].
- Video basics
- A video is a sequence of frames; typical frame rates are \approx 24-30 frames per second.
- Sound basics
- Sound can be represented by parameters such as pitch, volume, duration; in programming, these can be encoded as numbers and combined into streams.
- Algorithm complexity (intuitively)
- Linear search: O(n) time.
- Binary search (divide and conquer): O(\log_2 n) time.
- Notation and formulas in pseudocode and practice
- Place-value reasoning, base conversions, and the general idea of encoding inputs/outputs using binary representations.
Final takeaway
- CS50 teaches that computers operate on zeros and ones, but through abstraction we work with higher-level concepts: numbers, characters, colors, images, sound, and interactive programs.
- The real power lies in designing algorithms, building abstractions (functions, blocks, variables, custom blocks), and progressively refining code to solve problems efficiently and elegantly.
- The learning path emphasizes collaboration, iteration, and the habit of explaining problems aloud (rubber duck debugging) to improve understanding and outcomes.
Title: CS50 Lecture Notes on Information Representation, Algorithms, and Scratch-based Learning