CS50 Lecture Notes: Information Representation, Algorithms, and Scratch Basics

Representing Information and the CS50 Foundations

  • Intro and personal framing

    • CS50 is Harvard’s introduction to the intellectual enterprises of Computer Science and the Art of Programming, taught by David Malin.
    • Personal journey: many students start out nervous about CS; CS50 aims to empower through problem solving and hands-on practice.
    • Emphasis: CS is broadly applicable across arts, humanities, social/natural sciences, etc., not just a narrow technical field.
    • Real-world vision: by term’s end you should be able to teach yourself new ideas beyond the class with the help of TAs and fellows.
    • Traditions and community: CS50 Puzzle Day, Hackathon, CS50 Fair; the goal is to build problem-solving skills and collaboration, not just language syntax.
    • MIT hack analogy: education can feel like drinking from a fire hose—lots of information; with time and effort, the returns are high.
    • Core message: focus on where you started versus where you end up; what matters is your delta, not relative standing to peers.
    • Inspirational anecdote: first CS50 homework (hello world in C) once earned a negative score, illustrating that mistakes are part of learning.
  • What is computer science?

    • Definition: the study of information and the processing of information.
    • Central idea: computational thinking—the application of CS ideas to problems of interest in class and beyond.
    • Core model of problem solving: input -> processing -> output.
    • Emphasis on representation: how to represent inputs/outputs in standardized forms that computers can manipulate.
  • Binary fundamentals and the birth of bits

    • Unary vs binary foundations:
    • Unary (base-1) uses a single state per object (e.g., counting on fingers). A single hand can count up to 5 naively, but clever patterns can extend that.
    • Binary (base-2) uses two states per digit (0 or 1). In practice, these states map to physical on/off switches (transistors).
    • A single bit stores two possibilities: 0 or 1.
    • With multiple bits you can count higher:
    • With 3 bits you can count 0 through 7 (8 patterns total).
    • General rule: with k bits you can count up to 2^k-1, and there are 2^k distinct patterns from all zeros to all ones.
    • Examples from the talk:
    • 3 bits: patterns range from 000 to 111, corresponding to decimal values 0–7.
    • 3-bit counting demonstrates why 7 is the max with three bits and how 8 patterns require an extra bit.
    • Why binary is preferred in hardware:
    • Voltage levels are easiest to distinguish at extremes (0 vs high voltage).
    • Tolerances for “in-between” levels are error-prone; binary keeps discrimination clean.
    • Base comparison recap:
    • Decimal (base 10) uses digits 0–9; weight per place is powers of 10: d2 10^2 + d1 10^1 + d_0 10^0 for a three-digit number.
    • Binary uses digits 0 and 1; weight per place is powers of 2: d2 2^2 + d1 2^1 + d_0 2^0 for a three-bit number.
  • From bits to letters: ASCII and text representation

    • Representing characters with bits requires a mapping from binary patterns to symbols.
    • ASCII (American Standard Code for Information Interchange) maps letters, digits, and symbols to numeric codes.
    • Key ASCII examples:
    • Uppercase A through Z occupy decimals 65–90 (for example, 65 = 'A', 66 = 'B', …).
    • Lowercase a through z occupy 97–122 (e.g., 97 = 'a', 98 = 'b').
    • The examples used in the lecture: 65 = 'A', 66 = 'B', 72 = 'H', 73 = 'I', 33 = '!' (illustrating ASCII values for a simple string such as "HI!").
    • Note on the bit width:
    • ASCII originally used 7 bits, allowing 128 characters.
    • Modern systems commonly use 8 bits per character (extended ASCII), allowing 256 characters.
    • Multi-byte sequences:
    • Three bytes (24 bits) can encode three characters (as in a three-byte sequence for a simple text message).
    • Practical demonstration:
    • A text message composed of three ASCII bytes (e.g., 72, 73, 33) corresponds to the string "HI!".
    • Limits of ASCII and motivations for Unicode:
    • ASCII is insufficient for all human languages; Unicode expands character representation to cover many more languages and symbols.
    • Unicode can use 16, 24, or 32 bits per character (often variable-length encodings) to cover tens of thousands to billions of characters.
  • Emoji, languages, and the rainbow of Unicode

    • Emoji are characters encoded in Unicode; their rendered appearance is font- and platform-dependent.
    • Unicode’s goal: be backwards compatible with ASCII while enabling much larger character sets (including pictograms).
    • Emoji longevity and updates:
    • New emoji are released regularly by the Unicode Consortium, requiring software updates to display newly added glyphs correctly.
    • Example: the most popular emoji as of a recent period is a face with tears of joy, represented in Unicode code points that map to a colorful glyph when rendered by fonts.
  • How color and images are stored as bytes

    • Colors on screens are represented using RGB: red, green, blue channels.
    • Standard color depth: 3 channels, each 8 bits, totaling 24 bits per pixel (often written as 24-bit color).
    • Example: a single pixel color can be described by three bytes, e.g., R=72,\, G=73,\, B=33, which visually corresponds to a shade (described in the talk as a yellow-ish color in that context).
    • The relation between pixel data and images: a digital image is a grid of pixels; each pixel contributes 3 bytes (for RGB) or more if a different color model is used.
    • Since images contain many pixels, images become large: megabytes to gigabytes for photos and videos.
    • Note on compression: modern images often use compression to reduce storage and bandwidth without losing essential quality.
  • How video and sound fit into the bit world

    • Video basics:
    • A video is a sequence of still images called frames, typically 24–30 frames per second (fps) in common media formats.
    • Each frame is conceptually a still image represented by millions of color pixels (each pixel as 3 bytes for RGB).
    • The large size of video data explains why videos are often gigabytes in size before compression.
    • Sound basics:
    • Audio is represented by samples that encode sound waves, typically with multiple channels and a sample rate (e.g., 44.1 kHz).
    • In the lecture’s simplified model, sound is represented by triplets (or more) of numbers corresponding to pitch, volume, duration, and potentially instrument. In a pixel/data sense, you can think of a sequence of numbers encoded in bytes that produce audible notes when played back.
  • Algorithms and the power of abstraction

    • What is an algorithm?
    • A precise description of how to do something; it must be unambiguous and finite.
    • A classic real-world example: searching a phone book.
    • Naive sequential search (check one page after another) is slow: with n pages it can take up to n steps.
    • A faster approach (jumping two pages at a time) is still linear but faster in practice; still not optimal.
    • The optimal strategy demonstrated is binary search: repeatedly halve the search space by comparing the target with the middle element.
    • Performance and time complexity:
    • Three archetypal patterns shown:
      • Linear search: O(n) time, roughly proportional to the size of the input.
      • Skipping two at a time: O(n/2) ≈ O(n) but with a constant factor improvement.
      • Binary search (divide and conquer): O(log2 n) time; dramatically better for large n.
    • Graphical intuition: the linear line, a shallower line for the “two-at-a-time” approach, and a logarithmic curve for binary search.
    • Why this matters:
    • As data grows (e.g., Google-scale datasets), efficient algorithms become crucial for performance, usability, and cost.
    • From idea to code: pseudocode as a bridge
    • Pseudocode is English-like yet precise; it uses verbs (functions) and conditionals to outline algorithms without language-specific syntax.
    • Example pseudocode features:
      • Functions (verbs) like pick up, open, call, etc.
      • Conditionals (if/else if/else) driven by Boolean expressions.
      • Indentation signals scope and blocks of code; loops like go back to line X introduce repetition.
      • Booleans and Boolean expressions produce true/false decisions.
    • Common coding constructs and their roles:
    • Functions: encapsulate a bite-sized task.
    • Conditionals: forks in the road depending on test results.
    • Expressions/Booleans: questions that yield true/false outcomes.
    • Loops: allow repeated execution until a condition is met (or forever in some examples).
    • The danger of infinite loops and the need for termination conditions.
    • The practical mindset: code is designed not only to work but to be maintainable, extensible, and less error-prone (e.g., avoid duplicating code; prefer abstractions).
  • From pseudocode to AI and modern computing

    • Artificial intelligence overview (brief):
    • AI relies on large data sets and statistical models to predict responses, often via neural networks.
    • Large Language Models (LLMs) are trained on vast data; they output the most probable answer rather than following a fixed rule-based set of conditionals.
    • The supervisor role of developers shifts toward guiding, shaping, and constraining outputs rather than enumerating every possible response.
    • Rubber duck debugging and teaching aids:
    • The rubber duck concept helps programmers verbalize the problem to themselves to discover mistakes.
    • CS50 uses a physical (and virtual) duck to illustrate this debugging technique and to provide a consistent support persona (the CS50 AI tool also functions as a teaching aid).
    • CS50’s stance on AI tools:
    • External tools like ChatGPT are policy-disallowed for coursework; students are encouraged to use CS50’s own AI tools (cs50.ai) integrated into the coding environment (Visual Studio Code, cloud-based) to assist learning without outsourcing solutions.
    • The role of AI in programming education:
    • AI tools can act as guided tutoring, not as a replacement for student problem solving and learning through practice.
  • Scratch and the first programming experiences

    • Scratch as an introduction to programming concepts without heavy syntax:
    • Visual blocks categorize by function: Motion (blue), Looks (purple), Sound (pink), and more.
    • A Sprite (character) and a world with coordinates (x, y) to position sprites on the screen.
    • Basic program structure starts with an event (e.g., When green flag clicked) to kick things off.
    • A simple program: Hello World using a say block to display text on the screen.
    • Inputs and outputs in Scratch:
      • Inputs can be provided to functions (blocks) such as say, ask, or join (concatenate) strings.
      • Outputs can be side effects (text displayed or sound played) or return values passed back to the program (via blocks like answer).
    • Return values and variables:
    • Some blocks return a value (e.g., Ask block returns the user’s input stored in a variable like answer).
    • You can join strings (join block) to create dynamic output like "Hello, " + answer.
    • Controlling timing and flow:
    • Use wait blocks to slow down output so humans can read it; otherwise, actions may complete too quickly to observe.
    • Use loops (repeat, forever) to repeat actions; define custom blocks (functions) to reuse code and hide implementation details.
    • Extending Scratch with new functionality:
    • Text-to-speech extension allows the cat to speak audible phrases.
    • Sound blocks let you play, loop, or sequence sounds.
    • Extensions and cloud-based features enable more interactive and data-driven behavior.
    • Refactoring and design principles shown in Scratch:
    • Create a custom function Meow to encapsulate the behavior of meowing.
    • Parameterize with an input n to control how many times the cat meows (meow n times).
    • Use a single point of truth to adjust timing or repetition by changing one input rather than duplicating blocks.
    • Introduce and manage variables to keep track of state (e.g., a score in Oscar Time).
    • The importance of iterative development:
    • Build in baby steps: start with a minimal version, then add features, test, and refine.
    • Expect bugs and view debugging as part of the learning process, not a failure.
    • Example project: Oscar Time (a Scratch game)
    • Components: Oscar (trash can), trash sprites, a score variable.
    • Features added step by step:
      • Drag items into Oscar’s trash can to earn points.
      • Randomize trash spawn positions at the top, with trash falling downward.
      • Collision detection between trash and Oscar increments the score and resets trash at the top.
    • The developmental mindset: implement one tiny feature first, then more features over time; avoid trying to implement everything at once.
    • Another example: Ivy’s Hardest Game (multi-level maze)
    • Harvard sprite moves with arrow keys; Yale, MIT, and Princeton sprites act as adversaries/obstacles.
    • Levels escalate in difficulty, fewer walls, more adversaries, and faster enemies.
    • The goal is to reach the endpoint while avoiding obstacles; the logic evolves from a simple movement to a more complex scene with multiple moving parts.
    • Final CS50 tradition and wrap-up:
    • CS50’s focus on hands-on programming and problem solving culminates in projects, a final hackathon, and a campus-wide fair.
    • The course stresses that learning to program is a long process with incremental milestones, not a single breakthrough.
  • Connecting to real-world relevance and ethical considerations

    • The lecture connects foundational concepts (bits, bytes, encoding, and algorithms) to practical problems in computing and digital media.
    • It emphasizes the widespread applicability of CS ideas to fields beyond traditional computer science.
    • The discussion of AI includes ethical and policy considerations: in an educational setting, tools are regulated to preserve learning while offering supportive AI-assisted learning paths.
    • The rubber duck debugging practice invites reflective practice: articulating problems aloud often clarifies thinking, reducing cryptic errors and improving problem-solving skills.
    • The broader real-world relevance: understanding data representations helps in areas like data science, software engineering, cybersecurity, and human-computer interaction.
  • Quick reference: key numbers, terms, and formulas to remember

    • Base concepts
    • Base 1 (unary): counting with a single state per symbol (e.g., fingers).
    • Base 2 (binary): digits {0,1}, weights 2^0, 2^1, 2^2, …
    • Bit, byte, and capacity
    • A bit is a binary digit: ext{bit} o ext{0 or 1}
    • A byte is 8 bits:
      • Maximum value with 1 byte: 2^8-1 = 255, giving 0–255 as possible values.
      • If you count 0 through 255, there are 256 distinct values per byte.
    • ASCII and character encoding
    • ASCII uses 7 bits originally; extended ASCII commonly uses 8 bits per character.
    • Uppercase letters: 65 \le\text{code} \le 90 (e.g., 65 = 'A', 66 = 'B', 72 = 'H', 73 = 'I').
    • Lowercase letters: 97 \le\text{code} \le 122 (e.g., 97 = 'a').
    • A sample: 72 = 'H', 73 = 'I', 33 = '!', which formed "HI!" in the example.
    • Unicode and emoji
    • Unicode expands beyond ASCII to cover many languages and symbols; a modern system may use 16, 24, or 32 bits per character.
    • With 32 bits, up to 2^{32} = 4,294,967,296 code points are possible (a rough upper bound for “how many characters” can be represented).
    • Emoji glyphs are rendered via fonts and platform-specific rendering; updates to the Unicode standard require software updates to render new glyphs.
    • Colors and images
    • RGB color model uses 3 channels per pixel: red, green, blue.
    • Typical depth: 8 bits per channel, so each pixel uses 8+8+8 = 24 bits.
    • A single pixel color is represented by a triple: (R,G,B) with each component in [0,255].
    • Video basics
    • A video is a sequence of frames; typical frame rates are \approx 24-30 frames per second.
    • Sound basics
    • Sound can be represented by parameters such as pitch, volume, duration; in programming, these can be encoded as numbers and combined into streams.
    • Algorithm complexity (intuitively)
    • Linear search: O(n) time.
    • Binary search (divide and conquer): O(\log_2 n) time.
    • Notation and formulas in pseudocode and practice
    • Place-value reasoning, base conversions, and the general idea of encoding inputs/outputs using binary representations.
  • Final takeaway

    • CS50 teaches that computers operate on zeros and ones, but through abstraction we work with higher-level concepts: numbers, characters, colors, images, sound, and interactive programs.
    • The real power lies in designing algorithms, building abstractions (functions, blocks, variables, custom blocks), and progressively refining code to solve problems efficiently and elegantly.
    • The learning path emphasizes collaboration, iteration, and the habit of explaining problems aloud (rubber duck debugging) to improve understanding and outcomes.
  • Title: CS50 Lecture Notes on Information Representation, Algorithms, and Scratch-based Learning