Computer Science Principles - Big Idea 2
Binary Numbers
- Binary Number System: A number system that uses two digits, 0 and 1.
- One bit is either 1 or 0.
- One byte is 8 bits.
- Binary Digit (Bit): The smallest unit of data in computing.
- Byte: A group of 8 bits.
- Representing Electrical Signals:
- 1 (ON): Represents an electrical signal in the computer being on.
- 0 (OFF): Represents an electrical signal in the computer being off.
- Transistors: Circuits in a computer's processor are made up of billions of transistors.
- Binary and Electrical Signals: The digits 1 and 0 used in binary reflect the electrical signal being on and off.
- Information Storage: All software, music, documents, and any other information processed by a computer is stored in sequences of binary 0s and 1s.
- Interpretation of Binary Sequences: How a binary sequence is interpreted depends on how it will be used.
- A byte of information, like 0100 1001, can be used to represent instructions to the computer.
- Representation of Instructions: The 1s and 0s can represent anything, including pictures, letters, and videos.
Digital Data and Abstractions
- Abstraction: Reduces complexity and allows focusing on the main idea or larger problem.
- Example:
- ASCII representation of "H.!"
- Binary:
01001000 01101000 00100001 - Decimal: 72 105 33
- Analog Data: A mechanism, device, or technology that represents data by measurement of a continuous physical variable.
- Abstraction Example: The use of digital data to approximate real-world analog data is an example of abstraction.
- Sampling: Measuring the values of the analog signal at regular intervals.
- Samples are measured to figure out the exact bits required to create and store the analog data in digital form.
- Essential Knowledge:
- DAT-1.A.1: Data values can be stored in variables, lists of items, or standalone constants and can be passed as input to (or output from) procedures (return value).
- DAT-1.A.2: Computing devices represent data digitally, meaning that the lowest-level components of any value are bits.
- DAT-1.A.3: Bit is shorthand for binary digit and is either 0 or 1.
- DAT-1.A.4: A byte is 8 bits.
Representing Integers with Fixed Number of Bits
- All data is represented by 1s and 0s arranged in groups called bytes.
- This includes integers (whole numbers, even and odd, including 0).
- Integers are represented in computers by a fixed number of bits.
- Example: Some programming languages store data values in up to 32 bits (or 4 bytes).
- 4 bytes can represent 2^{32} different values, which is a little over 4 billion different values total.
- DAT-1.B.1: In many programming languages, integers are represented by a fixed number of bits, which limits the range of integer values and mathematical operations on those values. This limitation can result in overflow or other errors.
- DAT-1.B.2: Other programming languages provide an abstraction through which the size of representable integers is limited only by the size of the computer's memory; this is the case for the language defined in the exam reference sheet.
- DAT-1.B.3: In programming languages, the fixed number of bits used to represent real numbers limits the range and mathematical operations on these values; this limitation can result in round-off and other errors. Some real numbers are represented as approximations in computer storage.
Overflow Error
- If a program encounters a calculation that requires a number larger than what its memory will allow to be stored, this can result in an overflow error.
Round-off Error
- Programming languages can have problems with real numbers like pi.
Computer's Available Memory
- Ideal situation: the range of numbers a computer can work with would only be limited by the computer's available memory.
- Real world: this is not always possible.
- If a number stretches towards infinity, it would require an infinite amount of computer memory in order to store and calculate, which is not possible.
4-Bit Computer Example
- Computer uses only 4 bits to represent integers.
- First bit = sign of integer (positive or negative).
- Other 3 bits for the absolute value.
- Largest number this system could represent:
- Binary:
0111 - Positive number 7 since 2^2 + 2^1 + 2^0 = 4 + 2 + 1 = 7
- What would happen if we ran a program like this on the 4-bit computer, where the largest positive integer is 7?
X <- 7
y <- x+1
- Overflow error or number too large.
- Could possibly wrap the number around like an odometer that has reached its max and 8 becomes 1.
Binary Numbers: Base 2 and Base 10 Conversions
- DAT-1.C: For binary numbers:
- Calculate the binary (base 2) equivalent of a positive integer (base 10) and vice versa.
- Compare and order binary numbers.
- DAT-1.C.1: Number bases, including binary and decimal, are used to represent data.
- DAT-1.C.2: Binary (base 2) uses only combination of the digits zero and one.
- DAT-1.C.3: Decimal (base 10) uses only combination of the digits 0-9.
Decimal Number System Place Values
- Example: 5012
- Place Values: … Powers of 10
- 5 * 10^3 + 0 * 10^2 + 1 * 10^1 + 2 * 10^0
Binary Number System Place Values
- Example: 1101
- Place Values: … Powers of 2
- 1 * 2^3 + 1 * 2^2 + 0 * 2^1 + 1 * 2^0
Converting Binary to Decimal
- Example: 00101001
- Deconstructing a binary number means adding up the powers of 2 that are "turned on."
- 2^0 + 2^3 + 2^5 = 1 + 8 + 32 = 41
Constructing a Binary Number
- Figuring out which powers of 2 add up to the number you want.
Data Compression
- Data compression is a reduction in the number of bits needed to represent data.
- Data compression is used to save transmission time and storage space.
How Compression Works
- When data is compressed, you are looking for repeated patterns and predictability.
- The larger the data file, the more patterns that can be pulled out.
Text Compression
- Remove all repeated characters and insert a single character or symbol in its place.
Data Compression Methods: Lossless vs Lossy
- Lossless: Reduces the number of bits stored or transmitted while guaranteeing complete reconstruction of the original data.
- Lossy: Significantly reduces the number of bits stored or transmitted but only allows reconstruction of an approximation of the original data.
Lossy vs Lossless
- Lossless:
- The typical approach where the loss of words or numbers would change the information.
- Examples: Executable files, text, spreadsheet files.
- Lossy:
- The typical approach where the removal of some data has little or no discernible effect on the representation of the content since the data removed are redundant, unimportant, or imperceptible.
- Examples: Graphics, audio, video, images.
Lossy Example
.jpg compression algorithm is used on images.- Divides the picture up into blocks and squares.
- Uses approximation to average out the pixel color data.
- DAT-2.C.6: The size of a data set affects the amount of information that can be extracted from it.
- DAT-2.C.7: Large data sets are difficult to process using a single computer and may require parallel systems.
- DAT-2.C.8: Scalability of systems is an important consideration when working with data sets, as the computational capacity of a system affects how data sets can be processed and stored.
Where to Start with Data
- Collecting Data
- Issues to consider:
- Source: Do you need more sources?
- Potential Bias:
- Intentional: Who collected the data? Do they have an agenda?
- Unintentional: How is the data collected? Who collected the data?
Data Cleaning
- Identifying incomplete, corrupt, duplicate, or inaccurate records.
- Replacing, modifying, or deleting the "dirty" data.
- Be careful about modifying or deleting!
- Be sure there is a mistake!
- Keep records of what data is modified/deleted and WHY.
- Invalid data may need to be modified - keep form consistent.
- Prefix meta: behind, among, between
- Metadata – data about data
- Some data has information about itself.
- Why?
- Data - Photo/Image
- Date
- Time
- Location
- Height
- Width
- Pixels
Using Programs with Data
- DAT-2.E: Explain how programs can be used to gain insight and knowledge from data.
- Using programs, the data can be stored in types of lists to be processed.
- After filtering and cleaning the data, users can utilize the program to interact with the data to gain insight and knowledge.
- Users can interact with the data by filtering, sorting, combining, transforming, clustering or classifying.
- Each iteration leads to more knowledge and insight!
- Spreadsheets are very powerful with Selection and Interation.
- Lists in Programming Languages give flexibility to do anything programmer wants.
- Visit in Big Idea 3