AP Computer Science Principles Exam Review - Collaboration to Safe Computing

Collaboration

A computing innovation includes a program as an integral part of its function.
Computing innovations can be:
- Physical: self-driving car
- Non-physical software: picture editing software
- Non-physical computing concept: e-commerce
Effective collaboration produces a computing innovation that reflects diversity.
Collaboration with diverse perspectives helps avoid bias.
Consultation and communication with users are important.
Information from potential users helps understand a program's purpose and incorporate diverse perspectives.
Online tools support collaboration by allowing sharing and feedback.
Effective collaborative teams practice interpersonal skills: communication, consensus building, conflict resolution, and negotiation.

Program Function and Purpose

Purpose of computing innovations: to solve problems or pursue interests.
Understanding the purpose improves development.
A program is a collection of program statements that performs a specific task.
A program is also referred to as software.
A code segment is a part of a program (collection of program statements).
A program needs to work for a variety of inputs and situations.
Program behavior: how a program functions during execution, described by user interaction.
Program description: broadly by what it does, or in detail by both what it does and how.
Program inputs: data sent to a computer for processing.
- Forms: tactile, audio, visual, or text.
An event is associated with an action and supplies input data to a program.
- Generated when a key is pressed, a mouse is clicked, a program is started, or any other defined action occurs.
Input affects the program output.
Event-driven programming: program statements are executed when triggered rather than sequentially.
Input can come from a user or other programs.
Program outputs: data sent from a program to a device.
- Forms: tactile, audio, visual, or text.
Program output is usually based on input or prior state.

Program Design and Development

Development process: ordered/intentional or exploratory.
Multiple development processes exist.
Common phases:
- Investigating and reflecting
- Designing
- Prototyping
- Testing
Iterative development: requires refinement and revision based on feedback, testing, or reflection.
Incremental development: breaks the problem into smaller pieces, ensuring each works before adding to the whole.
Design incorporates investigation to determine requirements.
Investigation: used for understanding constraints and user concerns.
- Ways to perform: collecting data through surveys, user testing, interviews, and direct observations.
Program requirements describe function and user interactions.
Program specification defines the requirements.
Design phase: outlines how to accomplish the program specification.
- Includes: brainstorming, planning and storyboarding, organizing into modules, creating UI diagrams, and developing a testing strategy.
Program documentation: written description of the function of code, event, procedure, or program.
Comments: program documentation written into the program to be read by people and do not affect how a program runs.
Programmers should document throughout development.
Documentation helps in developing and maintaining correct programs.
Acknowledge any code segments developed collaboratively or by another source.
- Includes: origin or original author's name.

Identifying and Correcting Errors

Logic error: mistake in algorithm/program that causes incorrect behavior.
Syntax error: mistake where the rules of the programming language are not followed.
Runtime error: mistake that occurs during execution.
- Programming languages define their own runtime errors.
Overflow error: occurs when a computer attempts to handle a number outside of the defined range of values.
Effective ways to find and correct errors:
- Test cases
- Hand tracing
- Visualizations
- Debuggers
- Adding extra output statements
Testing uses defined inputs to ensure expected outcomes.
Programmers use results for testing to revise algorithms/programs.
Defined inputs should demonstrate different expected outcomes at or beyond the extremes of input data.
Program requirements are needed to identify appropriate defined inputs for testing.

Binary Numbers

Data values can be stored in variables, lists, or constants.
Computing devices represent data digitally using bits.
Bit: binary digit (0 or 1).
Byte: 8 bits.
Abstraction: reducing complexity by focusing on the main idea, hiding irrelevant details and bringing together related details.
Bits are grouped to represent abstractions: numbers, characters, and color.
Same sequence of bits may represent different types of data in different contexts.
Analog data: values change smoothly rather than in discrete intervals.
- Examples: pitch/volume of music, colors of a painting, position of a sprinter during a race.
Digital data approximates real-world analog data (example of abstraction).
Sampling technique: measuring values of the analog signal at regular intervals.
Integers represented by a fixed number of bits may result in overflow errors.
Some languages provide an abstraction where the size of integers is limited only by memory size.
Fixed number of bits for real numbers leads to round-off errors.
Some real numbers are represented as approximations in computer storage.
Number bases: binary (base 2), decimal (base 10).
Binary uses 0 and 1; decimal uses 0-9.
Digit's position determines its numeric value, which is the bit's value multiplied by the place value.
Place value is the base raised to the power of the position (starting at the rightmost with 0).

Data Compression

Data compression reduces the size of transmitted/stored data.
Fewer bits does not necessarily mean less information.
Reduction depends on redundancy and compression algorithm.
Lossless compression: guarantees complete reconstruction of original data.
Lossy compression: allows reconstruction of approximation of the original data; greater reduction than lossless.
Lossless is chosen when quality is maximally important.
Lossy is chosen when minimizing data size/transmission time is maximally important.

Extracting Information from Data

Information: facts and patterns extracted from data.
Data provides opportunities to identify trends, make connections, and identify problems.
Digitally processed data may show correlation between variables.
Correlation does not indicate causation; additional research is needed.
Single source may not contain enough data; combine data from different sources.
Metadata: data about data.
- Example: image metadata includes date of creation, file size.
Changes to metadata do not change the primary data.
Metadata is used for finding, organizing, and managing information.
Ability to process data depends on users and their tools.
Datasets pose challenges regardless of size (cleaning, incomplete, invalid data, combining resources).
Data may not be uniform due to collection methods.
Cleaning data: making data uniform without changing meaning.
- Example: replacing abbreviations with the same word.
Bias is often created by the type/source of data collected.
Collecting more data does not eliminate bias.
Size of dataset affects the amount of extractable information.
Large datasets may require parallel systems due to processing limitations.
Scalability is important: computational capacity affects processing and storage.

Using Programs with Data

Programs process data to acquire information.
Tables, diagrams, text, and visual tools communicate insights.
Search tools efficiently find information.
Data filtering systems recognize patterns.
Spreadsheets efficiently organize and find trends.
Processes to extract/modify information:
- Transforming elements (e.g., doubling numbers).
- Filtering (e.g., keeping positive numbers).
- Combining/comparing data (e.g., finding highest GPA).
- Visualizing datasets.
Programs used iteratively/interactively to gain insight.
Programmers filter/clean data to gain knowledge.
Combining, clustering, and classifying data are part of gaining insights.
Insights can be obtained from transforming digitally represented information.
Patterns can emerge when data is transformed using programs.

Variables and Assignments

Variable: an abstraction inside a program that can hold a value.
Each variable has associated data storage that represents one value at a time.
Meaningful names improve readability.
Programming languages provide types: numbers, booleans, lists, and strings.
The assignment operator allows a program to change the value represented by the variable.
The value stored in a variable will be the most recent value assigned.

Abstraction

List: an ordered sequence of elements.
Element: an individual value in a list that is assigned a unique index.
Index: a common method for referencing the elements in a list or string using natural numbers.
String: an ordered sequence of characters.
Data abstraction: separation between abstract properties and concrete details.
Data abstractions manage complexity by giving a name without referencing specific details.
Data abstractions can be created using lists.
Developing data abstraction results in easier development and maintenance.
Data abstractions often contain different types of elements.
Lists allow multiple related items to be treated as a single value.

Mathematical Expressions

Algorithm: a finite set of instructions that accomplish a specific task.
Algorithms can be expressed in natural language, diagrams, and pseudocode.
Algorithms are implemented using programming languages.
Every algorithm can be constructed using combinations.
Sequencing: applying steps in the order given.
Code statement: part of program code that expresses an action.
Expression: value, variable, operator, or procedure call that returns a value.
Expressions are evaluated to produce a single value.
Evaluation follows order of operations defined by the programming language.
Sequential statements execute in order.
Clarity and readability are important considerations.
Arithmetic operators: addition, subtraction, multiplication, division, and modulus operators.
The order of operations used in mathematics applies when evaluating expressions.

Strings

String concatenation joins together two or more strings, end to end to make a new string.
A substring is part of an existing string.

Boolean Expressions

Boolean value: true or false.
Relational operators: equal to, not equal to, greater than, less than, greater than or equal to, and less than or equal to.
For relationships between Boolean values, write expressions using logical operators and evaluate expressions that use logic operators.
Logical operators: not, and, and or.
Operand is either a Boolean expression or a single Boolean value.

Conditional

Selection determines which part of an algorithm are executed based on a condition being true or false.
Conditional statements or if statements affect the sequential flow of control by executing different statements based on the value of a Boolean expression.
Nested conditionals consist of conditional statements within conditional statements.

Iteration

Iteration is a repeating portion of an algorithm.
Iteration repeats a specified number of times or until a given condition is met.
Iteration statements change the sequential flow of control by repeating a set of statements zero or more times until a stopping condition is met.

Developing Algorithms

Algorithms can be written differently and still accomplish the same tasks.
Similar algorithms can yield different side effects or results.
Conditional statements can be written as equivalent Boolean expressions.
Boolean expressions can be written as equivalent conditional statements.
Different algorithms can be developed to solve the same problem.
Algorithms can be created by combining or modifying existing algorithms.
Knowledge of existing algorithms helps in constructing new ones.
Using existing correct algorithms as building blocks has benefits.

Lists

Basic operations on lists: accessing an element by index, assigning a value of an element of a list to a variable, assigning a value to an element of a list, inserting elements at a given index, adding elements to the end of a list, removing elements, and determining the length of a list.
List procedures are implemented in accordance with syntax rules of the programming language.
Traversing a list can be a complete traversal where all elements in the list are accessed or a partial traversal where only a portion of elements are accessed.
Iteration statements can be used to traverse a list.
Knowledge of existing algorithms that use iteration can help in constructing new algorithms.

Linear and Binary Search

Linear search or sequential search algorithms check each element of a list in order until desired value is found of all elements in the list and they have been checked.
Binary search starts at the middle of a sorted data set of numbers and eliminates half of the data. This process repeats until the desired value is found or all elements have been eliminated.
Data must be in sorted order to use the binary search algorithm.
Binary search is often more efficient than sequential linear search when applied to sorted data.

Calling procedures

Procedure: a named group of programming instructions that may have parameters and return values.
Procedures are referred to by different names, such as a method or function depending on the programming language.
Parameters are input variables of a procedure.
Arguments specify the values of the parameters when a procedure is called.
A procedure call interrupts a sequential execution of statements, causing the program to execute the statements within the program before continuing. Once the last statement in the procedure or return statement has executed, flow of control is returned to the point immediately following where the procedure was called.

Developing Procedures

Procedural abstraction provides a name for a process and allows a process to be used only knowing what it does, not how it does it.
Procedural abstraction allows a solution to a large problem to be based on the solutions of smaller sub problems.
This is accomplished by creating procedures to solve each of the sub problems.
The subdivision of a computer program into separate subprograms is called modularity.
A procedural abstraction may extract shared features to generalized functionality instead of duplicating code. This allows for program code reuse, which helps manage complexity.
Using parameters allows procedures to be generalized, enabling the procedures to be used with a range of input values or arguments.
Using procedural abstraction helps improve code readability.
Using procedural abstraction in a program allows programmers to change the internals of the procedure to make it faster, more efficient, use less storage, etcetera, without needing to notify users of the change as long as what the procedure does is preserved.
The return statement may appear at any point inside the procedure and causes an immediate return from the procedure back to the calling statement.

Libraries

A software library contains procedures that may be used in creating new programs.
Existing code segments can come from internal or external sources, such as libraries or previously written code.
The use of libraries simplifies the task of creating complex programs.
Application program interfaces, also known as APIs, are specifications for how the procedure in a library behave and can be used.
Documentation for an API or library is necessary in understanding the the behaviors provided by the API or library and how to use them.

Random Values

Using random number generation in a program means each execution may produce a different result.
This generates and returns a random integer from a to b, inclusively. Each result is equally likely to occur.
For example, $random(1, 3)$ could return a one, two, or three.

Simulations

Simulations are abstractions of more complex objects or phenomena for a specific purpose.
A simulation is a representation that uses varying sets of values to reflect the changing states of a phenomenon.
Simulations often mimic real world events with the purpose of drawing inferences, allowing investigation of a phenomenon without the constraints of the real world.
The process of developing an abstract simulation involves removing specific details or simplifying functionality.
Simulations can contain bias derived from the choices of real world elements that were included or excluded.
Simulations are most useful when real world events are impractical for experiments. For example, they're too big, too small, too fast, too slow, too expensive, or too dangerous.
Simulations facilitate the formulation and refinement of hypothesis related to the objects or phenomena under consideration.
Random number generators can be used to simulate the variability that exists in the real world.

Algorithmic Efficiency

A program is a general description of a task that can or cannot be solved algorithmically.
An instance of a program also includes specific input. For example, sorting is a problem. Sorting the list two three one seven is an instance of the problem.
A decision problem is a problem with a yes or no answer. For example, is there a path from a to b?
An optimization problem is a problem with the goal of finding the best solution among many possible solutions. For example, what is the shortest path from a to b?
Efficiency is an estimation of the amount of computational resources used by an algorithm.
Efficiency is typically expressed as a function of the size of the input.
An algorithm's efficiency is determined through formal or mathematical reasoning.
The algorithm's efficiency can be informally measured by determining the number of times a statement or group of statements executes.
Different correct algorithms for the same problem can have different efficiencies.
Algorithms with a polynomial efficiency or slower, constant, linear, square, cube, etcetera, are said to run-in a reasonable amount of time.
Algorithms with exponential or factorial efficiencies are examples of algorithms that run-in an unreasonable amount of time.
Some problems cannot be solved in a reasonable amount of time because there is no efficient algorithm for solving them. In these cases, approximate solutions are solved.
A heuristic is an approach to a problem that produces a solution that is not guaranteed to be optimal, but may be used when techniques that are guaranteed to always find an optimal solution are impractical.

Undecidable Problems

A decidable problem is a decision problem for which an algorithm can be written to produce a correct output for all inputs. For example, is a number even?
An undecidable problem is one for which no algorithm can be constructed that is always capable for providing a correct yes or no answer.
An undecidable problem may have some instances that have an algorithmic solution, but there is no algorithmic solution that could solve all instances of the problem.

The Internet

A computing device is a physical artifact that can run a program. Some examples include computers, tablets, servers, routers, and smart sensors.
A computing system is a group of computing devices and programs working together for a common purpose.
A computer network is a group of interconnected computing devices capable of sending or receiving data. A computer network is a type of computing system.
A path between two computing devices on a computer network, a sender and a receiver, is a sequence of directly connected computing devices that begins at the sender and ends at the receiver.
Routing is the process of finding a path from sender to the receiver.
The bandwidth of a computer network is the maximum amount of data that can be sent in a fixed amount of time. Bandwidth is usually measured in bits per second.
The Internet is a computer network consisting of interconnected networks that use standardized, open, nonproprietary communication protocols.
Access to the Internet depends on the ability to connect a computing device to an Internet connected device.
A protocol is agreed upon set of rules that specify the behavior of a program.
The protocols used in the Internet are open, which allows users to easily connect additional computing devices to the Internet.
Routing on the Internet is usually dynamic. It is not specified in advance.
The scalability of a system is the capacity for the system to change in size and scale to meet new demands. The Internet was designed to be scalable.
Information is passed through the Internet as a data stream. Data streams contain chunks of data, which are encapsulated in packets.
Packets contain a chunk of data and metadata used for routing the packet between the origin and the destination on the Internet, as well as for data reassembly.
Packets may arrive at the destination in order, out of order, or not at all.
IP, TCP, and UDP are common protocols used on the Internet.
The World Wide Web is a system of linked pages, programs, and files. HTTP is a protocol used by the World Wide Web. The World Wide Web uses the Internet.

Fault Tolerance

The Internet has been engineered to be fault tolerant with abstractions for routing and transmitting data.
Redundancy is the inclusion of extra components that can be used to mitigate failure of a system if other components fail.
One way to accomplish network redundancy is by having more than one path between any two connected devices. If a particular device or connection on the Internet fails, subsequent data will be sent via a different route if possible.
When a system can support failures and still continue to function, it is called fault tolerant. This is important because elements of complex systems fail unexpected times, often in groups, and fault tolerance allows users to continue to use the node.
Redundancy within a system often requires additional resources, but can provide the benefit of fault tolerance. The redundancy of routing options between two points increases the reliability of the Internet and helps us scale to more devices and more people.

Parallel and Distributed Computing

Sequential computing is a computational model in which operations are performed in order one at a time.
Parallel computing is a computational model where the program is broken into multiple smaller sequential computing operations, some of which are performed simultaneously.
Distributed computing is a computational model in which multiple devices are used to run a program.
Comparing efficiency of solutions can be done by comparing the time it takes them to perform the same task. A sequential solution takes as long as the sum of all of its steps.
A parallel computing solution takes as long as its sequential tasks plus the longest of its parallel tasks. The speed up of a parallel solution is measured in the time it took to complete the task sequentially divided by the time it took to complete the task when done in parallel.
Parallel computing consists of a parallel portion and a sequential portion. Solutions that use parallel computing can scale more effectively than solutions that use sequential computing. Distributed computing allows problems to be solved that could not be solved on a single computer because of either the processing time or storage needs involved.
Distributed computing allows much larger problems to be solved quicker than they could be solved using a single computer. When increasing the use of parallel computing in a solution, the efficiency of the solution is still limited by the sequential portion. This means that at some point, adding parallel portions will no longer meaningfully increase efficiency.

Beneficial and Harmful Effects

People create computing innovations.
The way people complete tasks often changes to incorporate new computing innovations. Not every effect of a computing innovation is anticipated in advance. A single effect can be viewed as both beneficial and harmful by different people or even by the same person.
Advances in computing have generated an increased creativity in other fields, such as medicine, engineering, communications, and the arts. Computing innovations can be used in ways that their creators had not originally intended.
The worldwide web was originally intended only for rapid and easy exchange of information within the scientific community. Targeted advertising is used to help businesses, but it can be misused at both individual and aggregate levels. Machine learning and data mining have enabled innovation in medicine, business, and science, but information discovered in this way has also been used to discriminate against groups of individuals.
Some of the ways computing innovations can be used may have a harmful impact on society, the economy, or culture. Responsible programmers try to consider the unintended ways their computing innovations can be used and the potential beneficial and harmful effects of these new uses.
It is not possible for a programmer to consider all the ways a computing innovation can be considered. Computing innovations have often had unintended beneficial effects by leading to advances in other fields. Rapid sharing of a program or running a program with a large number of users can result in significant impacts beyond the intended purse intended purpose or control of the programmer.

Digital Divide

Internet access varies between socioeconomic, geographic, and demographic characteristics, as well as between countries. The digital divide refers to differing access to computing devices and the Internet based on socioeconomic, geographic, or demographic characteristics. The digital divide can affect both groups of individuals.
The digital divide raises issues of equity, access, and influence, both globally and locally. The digital divide is affected by the actions of individuals, organizations, and government.

Computing Bias

Computing innovations can reflect existing human biases because of biases written into algorithms or biases in the data used by the innovation. Programmers should take action to reduce bias in algorithms used for computing innovations as a way of combating existing human biases. Biases can be embedded at all levels of software development.

Crowdsourcing

Widespread access to information and public data facilitates the identification of problems, development of solutions and dissemination of results. Science has been affected by using distributed and citizen science to solve scientific problems. Citizen science is scientific research conducted in whole or part by distributed individuals, many of whom may not be scientists who contribute relevant data to research using their own computing devices.
Crowdsourcing is the practice of of obtaining input or information from a large number of people via the Internet. Human capabilities can be enhanced by collaboration via computing. Crowdsourcing offers new models for collaboration, such as connecting businesses or social causes with funding.

Legal and Ethical Concerns

Material created on a computer is the intellectual property of the creator and or organization. Ease of access and distribution of digitized information raises intellectual property concerns regarding ownership, value, and use. Measure should be taken to safeguard intellectual property. The use of material created by someone else without permission and presented as one's own is plagiarism and may have legal consequences.
Some examples of legal ways to use materials created by someone else include creative commons, which is a public copyright license that enables the free distribution of an otherwise copyrighted work. This is used when the content creator wants to give others the right to share, use, and build upon the work they have created. Open source, which are programs that are made freely available and may be redistributed and modified. And open access online resource output free of any and all restriction on access and free of many restrictions on use, such as copyright or license restrictions.
The use of material created by someone else other than you should always be cited. Creative Commons, open source, and open access have enabled broad access to digital information. As with any technology or medium, using computing to harm individuals or groups of people raises legal and ethical concerns. Computing can play a role in in social and political issues, which in turn often raises legal and ethical concerns. The digital divide raises ethical concerns around computing. Computing innovations can raise legal and ethical concerns. Some examples of these include the development of software that allows access to digital media downloads and streaming, the development of algorithms that include bias, and the existence of computing devices that collect and analyze data by continuously monitoring activities.

Safe Computing

Personally identifiable information, known as PII, is information about an individual that identifies, links, relates, and describes them. Examples of PII include Social Security number, age, race, phone numbers, medical information, and financial information. Search engines can record and maintain a history of searches made by users. Websites can record and maintain a history of individuals who have viewed their pages. Devices, websites, and networks can collect information about users' location.
Technology enables the collection, use, and exploitation of information about, by, and for individuals, groups, and institutions. Search engines can use search history to suggest websites or for targeted marketing. Disparate personal information, such as geolocation, cookies, and browsing history can be aggregated to create knowledge about an individual. PII and other information placed online can be used to enhance a user's online experiences. PII stored online can be used to simplify making online purchases. Commercial and governmental curation of information may be exploited if privacy and other protections are ignored.
Information placed online can be used in ways that were not intended and that may have a harmful impact. For example, an email message may be forwarded, tweets can be retweeted, and social media posts can be viewed by potential employers. PII can be used to stop or steal the identity of a person or to aid in the planning of other criminal acts. Once information is placed online, it is difficult to delete. Programs can collect your location and record where you have been, how you got there, and how long you were at a given location. Information posted to social media services can be used by others. Combining information posted on social media and other sources can be used to deduce private information about you.
Certificate authorities issue digital certificates that validate the ownership of encryption keys used in secure communications and are based on a trust model. Computer virus and malware scanning software can help protect a computing system against infection. A computing virus is a malicious program that can copy itself and gain access to a computer in an unauthorized way. Computer viruses often attach themselves to legitimate programs and start running independently on a program. Malware is software intended to damage a computing system or to take partial control over its operation. All real world systems have errors or design flaws that can be exploited to compromise them.
Regular software updates help fix errors that could compromise a computing system. Users can control the permissions programs have for collecting user information. Users should review the permission settings of programs to protect their privacy. Phishing with a p h is a technique that attempts to trick a user into providing personal information. The personal information can then be used to access sensitive online resources, such as bank accounts and emails. Key logging is the use of a program to record every keystroke made by a computer user in order to gain fraudulent access to passwords and other confidential information.
Data sent over public networks can be intercepted, analyzed, and modified. One way this can happen is through a rogue access point. A rogue access point is a wireless access point that gives unauthorized access to secure networks. A malicious link can be disguised on a web page or in an email message. Unsolicited emails, attachments, links, and forms and emails can be used to compromise the security of a computing system. These can come from unknown senders or from known senders whose security has been compromised. Untrustworthy, often free downloads from freeware or shareware sites can contain malware.