AP Computer Science Principles Midterm

Computers

  • Computing Device:  a machine that can run a program, including computers, tablets, servers, routers, and smart sensors

  • Computing System: a group of computing devices and programs working together for a common purpose

  • Computing Network:  a group of interconnected computing devices capable of sending or receiving data.

 

Connecting These Devices

  • Router: A type of computer that forwards data across a network

  • Path: the series of connections between computing devices on a network starting with a sender and ending with a receiver.

  • Redundancy:  the inclusion of extra components so that a system can continue to work even if individual components fail, for example by  having more than one path between any two connected devices in a network.

  • Fault Tolerant:  Can continue to function even in the event of individual component failures. This is important because elements of complex systems like a computer network fail at unexpected times, often in groups.

  • Bandwidth: the maximum amount of data that can be sent in a fixed amount of time, usually measured in bits per second. 

  • Packet:  A chunk of data sent over a network. Larger messages are divided into packets that may arrive at the destination in order, out-of-order, or not at all.

  • Packet Metadata: Data added to all packets to help route them through the network and potentially reassemble the original message.

  • Datastream: Information passed through the internet in packets.

  • Scalability: the capacity for the system to change in size and scale to meet new demands

 

Protocols

  • Protocol:  An agreed-upon set of rules that specify the behavior of some system

  • Internet Protocol (IP): a protocol for sending data across the Internet that assigns unique numbers (IP addresses) to each connected device

  • User Datagram Protocol (UDP):  A protocol for sending packets quickly with minimal error-checking and no resending of dropped packets

  • Transmission Control Protocol (TCP):  A protocol for sending packets that does error-checking to ensure all packets are received and properly ordered

  • Hypertext Transfer Protocol (HTTP): a protocol for computers to request and share the pages that make up the world wide web on the Internet

  • Domain Name System (DNS): the system responsible for translating domain names like example.com into IP addresses

 

Digital Divide

  • Differing access to computing devices and the Internet, based on socioeconomic, geographic, or demographic characteristics. 

  • Can affect both individual and groups. 

  • Raises ethical concerns of equity, access, and influence globally and locally. 

  • Affected by the actions of individuals, organizations, and governments.

 

 User Interface:  the inputs and outputs that allow a user to interact with a piece of software. User interfaces can include a variety of forms such as buttons, menus, images, text, and graphics.

Input:  data that are sent to a computer for processing by a program. Can come in a variety of forms, such as tactile interaction, audio, visuals, or text.

Output:  any data that are sent from a program to a device. Can come in a variety of forms, such as tactile interaction, audio, visuals, or text.

Program Statement: a command or instruction. Sometimes also referred to as a code statement.

Program: a collection of program statements. Programs run (or “execute”) one command at a time.

Sequential Programming: program statements run in order, from top to bottom.

  • No user interaction

  • Code runs the same way every time

Event Driven Programming: some program statements run when triggered by an event, like a mouse click or a key press

  • Programs run differently each time depending on user interactions

Debugging Strategies

  • Keep your code clean

  • Run your code

  • Use classmates and resources

Documentation: a written description of how a command or piece of code works or was developed.

Comment: form of program documentation written into the program to be read by people and which do not affect how a program runs.

Pair Programming: a collaborative programming style in which two programmers switch between the roles of writing code and tracking or planning high level progress

 Correlation does not equal Causation


Metadata:  data about data

Visualizations can help us:

  • Answer questions

  • Look at lots of data at once

  • See patterns that are "invisible" if you just look at the table


Cleaning and Filtering

When does data need to be cleaned?

  • Data is incomplete

  • Data is invalid

  • Multiple tables are combined into one


What leads to "messy" data?

  • Users enter in different types of data ("two", 2)

  • Users use different abbreviations to represent the same information ("February", "Feb", "Febr")

  • Data may have different spellings ("color", "colour") or inconsistent capitalization ("spring", "Spring")


Filtering data allows the user to look at a subset of the data.

Types of Charts

Bar Chart: Count how many times each value in the column appears and make a bar at that height.


Information we can get out of bar charts:

  • What value(s) are most common in this column?

  • What value(s) are least common in this column?

  • What is the unique list of values in this column?


Histogram: Similar to a bar chart, but first all numbers in a range or "bucket" are grouped together.  For example, the chart below has a bucket size of 20 so the numbers 41, 48, and 53 would all be placed in the same bucket between 40 and 60.


Information we can get out of histograms:

  • What range of value(s) are most common in this column?

  • What range value(s) are least common in this column?

  • What ranges of values do or do not appear in this column?


Histograms can only be created with numeric data but can be useful when a normal bar chart may be difficult to read.



Cross Tab: Counts how often pairs of values in two columns appear.


Information we can get out of cross tab charts:


  • Finding the most / least common combinations of values in two columns

  • Finding patterns across two columns

  • Exploring two columns when one or both are strings.

Not useful if either column has too many values because the chart would be enormous


Scatter Plot: Shows combinations of values from two columns


Information we can get out of scatter plots:

  • Seeing patterns and trends between two values

  • Numeric data with lots of different values

Not useful for lots of repeated values

Open Data 

  • "sharing data with others so they can can analyze it"

  • Open data is publicly available data shared by governments, organizations, and others

  • Making data open help spread useful knowledge or creates opportunities for others to use it to solve problems

Citizen Science and Crowdsourcing

  • "collecting data from others so you can analyze it"

  • Crowdsourcing is the practice of obtaining input or information from a large number of people via the Internet.

  • Citizen science is research where some of the data collection is done by members of the public using own computing devices which leads to solving scientific problems

  • Crowdsourcing offers new models for collaboration, such as connecting businesses or social causes with funding

  • Both are examples of how human capabilities can be enhanced by collaboration via computing

Big data 

  • "Collect huge amounts of data so we can learn even more from it"

  • The size of the datasets we analyzed impacts how much information can be extracted

  • As a result, in business, science, and many other contexts people are working with increasingly big data sets

  • When data gets too big it can no longer be processed on one computer. Cloud computing or parallel systems are sometimes used to help process all that information.

  • In general scalability of your system is important to consider when working with big data. You want your system to be able to work even as you're using more and more data.

robot