Chapter 3: Data

**Abstractions**

**Bits**are grouped to represent abstractions.These abstractions include but are not limited to numbers, characters, and colors.

**Abstractions**find common features to generalize the program.

**Analog vs. Digital Data**

An

**analog signal**has values that change smoothly over time, rather than in discrete intervals.Analog signals are continuous signals, while digital signals are discrete time signals.

A

**digital signal**is an analog signal that has been broken up into steps.

**Consequences of Using Bits to Represent Data**

A

**variable**is an abstraction inside a program that can hold a value.Each variable has associated data storage that represents one value at a time.

However, value can be a list or other collection that, in turn, contains multiple values.

Some data types include integers, real numbers, Boolean, string, and list.

**Number Systems**

Number bases, including binary, decimal, and hexadecimal, are used to represent and investigate digital data.

DECIMAL | BINARY |
---|---|

0 | 0000 |

1 | 0001 |

2 | 0010 |

3 | 0011 |

4 | 0100 |

5 | 0101 |

6 | 0110 |

7 | 0111 |

8 | 1000 |

9 | 1001 |

10 | 1010 |

**Converting Numbers into Different Bases**

**Convert a binary (BIN) number to a decimal (DEC) number.**

## 11011BIN = ?DEC

STEPS:

Step 1. A five-column table is needed because 11011 has five digits. Start byputting a 1 into the upper-right box of the five-column table.

Step 2. Fill in the remaining first row by continually multiplying by the base. Because the original number is in binary, fill the columns by continually multiplying the product by 2.

Step 3. Place the numbers to be converted into the second row.

Step 4. Add the result of multiplying row 1 by row 2.

Answer: 27DEC

**Convert a decimal (DEC) number to a binary (BIN) number.**

30DEC = ?BIN

Step 1. Create a flexible table with enough columns until the number in the upper row is just bigger than the number you are converting.

Step 2. Start with the largest number that is still smaller than the target number. Subtract the number in the upper row of the table from the original number.

Step 3. 14 − 8 = 6

Step 4. 6 − 4 = 2

Step 5. 2 − 2 = 0

Step 6: 0

Answer: 11110BIN

**Various Errors**

**Overflow Errors**

An

**overflow error**occurs when the result of a computation is too large for the available storage space.This results in data loss, as some information gets cut off due to lack of memory.

Overflow errors can occur in almost any programming language and can be very difficult to debug.

**Roundoff Errors**

A

**roundoff error**occurs when decimals (real numbers) are rounded.One computer might calculate 1/3 as 0.333333. Another computer might calculate ⅓ as 0.3333333333.

In this case, 1/3 on one computer is not equal to 1/3 on a second computer.

**Lossy and Lossless Data Compression**

**Data compression**is reducing the size (number of bits) of transmitted or stored data.**Digital data compression**often involves trade-offs in quality versus storage requirements.**Lossy compression**can significantly reduce the file size while decreasing resolution.Traditionally, lossy compression is used to reduce file size for storage and transmission (email).

**Lossless data compression,**no data are lost.After compression, the original file can be reproduced without any lost data.

**Information Extracted From Data**

People can use computer programs to process information as well as to gain insight and knowledge.

Information is the collection of facts and patterns extracted from data.

Depending on how the data were collected, the information may not be uniform.

For example, if users entered data into an open field, the way they chose to abbreviate, spell, or capitalize something may vary from user to user.

Cleaning data is a process that makes the data uniform without changing their meaning.

**Predicting Algorithms**

**Predicting algorithms**use information collected from big data to influence our daily lives.For example:

A credit card company can use purchasing patterns to identify when to extend credit or flag a purchase for possible fraud.

Social media sites can use patterns to target advertising based on viewing habits.

**Visualization of Data**

Using appropriate visualizations when presenting digitally processed data can help one gain insight and knowledge.

Although big data is a powerful tool, the data will lose their value if they cannot be presented in a way that can be interpreted.

**Visualization tools**can communicate information about data.Column charts, line graphs, pie charts, bar charts, XY charts, radar charts, histograms, and waterfall charts can make complex data easier to interpret.

**Privacy Concerns**

**Privacy concerns**arise through the mass collection of data.The content of the data may contain personal information and can affect the choice in storage and transmitting.

**Geolocation**, when used within a program, helps you find the approximate geographic location of an IP address along with some other useful information, including ISP, time zone, area code, state, and so on.

**Metadata**

**Metadata**are data that describe your data—for example, a picture of you standing in front of a waterfall is data.The location and time the picture was taken are metadata.

Metadata are used for finding, organizing, and managing information.

Metadata can increase the effective use of data or data sets by providing additional information about various aspects of that data.

# Chapter 3: Data

**Abstractions**

**Bits**are grouped to represent abstractions.These abstractions include but are not limited to numbers, characters, and colors.

**Abstractions**find common features to generalize the program.

**Analog vs. Digital Data**

An

**analog signal**has values that change smoothly over time, rather than in discrete intervals.Analog signals are continuous signals, while digital signals are discrete time signals.

A

**digital signal**is an analog signal that has been broken up into steps.

**Consequences of Using Bits to Represent Data**

A

**variable**is an abstraction inside a program that can hold a value.Each variable has associated data storage that represents one value at a time.

However, value can be a list or other collection that, in turn, contains multiple values.

Some data types include integers, real numbers, Boolean, string, and list.

**Number Systems**

Number bases, including binary, decimal, and hexadecimal, are used to represent and investigate digital data.

DECIMAL | BINARY |
---|---|

0 | 0000 |

1 | 0001 |

2 | 0010 |

3 | 0011 |

4 | 0100 |

5 | 0101 |

6 | 0110 |

7 | 0111 |

8 | 1000 |

9 | 1001 |

10 | 1010 |

**Converting Numbers into Different Bases**

**Convert a binary (BIN) number to a decimal (DEC) number.**

## 11011BIN = ?DEC

STEPS:

Step 1. A five-column table is needed because 11011 has five digits. Start byputting a 1 into the upper-right box of the five-column table.

Step 2. Fill in the remaining first row by continually multiplying by the base. Because the original number is in binary, fill the columns by continually multiplying the product by 2.

Step 3. Place the numbers to be converted into the second row.

Step 4. Add the result of multiplying row 1 by row 2.

Answer: 27DEC

**Convert a decimal (DEC) number to a binary (BIN) number.**

30DEC = ?BIN

Step 1. Create a flexible table with enough columns until the number in the upper row is just bigger than the number you are converting.

Step 2. Start with the largest number that is still smaller than the target number. Subtract the number in the upper row of the table from the original number.

Step 3. 14 − 8 = 6

Step 4. 6 − 4 = 2

Step 5. 2 − 2 = 0

Step 6: 0

Answer: 11110BIN

**Various Errors**

**Overflow Errors**

An

**overflow error**occurs when the result of a computation is too large for the available storage space.This results in data loss, as some information gets cut off due to lack of memory.

Overflow errors can occur in almost any programming language and can be very difficult to debug.

**Roundoff Errors**

A

**roundoff error**occurs when decimals (real numbers) are rounded.One computer might calculate 1/3 as 0.333333. Another computer might calculate ⅓ as 0.3333333333.

In this case, 1/3 on one computer is not equal to 1/3 on a second computer.

**Lossy and Lossless Data Compression**

**Data compression**is reducing the size (number of bits) of transmitted or stored data.**Digital data compression**often involves trade-offs in quality versus storage requirements.**Lossy compression**can significantly reduce the file size while decreasing resolution.Traditionally, lossy compression is used to reduce file size for storage and transmission (email).

**Lossless data compression,**no data are lost.After compression, the original file can be reproduced without any lost data.

**Information Extracted From Data**

People can use computer programs to process information as well as to gain insight and knowledge.

Information is the collection of facts and patterns extracted from data.

Depending on how the data were collected, the information may not be uniform.

For example, if users entered data into an open field, the way they chose to abbreviate, spell, or capitalize something may vary from user to user.

Cleaning data is a process that makes the data uniform without changing their meaning.

**Predicting Algorithms**

**Predicting algorithms**use information collected from big data to influence our daily lives.For example:

A credit card company can use purchasing patterns to identify when to extend credit or flag a purchase for possible fraud.

Social media sites can use patterns to target advertising based on viewing habits.

**Visualization of Data**

Using appropriate visualizations when presenting digitally processed data can help one gain insight and knowledge.

Although big data is a powerful tool, the data will lose their value if they cannot be presented in a way that can be interpreted.

**Visualization tools**can communicate information about data.Column charts, line graphs, pie charts, bar charts, XY charts, radar charts, histograms, and waterfall charts can make complex data easier to interpret.

**Privacy Concerns**

**Privacy concerns**arise through the mass collection of data.The content of the data may contain personal information and can affect the choice in storage and transmitting.

**Geolocation**, when used within a program, helps you find the approximate geographic location of an IP address along with some other useful information, including ISP, time zone, area code, state, and so on.

**Metadata**

**Metadata**are data that describe your data—for example, a picture of you standing in front of a waterfall is data.The location and time the picture was taken are metadata.

Metadata are used for finding, organizing, and managing information.

Metadata can increase the effective use of data or data sets by providing additional information about various aspects of that data.