Data 200 Lect. 2, 3,4
Data is represented in Binary
strings of 1 ands 0
small info = bits
Binary Basics
0 → 1
1 → 1
2 → 10
3 → 11
4 → 100
Base of 2
Always divide by 2
Characters and More
ASCII table - converts hexadecimal and decimals into commands
Colors
How to represent them?
RGB (###, ###, ###)
Hexadecimal (#RRGGBB)
each of the two-digit color codes are in the range 00…FF hexadecimal
Data Groups
Structured Data
everything has a name and a type
relationship is defined between value and name
eg. spreadsheets
Unstructured
no implied relationship between values
Semi-structured Data
some portions are structures and some are not
Files
CSV (Comma Separated Values)
text has quotes, numbers do not
each value is separated from other by commas
xml (eXentsible Markup Language)
XML extends HTML to provide structure for exchanging non-document information
eg. webpage
tsv (Tab Separated Value
same as CSV but separated by Tab
JSOM: Javascript Object notation
RSS (really simple syndication
dialect of XML
A lot of diff. ways data can be stored. in class mainly using CSV when incorporating data
How to find dataset?
data.gov
dataset.google.com
Lect. 3
Center of data
Mean
Median
Mode
Spread
Standard Deviation
Range difference
St. dev range difference
range/SD
Z-score
(observed # - mean) /SD
Visualizing Data/Finding relationships
Histogram
if data is continuous, bars touch
Density plot
gives more info than a histogram
Bar plot
bars don’t touch
can arrange in any order
Scatter plot
Data is represented in Binary
strings of 1 ands 0
small info = bits
Binary Basics
0 → 1
1 → 1
2 → 10
3 → 11
4 → 100
Base of 2
Always divide by 2
Characters and More
ASCII table - converts hexadecimal and decimals into commands
Colors
How to represent them?
RGB (###, ###, ###)
Hexadecimal (#RRGGBB)
each of the two-digit color codes are in the range 00…FF hexadecimal
Data Groups
Structured Data
everything has a name and a type
relationship is defined between value and name
eg. spreadsheets
Unstructured
no implied relationship between values
Semi-structured Data
some portions are structures and some are not
Files
CSV (Comma Separated Values)
text has quotes, numbers do not
each value is separated from other by commas
xml (eXentsible Markup Language)
XML extends HTML to provide structure for exchanging non-document information
eg. webpage
tsv (Tab Separated Value
same as CSV but separated by Tab
JSOM: Javascript Object notation
RSS (really simple syndication
dialect of XML
A lot of diff. ways data can be stored. in class mainly using CSV when incorporating data
How to find dataset?
data.gov
dataset.google.com
Lect. 3
Center of data
Mean
Median
Mode
Spread
Standard Deviation
Range difference
St. dev range difference
range/SD
Z-score
(observed # - mean) /SD
Visualizing Data/Finding relationships
Histogram
if data is continuous, bars touch
Density plot
gives more info than a histogram
Bar plot
bars don’t touch
can arrange in any order
Scatter plot