Big Data
extremely large and ever-growing amounts of data, both structured and unstructured, that are too large or complex to be dealt with by traditional data-processing application software
structured data
data that resides in a fixed field within a spreadsheet or relational database; data that has been intentionally collected in a organized manner
unstructured data
data that is not stored in a predefined way; it is not structured or categorized
The Five V’s of Big Data
volume, velocity, variety, veracity, value
data center
a physical room, building, or facility that houses the equipment necessary for storing data and hosting applications and services
data compression
a reduction in the number of bits needed to represent data
lossy compression
a non-reversible process that reduces the number of bits in a digital file by discarding some of the data and information
lossless compression
a reversible process that reduces the number of bits in a digital file without losing any data
byte
a group of 8 bits
kilobyte
one thousand bytes
megabyte
one million bytes (or one thousand kilobytes)
gigabyte
one billion bytes (or one thousand megabytes)
terabyte
one trillion bytes (or one thousand gigabytes)
cleaning data
a process that makes data uniform without changing its meaning
bias
prejudice in favor of or against one thing, person, or group compared with another, usually in a way considered to be unfair
COPPA
Children’s Online Privacy Protection Act
data
raw facts and statistics collected together for reference or analysis
information
data that has been processed, interpreted, analyzed, organized, or structured to make it more meaningful or useful
extraction
retrieving data, processing it, and placing it in a structure that can be analyzed
filtering
temporarily removing or hiding unwanted data
sorting
arranging data in alphabetical or numerical order
data mining
the practice of analyzing large databases in order to generate new information
trend analysis
the practice of attempting to spot a pattern in data
pattern
consistencies/repetitions in data; useful for predicting future behavior or events
visualization
the presentation of data in a pictorial or graphical form; allows decision makers to grasp a difficult concept or identify new patterns
causation
proof that one event or condition directly led to another event or condition
correlation
how closely related two or more events or condition are; just because trend graphs look similar does not mean that one trend caused another trend
model
a physical or virtual representation of an object, concept, or idea
simulation
a method for testing a hypothesis of a situation using a model