big data
large amounts of structured and unstructured data that can potentially be mined, examined, and used by organizations.
data processing
converting information that can be understood by a computer.
useable data
data that is capable of being used - i.e., data that has been processed so that it can be analyzed or used in its current form.
useful data
can someone use the data to make predictions, describe some process or solve a problem.
data collection
gathering and measuring information on targeted variables in order to answer questions and evaluate outcomes.
collaboration
working together to facilitate the application of multiple perspectives and diverse talents and skills.
unstructured data
raw data with no connections and/or relationships among data detected - requires more storage space.
structured data
data that is organized in some fashion - utilizes less storage space.
data set
a collection of numbers or values that relate to a particular subject usually portrayed in a relational database table. Example: column header and row contents for test scores for each student.
knowledge extraction
knowledge created from structured relational databases.
relational database
a collection of data organized and retrieved in various ways between database tables.
data vs. information
data are figures and facts while information is data that is processed, interpreted and organized to become meaningful.
data storage
the retention and retrieval of data.
extraction
retrieving or processing data from unstructured data sources for further data processing, storage and/or analysis.
knowledge extraction
knowledge created from structured relational databases.
spiderbot
a virtual robot (program) that visits web sites and reads information to create entries for a search engine index.
unstructured data
raw data with no connections and/or relationships among data detected - requires more storage space.
structured data
data that is organized in some fashion - utilizes less storage space.
screen scraping
extracting information that is formatted for human use and converting it into a format for computer use (example: scanner or pdf converter)
relational database
a collection of data organized and retrieved in various ways between database tables.
generation loss
the loss of quality between copies of data, usually analog formats (copies of copies) - unlike digital data where copies are identical as long as the format and size remains the same.
browser
a computer program used to navigate and search the World Wide Web and display HTML files in a graphical format (example: Google Chrome, Internet Explorer, Mozilla Firefox)
data vs. information
data are figures and facts while information is data that is processed, interpreted and organized to become meaningful.
data persistence
information that is not often access and rarely modified.
data storage
static storage of various capacities and speed such as CDs', DVD's, flash memory, main memory, cache memory, magnetic tape, etc.
indexing
the specific organization and method of keeping track of data.
filter bubble
limiting a user's perspective by having an algorithm selectively determine what type of information a user would like to see based on past search history and behavior.
privacy concerns
digitization of personal data means your data is now easier to reproduce, share, sell and access.
utility
the measurement of usefulness - example: sharing personal digital data in order to receive something of value in return.
cache
a memory location to store active data temporarily to shorten data access times and reduce latency.
reCAPTCHA
a digital tool used to deter automated form-filling and exploitation of web-based registration systems.
crowdsourcing
obtaining information from a large number of people, either paid or unpaid, voluntary or involuntary.
human computation
using human cognition to provide computational data via techniques such as crowdsourcing.
descriptive analytics
information about collected data using statistics (mean, median, mode, range) which describe circumstances.
predictive analytics
information about future events based on collected and analyzed data.
analytics
information resulting from the systematic analysis of data or statistics.
automated summarization
summarizing data to a simpler state by removing redundant or less significant details.
visualization
the representation of information using a chart, diagram, image, etc.
regression analysis
the forecasting of change through statistical analysis of the strength of the relationship between one dependent variable and other changing independent variables.
metadata
descriptive data about an image, a web page, or other complex objects (data about data)
curation of information
gathering information pertaining to a specific topic.
models
physical or virtual representations of an object.
simulations
test a hypothesis of a situation using a model.