1/56
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
useable data
data that is capable of being used - i.e., data that has been processed so that it can be analyzed or used in its current form.
useful data
can someone use the data to make predictions, describe some process or solve a problem.
big data
large amounts of structured and unstructured data that can potentially be mined, examined, and used by organizations.Â
data processing
converting information that can be understood by a computer.
is all data collected utilized?
no!
unpredictable data=
useless
where does this data come from?
RFID- Radio-frequency identification
how does this data get created?
data footprints
where do these large data sets get stored?
data center
what are the steps for understanding data?
collection, extraction, storage, analysis
what are 2 types of data collection?
traditional (scales, stop watch, rulers) and tech (Google searches, GPS, # of visits to a web)
Which of the following is NOT a benefit of making digital information and scientific databases openly available across the internet?
Â
Innovations in medicine, business and science can be developed from the increased knowledge gained from large data sets.
Â
Data scientists can discover previously unnoticed trends and patterns hidden within large data sets.
Â
Inaccurate and misleading data can be more easily disseminated to scientific researchers.
Scientific researchers can more easily share data and collaborate on related research projects.
Inaccurate and misleading data can be more easily disseminated to scientific researchers.
Which of the following tasks best shows an example where the searching and sorting techniques of big data may be involved?
There are TWO correct answers.
Â
Keeping track of all employees’ email use to see how many personal or work-related emails are sent during work time to check for productivity
Recording the amount of time it takes a student to travel from one class to another class in order to find the average
Â
Creating a seating chart for a classroom based on an alphabetized list of student Names
Tallying how many pencils and pens you use throughout a school year so you know how many to buy for the start of the next school year to insure you will have enough
Creating a seating chart for a classroom based on an alphabetized list of student Names
AND
Keeping track of all employees’ email use to see how many personal or work-related emails are sent during work time to check for productivity
There are many computer applications that have been designed to help people search through large data sets to find patterns. However, not all questions require a search for a hidden pattern. Seeking the answer to which of the following questions is least likely to require an investment in software:
Â
Which contestants took the top three prizes in a talent show at a neighborhood block party?
Â
Which magazines are customers more likely to subscribe to, if they already subscribe to a particular magazine?
Â
In which zip codes of Manhattan is the demand for office space highest?
Â
Which areas of a city are crimes more likely to take place?
Which contestants took the top three prizes in a talent show at a neighborhood block party?
Google has access to a lot of data. One way to make use of the data collected by Google is to examine the relative popularity of search terms using the Google Trends feature. This tool allows users to identify trends across geographies and time, and within categories like Real Estate, Sports, Shopping, Pets & Animals, Books & Literature and Arts & Entertainment. Google Trends would be most helpful for determining which of the following?
Â
Showtimes for the latest movies released.
Â
The scores of games your favorite sports team participated in last season.
Â
Which breed of dog shows the most affection.
Â
Which week is the best week to send advertisements to parents who want Fidget Spinners or other popular toys for their children around the holidays.
Which week is the best week to send advertisements to parents who want Fidget Spinners or other popular toys for their children around the holidays.
data’s usability is defined by
The means to process and analyze the data exist (regardless if it is informative or useful.
what makes data usable?
data is useful if somebody would want to use it, in essence making it valuable for some purpose or another.
data collection
gathering and measuring information on targeted variables to answer questions and evaluate outcomes.
collaboration
working together to facilitate the application of multiple perspectives and diverse talents and skills.
unstructured data
raw data with no connections and/or relationships among data detected - requires more storage space.
-stored as it’s collected
-not easily processed
-nothing is deleted or edited
-useful to find a particular piece of data (tedious but useful)
ex. security cam footage that records and contains an overwhelming amount of information
structured data
data that is organized in some fashion - utilizes less storage space.
-more easily processed
-more useful and usable
-what might I lose if I structure data in a certain way?
-downside: structure and order applied after the data is collected kind of changes the data to make sure that nothing important is deleted—sometimes data can never be returned to its raw state
ex. making a list of different amounts of gas pumped by a customer
data set
a collection of numbers or values that relate to a particular subject usually portrayed in a relational database table. Example: column header and row contents for test scores for each student.
knowledge extraction
knowledge created from structured relational databases.
relational database
a collection of data organized and retrieved in various ways between database tables.
data
figures and facts
information
information is data that is processed, interpreted and organized to become meaningful.
data storage
the retention and retrieval of data.
screen scraping
extracting information that is formatted for human use and converting it into a format for computer use (example: scanner or pdf converter)
curation of information
gathering information pertaining to a specific topic.
analog (natural) data
information that contains continuous values.
-not truly unstructured because it’s collected over time
ex. restructuring binary (structured data)
digital data
information that has discrete values.
data
are pieces of info that are observable and/or measurable
Extraction
retrieving or processing data from unstructured data sources for further data processing, storage and/or analysis.
Collection —→ Extraction (www)
Knowledge extraction
Knowledge created from structured relational databases.
Spiderbot
AÂ virtual robot (program) that visits web sites and reads information to create entries for a search engine index.
-visit a web page
-gather all the links on each page visited
-add the links to its list of pages to visit in the future
-repeat until it has visited all pages in its list
***iterates until all pages are visited on its list***
Screen scraping (data extraction technique)
Extracting information that is formatted for human use and converting it into a format for computer use (example: scanner or pdf converter)
relational database:
a collection of data organized and retrieved in various ways between database tables.
generation loss:
the loss of quality between copies of data, usually analog formats (copies of copies) - unlike digital data where copies are identical as long as the format and size remain the same.
browser:
a computer program used to navigate and search the World Wide Web and display HTML files in a graphical format (example: Google Chrome, Internet Explorer, Mozilla Firefox)
data vs. information:
data are figures and facts while information is data that is processed, interpreted and organized to become meaningful.
data persistence:
information that is not often accessed and rarely modified. Data that remains stored after a user has deleted it.*
relational database
relational database: a collection of data organized and retrieved in various ways between database tables.
index
is a data structure that improves the speed of data retrieval operations in databases and other systems