1/101
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Structured data
data easily searchable by algorithms; clearly defined
Exs: Facts (name, order number, quantity, location, time), Questionnaire / Survey data (1-5, coded), Bibliographic info about unstructured data, Online behavior (clicks, views, links, game moves), Constructed data (sentiment, mood, intent), Metadata (data about data)
unstructured data
data that is difficult to be searched by algorithms, doesn't fit into relational databases; has internal structure but it is not structured via pre-defined data models or schema
Ex: Emails, Voice messages, Texts, tweets, Social media, Video / Audio / Photo
Which is easier for businesses to process, structured or unstructured data, and why?
structured data is easier for businesses to process because it is easier for everyone to access and it's controlled by professionals
What does "Big Data" mean?
The name given to the increasingly HUGE collection of data captured from the world.
Includes structured Data and Unstructured Data
Comes from all sorts of inputs
What are the "5 V's" that describe big data?
Velocity, Variety, Volume, Veracity, Value
Velocity
the speed at which new data are gathered and stored
Variety
the variety of the kinds of new data
Volume
the sheer quantity of data being gathered and stored
Veracity
the quality (accuracy, credibility) of the data
Value
what you can do with the data
What are a few of the challenges that Big Data presents to businesses?
The vast amount of growing data is a bit of a challenge in itself, but the unstructured portion of Big Data makes it difficult to manage. We still don't have good techniques for indexing and analyzing it.
Why should we, in a business, avoid using spreadsheets to store important data and instead use a database management system? Know at least 3 reasons
Data redundancy: Unnecessary duplication
data inconsistency
Data isolation: Difficult to do efficient data retrieval and search
Data insecurity: Too easy to get access to it
Data errors: Easy to make errors
What are the characteristics of high-quality information?
-accurate, complete, consistent, timely, and accessible.
-Know that businesses spend lots of money to ensure their data has these qualities
Accurate
the degree to which information is correct and free from error
- is there an incorrect value in the information?
Complete
- is there a value missing from the information?
Consistent
the degree to which information is compatible with previous information
- is summary information in agreement with the detailed information?
Timely
the degree to which information is available in time to perform the task at hand
- is the information current with respect to business needs?
Accessible
is it able to be found? organized?
What are the qualities of poor data management?
Data redundancy, data
inconsistency, data isolation, and data insecurity.
Data redundancy
occurs when unnecessary duplicate information exists in a database
Ex: customer information is stored in multiple places, and updated differently,leading to data inconsistency
Data inconsistency
A condition in which different versions of the same data yield different results.
Ex: purchase order info in email, on customer account, in shipping not synched
Data isolation
Difficult to do efficient data retrieval and search
Ex: data in marketing is not available to people in finance
Data insecurity.
Too easy to get access to data
Ex: Spreadsheet on laptop, easy to hack passwords, open public workstations
In designing a database, what is the name of the activity or technique that clients and IT staff do together to create an understanding of the data requirements for the database?
data modeling
Know what these mean with respect to storage and memory, and which is bigger than the
other: KB, MB, GB, TB, PB
Kilobyte (KB): one thousand bytes 10^3
Megabyte (MB): one million bytes 10^6
Gigabyte (GB): one billion bytes 10^9
Terabyte (TB): one trillion bytes 10^12
Petabyte (PB): one quadrillion bytes 10^15
Database
-Stores data about various types of objects (inventory), events (transactions), people (employees), and places (warehouses).
-A database file is static, it doesn't change. Only an app like SQL can CRUD (create, read, update, delete) the data in a database file.
Database Management System (DBMS)
-includes the database AND also the apps that let people USE the database.
-creates, reads, updates, and deletes (CRUD) data in a database while controlling access and security.
-The processes that work the information.
What are the parts of a database management system?
Hardware, Software, Networks, Data, Procedures, Media and People.
Why are these advantages of having a database management system: data are located centrally, data quality is controlled, data is accessible, data are easier to maintain.
Data are easier to find, easier to CRUD, and easier to ensure high-quality
relational database
A database that represents data as a collection of tables in which all data relationships are represented by common values in related tables
Relational Database: Data value
An actual piece of information, at the smallest level
Example: "Mickey" for first name, "Mouse" for last name
Relational Database: Field
The smallest meaningful type of data (columns in a table)
Example: Mickey's first name, Mickey's zip code, price, product name
Relational Database: Record
Set of fields containing all info known about one entity (rows in a table)
Each record contains the same fields in the same sequence
Example: all name and address info about one customer - Mickey
Relational Database: File/Table
Collection of related records (like customer info, financial info, inventory info)
Example: a complete set of names and addresses of all customers
Relational Database: Database
Collection of files/tables
Relational Database: SQL
Lorenzo Shipping Company
Five relational files/tables: Customer, shipment, truck, driver, city
What do we mean by "populating" a database?
it means we add data to it; you create a record
What is a primary key in a relational database? How do we use it?
-A field (or combination of fields) that uniquely identifies a given record in a table
-Should contain some value that is highly unlikely ever to be null and MUST be unique across all records in that file.
Ex: RedID
What is a foreign key? How do we use it?
a primary key of one table that appears as a field in another file and serves as a logical link between the two files
What is a derived attribute and how do we get the value when we run a query
an attribute value that can be derived from the values of other attributes
Ex: if we have an attribute for birth date then age is derivable.
Be able to look at a picture of a data base schema (like the Lorenzo Shipping example we used in class) and name the different parts.
column, row, primary key, foreign key, file etc.
Know the names of the commercially available databases from these vendors Oracle, IBM, Microsoft, and open source
18^c Oracle Database
IBM DB2
Microsoft SQL Server
Amazon web services
OpenSource MySQL
Sybase
structured decision
A decision that is routine and repetitive and often has well-defined procedures for making the decision.
Ex: Solving a math problem for which there is one right answer
unstructured decision
A decision that is novel and therefore has no agreed-upon, well-understood procedure for making the decision.
We don't know what info we need and we don't know what procedure to use.
Ex: Which products should our company design that will make at least $1 billion?
semi-structured decision
A decision for which some parts are structured and some parts are unstructured.
Some info is known, some is not. A known process will answer some of it, but not all of it.
Ex: What price should we give our new product? (unknown: market elasticity)
decision
a conclusion or resolution reached after consideration.
Outcome: choose one thing over another.
problem
A matter or situation regarded as unwelcome or harmful and needing to be dealt with and overcome. Quality of urgency.
Ex: " We're losing sales and we need to change that"
opportunity
A set of circumstances that makes it possible to do something different. Not so urgent.
Ex: "they are making lots of money selling X, can I?"
paradox
a seemingly absurd or self-contradictory statement or proposition that when investigated or explained may prove to be well founded or true.
Ex: "if you use fewer words you sound more intelligent"
dilemma
a situation in which a difficult choice has to be made between two or more alternatives, especially equally undesirable ones.
Ex: "If I don't pay a bribe, I might not get the business. But if I do pay a bribe, I might get fired"
The Steps of the Rational Decision-Making process are...
1. Identify and define/describe the problem
2. Define the requirements and goals of the decision.
3. Identify alternative solutions
4. Define decision criteria
5. Select appropriate decision-making process and tools.
6. Evaluate alternative solutions using criteria
7. Check that the solution solves the problem
What is the Sensemaking approach and when do we use it?
-A process of creating meaning when there is no single meaning available.
Useful when you don't know what's going on. -Useful for making unstructured problems more structured
What is DSS and what kind of app is it?
Decision Support System
a software or program that helps professionals make and justify decisions in many industries.
decision matrix
A tool for systematically ranking alternatives according to a set of criteria.
Weighted Decision Matrix
not all criteria is of equal importance. To make this, assign each criterion a weight relative to its importance. One way to do this is to distribute 100% across the criteria.
OLAP (Online Analytical Processing)
A process for gathering data from different sources in the organization and storing it so we can run decision analysis on them
How do OLAP and DSS tools work together?
(equate "cube" with "database")
DSS is a system more for data collection, and this supports the functionality of the OLAP system by giving it all the data it needs. Both work together to support each other in analyzing data and coming up with a decision
What is a business decision making model?
a model that uses information and results to make a decision based on the data
what-if model
checks the impact of a change in an assumption on the proposed solution
Ex: What happens to sales if we set the price at 10% higher?
sensitivity model
The study of the impact that changes in one (or more) parts of the model have on other parts of the model
Ex: At what point in price increases will sales drop by 10%?
goal-seeking model
Finds the inputs necessary to achieve a goal such as a desired level of output
Ex: "I only want to pay $400/month, what do I need to do to make that happen"
optimization model
Extension of goal-seeking analysis; finds the optimum (best) value for a target variable by repeatedly changing other variables to see what scenario produces that optimum value.
Ex: "We want to maximize profit so what are all the variables we need to do that"
What makes a good model vs a bad model?
A good model is one which makes good predictions if high-quality inputs are given to the model; a bad model is one in which even the correct inputs doesn't result in good predictions
What are data visualization tools and when are they most useful?
a visual representation of data with the goal of clearly communicating or better understanding the meaning of the data. They are useful in uncovering trends and relationships in data that might be less apparent when viewing the data in tables Examples: graphs, charts, heat maps
What is a definition of Artificial Intelligence?
Any system that perceives its environment and takes actions that maximize its chance of achieving its goal
Computer programs that mimic - and improve on - human cognition
expert system
A programmer writes a program that uses the same rules as human experts (after interviews and observations)
Ex: doctor diagnosis, oil drilling locations, financial investments
intelligent agent
An application that does specific tasks on behalf of its users
Ex: shopping, stock picking, or spamming
Ex: Siri, Bixby, Alexa, Youper, etc.
supervised learning
We give the Machine Learning algorithm known and labeled data to learn from.
Start with data that is known, then using that knowledge, it looks at new, unknown data and uses what it learned to identify it.
Unsupervised Learning
We give the Machine Learning algorithm unknown and unlabeled data to learn from.
Start with unknown data, then it picks out patterns on its own, remembers what it sees, and then uses that to identify new, unknown data.
How does a generative AI application like ChatGPT work?
use neural networks to identify the patterns and structures within existing data to generate new and original content.
LAN (Local Area Network)
Connects a group of computers in close proximity, such as in an office building, school, store coffee shop, home
Typically, all hardware & software is owned by a single company
Intranets are a type of LAN
WAN (Wide Area Network)
Spans a large geographic area such as a state, province, or country
All HW & SW is owned by several companies
The Internet is a type of WAN
ISP (Internet Service Provider)
A company that provides access to the Internet.
Protocol for WiFi
IEEE 802.11 (and its other name, Wifi 6)
Protocols for mobile/cell phones
4G and 5G
How do we measure data transmission speed and how is that different from how we measure data storage capacity?
Measured in bits (b) NOT Bytes (B)
What does bandwidth mean and how do we measure it?
It's the amount of data you can move in a second, measured using bps (bits per second)
When you bought internet access through a company like Cox or Spectrum (or if you were going to) how much download bandwidth would be reasonable at this time?
Depends on what you need the internet for:
-1-5 Mbps for email and web browsing.
-15-25 Mbps for streaming HD video.
-40-100 Mbps for streaming 4K video and light online gaming.
-200+ Mbps for streaming 4K video, online gaming, and downloading large files.
Which has lower/higher bandwidth among these: Bluetooth, WiFi, Ethernet?
Lowest = Bluetooth at ~24 Mbps
Middle = WiFi at ~600 Mbps
Highest = Ethernet at ~1000 Mbps
TCP (Transmission Control Protocol)
provides reliable, ordered, and error-checked delivery of a stream of packets on the internet.
Breaks down data and puts it back together
IP (Internet Protocol)
routes the data, does addresses; envelopes, addresses, and sends packets. Receives and makes sure it gets all packets.
The TCP/IP protocol is the...
foundation of the Internet
What is the network protocol for the Web?
HTTP (Hypertext Transfer Protocol)
Internet
A collection of computers around the world that agree to speak the same "language"
Intranet
designed to be open AND secure, internal networks that use Internet technologies and are accessible through web browsers. Inside an organization (eg employees, campus community).
Typically, behind a firewall that protects it from outside evildoers.
Extranet
connect some of a company's resources with external organizations such as customers, suppliers and consultants.
Typically create a Virtual Private Network (VPN) using Internet as its backbones and relying on firewalls for security.
What do we mean by "packet switching"? What is a packet? What routes do individual packets travel on the Internet?
The path of the signal is digital and is neither dedicated nor exclusive. A file is broken into smaller blocks, called packets which travel from router to router on the Internet
Circuit Switching
an older technology that was used for telephone calls. Plain old telephone service (POTS) and most wired telephone calls are transmitted, at least in part, over a dedicated circuit.
Why does the Internet use packet switching instead of circuit switching?
packet switching goes through many paths where as in circuit switching everything goes through one path so it's easier to intercept that data
Overall, packet switching is more secure
Which is a super / peer level / sub system to the others: Internet, FTP, email, web
Internet: Supersystem
FTP: Peer to Peer
Email: Peer to Peer
Web: Subsystem
Why are we talking about moving from IPv4 to IPv6? Why do we need to move to IPv6?
we are running out of IP addresses and IPv6 allows us to make longer addresses that are also more unique to accommodate more people
Who "owns" the Internet?
No one "owns" the internet; it is shared by everyone
URL (Uniform Resource Locator)
A location or address identifying where documents can be found on the Internet; a Web address
IP Address
The unique number assigned to each device on the Internet.
DNS (Domain Name System)
The Internet's system for converting alphabetic names into numeric IP addresses.
What are .com or .net. or .edu called?
top level domains
How is the Internet of Things different from just the Internet?
The Internet of Things refers to the growing network of Internet-enabled devices, while just the Internet considers networked connections among things, people, processes, and data.
What is a firewall in a network?
a security software that allows inside people to use the outside Internet, and prevents outsiders from getting in
IETF (Internet Engineering Task Force)
An organization that sets standards for how systems communicate over the Internet
Guides TCP/IP standards
W3C (World Wide Web Consortium)
The organization responsible for managing standards for the WWW.