1/19
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
Difficulties in managing data: Scattering
•Data increases exponentially with time and gets scattered through organizations
–Collected by many individuals, using different servers, locations, databases & formats
Difficulties in managing data: Sources
•Multiple sources of data
–Internal Sources: Corporate databases, company documents
–Personal Sources: Personal thoughts, opinions, experiences
–External Sources: Commercial dbs, reports, Web data, Sensors
Difficulties in managing data: Redundancy
Information Systems don’t communicate w/ each other resulting in duplicate data
Difficulties in managing data: Information Changes
•Data inconsistency: information changes
–Ex’s: customers move, change their contact info, companies get bought out, employee turnover
- could have various copies of the data don’t agree
Difficulties in managing data: Data rot or degradation
• physical machine issues
–Wears out over time; impacted by temperature, humidity, exposure to light
–Legacy storage devices make it hard to playback old media
•8-track players; floppy drives
Difficulties in managing data: Data security, quality, and integrity
Vulnerable as it can easily be jeopardized
Data Governance
An approach to managing information across an entire organization.
Involves a set of business processes designed to ensure data is handled in a certain way
Goal is to make data available, transparent & useful for people authorized to access it
Database Management Systems (DBMS)
Set of programs that provide users with tools to add, delete, access and analyze data stored in one location
Interface between applications & a database e.x Oracle, Microsoft sql server
The things DBMS minimize
Data redundancy
same data stored in multiple locations
Data inconsistency
Various copies of the data don’t agree
Data isolation
applications cannot access data associated with other applications
The things DBMS maximize
Data Security
With all the data in one place, there’s a high risk of losing everything all at once
DBMS have high security measures to minimize mistakes & deter attacks
Data Integrity
Data must meet certain constraints
Ex: Students GPA cannot be negative
Ex: no alphabetic characters in a Social Security number
Data Independence
Applications & data are independent of one another
All applications can use the data as it’s not tied to just one system
Database Management System: Hierarchy
Flat File Database →
Relational Database: Hierarchical Database; RDBMS →
NoSQL : Key-value; Column Oriented; Documented Oriented; Graph DB
Relational Database Model
Most used DB architecture
Based on concept of two-dimensional tables, Rows & Columns
Data organized into one or more tables
Tables related to one another by means of a common field
Disadvantage
–Large-scale DB’s can have many interrelated tables making the overall design complex, slowing search & access times
Relational Model: Data Hierarchy
Field : words describing an item (master data)
Ex’s: student Id, Student name, GPA
Record: grouping of related fields representing an item (transactional data)
Ex: fields grouped together to represent a student
Table or Data File: grouping of related records representing an entity
Ex: group of records for all students
Database: grouping of related tables or files
Ex: Students table & courses table
The data model
a diagram that represents the entities in the database and their relationships
Data Model Components
Entity: person, place or thing about which information is retained
Like a Table or Data file….
Ex’s: Student, parking permit, class, professor
Attribute: each characteristic of an entity
Like a field
Ex’s: Student name, id, address
Instance: is a specific representation describing the entity.
Like a record
Ex: Jimbo Brown, 789546, 3.21
Data Model Identifiers
Entity’s will have Identifiers, which are attributes that can uniquely specify an instance.
These are called:
Primary Key (PK)
uniquely identifies a record or entity instance
Student ID #, email address or social security #
Parking Permit #, License plate #
Foreign Key (FK)
attribute that has identifying info. but doesn’t uniquely identify a record in its own table.
Used to uniquely identify a record in a related table
Retrieving information
Retrieving Data is the most common DB operation
Structured Query Language (SQL) - allows users to perform complicated searches by using relatively simple statements or keywords.
Typical Keywords
SELECT – specify the wanted attributes
FROM – specify the table to be used
WHERE – specify conditions to apply in the query
Big Data
Diverse & high-volume set of information that requires new forms of processing to enable enhanced capabilities of information systems like:
decision making, insight discovery, and process optimization.
can be utilized in a reasonable amount of time only by sophisticated information systems
Consists of unstructured data:
Doesn’t fit into rows & columns of a table, like traditional, structured data does into relational databases
Characteristic of big data
Volume: Creates data management problems, but also means its incredibly valuable
Ex: airplane engine creates 10TB in 30 minutes; 25,000 flights/day
Velocity: Rate at which data flow is rapidly increasing
Ex: Internet connects customers fast, sites are able to capture your clicks & recommend interests to you generating data fast
Variety: data is untraditional & can be in many different types of unstructured formats:
Ex: satellite imagery, audio streams, digital music files, web content, documents, comments by users
Managing Big Data
Drivers of Big Data:
Cloud Computing for powerful and scalable IT resources
Open-Source software which makes Big Data affordable for most organizations to process
NoSQL databases are used instead of Relational DBs because they can process unstructured & structured data
NoSQL DBs: Neo4j, MongoDB, CouchDB