MN

Databases for an Information Rich, Connected World — Vocabulary Flashcards

The Unseen Services

  • We observe and interact with the visible components, but we seldom think about the unseen components that provide comfort and stability.
  • Examples of unseen services:
    • Login and use Facebook
    • Make a phone call or send a text message
    • Read your email
    • Go shopping (groceries, clothes, gasoline)
    • Check a book out of the library
    • Withdraw money from an ATM
    • Have a physical exam with your doctor
  • Point: We usually notice the database only when there is an outage or security breach.

The Ubiquitous Information Provider

  • People use information systems to resolve real-world needs by accessing data stored in databases.
  • Common scenarios:
    • Searching to buy a specific item (appliance, service, clothing, book)
    • Researching more in-depth before purchasing a major item (e.g., a car)
    • Planning a vacation or trip
    • Learning about a hobby or sport (e.g., skiing, boating, coin collecting)
    • Researching a company before investing
    • Researching a medical condition or problem
    • Solving a technical problem or learning a new technical skill (e.g., administering a Linux server)
    • Writing a research paper or researching for a school project
  • Example: Wikipedia as a widely used information resource

Interesting Public Databases

  • Notable public datasets and catalogs:
    • Annotated Human Genome Data
    • Federal Reserve Economic Data (FRED)
    • US Labor, Economic, and Census Data
    • Freebase Dataset
    • Historical Weather Datasets
    • Twitter Archive at the Library of Congress

Information Structuring in Databases

  • Key concepts:
    • Schema: structure and organization of the database
    • Attribute: relevant piece of information about the things of interest
    • Relationship: connection between things of interest
  • Examples of database schemas and contents:
    • Customer (person) database:
    • Attributes: name, address, telephone, account balance, credit rating, credit card number, …
    • Relationships: details of purchases, details of payments, …
    • Movie star (person) database:
    • Attributes: name, birthday, age, real name, spouse name, …
    • Relationships: details of movie roles, details of TV roles, previous marriages, gallery of photos, …
    • University professor (person) database:
    • Attributes: name, address, telephone, office number, office phone number, years of work, title, department, college, salary, …
    • Relationships: details of publications, details of university committee, awards, current classes, previous classes, student ratings, …
    • Student (person) database:
    • Attributes: name, address, telephone, total hours completed, current registered hours, total GPA, overall standing, …
    • Relationships: details of courses and grades, current registered courses, library materials, campus credit balance, …

Database Structures (Page 7 content overview)

  • The following tables and fields are part of a sample database schema (illustrative list):
    • idbTypeIncident: IncidentTypeID, Incident_Type, Active, Order Level
    • idbActivityType: Activity Type_ID, Activity Type, Active
    • idb_Injury Type: Injury Type ID, Injury Type, Active
    • idbIllnessType: IllnessTypeID, Illness_type, Active
    • idbContributingFactor: Contributing Factor_ID, Contributing Factor, Active
    • idbMedicalCare: MedicalCareID, Medical Care, Active, Order_Level
    • idbInjury Location: Injury LocationID, Anatomical Location, Active
    • idbIncident Injury: Incident InjuryID, IncidentID, InjuryType_ID
    • idbIncident Illness: Incident IllnessID, IncidentID, IllnessType_ID
    • idbIncidentFactor: Incident FactorID, IncidentID, Contributing Factor_ID, Priority
    • idbIncidentLocation: Incident_ID, Country, State, Area, Location
    • idbDamageType: Damage Type_ID, Damage Type, Active
    • idbEvacuationType: Evacuation TypeID, Evacuation Type, Active, OrderLevel
    • idbOrganization: Organization ID, Organization, User ID, Password, SecureID
    • idbPersonStatus: PersonStatusID, PersonStatus, Active, OrderLevel
    • idb Incident: Incident ID, IncidentTypeID, Severity Level, OrganizationID, SecureID, ProgramTypeID, CourseName, Course Number, PersonStatusID, Age, Gender, IncidentEvent, Incident Date, Incident Time, Start Date, TotalDays, ActiveHours, Day Occurred, Lost Day, DaysLost, LeaveField, LeaveDate, Evacuation TypeID, MedicalVisit, MedicalCareID, ReturnField, ReturnDate, Property Damage, ActivityType_ID, Narrative
    • idbIncident Condition: Incident ConditionID, IncidentID, Surface ConditionID
    • idbIncident Environment: Incident EnvironmentID, Incident ID, EnvironmentTypeID
    • idbSeverityLevel: SeverityID, SeverityLevel, Severity Type, Severity Description
    • idbSurface Condition: Surface ConditionID, Surface Condition, Active, Order_Level
    • idbEnvironmentType: EnvironmentTypeID, Environment Type, Active, Information
    • (continued) Sample fields for various entities such as Injury, Illness, Property Damage, Environmental Damage, Emotional Damage, etc.
    • (continued) Administrative metadata: Assessment, Analysis, Comments, Preparer, Preparer Position, Submit_Date, Reviewer, Reviewer Position, Reviewer Comment, Review Date, Date Updated

The Database Approach

  • An information system that uses a database management system (DBMS) to manage its information has three components:
    • DBMS
    • Data
    • Application software

DBMS Functions

  • Data Definition: providing a way to define and build the database
  • Data Manipulation: providing a way to insert and update data in the database
  • Query Execution: answering questions about the data in the database
  • Data Integrity: ensuring that data stored is well formed
  • Data Security: enforcing restrictions about who is able to access what data

Data Portability and Reliability Features

  • Data Portability: providing a means for backup and restore
  • Data Recovery: protecting data from loss in case of catastrophic hardware or software failure
  • Provenance: logging capabilities to provide an audit trail for data changes
  • Performance: providing a means to tune and optimize operation
  • Multiuser Concurrency: supporting activities of many users at the same time
  • Automatic Processing: providing a way to define rules to execute business logic (e.g., stored procedures and triggers)

Database Architectures: Mainframe

  • Mainframe Computer
  • Application
  • DBMS
  • ◄ Keyboard Input
  • Displayed Results
  • Data Terminal
  • Personal Computer running terminal emulation software
  • Network

Database Architectures: Desktop

  • Application
  • DBMS
  • Data
  • Single Personal Computer

Database Architectures: File Server

  • Application
  • DBMS
  • File Server
  • Data
  • Network
  • DBMS
  • DBMS
  • Application
  • Application
  • Personal Computer

Database Architectures: Client Server

  • Database Server
  • DBMS
  • Data
  • ◄ Request from Application to DBMS
  • Response ▸
  • Application
  • Application
  • Network
  • Personal Computer

Database Architectures: Web

  • Database Server
  • Web Server
  • DBMS
  • Chrome (Application)
  • Safari (Application)
  • Data
  • Firefox (Application)
  • Personal Computer
  • Internet
  • Running web browser

Databases and Business Systems

  • Information System: Purpose and Examples
  • Database Requirements
  • Transaction Processing System (TPS):
    • Support the daily activities
    • Capture transactions
    • Present detailed information to users
  • Management Information System (MIS):
    • Manage and verify the efficiency of daily activities
    • Data for MIS comes from TPS
    • Summarized into a set of management reports
    • Record historical data to forecast future trends
  • Decision Support System (DSS):
    • Make strategic decisions
    • Contains both internal and external data
    • Modeling and analysis tools

OLTP vs OLAP

  • OLTP Database:
    • Purpose: Day-to-day business processes
    • Source of data: Daily transactions
    • Organization: Structured and normalized
    • Queries: Standard SQL
    • Inserts/updates: Frequent, limited data, must be rapid and real-time
    • Space: Intentionally limited for rapid response
    • Time Frame: Usually weeks or months
  • OLAP Database:
    • Purpose: Strategic decision making
    • Source of data: Internal and external data
    • Organization: Not normalized; star or snowflake
    • Queries: Complex queries
    • Inserts/updates: Very few, extensive and often transformed data, batches
    • Space: Can be very large
    • Time Frame: Long time series, often years

Data Mining and Data Warehouses

  • Data mining: A technique to analyze and extract information from massive databases
  • Data warehouse: A database whose primary objectives are analyzing, querying, and reporting
  • Data warehouses are often made up of several data marts
  • ETL: Extract, Transform, Load
    • Transformation reformats and restructures input data to required warehouse formats

Source Data

  • Internal
    • Operational Data
    • Direct data
    • Indirect data
  • External
    • Purchased Data
  • Raw data
  • ETL
  • Data Warehouse
  • Data Mart

Technology Careers and Database Skills

  • Technology Careers Requiring Database Skills:
    • Software engineer and developer
    • Business/computer systems analyst
    • Web developer
    • Computer programmer
    • Quality assurance engineer
    • Server administrator
    • Database administrator
    • Data warehouse engineer
    • Project manager
  • Knowledge-based Careers Requiring DB Skills:
    • Financial planner
    • Market research analyst
    • Advertising manager
    • Human resource manager
    • Accountant
    • Economist
    • Sales manager
    • Sociologist
    • Management consultant
    • Public Administrator

Questions?

  • End of topics covered in this transcript. If you want deeper elaboration on any section, I can expand with more examples, diagrams, or practice questions.