Info tech Chapt 3 Database systems
Chapter 3: Database Systems and Big Data
Introduction to Data Fundamentals
Without proper data handling, organizations would struggle to perform essential business activities.
Data: Raw facts that require organization to transform into useful information.
Data management is pivotal for various business functions and organizational structures.
Information Systems and Data Storage
Information systems are integral to businesses for storing data across various functions:
Sales, Marketing, Production, Finance, Accounting, Human Resources
Data storage varies:
Operational Level: Stores daily operational data.
Middle Management: Reviews summary data for performance assessment.
Executive Management: Uses higher-level summaries to evaluate overall organizational performance.
Overview of Databases
A Database: Organized collection of data, while a Database Management System (DBMS) allows for data manipulation and user interaction.
Databases consist of integrated related files, forming a hierarchy of data encompassing bits, characters, fields, records, and files.(tables think of excel)
Analogy of Databases to Filing Cabinets
A database functions like a filing cabinet containing related files (e.g., customers and suppliers).
For example, a CUSTOMER database features multiple tables corresponding to various aspects like ORDERS and INVOICES.
Each Entity/Table can be seen as a drawer within the filing cabinet that houses records about specific categories (e.g., Customer Orders). (Each tab in excel is a table, the sheet is a database)
Understanding Records and Attributes
Instance/Record: Represents specific information about an individual entity.
Comparable to a piece of paper noting shipment details. (Think back to drawer and cabinet example, a piece of paper in a folder inside a filing cabinet is a record)
Attributes/Fields: Characteristics of an entity, akin to labels on forms (e.g., shipment date, items shipped).
Data Item/Value: The dynamic part of forms that changes (e.g., differing shipment details on various orders).
Data Entities, Attributes, and Keys
Entity/Table: Refers to subjects (persons, place or things) of data collection (e.g., customers, shipments).
a specific example would be a row in excel
Primary Key: Unique identifier for records within a table, essential for ensuring data integrity.(like labeling customers as they come with numbers. Each row is different it’s like our drivers license number, social security number, our T# , it u iquely identifies and does not repeat )
Attributes define the various characteristics of data entities, while values denote specific data entries. (Also known as fields. Can include first and last name, etc)
Data Modeling Considerations
Building a database requires careful consideration of:
Content: Necessary data collection and associated costs.
Access: User permissions for data access.
Physical Organization: Optimal storage locations for data.
Archiving and Security: Data longevity and protection measures.
Logical Structure: Logical arrangement of data.
•Rules of Thumb
oBreak the data into tables where each table is for a single person, a single place, or a single thing
•E.g. Student, Employee, Building, Classroom, Car, etc.
oDon’t Repeat Data!
•Your name is in the university database one time and only one time
•Even though it shows up on lots of things like: schedules, transcripts, invoices, etc.
•If your name changes, it only has to be changed on one place
oIf you can calculate the data from other data, then you do not store it. You just calculate it when you need it (e.g., order total)
oUsually want the primary key to be a single system generated value not a composite key (e.g., row is unique based on name (Joe Smith), address (123 A St.), and phone (5015555555)
oReduce typing as much as possible so that we mostly pick from a list
Normalization in Database Design
Normalization: Process to minimize redundancy in database designs and optimize data efficiency:
First Normal Form: Requires columns to have single-valued attributes and unique row identifiers.
Second Normal Form: Eliminates partial dependencies ensuring fields are fully dependent on the primary key.
Third Normal Form: Prevents transitive dependencies, ensuring non-key columns do not influence others.
Relationship Structures in Databases
Databases utilize various relationship types:
One-to-One, One-to-Many, Many-to-Many: These relationships help in structuring tables effectively for data handling.
Many-to-Many relationships can be resolved by introducing an Associative Table that links two original entities. (What is most likely used)
Relational Database Model
In a Relational Database, data is organized into two-dimensional tables.
Each row symbolizes an entity, and each column corresponds to attributes of that entity.
This model emphasizes utility and relational integrity, allowing data connections across various tables (e.g., shared department numbers).
SQL Databases Overview
SQL (Structured Query Language): Central programming language for manipulating relational database data. (“Go get me this data, put this data there”)
Adheres to ACID properties: Atomicity, Consistency, Isolation, and Durability to ensure reliable transactions.
Adopted as the standard language for relational databases in 1986. (most commonly used)
Popular Database Management Systems (DBMS)
Overview of notable DBMS in use today, including:
Open-Source DBMS: MySQL, PostgreSQL, MariaDB.
Commercial DBMS: Oracle, IBM DB2, Microsoft SQL Server.
Variety of systems catering to individuals, workgroups, and enterprise-level needs.
Comprehensive Data Management Strategy
Data Management: Integrated functions ensuring data accessibility, reliability, and compliance with organizational needs.
Data Governance: Role definitions and responsibilities to ensure data trustworthiness across organizations.
Data Lifecycle Management (DLM): Policy-based methods for managing data flow from creation to eventual deletion, ensuring efficient data handling throughout its lifecycle.
Data security is crucial for safeguarding sensitive information in organizations. It encompasses strategies and measures that protect data from unauthorized access, corruption, or theft. Key components include:
Encryption: Transforming data into a secure format that can only be read or processed after decryption.
Access Controls: Implementing policies that define who can access or manipulate data.
Backup Solutions: Regularly creating copies of data to prevent loss due to breaches or disasters.
User Training: Educating staff on best practices for data protection and recognizing potential threats, such as phishing attacks.