Database
An organized collection of data
Database management system (DBMS)
is a group of programs that: o Manipulate the database o Provide an interface between the database and its users and other application programs
bit
represents a circuit that is either on or off
byte
is made up of eight bits
Field
A name, number, or combination of characters that describes an aspect of a business object or activity
Record
A collection of related data fields
File
a collection of related records
Hierarchy of data
bits, characters, fields, records, files, and databases
Entity
a person, place, or thing for which data is collected, stored, and maintained
Attribute
a characteristic of an entity
Data item
the specific value of an attribute
Primary key
a field or set of fields that uniquely identifies the record
Content
what data should be collected? cost?
Access
what data should be provided to which users and when?
Logical structure
how should data be arranged so that it makes sense?
Physical organization
where should data be physically located?
Archiving
how long to store?
Security
how can data be protected?
Data model
a diagram of data entities and their relationships
Enterprise data modeling
data modeling done at the level of the entire enterprise
Entity-relationship diagrams
data models that use basic graphical symbols to show the organization of and relationships between data
Relational model
a simple but highly useful way to organize data into collections of two-dimensional tables called relations
Domain
range of allowable values for a data attribute
Selecting
eliminating rows according to certain criteria
Projecting
eliminating columns in a table
Joining
combining two or more tables
Linking
combining two or more tables through common data attributes to form a new table with only the unique data attributes
Data Cleansing
The process of detecting and then correcting or deleting incomplete, incorrect, inaccurate, irrelevant records that reside in a database
Data validation
Which involves the identification of “bad data” and its rejection at the time of data entry
Relational Database Management Systems
Creating and implementing the right database system ensures that the database will support both business activities and goals
SQL Databases
a special-purpose programming language for accessing and manipulating data stored in a relational database
Schema
a description of the entire database
Data definition language
A collection of instructions and commands used to define and describe data and relationships in a specific database
Allows the database’s creator to describe data and relationships that are to be contained in the schema
Data dictionary
a detailed description of all the data used in the database
Can also include a description of data flows, information about the way records are organized, and the data-processing requirements
Query by Example (QBE)
is a visual approach to developing database queries or requests
Data manipulation language (DML)
a specific language, provided with a DBMS
Allows users to access and modify the data, to make queries, and to generate reports
Database administrators (DBAs)
skilled and trained IS professionals
o Works with users to define their data needs o Applies database programming languages to craft a set of databases to meet those needs o Tests and evaluates databases o Implements changes to improve their databases’ performance o Assures that data is secure from unauthorized access
Data administrator
a nontechnical position responsible for defining and implementing consistent principles for a variety of data issues
Database as a Service (DaaS)
The database is stored on a service provider’s servers
o The database is accessed by the client over a network, typically the Internet o Database administration is handled by the service provider
Three characteristics of big data
o Volume o Velocity o Variety
Data management
o An integrated set of functions that defines the processes by which data is obtained, certified fit for use, stored, secured, and processed in such a way as to ensure that the accessibility, reliability, and timeliness of the data meet the needs of the data users within an organization
Data governance
o Defines the roles, responsibilities, and processes for ensuring that data can be trusted and used by an entire organization
Data lifecycle management (DLM)
A policy-based approach to managing the flow of an enterprise’s data
Data warehouse
a large database that collects business information from many sources in the enterprise in support of management decision making
ETL process
o Extract o Transform o Load
Data mart
a subset of a data warehouse that is used by small- and mediumsized businesses and departments within large companies to support decision making
Data lake
takes a “store everything” approach to big data, saving all the data in its raw and unaltered form
Hadoop
o An open-source software framework that includes several software modules that provide a means for storing and processing extremely large data sets
Two primary components of a Hadoop
o A data processing component (MapReduce) o A distributed file system (Hadoop Distributed File System, HDFS)
In-memory database (IMDB)
A database management system that stores the entire database in random access memory (RAM) o Provides access to data at rates much faster than storing data on some form of secondary storage o Enables the analysis of big data and other challenging data-processing applications o Performs best on multiple multicore CPUs