Informatics Lecture 9

database

Sequences of instructions for the computer

database management system (DBMS)

a group of programs that: (1) Manipulate the database. (2) Provide an interface between the database and its users and other application programs.

database administrator (DBA)

Skilled IS professional who directs all activities related to an organization’s database.

Byte

made up of eight bits

character

basic building block of information.

field

a name, number, or combination of characters that describes a business object or activity.

record

a collection of related data fields.

file

a collection of related records.

entity

a person, place, or thing for which data is collected, stored, and maintained.

attribute

a characteristic of an entity.

data item

the specific value of an attribute.

primary key

a field or set of fields that uniquely identifies the record.

data model

a diagram of data entities and their relationships.

enterprise data modelling

starts by investigation the general data and information needs of the organization at the strategic level.

entity-relationship (ER) diagrams

data models that use basic graphical symbols to show the organization entities and relationships between data.

relational model

a simple but highly useful way to organize data into collections of two-dimensional tables called relations.

domain

range of allowable values for a data attribute.

data cleansing/ cleanup

(1) The process of detecting and then correcting or deleting incomplete, incorrect, inaccurate, irrelevant records that reside in a database.

(2) Eliminate redundancies and anomalies

data center

Climate-controlled building or set of buildings that houses database servers and the systems that deliver mission-critical information and services.

traditional data centers

Consist of warehouses filled with row upon row of server racks and powerful cooling systems.

flat file database

Simple database program whose records have no relationship to one another.

single user database

Only one person can use a database at a time. Ex: Access.

multiple user database

Allow dozens or hundreds of people to access the same database system at the same time. Ex: SQL Server and Oracle

schema

A description of the entire database. A schema can be part of the database or a separate schema file.

DBMS in user view

Can reference a schema to find where to access the requested data in relation to another piece of data.

data definition language (DDL)

(1) A collection of instructions and commands used to define and describe data and relationships in a specific database.

(2) Allows the database’s creator to describe data and relationships that are to be contained in the schema.

data dictionary

(1) A detailed description of all the data used in the database

(2) Can also include a description of data flows, information about the way records are organized, and the data-processing requirements.

concurrency control

deals with the situation in which two or more users or applications need to access the same record at the same time.

data manipulation language (DML)

A specific language, provided with a DBMS that allows users to access and modify the data, to make queries, and to generate reports.

structured query language (SQL)

Adopted by the American National Standards Institute (ANSI) as the standard query language for relational databases.

big data

Big Data is the current term for the enormous datasets generated by Web and mobile applications such as search tools (for example, Google and Bing), Web 2.0 social networks (for example, Facebook, LinkedIn and Twitter), and scientific data collection tools.

Big data is referring to terabytes and petabytes of data.

data management

An integrated set of functions that defines the processes by which data is obtained, certified fit for use, stored, secured, and processed in such a way as to ensure that the accessibility, reliability, and timeliness of the data meet the needs of the data users within an organization.

data governance

Defines the roles, responsibilities, and processes for ensuring that data can be trusted and used by an entire organization.

data lifecycle management (DLM)

A policy-based approach for managing the flow of an enterprise’s data.

data warehouse

type of data management system and a large database that collects business information from many sources in the enterprise in support of management decision making.

ETL

ETL stands for Extract, transform, load, and it is a process used in data warehousing to extract data from various sources, transform it into a format suitable for loading into a data warehouse, and then load it into the warehouse.

data mart

(1) a subset of a data warehouse that is used by small- and medium-sized businesses and departments within large companies to support decision making.

(2) A specific area in the data mart might contain greater detailed data than the data warehouse.

non-relational databases NoSQL

Provides a means to store and retrieve data that is modeled using some means other than the simple two-dimensional tabular relations used in relational databases.

key-value NoSQL

databases are similar to SQL database but have only two columns (“key” and “value”), with more complex information sometimes stored within the “value” columns.

document NoSQL

databases are used to store, retrieve, and manage document-oriented information, such as social media posts and multimedia, also known as semi-structured data.

graph NoSQL

databases are used to understand the relationships among events, people, transactions, locations, and sensor readings and are well suited for analyzing interconnections such as when extracting data from social media.

column NoSQL

databases store data in columns, rather than in rows, and are able to deliver fast response times for large volumes of data.

Hadoop

An open-source software framework that includes several software modules that provide a means for storing and processing extremely large data sets.

Short essay questions:

  1. what is the function of data management?

    • Without data and the ability to process the data, an organization could not successfully complete most business activities.

    • To transform data into useful information

  2. what are the approaches to data management?

    Traditional approach to data management:

    Each distinct operational system used data files dedicated to that system.

    Database approach to data management:

    (1) Information systems share a pool of related data. (2) Offers the ability to share data and information resources. (3) A database management system (DBMS) is required.

  3. what are the characteristics of the traditional approach to data management?

    Each distinct operational system used data files dedicated to that system.

  4. what are the characteristics of the database approach to data management?

    (1) Information systems share a pool of related data. (2) Offers the ability to share data and information resources. (3) A database management system (DBMS) is required.

  5. what are the considerations to have in mind when building a database?

    content, access, logical structure, physical organization, and security.

  6. what does the relational model contain?

    Each row in the table represents a data entity (record).

    Each column represents an attribute of that entity (fields).

  7. how is data manipulated?

    (1) Selecting: eliminating rows according to certain criteria. (2) Projecting: eliminating columns in a table. (3) Joining: combining two or more tables. (4) Linking: combining two or more tables through common data attributes to form a new table with only the unique data attributes.

  8. what are the types of databases? provide examples.

    • Flat file: Simple database program whose records have no relationship to one another.

    • Single user: Only one person can use a database at a time.

      Ex: Access

    • Multiple users: Allow dozens or hundreds of people to access the same database system at the same time.

      Ex: SQL Server and Oracle

  9. how to provide a user view in databases?

    • Schema: A description of the entire database. A schema can be part of the database or a separate schema file.

    • DBMS: Can reference a schema to find where to access the requested data in relation to another piece of data.

  10. how is data sorted and retrieved?

    When an application program needs data, it requests the data through the DBMS.

  11. provide an example of popular database management system.

    Database as a Service (DaaS)

  12. what are of the characteristics Database as a Service (DaaS)? providing an example.

    • The database is stored on a service provider’s servers.

    • The database is accessed by the client over a network, typically the Internet.

    • Database administration is handled by the service provider.

    Example of DaaS: Amazon Relational Database Service (Amazon RDS).

  13. can databases be used with other software?

    DBMSs can act as front-end or back-end applications:

    • Front-end applications interact directly with people.

    • Back-end applications interact with other programs or applications

  14. what are big data and what are their characteristics?

    Volume: It indicates to the size of data. Analyzing data with very large volume to extract valuable information is one of important challenges of big data.

    Velocity: The term velocity is referring to the speed of data. Flooding of data is very high speed, and it has to be dealt with in appropriate time.

    Variety: The data is very diverse and has many types as it comes from different sources with different structures such as: social data, audio, video unstructured data, email and etc.

    Value: Another challenging issue is to convert the data into values to understand and discover hidden values.

    Veracity: Data veracity, in general, is how accurate or truthful a data set may be. More specifically, when it comes to the accuracy of big data, it’s not just the quality of the data itself but how trustworthy the data source, type, and processing of it is.

  15. what are the challenges of big data?

    • Big data is facing many challenges such as data capture, storage, visualization, analysis, and updating data securely.

    • Analyzing data with very large volume to extract valuable information is one of important challenges of big data.

    • Extracting or mining valuable information from huge amounts of data is referred by data mining methods.

  16. what are the characteristics of ETL?

    The ETL process is an iterative process that is repeated as new data is added to the warehouse. The process is important because it ensures that the data in the data warehouse is accurate, complete, and up to date.

  17. what are the elements of a data warehouse?

    • A relational database to store and manage data.

    • An extraction, loading, and transformation (ELT) solution for preparing the data for analysis.

    • Analysis tools, reporting, and data mining capabilities.

    • Client analysis tools for visualizing and presenting data to business users.

  18. what is the relation between Big Data and NoSQL databases?

  19. what are the advantages of NoSQL database?

    • Ability to spread data over multiple servers so that each server contains only a subset of the total data.

    • Do not require a predefined schema.

  20. what are the two primary components of Hadoop?

    • A data processing component (MapReduce).

    • A distributed file system (Hadoop Distributed File System, HDFS).