DSS 330 Final Exam

Studied by 59 people

5.0(1)

Get a hint

Hint

Big data is defined as

1 / 43

There's no tags or description

Looks like no one added any tags here yet for you.

44 Terms

Big data is defined as

extremely large set of nontraditional data used to gain meaningful business insights or predict significant business events

New cards

The Three V’s of Big Data are

Volume, Variety, Velocity

New cards

Volume refers to the

amount of data from many sources

New cards

Variety refers to the

two types of data….(multiple formats)

Traditional Data (text, numbers, pictures, video, sound)
Behavioral Data (clicks and pauses)

New cards

Velocity refers to the

speed of creation for data

New cards

Hadoop is defined as

open-source software framework that is used for storing and processing big data in a distributed computing environment (yellow elephant logo)

Data redundancy

New cards

Nodes in Hadoop are

Devices (Example: Servers)

New cards

A Cluster in Hadoop are

groups of nodes

New cards

Unstructured Data is

not organized or easily interpreted and is hard to predict how it looks (often stored in nonrelational database systems)

New cards

Structured Data is

traditional in its retrieval and storage in DBMS (we know what to expect)

New cards

Commodity Hardware is

hardware that is readily available, inexpensive, and amassed in large quantities. Benefits by reducing costs

New cards

Hadoop Distributed File System is defined as

a distributed file system designed to run on commodity hardware. It is fault-tolerant (reliable) allowing it to degrade gracefully

Storage component of Hadoop

New cards

Graceful degradation is

the ability of a machine or network to maintain limited functionality even when a large portion of it has been rendered inoperative,

New cards

The Architecture for the Hadoop Distributed File System is

master/slave architecture

New cards

In the master/slave architecture

the master node (name-node) controls the cluster and knows which slave node (server or data node) has what

New cards

FAT (File Allocation Table) purpose is to

keep track of where files are stored on a disk and how much space is available for new files

New cards

Hadoop Map Reduce

Map: process and map input data, local solution (per node)

Reduce: process the data that comes after map and getting rid of duplicate data, aggregate solution (per cluster)

(Produces a new set of output, which will be stored in the HDFS)

<p><em>Map</em>: process and map input data, <strong>local solution <span style="color: blue">(per node)</span></strong></p><p><em>Reduce</em>: process the data that comes after map and getting rid of duplicate data, <strong>aggregate solution <span style="color: red">(per cluster)</span></strong></p><p>(Produces a new set of output, which will be stored in the HDFS)</p>

New cards

Map Reduce is

batch-oriented, meaning it processes large amounts of data in a batch or group. Need all data that is relational

New cards

Not all data is

relational (non-relational data includes movies, text, music, photos, social media)

New cards

Hadoop YARN (Yet Another Resource Negotiator) is

Real-time streaming
Opportunistic, meaning it runs when node resources are available
Works with MapReduce
Distributed through cluster nodes

New cards

NoSQL

is a non-relational DMBS concept that is distributed and open source. (Geared for Big Data that is unstructured and semi-structured)

New cards

Big data is scaled

horizontally (Note: Hadoop grows horizontally)

New cards

SQL DBs are scaled

vertically

New cards

NoSQL Document puts

multi-attribute data in a single “Document”

New cards

NoSQL Rows vs Columns…

Rows: storing data row by row through a table
Columns: storing data in blocks (more storage=more blocks=longer address= more bits)

LOOK AT IMAGE TO VISUALIZE IT

<ul><li><p>Rows: storing data row by row through a table</p></li><li><p>Columns: storing data in blocks (more storage=more blocks=longer address= more bits)</p></li></ul><p>LOOK AT IMAGE TO VISUALIZE IT</p>

New cards

Columnar Storage is when

data stored in columns, not rows (better performance for single-attributed operations)

<p>data stored in columns, not rows (better performance for <strong><span style="color: blue">single-attributed operations</span></strong>)</p>

New cards

Enterprise Applications is defined as

software that supports enterprise-level tasks (powerful, complex, sophisticated, expensive)

Essentially databases

New cards

Data Warehousing is defined as

logically centralized large database (physically centralized or distributed)

Powerful enterprise-wide querying applications

New cards

Enterprise Resource Planning are

category of software tools which are used to manage the data of an enterprise and helps deal with different departments of an enterprise

<p><strong><span style="color: blue">category of software tools </span></strong>which are used to manage the data of an enterprise and helps deal with different departments of an enterprise</p>

New cards

E.T.L stands for

Extract. Transform. Load

(Three database functions that are combined into one tool to retrieve data from one database and place it into another database)

New cards

Data Mart is defined as

subset of data warehouse (In other words; a simple form of data warehouse focused on a single subject or line of business)

New cards

Data Mart characteristics include being

Topic-Oriented (Ex; region, product, business unit)
Focused (Ex; summary or full data, including other data marts)

New cards

Data Cube is defined as a

multi-dimensional data structure designed to make data query and analysis more efficient (Data mart or not)

For example, a hierarchy (which makes up a single dimension of the cube) for location data might have three levels: states within regions within countries

<p><strong>multi-dimensional data structure</strong> designed to make data query and analysis more efficient (Data mart or not)</p><ul><li><p>For example, a hierarchy (which makes up a single dimension of the cube) for <em>location </em>data might have three levels: states within regions within countries</p></li></ul>

New cards

Data Mining is defined as

practice of uncovering new knowledge, identifying patterns or relationships (Querying requires previous knowledge)

New cards

Online Analytical Processing (OLAP) is defined as

is a software that reviews, manipulates, and queries large amounts of data in real time (used during data mining and may use data cubes & data marts)

New cards

Federated Databases is defined as

type of distributed DBMS that integrates data from different sources, providing a single interface for all users.

FDs are Heterogeneous, meaning each FD have different schema, data models, formats, making it hard to integrate into one single local database
FDs are Autonomous, meaning they have control over their own data and has its own local users, creating a virtual database.

New cards

(KAHOOT QUESTION) An enterprise application is generally not

a. Open-sourced

b. Powerful

c. Complex

d. Single-user

d. single user

New cards

(KAHOOT QUESTION) Most, if not all, enterprise apps are, essentially

a. ERP

b. SQL

c. Databases

d. 3NF

c. Databases

New cards

(KAHOOT QUESTION) An enterprise data warehouse is not

a. Cheap

b. Large

c. Powerful

d. Complex

a. Cheap

New cards

(KAHOOT QUESTION) Selecting, cleaning, and storing the data for an EDW is known as

a. GTL

b. NFL

c. ETL

c. ETL (Extract. Transform. Load)

New cards

(KAHOOT QUESTION) True or False, A data cube is an actual cube

a. True

b. False

New cards

(KAHOOT QUESTION) Enterprise Resource Planning is

a. SAP

b. A category software

c. A data warehouse

d. FTW

b. A category software

New cards

(KAHOOT QUESTION) Heterogeneous DB environment is

a. Federated

b. Normalized

c. Distributed

d. Stimulated

a. Federated

New cards

(KAHOOT QUESTION) Not an enterprise app

a. MySQL

b. Oracle

c. MS Excel

d. SAP

c. MS Excel

New cards

Explore top notes

Bonnie and Clyde

Note

Studied by 3 people

Updated ... ago

5.0 Stars(1)

🩸

Hematology & Anemias

Note

Studied by 16 people

Updated ... ago

5.0 Stars(1)

🏗️

Genetic Engineering

Note

Studied by 28 people

Updated ... ago

5.0 Stars(1)

SAT 1-12 (master list)

Note

Studied by 26 people

Updated ... ago

5.0 Stars(1)

Chapter 10 Section II, Part B -Conversation

Note

Studied by 34 people

Updated ... ago

4.3 Stars(3)

Design Strategies

Note

Studied by 7 people

Updated ... ago

5.0 Stars(1)

Le vacanze

Note

Studied by 73 people

Updated ... ago

5.0 Stars(1)

AP Psychology Unit 2

Note

Studied by 478 people

Updated ... ago

5.0 Stars(1)

Explore top flashcards

Core Vocabulary #1

Flashcard30 terms

Studied by 8 people

Updated ... ago

5.0 Stars(1)

Epithelium and CT Function

Flashcard32 terms

Studied by 4 people

Updated ... ago

5.0 Stars(2)

exam review questions

Flashcard20 terms

Studied by 3 people

Updated ... ago

5.0 Stars(2)

1. SPECTACLE LENSES (JEDI)

Flashcard40 terms

Studied by 21 people

Updated ... ago

5.0 Stars(1)

ap lit vocab pt 1

Flashcard61 terms

Studied by 27 people

Updated ... ago

5.0 Stars(1)

Chemistry Polyatomic ion charges

Flashcard20 terms

Studied by 21 people

Updated ... ago

5.0 Stars(2)

Vocab Workshop Level E - Unit 9

Flashcard20 terms

Studied by 5 people

Updated ... ago

5.0 Stars(1)

Inmuno P2

Flashcard92 terms

Studied by 3 people

Updated ... ago

5.0 Stars(1)