DSS 330 Final Exam

studied byStudied by 59 people
5.0(1)
Get a hint
Hint

Big data is defined as

1 / 43

encourage image

There's no tags or description

Looks like no one added any tags here yet for you.

44 Terms

1

Big data is defined as

extremely large set of nontraditional data used to gain meaningful business insights or predict significant business events

New cards
2

The Three V’s of Big Data are

Volume, Variety, Velocity

New cards
3

Volume refers to the

amount of data from many sources

New cards
4

Variety refers to the

two types of data….(multiple formats)

  1. Traditional Data (text, numbers, pictures, video, sound)

  2. Behavioral Data (clicks and pauses)

New cards
5

Velocity refers to the

speed of creation for data

New cards
6

Hadoop is defined as

open-source software framework that is used for storing and processing big data in a distributed computing environment (yellow elephant logo)

  • Data redundancy

New cards
7

Nodes in Hadoop are

Devices (Example: Servers)

New cards
8

A Cluster in Hadoop are

groups of nodes

New cards
9

Unstructured Data is

not organized or easily interpreted and is hard to predict how it looks (often stored in nonrelational database systems)

New cards
10

Structured Data is

traditional in its retrieval and storage in DBMS (we know what to expect)

New cards
11

Commodity Hardware is

hardware that is readily available, inexpensive, and amassed in large quantities. Benefits by reducing costs

New cards
12

Hadoop Distributed File System is defined as

a distributed file system designed to run on commodity hardware. It is fault-tolerant (reliable) allowing it to degrade gracefully

  • Storage component of Hadoop

New cards
13

Graceful degradation is

the ability of a machine or network to maintain limited functionality even when a large portion of it has been rendered inoperative,

New cards
14

The Architecture for the Hadoop Distributed File System is

master/slave architecture

New cards
15

In the master/slave architecture

the master node (name-node) controls the cluster and knows which slave node (server or data node) has what

New cards
16

FAT (File Allocation Table) purpose is to

keep track of where files are stored on a disk and how much space is available for new files

New cards
17

Hadoop Map Reduce

Map: process and map input data, local solution (per node)

Reduce: process the data that comes after map and getting rid of duplicate data, aggregate solution (per cluster)

(Produces a new set of output, which will be stored in the HDFS)

<p><em>Map</em>: process and map input data, <strong>local solution <span style="color: blue">(per node)</span></strong></p><p><em>Reduce</em>: process the data that comes after map and getting rid of duplicate data, <strong>aggregate solution <span style="color: red">(per cluster)</span></strong></p><p>(Produces a new set of output, which will be stored in the HDFS)</p>
New cards
18

Map Reduce is

batch-oriented, meaning it processes large amounts of data in a batch or group. Need all data that is relational

New cards
19

Not all data is

relational (non-relational data includes movies, text, music, photos, social media)

New cards
20

Hadoop YARN (Yet Another Resource Negotiator) is

  • Real-time streaming

  • Opportunistic, meaning it runs when node resources are available

  • Works with MapReduce

  • Distributed through cluster nodes

New cards
21

NoSQL

is a non-relational DMBS concept that is distributed and open source. (Geared for Big Data that is unstructured and semi-structured)

New cards
22

Big data is scaled

horizontally (Note: Hadoop grows horizontally)

<p>horizontally (Note: Hadoop grows horizontally)</p>
New cards
23

SQL DBs are scaled

vertically

<p>vertically</p>
New cards
24

NoSQL Document puts

multi-attribute data in a single “Document”

<p>multi-attribute data in a single “Document”</p>
New cards
25

NoSQL Rows vs Columns…

  • Rows: storing data row by row through a table

  • Columns: storing data in blocks (more storage=more blocks=longer address= more bits)

LOOK AT IMAGE TO VISUALIZE IT

<ul><li><p>Rows: storing data row by row through a table</p></li><li><p>Columns: storing data in blocks (more storage=more blocks=longer address= more bits)</p></li></ul><p>LOOK AT IMAGE TO VISUALIZE IT</p>
New cards
26

Columnar Storage is when

data stored in columns, not rows (better performance for single-attributed operations)

<p>data stored in columns, not rows (better performance for <strong><span style="color: blue">single-attributed operations</span></strong>)</p>
New cards
27

Enterprise Applications is defined as

software that supports enterprise-level tasks (powerful, complex, sophisticated, expensive)

  • Essentially databases

New cards
28

Data Warehousing is defined as

  • logically centralized large database (physically centralized or distributed)

  • Powerful enterprise-wide querying applications

New cards
29

Enterprise Resource Planning are

category of software tools which are used to manage the data of an enterprise and helps deal with different departments of an enterprise

<p><strong><span style="color: blue">category of software tools </span></strong>which are used to manage the data of an enterprise and helps deal with different departments of an enterprise</p>
New cards
30

E.T.L stands for

Extract. Transform. Load

(Three database functions that are combined into one tool to retrieve data from one database and place it into another database)

New cards
31

Data Mart is defined as

subset of data warehouse (In other words; a simple form of data warehouse focused on a single subject or line of business)

New cards
32

Data Mart characteristics include being

  • Topic-Oriented (Ex; region, product, business unit)

  • Focused (Ex; summary or full data, including other data marts)

New cards
33

Data Cube is defined as a

multi-dimensional data structure designed to make data query and analysis more efficient (Data mart or not)

  • For example, a hierarchy (which makes up a single dimension of the cube) for location data might have three levels: states within regions within countries

<p><strong>multi-dimensional data structure</strong> designed to make data query and analysis more efficient (Data mart or not)</p><ul><li><p>For example, a hierarchy (which makes up a single dimension of the cube) for <em>location </em>data might have three levels: states within regions within countries</p></li></ul>
New cards
34

Data Mining is defined as

practice of uncovering new knowledge, identifying patterns or relationships (Querying requires previous knowledge)

New cards
35

Online Analytical Processing (OLAP) is defined as

is a software that reviews, manipulates, and queries large amounts of data in real time (used during data mining and may use data cubes & data marts)

New cards
36

Federated Databases is defined as

type of distributed DBMS that integrates data from different sources, providing a single interface for all users.

  • FDs are Heterogeneous, meaning each FD have different schema, data models, formats, making it hard to integrate into one single local database

  • FDs are Autonomous, meaning they have control over their own data and has its own local users, creating a virtual database.

New cards
37

(KAHOOT QUESTION) An enterprise application is generally not

a. Open-sourced

b. Powerful

c. Complex

d. Single-user

d. single user

New cards
38

(KAHOOT QUESTION) Most, if not all, enterprise apps are, essentially

a. ERP

b. SQL

c. Databases

d. 3NF

c. Databases

New cards
39

(KAHOOT QUESTION) An enterprise data warehouse is not

a. Cheap

b. Large

c. Powerful

d. Complex

a. Cheap

New cards
40

(KAHOOT QUESTION) Selecting, cleaning, and storing the data for an EDW is known as

a. GTL

b. NFL

c. ETL

c. ETL (Extract. Transform. Load)

New cards
41

(KAHOOT QUESTION) True or False, A data cube is an actual cube

a. True

b. False

b. False

New cards
42

(KAHOOT QUESTION) Enterprise Resource Planning is

a. SAP

b. A category software

c. A data warehouse

d. FTW

b. A category software

New cards
43

(KAHOOT QUESTION) Heterogeneous DB environment is

a. Federated

b. Normalized

c. Distributed

d. Stimulated

a. Federated

New cards
44

(KAHOOT QUESTION) Not an enterprise app

a. MySQL

b. Oracle

c. MS Excel

d. SAP

c. MS Excel

New cards

Explore top notes

note Note
studied byStudied by 3 people
Updated ... ago
5.0 Stars(1)
note Note
studied byStudied by 16 people
Updated ... ago
5.0 Stars(1)
note Note
studied byStudied by 28 people
Updated ... ago
5.0 Stars(1)
note Note
studied byStudied by 26 people
Updated ... ago
5.0 Stars(1)
note Note
studied byStudied by 34 people
Updated ... ago
4.3 Stars(3)
note Note
studied byStudied by 7 people
Updated ... ago
5.0 Stars(1)
note Note
studied byStudied by 73 people
Updated ... ago
5.0 Stars(1)
note Note
studied byStudied by 478 people
Updated ... ago
5.0 Stars(1)

Explore top flashcards

flashcards Flashcard30 terms
studied byStudied by 8 people
Updated ... ago
5.0 Stars(1)
flashcards Flashcard32 terms
studied byStudied by 4 people
Updated ... ago
5.0 Stars(2)
flashcards Flashcard20 terms
studied byStudied by 3 people
Updated ... ago
5.0 Stars(2)
flashcards Flashcard40 terms
studied byStudied by 21 people
Updated ... ago
5.0 Stars(1)
flashcards Flashcard61 terms
studied byStudied by 27 people
Updated ... ago
5.0 Stars(1)
flashcards Flashcard20 terms
studied byStudied by 21 people
Updated ... ago
5.0 Stars(2)
flashcards Flashcard20 terms
studied byStudied by 5 people
Updated ... ago
5.0 Stars(1)
flashcards Flashcard92 terms
studied byStudied by 3 people
Updated ... ago
5.0 Stars(1)