Big Data Analytics Lecture Notes

0.0(0)
GameKnowt Play
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/65

flashcard set

Earn XP

Description and Tags

Vocabulary flashcards summarizing key terms, roles, phases, and technologies from the Big Data Analytics lecture.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

66 Terms

1

Big Data

Extremely large and complex data sets that require innovative processing to extract insights and support decision-making.

2

3Vs of Big Data

The core characteristics—Volume, Variety, and Velocity—that define most Big Data challenges.

3

Volume

The huge amount of data generated and stored, often measured in terabytes, petabytes, or exabytes.

4

Variety

The many different data types and formats—structured, semi-structured, quasi-structured, and unstructured—found in Big Data.

5

Velocity

The speed at which new data is generated, collected, and must be processed for timely insights.

6

Structured Data

Highly organized data in rows and columns (e.g., relational tables) that is easy to query and analyze.

7

Semi-Structured Data

Data with partial organization (e.g., JSON, XML, CSV, HTML) that includes tags or markers but lacks strict schema.

8

Quasi-Structured Data

Loosely organized data such as clickstream or chat logs that requires additional parsing to analyze.

9

Unstructured Data

Data without a predefined model or structure, including audio, video, images, and free text.

10

Spreadsheet

A low-volume data repository (e.g., Excel) providing flexible but potentially siloed data analysis.

11

Data Warehouse

A centralized, structured repository designed for business intelligence reporting and analytics.

12

Analytic Sandbox

An analyst-controlled environment that gathers diverse data sources for flexible, high-performance exploration.

13

Business Intelligence (BI)

Technologies and processes for collecting, integrating, and reporting structured data to support day-to-day decision-making.

14

Data Science

A multidisciplinary field leveraging scientific methods, algorithms, and systems to extract knowledge from data, both structured and unstructured.

15

Predictive Modeling

The use of statistical or machine-learning algorithms to forecast future outcomes based on historical data.

16

Time Series Analysis

Analytical techniques that examine data points sequenced over time to identify trends and seasonal patterns.

17

Exploratory Analytics

Open-ended data examination aimed at discovering patterns or relationships without predefined hypotheses.

18

Explanatory Analytics

Analysis focused on explaining why observed events occurred, often linking cause and effect.

19

Enterprise Data Warehouse (EDW)

A large, centralized data warehouse supporting enterprise-wide reporting, backups, and security.

20

Data Mart

A departmental subset of a data warehouse tailored for a specific business function’s analytics needs.

21

Data Extract

A copy of data removed from a main repository for analysis in external tools such as R or Excel.

22

ETL (Extract, Transform, Load)

The process of pulling data from sources, cleaning/transforming it, and loading it into a target system.

23

NoSQL

Non-relational database technologies designed for flexible schemas and large-scale, distributed data storage.

24

Hadoop

An open-source framework that stores and processes large data sets across clusters of commodity hardware.

25

Big Data Driver

A factor—such as medical imaging, IoT sensors, or social media—that accelerates data growth and necessitates new analytics.

26

Sensor Net

A network of data-emitting devices (e.g., smartphones, smart meters) continuously generating real-time data streams.

27

Data Collector

An entity or system that gathers raw data from devices, applications, or networks for storage and preprocessing.

28

Data Aggregator

An organization that combines data from multiple collectors, organizes it, and sells or distributes it to users.

29

Data User/Buyer

A company or group that purchases or accesses aggregated data to inform business decisions or services.

30

Deep Analytical Talent

Highly technical professionals (e.g., data scientists) skilled in advanced analytics and machine learning on messy data.

31

Data-Savvy Professional

Business-focused individuals who understand data concepts well enough to frame and interpret analytical questions.

32

Technology & Data Enabler

IT professionals who design, build, and maintain the systems that store and process Big Data.

33

Data Scientist

A specialist who converts business problems into analytical tasks, builds models on large data sets, and translates results into actionable insights.

34

Quantitative Skills

Strong mathematical and statistical capabilities essential for rigorous data analysis.

35

Critical Thinking

The practice of questioning assumptions, validating results, and evaluating data from multiple perspectives.

36

Data Analytics Lifecycle

A six-phase, iterative process guiding data science projects from discovery through operationalization.

37

Discovery Phase

Lifecycle step focused on understanding business problems, resources, data sources, and initial hypotheses.

38

Data Preparation Phase

Lifecycle step where data is cleaned, transformed, and loaded into an analytic sandbox for use.

39

Model Planning Phase

Lifecycle step selecting analytical techniques, identifying variables, and drafting the modeling approach.

40

Model Building Phase

Lifecycle step where actual statistical or machine-learning models are created, trained, and tested.

41

Communicate Results Phase

Lifecycle step in which findings are presented to stakeholders through stories, visuals, and metrics.

42

Operationalize Phase

Lifecycle step delivering final reports, code, or pilots so models can be put into everyday business use.

43

Project Sponsor

The individual who funds the analytics project, sets goals, and judges its business value.

44

Business User

Domain expert or stakeholder who benefits from the analysis and advises on practical implementation.

45

Project Manager

Person responsible for ensuring analytics milestones, timelines, and quality standards are met.

46

BI Analyst

Professional who develops dashboards and reports, providing business context and data lineage knowledge.

47

Database Administrator (DBA)

Specialist who manages database performance, security, and data access for analytics teams.

48

Data Engineer

Developer who builds data pipelines, cleans data, and prepares analytic environments for data science work.

49

CRISP-DM

A well-known methodology for data mining projects, influencing the Data Analytics Lifecycle design.

50

MAD Skills

Best-practice guidelines for model development, analytics, and deployment in data science projects.

51

Operational Data Store (ODS)

An intermediate data repository integrating data from multiple sources for operational reporting.

52

BI vs. Data Science

Contrast where BI answers what, when, and where using structured data, while Data Science tackles how and why with varied data and predictive methods.

53

Compliance Analytics

Use of data analysis to ensure adherence to laws and regulations like AML or Sarbanes-Oxley.

54

Customer Churn

The rate at which customers stop doing business with a company, often predicted via Big Data models.

55

Upselling

Encouraging customers to purchase higher-end or additional products, often guided by analytics insights.

56

Cross-Selling

Offering complementary products to existing customers, frequently targeted through predictive analytics.

57

EDW Challenge: Accessibility

Difficulty data scientists face when trying to obtain data from enterprise warehouses due to operational priorities.

58

EDW Challenge: Sampling

The need to use smaller data subsets in tools like R or Excel, potentially reducing model accuracy.

59

Shadow File System

Uncontrolled data copies created outside central IT oversight, often increasing risk and cost.

60

Iterative Process

A cyclical workflow where insights lead teams to revisit and refine earlier project phases.

61

Pilot Deployment

A limited rollout of a model or analytics solution to validate performance in a real environment before full launch.

62

Exabyte

A unit of digital information equal to 1,000 petabytes; illustrates modern Big Data scale.

63

Data Governance

Policies and procedures ensuring data quality, security, and proper usage across an organization.

64

Machine Learning

Field of study enabling systems to learn patterns from data and improve predictions automatically.

65

Scenario Optimization

Analytical technique that determines the best decision or strategy under given constraints and objectives.

66

Failover

A backup operational mode that automatically switches to a standby system if the primary system fails.