1/65
Vocabulary flashcards summarizing key terms, roles, phases, and technologies from the Big Data Analytics lecture.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Big Data
Extremely large and complex data sets that require innovative processing to extract insights and support decision-making.
3Vs of Big Data
The core characteristics—Volume, Variety, and Velocity—that define most Big Data challenges.
Volume
The huge amount of data generated and stored, often measured in terabytes, petabytes, or exabytes.
Variety
The many different data types and formats—structured, semi-structured, quasi-structured, and unstructured—found in Big Data.
Velocity
The speed at which new data is generated, collected, and must be processed for timely insights.
Structured Data
Highly organized data in rows and columns (e.g., relational tables) that is easy to query and analyze.
Semi-Structured Data
Data with partial organization (e.g., JSON, XML, CSV, HTML) that includes tags or markers but lacks strict schema.
Quasi-Structured Data
Loosely organized data such as clickstream or chat logs that requires additional parsing to analyze.
Unstructured Data
Data without a predefined model or structure, including audio, video, images, and free text.
Spreadsheet
A low-volume data repository (e.g., Excel) providing flexible but potentially siloed data analysis.
Data Warehouse
A centralized, structured repository designed for business intelligence reporting and analytics.
Analytic Sandbox
An analyst-controlled environment that gathers diverse data sources for flexible, high-performance exploration.
Business Intelligence (BI)
Technologies and processes for collecting, integrating, and reporting structured data to support day-to-day decision-making.
Data Science
A multidisciplinary field leveraging scientific methods, algorithms, and systems to extract knowledge from data, both structured and unstructured.
Predictive Modeling
The use of statistical or machine-learning algorithms to forecast future outcomes based on historical data.
Time Series Analysis
Analytical techniques that examine data points sequenced over time to identify trends and seasonal patterns.
Exploratory Analytics
Open-ended data examination aimed at discovering patterns or relationships without predefined hypotheses.
Explanatory Analytics
Analysis focused on explaining why observed events occurred, often linking cause and effect.
Enterprise Data Warehouse (EDW)
A large, centralized data warehouse supporting enterprise-wide reporting, backups, and security.
Data Mart
A departmental subset of a data warehouse tailored for a specific business function’s analytics needs.
Data Extract
A copy of data removed from a main repository for analysis in external tools such as R or Excel.
ETL (Extract, Transform, Load)
The process of pulling data from sources, cleaning/transforming it, and loading it into a target system.
NoSQL
Non-relational database technologies designed for flexible schemas and large-scale, distributed data storage.
Hadoop
An open-source framework that stores and processes large data sets across clusters of commodity hardware.
Big Data Driver
A factor—such as medical imaging, IoT sensors, or social media—that accelerates data growth and necessitates new analytics.
Sensor Net
A network of data-emitting devices (e.g., smartphones, smart meters) continuously generating real-time data streams.
Data Collector
An entity or system that gathers raw data from devices, applications, or networks for storage and preprocessing.
Data Aggregator
An organization that combines data from multiple collectors, organizes it, and sells or distributes it to users.
Data User/Buyer
A company or group that purchases or accesses aggregated data to inform business decisions or services.
Deep Analytical Talent
Highly technical professionals (e.g., data scientists) skilled in advanced analytics and machine learning on messy data.
Data-Savvy Professional
Business-focused individuals who understand data concepts well enough to frame and interpret analytical questions.
Technology & Data Enabler
IT professionals who design, build, and maintain the systems that store and process Big Data.
Data Scientist
A specialist who converts business problems into analytical tasks, builds models on large data sets, and translates results into actionable insights.
Quantitative Skills
Strong mathematical and statistical capabilities essential for rigorous data analysis.
Critical Thinking
The practice of questioning assumptions, validating results, and evaluating data from multiple perspectives.
Data Analytics Lifecycle
A six-phase, iterative process guiding data science projects from discovery through operationalization.
Discovery Phase
Lifecycle step focused on understanding business problems, resources, data sources, and initial hypotheses.
Data Preparation Phase
Lifecycle step where data is cleaned, transformed, and loaded into an analytic sandbox for use.
Model Planning Phase
Lifecycle step selecting analytical techniques, identifying variables, and drafting the modeling approach.
Model Building Phase
Lifecycle step where actual statistical or machine-learning models are created, trained, and tested.
Communicate Results Phase
Lifecycle step in which findings are presented to stakeholders through stories, visuals, and metrics.
Operationalize Phase
Lifecycle step delivering final reports, code, or pilots so models can be put into everyday business use.
Project Sponsor
The individual who funds the analytics project, sets goals, and judges its business value.
Business User
Domain expert or stakeholder who benefits from the analysis and advises on practical implementation.
Project Manager
Person responsible for ensuring analytics milestones, timelines, and quality standards are met.
BI Analyst
Professional who develops dashboards and reports, providing business context and data lineage knowledge.
Database Administrator (DBA)
Specialist who manages database performance, security, and data access for analytics teams.
Data Engineer
Developer who builds data pipelines, cleans data, and prepares analytic environments for data science work.
CRISP-DM
A well-known methodology for data mining projects, influencing the Data Analytics Lifecycle design.
MAD Skills
Best-practice guidelines for model development, analytics, and deployment in data science projects.
Operational Data Store (ODS)
An intermediate data repository integrating data from multiple sources for operational reporting.
BI vs. Data Science
Contrast where BI answers what, when, and where using structured data, while Data Science tackles how and why with varied data and predictive methods.
Compliance Analytics
Use of data analysis to ensure adherence to laws and regulations like AML or Sarbanes-Oxley.
Customer Churn
The rate at which customers stop doing business with a company, often predicted via Big Data models.
Upselling
Encouraging customers to purchase higher-end or additional products, often guided by analytics insights.
Cross-Selling
Offering complementary products to existing customers, frequently targeted through predictive analytics.
EDW Challenge: Accessibility
Difficulty data scientists face when trying to obtain data from enterprise warehouses due to operational priorities.
EDW Challenge: Sampling
The need to use smaller data subsets in tools like R or Excel, potentially reducing model accuracy.
Shadow File System
Uncontrolled data copies created outside central IT oversight, often increasing risk and cost.
Iterative Process
A cyclical workflow where insights lead teams to revisit and refine earlier project phases.
Pilot Deployment
A limited rollout of a model or analytics solution to validate performance in a real environment before full launch.
Exabyte
A unit of digital information equal to 1,000 petabytes; illustrates modern Big Data scale.
Data Governance
Policies and procedures ensuring data quality, security, and proper usage across an organization.
Machine Learning
Field of study enabling systems to learn patterns from data and improve predictions automatically.
Scenario Optimization
Analytical technique that determines the best decision or strategy under given constraints and objectives.
Failover
A backup operational mode that automatically switches to a standby system if the primary system fails.