1/23
Vocabulary flashcards covering analytics, scalability concepts, and the analytics tools mentioned in the lecture notes.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Analytics
The scientific process of discovering and communicating meaningful patterns in data to turn raw data into insights for better decisions; relies on statistics, programming, and operations research.
Data Scalability
The ability of a system to adapt over time to changes in data volume or processing demands, increasing or decreasing performance and cost.
Data Scaling
A technique to manage the overflow of data by increasing system capacity, enabling scalable data platforms and accommodating growth.
Scalable Data Platform
A data platform that can rapidly adjust to data growth or traffic by adding hardware or software, preparing for future needs.
Scaling Up (Vertical Scaling)
Adding faster processors and more memory to a single server; uses less network hardware and power, often a short-term fix for growth.
Scaling Out (Horizontal Scaling)
Adding more servers to perform parallel computing; a long-term solution that grows with demand by expanding the cluster.
Big Data
Extremely large and complex datasets that require advanced processing, storage, and analytics beyond traditional systems.
High CPU Usage
A key performance issue where CPU utilization is near capacity, slowing processing tasks.
Low Memory
A performance bottleneck caused by insufficient RAM, leading to slower data processing and possible swapping.
High Disk Usage
Excessive disk I/O and storage utilization causing slower data access and processing.
Tableau
A data visualization tool used to create interactive dashboards and visual analytics.
Python
A widely-used programming language for data analysis, scripting, and machine learning.
Microsoft Excel
Spreadsheet software used for data analysis, calculations, and charts.
SPSS
Statistical Package for the Social Sciences; software for statistical analysis.
SAS
Statistical Analysis System; analytics software for data management and analysis.
Apache Spark
Open-source distributed data processing engine for large-scale analytics.
Hadoop
Open-source framework for distributed storage and processing of big data across clusters.
RapidMiner
Data science platform for data preparation, modeling, and deployment.
Serverless Computing
Cloud model where the provider manages server provisioning and scaling, allowing code execution without managing servers.
Artificial Intelligence
Field of computing focused on creating systems that perform tasks requiring human-like intelligence.
Cloud Computing
Delivery of computing services over the internet, including storage, processing, and analytics.
Blockchain
Distributed ledger technology enabling secure, transparent recording of transactions.
Syllabus
Document outlining course topics, requirements, assessments, and grading.
Data-driven Decision Making
Decision making guided by data insights rather than intuition alone.