Data Analytics - Week 3

Quiz Questions and Answers Overview

The first quiz question involves identifying a key feature or benefit not usually associated with Relational Database Management Systems (RDBMS).
- Correct Answer: Analytics hardware can scale out horizontally to support big data.
- Explanation: Traditional relational databases are fundamentally designed for vertical scaling, meaning they increase capacity by adding more resources (CPU, RAM, storage) to a single server. They are not inherently built to distribute and scale data across multiple, independent servers (horizontal scaling) in the way modern distributed systems or analytics hardware do for big data workloads. While some advanced SQL versions or architectures might offer sharding capabilities, it is not the native or defining characteristic of a conventional RDBMS, which typically relies on a centralized architecture.

Key Features of Relational Databases

Support schema on write: Relational databases enforce a rigid, predefined schema that dictates the structure of data, data types, and relationships. This schema must be established and validated before any data can be written to the database, ensuring high data integrity and consistency.
Data Normalization: A core principle in RDBMS design, data normalization is the process of organizing the columns and tables of a relational database to minimize data redundancy and improve data integrity. It ensures that relationships between entities are atomic and well-defined, reducing the likelihood of data anomalies (e.g., insertion, update, and deletion anomalies).
ACID Transactions: RDBMSs are built to guarantee ACID properties (Atomicity, Consistency, Isolation, and Durability), which are crucial for ensuring the reliability of database transactions. These properties protect data integrity during concurrent operations and system failures.
Security and Access Control: Relational databases typically offer robust security models, including fine-grained access controls, user authentication, authorization mechanisms, and encryption capabilities (both at rest and in transit) to protect sensitive data.
Horizontal Scaling: As noted, this is generally not a native strength of relational databases. They are primarily designed for vertical scaling (scaling up), where performance is improved by adding more power to a single machine. Achieving horizontal scalability (scaling out) in RDBMS often requires complex techniques like sharding or replication, which add significant architectural complexity.

Distributed Database Characteristics

Node Failure Resilience: A critical advantage of distributed databases is their ability to maintain operational functionality even when one or more individual nodes fail intermittently. This resilience is achieved through data replication and distribution across multiple nodes, ensuring high availability and fault tolerance without compromising the system's overall functioning.
Data Consistency and Availability: Distributed databases often operate under trade-offs between consistency, availability, and partition tolerance (as defined by the CAP theorem). Consequently, not all data read by analytics applications necessarily represents the absolute latest state across all nodes. Systems might prioritize availability, leading to eventual consistency where data may be temporarily stale on some nodes.
Ease of Deployment and Maintenance: Due to their distributed nature, managing, deploying, and maintaining distributed databases is inherently more complex than traditional centralized relational databases. This complexity arises from managing data distribution, replication, consistency models, and coordinating operations across many independent nodes.
Database Topology: Applications interacting with distributed databases typically do not need to be aware of the exact underlying topology or data distribution. This is often facilitated by