NoSQL & SQL Databases Overview

NoSQL

  • Definition: NoSQL is a non-relational Database Management System (DMS) that does not require a fixed schema, avoids joins, and is easy to scale.
  • Use Case: Best suited for distributed data storage with large storage requirements, commonly used for Big Data and real-time web applications (e.g., Twitter, Facebook).

SQL

  • Definition: Structured Query Language (SQL) is the standard language for interacting with relational databases, utilizing tables to define relationships.
  • Use Case: Effective for inserting, searching, updating, and deleting records in databases - primarily used for Online Analytical Processing (OLAP).

Comparison of SQL and NoSQL

Parameters:
  • Design for:

    • SQL: Relational databases -> RDBMS
    • NoSQL: Non-relational databases (distributed database technologies)
  • Query Language Type:

    • SQL: Uses declarative SQL syntax
    • NoSQL: Lacks a singular declarative query language
  • Schema:

    • SQL: Predefined schema
    • NoSQL: Dynamic schema allowing for unstructured data
  • Ability to Scale:

    • SQL: Vertically scalable
    • NoSQL: Horizontally scalable
  • Examples:

    • SQL: Oracle, Postgres, MS-SQL
    • NoSQL: MongoDB, Redis, Neo4j, Cassandra, HBase
  • Complex Queries:

    • SQL: Ideal for complex, query-intensive environments
    • NoSQL: Less suited to complex queries; better for hierarchical data storage
Development Timeline:
  • SQL: Developed in the 1970s
  • NoSQL: Emerged in late 2000s to address SQL limitations
Open Source and Consistency:
  • SQL: Includes open-source solutions (e.g., Postgres, MySQL) and proprietary solutions (e.g., Oracle).
  • NoSQL: Generally open-source; consistency varies (e.g., MongoDB offers strong consistency while Cassandra provides eventual consistency).
Use Cases for Best Practices:
  • SQL: Use when data validity is crucial; ideal for ACID transactions.
  • NoSQL: Suitable for scenarios where fast data availability is prioritized, such as dynamic queries and scaling needs.

RDBMS vs Hadoop

  • Schema: RDBMS uses 'Schema on Write'; Hadoop employs 'Schema on Read'.
  • Data Type: RDBMS is limited to structured data; Hadoop accommodates structured, semi-structured, and unstructured data.
  • Speed: RDBMS excels at fast reads; Hadoop focuses on fast writes.
  • Cost: RDBMS generally involves licensing costs; Hadoop is an open-source framework.
Distributed Computing Challenges:
  • Key Challenges:
    • Transparency
    • Concurrency
    • Openness