Big Data Chapter 4

Chapter 4

 

Types of NoSQL

Key-Value Store - Stores data as a collection of key-value pairs where a key serves as a unique identifier. Highly efficient for lookups, insertions, and deletions.

Document Store- It stores data in documents (typically JSON, BSON, etc.) and allows nested structures. It is ideal for storing, retrieving, and managing document-oriented information.

Wide- Column Store- It stores data in tables, rows, and dynamic columns. It is efficient for querying large datasets and suitable for distributed computing.

Graph Databases - Stores data in nodes and edges, representing entities and their interrelations. Ideal for analyzing interconnected data and complex queries.

Times Series-  Optimized for handling time-stamped data. Ideal for analytics over time-series data like financial data, IoT sensor data, etc.

Multi-Model- Supports multiple data models against a single, integrated backend. This can include documents, graphs, key values, in-memory, and search engines.

Key-Value: Redis, Dynamo DB

Document: MongoDB, Couch DB

Wide Column: Cassandra, HBase

Graph: Neo4j, Orient DB

Time Series: InfluxDB, TimescaleDB

Multi-Model: FaunaDB, ArangoDB

 

 

 

 

 

 

Redis

Redis (Remote Dictionary Server) is an in-memory data structure store.

Prioritizes Consistency and Partition Tolerance when configured in a distributed setup

Key Features of Redis: Speed, Persistence Options, Scalability, Wide Use Cases, Advanced Features, Atomic Operations, Data Structures like Strings hashes Lists Sets Sorted Sets

 

The Redis server is the heart of the Redis system, handling all data storage, processing, and management tasks.

  • A simple database, i.e., a single primary shard.

  • A highly available (HA) database, i.e., a pair of primary and replica shards.

  • A clustered database contains multiple primary shards, each managing a subset of the dataset.

  • An HA clustered database, i.e., multiple pairs of primary/replica shards.

 

Shard: Splitting data across multiple Redis instances to distribute load and data volume. It's like breaking a big dataset into smaller, manageable pieces.

Cluster: A group of Redis nodes that share data. Provides a way to run Redis where data is automatically sharded across nodes.

Replication is copying data from one Redis server to another for redundancy and scalability. The primary server's data is replicated to one or more secondary (replica) servers.

Transactions: Grouping commands to be executed as a single isolated operation, ensuring atomicity.

Atomicity - The most important reason to use transactions is that they guarantee all commands will be executed together without any other client's commands interrupting them.

Consistency in reads - Within a transaction, you get a consistent view of the data. Commands see the data as it was when the transaction started, not as it changes during the transaction.

Batch operations - Transactions reduce network overhead by sending multiple commands in a single request, which improves performance.

Optimistic locking with WATCH - When combined with the WATCH command, transactions provide a way to ensure data hasn't changed since you last read it.

Pipeline: Bundling multiple commands to reduce request/response latency. Commands are queued and executed at once.

Persistence: Saving data to disk for durability. Redis offers RDB (snapshotting) and AOF (logging every write operation).

RDB (Redis Database)

RDB periodically creates point-in-time snapshots of your dataset at specified intervals. It is generally faster for larger datasets because it doesn't write every disk change, reducing I/O overhead.

AOF (Append Only File)

Durability: Records every write operation received by the server. You can configure the fsync policy to balance between durability and performance.

Data Loss Risk: Less risk of data loss compared to RDB. It can be configured to append each operation to the AOF file as it happens or every second.

Recovery Speed: Slower restarts compared to RDB because Redis replays the entire AOF to rebuild the state.

Multi-Model Database

 

SETNX: Sets the value of a key only if the key does not exist.

SETEX: Sets the value of a key with an expiration time.

MSET: Sets multiple keys to multiple values in a single atomic operation.

MGET: Gets the values of all the given keys.

INCR: Increments the integer value of a key by one.

DECR: Decrements the integer value of a key by one.

INCRBY: Increments the integer value of a key by the given amount.

DECRBY: Decrements the integer value of a key by the given number.

INCRBYFLOAT: Increments the float value of a key by the given amount.

GETSET: Sets a new value and returns the old value.

MSETNX: Sets multiple keys to multiple values only if none exist.

PSETEX: Similar to SETEX PSETEX but with an expiration time in milliseconds.

  • LPUSH adds a new element to the head of a list; RPUSH adds to the tail.

  • LPOP removes and returns an element from the head of a list; RPOP does the same but from the tails of a list.

  • LLEN Returns the length of a list.

  • LMOVE Atomically moves elements from one list to another.

  • LTRIM Reduces a list to the specified range of elements.

FIFO: First In, First Out

LIFO: Last In, First Out

Sort Alphabets with ALPHA

 

A Redis set is an unordered collection of unique strings (members)

 

Redis hashes are record types structured as collections of field-value pairs. You can use hashes to represent basic objects and to store groupings of counters, among other things.

  • HSET sets the value of one or more fields on a hash.

  • HGET returns the value at a given field.

  • HMGET returns the values at one or more given fields.

  • HINCRBY increments the value at a given field by the integer provided.

Sets a Key's time to live (TTL). The key will be automatically deleted from Redis once a specific duration (in seconds) has elapsed.

 

 

Redis Pub/Sub (Publish/Subscribe) is a messaging paradigm within Redis that allows for message broadcasting through channels. This feature enables the development of real-time messaging applications by allowing publishers to send messages to an unspecified number of subscribers asynchronously.

 

Proximity Searches: These find items close to a given point, such as the nearest restaurants to a user's location.

Radius Queries: These queries retrieve items within a specific distance from a point. They are useful for services like delivery area checks or local event discovery.

Distance Calculation: Calculate the distance between two geo points.

GEOADD to add a point GEODIST to find the distance between two points GEOSEARCH to find all points within a radius

GEOSEARCH is same as GEORADIUS but cleaner syntax.

Geohashing is a method of encoding geographic coordinates (latitude and longitude) into a compact string of letters and digits

 

Redis Stack extends the core features of Redis OSS and provides a complete developer experience for debugging and more.

JSON.SET

JSON.GET

JSON.DEL

 

Redis Search (or RediSearch) is a full-text search and secondary indexing engine for Redis. It allows for performing complex searches and filtering over the data stored in Redis without needing a relational database. This powerful module enables advanced querying capabilities like full-text search, filtering, aggregation, and auto-complete.

Key Features of Redis Search: Full test Search, Secondary Indexing, Complex Querying, Autocomplete, Faceted Search Aggregation

FT.SEARCH

FT.CREATE

 

This method takes snapshots of your database at specified intervals. It's efficient for saving a compact, point-in-time snapshot of your dataset. The RDB file is a binary file that Redis can use to restore its state. SAVE

 

Synchronous save of the dataset to disk. When Redis starts the save operation, it blocks all the clients until the save operation is complete. NOT RECOMMENDED IN PROD.

BGSAVE

 

 

AOF (append only File) This method logs every write operation the server receives, appending each operation to a file. This allows for more granular persistence and more durability than RDB, as you can configure Redis to append data on every write operation at the cost of performance. The AOF file can be replayed to reconstruct the state of the data.

 

robot