A-Level Computer Science

NoSQL

Studied by 0 people

0.0(0)

Get a hint

Hint

Why NoSQL?

1 / 132

Earn XP

Description and Tags

NoSQL

Computer Science

A-Level Computer Science

NoSQL

databases

133 Terms

Why NoSQL?

•Relational databases have been a successful technology for twenty years, providing persistence, concurrency control, and an integration mechanism.

•Application developers have been frustrated with the impedance mismatch between the relational model and the in-memory data structures.

•The vital factor for a change in data storage was the need to support large volumes of data by running on clusters.

•Relational databases are not designed to run efficiently on clusters.

New cards

Common characteristics of a NoSQL database

•No relational model

•Suited to clusters

•Open-source

•Suits unstructured data - Schema less

New cards

Current NoSQL trends

customer shift continues online
the internet is connecting everything
big data is getting better
applications are movie to the cloud
the world has gone mobile

New cards

Different NoSQL database models

key value
document
column family
graph (aggregate ignorant)

New cards

Aggregate orientated

Makes it easier for the database to manage data storage over clusters

New cards

Aggregate

A collection of data that can be interrogated as a unit

New cards

Aggregate orientated works well when

most of the time the same aggregate is needed

New cards

aggregate orientated does not work well when

users cahnge how the interrogate the data regularly

New cards

update consistency

write-write conflicts (lost updates, values overwritten)

New cards

read consistency

read-write conflict (reading in the middle of someone else’s write)

New cards

relaxed consistency

consistent DB is possible, but at what performance impact? Tradeoff may be necessary and the domain may tolerate some inconsistency

New cards

The CAP Theorem (Consistency, Availibility, Partition Tolerance)

Given the three properties, you can only get two

New cards

Consistency

When data is queried, the user recieves the most up to date version of data

New cards

Availability

When data is queried, the user always recieves a response, even if it is not the most up to date version)

New cards

Partition Tolerance

Where a database is distributed over a clustered network, should part of the network fail, the rest of the clustered network can continue to operate (partitions on the network can be tolerated)

New cards

if the network is working normally

all nodes are operating normally, reading, writing, and syncing with each other (Consistency AND availability)

New cards

if the network becomes partitioned

parts of the network has failed, which partitions the networks and nodes cant communicate with each other normally (consistency OR availability)

New cards

Write-write conflicts occur when

two clients try to write the same data at the same time

New cards

read-write conflicts occur when

one client reads inconsistent data in the middle of another client’s write

New cards

pessimistic approaches lock data records to

prevent conflicts

New cards

optimistic approaches detect conflicts and

fix them

New cards

distributed systems see read-write conflicts due to

some nodes having recieved updates while other nodes have not

New cards

eventual consistency

at some point the system will become consistent once all the writes have propagated to all the nodes

New cards

the CAP theorem states that if you get a network partition, you have to

trade off availability of data versus consistency

New cards

document data model two options

Embedded or Normalised

New cards

Embedded Data Model

Capture relationships between data by storing related data in a single document structure

allow applications to retrieve and manipulate related data in a single database operation

New cards

Normalised Data Model

references store the relationships between data by including links from one document to another

New cards

Distributed database

a logically interrelated collection of shared data, physically distibured over a computer network

New cards

Distributed DBMS

software system that permits the management of the distributed database and makes the distribution transparent to users

New cards

Types of DDBMS

Homogeneous and Heterogeneous

New cards

Homogeneous DDBMS

All sites use same DBMS product, much easier to design and manage

approach provides incremental growth and allows increased performance

New cards

Heterogeneous DDBMS

Sites may run different DBMS products with possibly different underlying data models

occurs when sites have implemented their own databases and integration is considered later

New cards

heterogeneous DDBMS require translations to allow for

different hardware and different DBMS products

New cards

functions of a DDBMS

functionality of a DBMS, extended communication services, data dictionary

concurrency control and recovery services, as well as distributed query processing

New cards

three key issues of distributed database design

fragmentation, allocation and replication

New cards

Fragmentation

Relation may be divieded into a number of sub-relations, which are then distributed

New cards

Allocation

Each fragment is stored at site with optimal distribution

New cards

Replication

copy of fragments may be maintained at several sites

New cards

definition and allocation of fragments carried out strategically to achieve

locality of reference

improved reliability and availability

improved performance

balanced storage capacities and costs

minimal communications costs

New cards

quantitative infomation for fragmentation may include

frequency with which an application is run

site from which an application is run

performance criteria for transactions and applications

New cards

qualitative information may include

transactions that are executed by application

type of access

predicates of read operations

New cards

four alternative strategies regarding placement of data for data allocation:

centralized

partitioned (fragmented)

complete replication

selective replication

New cards

Centralized (Data Allocation)

Consists of single database and DBMS stored at one site with users distributed across the network

New cards

Partitioned (data allocation)

database partitioned into disjoint fragments

each fragment assigned to one site

New cards

complete replication (data allocation)

consists of maintaining complete copy of database at each site

New cards

selective replication (data allocation)

combination of partitioning, replication, and centralization

New cards

Why fragment?

applications work with views rather than entire relations
data is stored close to where it is most frequently used
data that is not needed by local applications is not stored
with fragments as unit of distribution, transaction can be divided into several subqueries that operate on fragments
data not required by local applications is not stored and so not available to unauthorized users

New cards

disadvantaged od fragmenting

performance, integrity

New cards

types of fragmentation

horizontal, vertical, mixed, derived

New cards

Transparencies in a DDBMS

distribution

fragmentation

location

replication

local mapping

naming

transaction

concurrency

failure

performance

DBMS

New cards

distribution transparency

allows user to percieve database as single, logical entity

New cards

if DDBMS exhibits distribution transparency, user does not need to know

data is fragmented, location of data items, otherwise it would be local mapping transparency

New cards

naming transparency

each item in a DDB must have a unique name

DDBMS must ensure that no two sites create a database object with same name

One solution is to create central name server, however this results in loss of local autonomy

central site may become a bottleneck

low availability

New cards

transaction transparency

ensures that all distributed transactions maintain distributed database’s integrity and consistency

distributed transaction accesses data stored at more than one location

each transaction is divided into number of subtransactions

one for each site that has to be accessed

New cards

concurrency transparency

all transactions must execute independently and be logically consistent with results obtained if transactions executed one at a time, in some arbitrary serial order, same fundamental principles as for centalized DBMS

New cards

Failure transparency

DDBMS must ensure atomicity and durability of global transaction, means ensuring that subtransactions of global transaction either all commit or all abort

New cards

performance transparency

DDBMS must perform as if it were a centralized DBMS

DDBMS should not suffer any performance degradation due to the distributed architecture

DDBMS should determine most conse-effective strategy to execute a request

New cards

12 rules for a DDBMS

Local Autonomy
No Reliance on a Central Site
Continuous Operation
Location Independence
Fragmentation Independence
Replication Independence
Distributed Query Processing
Distributed Transaction Processing
Hardware Independence
Operating System Independence
Network Independence
Database Independence

New cards

aggregation suits what databases

key-value, document, column-family

New cards

aggregates are a natural unit for

replication and sharding

New cards

aggregates easier for developers to work with as they

naturally manipulate data through aggregate structures

New cards

paths for distributing the DB

replication, sharding, single server

New cards

replication

same data copied to multiple nodes

New cards

sharding

different data copied on different nodes

New cards

single server - if the driver to use NoSQL is not running the DB on a cluster

no distribution of the DB is needed

New cards

Benefits of single server

eliminates all the complexities that the other distribution options introduce
easy for operations people to manage
- easy for application developers to reason about

New cards

What database works best in a single server configuration

graph

New cards

each shard (data)

read and writes its own data

New cards

with sharding, each node has

different data

New cards

with sharding, ideally each user accesses

one node each

New cards

how to decide the allocation of shards

aggregate orientation obvious unit of distribution

store aggregates together if they are normally read in sequence

use application logic to decide

some NoSQL applications will offer auto-sharding

New cards

how to improve performance of sharding

locality of reference, distribute aggregates evenly across nodes

New cards

sharding benefits

improves read and write performance

New cards

sharding drawbacks

may affect resilience

a node failure makes that shard’s data unavailable in the same way as a non-distributed model

only the user of that data on that shard will suffer

clusters may use less reliable machines making node failure more likely

New cards

master-slave replication

all changes are made to the master

changes propagate to slaves

reads can be done from either

New cards

master-slave replication details

data replicated across multiple nodes

one node is appointed (automatically or manually) as the master, it is the authoriative source for the data and responsible for it’s updates

the other nodes are slaves

New cards

benefits of master-slave replication

good for scaling out if the dataset is read intensive (add more slaves to handle the read load - will mean the master has to synchronise to more slaves when writing)

read reilience - if the master should fail, the slaves can still handle read requests, writes have to wait until the master recovers or is replaced

New cards

drawbacks of master-slave replication

not good for datasets with heavy write traffic, may cause inconsistency

New cards

peer-to-peer replication

no master node, all the replicas have equal standing, all nodes can accept writes

New cards

benefits of ptp replication

a failed node doesnt mean no writes are possible, adding more nodes improves performance

New cards

drawbacks of ptp replication

consistency - write-write conflicts are forever

New cards

how to mitigate w-w conflicts in ptp replication

nodes replicas co-ordinate to avoid conflict, allow inconsistent writes - both solutions trade consistency for availability

New cards

combining sharding and replication

multiple masters

each data item only has a single master

a node might be a master for some data and a slave for others, or nodes may be dedicated for master or slave duties

ptp and sharding is a common strategy for column family databases

New cards

ke value model

simplest NoSQL DB model

uses the concept of key-value pairs

aggregate oriented but sees no structure in the aggregate

scaling is achieved with sharding

removing aggregates is preformed using a key only

New cards

key value model in theory

the developer cannot search on fields within the aggregate, or retrieve parts of the aggregate

New cards

key value model in practice

some databases which would classify as key-value may still allow some structure on the data beyond a big blob of data, so the distinction between key-value DBs and document DBs has a grey area

New cards

document model NoSQL database

aggregate oriented

recognises structure in the aggregate (structures and types result in the limits on what can be stored but allows the developer flexibility in retrievals)

often developers add an ID field in a document database to do a key-value style lookup

New cards

key-value model applications

session information, user profiles/preferences, shopping cart data

New cards

document model scaling

achieved through a combination of sharding and replication, this allows for high availability and can be a challenge for consistency

New cards

document model applications

event logging

content management systems

web analytics

e-commerce applications

New cards

column family model NoSQL databases

based on a column oriented model

data in cells grouped in columns rather than rows of data

use a concept of a keyspace which shows the structure of the column family (like tables as they contain rows, each row contains columns)

use of a query language to query data

ptp replication

New cards

column family scaling

acheived through adding more nodes to the cluster

New cards

column family applications

event logging

content management systems

New cards

graph model NoSQL databases

have different drivers with an opposite model

small records with complex connections

graph data structure of nodes connected by edges/arcs

once the database is populated with nodes and edges the developer can query it

queries can exploit the complex relationships in the graph much easier than a RDBMS

the queries can be chopped and changed regularly

New cards

graph model scaling

can more likely run as a single server DB

no distribution is required so data is highly consistent

scaling is a challenge as nodes on different machines is a performance concern

New cards

graph model applications

social networks

product preferences

eligibility rules

routing

dispatch

location based services

recommendation engines

New cards

each database model can be compared by

features

scaling

availability

consistency

data retrieval

New cards

secure systems require

what are the assests to be secured?

what are the threats to which those assets are vulnerable?

what services should be put in place to address those threats?

what are the current technological mechanisms that can support the services required?

New cards

what are the assets to be secured?

network resources, DB servers, application servers

New cards

100

what are the threats to which those assets are vulnerable?

information leakage - disclosure to unauthorised parties

integrity violation - data loss or corruption

denial of service - unavailability of system/service/network

illegitimate use - use of resource by unauthorised person or in unauthorised way

New cards

Explore top notes

Mixtures and Chromatography

Note

Studied by 4 people

... ago

5.0(1)

Spanish 2

Note

Studied by 94 people

... ago

5.0(1)

AP Microeconomics - Ultimate Guide

Note

Studied by 433 people

... ago

5.0(3)

Unit 1: Foundation of the U.S. Government

Note

Studied by 33 people

... ago

5.0(1)

Physics: Electricity, Circuits, and Electromagnetism

Note

Studied by 28 people

... ago

5.0(1)

Music in the Fifteenth Century

Note

Studied by 32 people

... ago

5.0(2)

4.16 Music After Beethoven: Romanticism

Note

Studied by 11 people

... ago

5.0(1)

Biology Final Exam - Sem 1 - Grade 7

Note

Studied by 29 people

... ago

5.0(1)

Explore top flashcards

La Identidad Regional en España

Flashcard (121)

Studied by 2 people

... ago

5.0(1)

French Subjunctive

Flashcard (84)

Studied by 13 people

... ago

5.0(1)

Vocab Unit 7 Level F

Flashcard (20)

Studied by 16 people

... ago

5.0(1)

Miembro Superior -Anatomia

Flashcard (27)

Studied by 8 people

... ago

5.0(1)

blood

Flashcard (83)

Studied by 12 people

... ago

5.0(1)

Chapter 5: Hair, Nails, Glands, & Functions of Skin

Flashcard (53)

Studied by 2 people

... ago

5.0(1)

modern times

Flashcard (48)

Studied by 50 people

... ago

5.0(1)

chemical reactions

Flashcard (24)

Studied by 16 people

... ago

5.0(1)