NoSQL

studied byStudied by 0 people
0.0(0)
Get a hint
Hint

Why NoSQL?

1 / 132

flashcard set

Earn XP

133 Terms

1

Why NoSQL?

•Relational databases have been a successful technology for twenty years, providing persistence, concurrency control, and an integration mechanism.

•Application developers have been frustrated with the impedance mismatch between the relational model and the in-memory data structures.

•The vital factor for a change in data storage was the need to support large volumes of data by running on clusters.

•Relational databases are not designed to run efficiently on clusters.

New cards
2

Common characteristics of a NoSQL database

•No relational model

•Suited to clusters

•Open-source

•Suits unstructured data  - Schema less

New cards
3

Current NoSQL trends

  • customer shift continues online

  • the internet is connecting everything

  • big data is getting better

  • applications are movie to the cloud

  • the world has gone mobile

New cards
4

Different NoSQL database models

  • key value

  • document

  • column family

  • graph (aggregate ignorant)

New cards
5

Aggregate orientated

Makes it easier for the database to manage data storage over clusters

New cards
6

Aggregate

A collection of data that can be interrogated as a unit

New cards
7

Aggregate orientated works well when

most of the time the same aggregate is needed

New cards
8

aggregate orientated does not work well when

users cahnge how the interrogate the data regularly

New cards
9

update consistency

write-write conflicts (lost updates, values overwritten)

New cards
10

read consistency

read-write conflict (reading in the middle of someone else’s write)

New cards
11

relaxed consistency

consistent DB is possible, but at what performance impact? Tradeoff may be necessary and the domain may tolerate some inconsistency

New cards
12

The CAP Theorem (Consistency, Availibility, Partition Tolerance)

Given the three properties, you can only get two

New cards
13

Consistency

When data is queried, the user recieves the most up to date version of data

New cards
14

Availability

When data is queried, the user always recieves a response, even if it is not the most up to date version)

New cards
15

Partition Tolerance

Where a database is distributed over a clustered network, should part of the network fail, the rest of the clustered network can continue to operate (partitions on the network can be tolerated)

New cards
16

if the network is working normally

all nodes are operating normally, reading, writing, and syncing with each other (Consistency AND availability)

New cards
17

if the network becomes partitioned

parts of the network has failed, which partitions the networks and nodes cant communicate with each other normally (consistency OR availability)

New cards
18

Write-write conflicts occur when

two clients try to write the same data at the same time

New cards
19

read-write conflicts occur when

one client reads inconsistent data in the middle of another client’s write

New cards
20

pessimistic approaches lock data records to

prevent conflicts

New cards
21

optimistic approaches detect conflicts and

fix them

New cards
22

distributed systems see read-write conflicts due to

some nodes having recieved updates while other nodes have not

New cards
23

eventual consistency

at some point the system will become consistent once all the writes have propagated to all the nodes

New cards
24

the CAP theorem states that if you get a network partition, you have to

trade off availability of data versus consistency

New cards
25

document data model two options

Embedded or Normalised

New cards
26

Embedded Data Model

Capture relationships between data by storing related data in a single document structure

allow applications to retrieve and manipulate related data in a single database operation

New cards
27

Normalised Data Model

references store the relationships between data by including links from one document to another

New cards
28

Distributed database

a logically interrelated collection of shared data, physically distibured over a computer network

New cards
29

Distributed DBMS

software system that permits the management of the distributed database and makes the distribution transparent to users

New cards
30

Types of DDBMS

Homogeneous and Heterogeneous

New cards
31

Homogeneous DDBMS

All sites use same DBMS product, much easier to design and manage

approach provides incremental growth and allows increased performance

New cards
32

Heterogeneous DDBMS

Sites may run different DBMS products with possibly different underlying data models

occurs when sites have implemented their own databases and integration is considered later

New cards
33

heterogeneous DDBMS require translations to allow for

different hardware and different DBMS products

New cards
34

functions of a DDBMS

functionality of a DBMS, extended communication services, data dictionary

concurrency control and recovery services, as well as distributed query processing

New cards
35

three key issues of distributed database design

fragmentation, allocation and replication

New cards
36

Fragmentation

Relation may be divieded into a number of sub-relations, which are then distributed

New cards
37

Allocation

Each fragment is stored at site with optimal distribution

New cards
38

Replication

copy of fragments may be maintained at several sites

New cards
39

definition and allocation of fragments carried out strategically to achieve

locality of reference

improved reliability and availability

improved performance

balanced storage capacities and costs

minimal communications costs

New cards
40

quantitative infomation for fragmentation may include

frequency with which an application is run

site from which an application is run

performance criteria for transactions and applications

New cards
41

qualitative information may include

transactions that are executed by application

type of access

predicates of read operations

New cards
42

four alternative strategies regarding placement of data for data allocation:

centralized

partitioned (fragmented)

complete replication

selective replication

New cards
43

Centralized (Data Allocation)

Consists of single database and DBMS stored at one site with users distributed across the network

New cards
44

Partitioned (data allocation)

database partitioned into disjoint fragments

each fragment assigned to one site

New cards
45

complete replication (data allocation)

consists of maintaining complete copy of database at each site

New cards
46

selective replication (data allocation)

combination of partitioning, replication, and centralization

New cards
47

Why fragment?

  • applications work with views rather than entire relations

  • data is stored close to where it is most frequently used

  • data that is not needed by local applications is not stored

  • with fragments as unit of distribution, transaction can be divided into several subqueries that operate on fragments

  • data not required by local applications is not stored and so not available to unauthorized users

New cards
48

disadvantaged od fragmenting

performance, integrity

New cards
49

types of fragmentation

horizontal, vertical, mixed, derived

New cards
50

Transparencies in a DDBMS

distribution

fragmentation

location

replication

local mapping

naming

transaction

concurrency

failure

performance

DBMS

New cards
51

distribution transparency

allows user to percieve database as single, logical entity

New cards
52

if DDBMS exhibits distribution transparency, user does not need to know

data is fragmented, location of data items, otherwise it would be local mapping transparency

New cards
53

naming transparency

each item in a DDB must have a unique name

DDBMS must ensure that no two sites create a database object with same name

One solution is to create central name server, however this results in loss of local autonomy

central site may become a bottleneck

low availability

New cards
54

transaction transparency

ensures that all distributed transactions maintain distributed database’s integrity and consistency

distributed transaction accesses data stored at more than one location

each transaction is divided into number of subtransactions

one for each site that has to be accessed

New cards
55

concurrency transparency

all transactions must execute independently and be logically consistent with results obtained if transactions executed one at a time, in some arbitrary serial order, same fundamental principles as for centalized DBMS

New cards
56

Failure transparency

DDBMS must ensure atomicity and durability of global transaction, means ensuring that subtransactions of global transaction either all commit or all abort

New cards
57

performance transparency

DDBMS must perform as if it were a centralized DBMS

DDBMS should not suffer any performance degradation due to the distributed architecture

DDBMS should determine most conse-effective strategy to execute a request

New cards
58

12 rules for a DDBMS

  1. Local Autonomy

  2. No Reliance on a Central Site

  3. Continuous Operation

  4. Location Independence

  5. Fragmentation Independence

  6. Replication Independence

  7. Distributed Query Processing

  8. Distributed Transaction Processing

  9. Hardware Independence

  10. Operating System Independence

  11. Network Independence

  12. Database Independence

New cards
59

aggregation suits what databases

key-value, document, column-family

New cards
60

aggregates are a natural unit for

replication and sharding

New cards
61

aggregates easier for developers to work with as they

naturally manipulate data through aggregate structures

New cards
62

paths for distributing the DB

replication, sharding, single server

New cards
63

replication

same data copied to multiple nodes

New cards
64

sharding

different data copied on different nodes

New cards
65

single server - if the driver to use NoSQL is not running the DB on a cluster

no distribution of the DB is needed

New cards
66

Benefits of single server

  • eliminates all the complexities that the other distribution options introduce

  • easy for operations people to manage

    • easy for application developers to reason about

New cards
67

What database works best in a single server configuration

graph

New cards
68

each shard (data)

read and writes its own data

New cards
69

with sharding, each node has

different data

New cards
70

with sharding, ideally each user accesses

one node each

New cards
71

how to decide the allocation of shards

aggregate orientation obvious unit of distribution

store aggregates together if they are normally read in sequence

use application logic to decide

some NoSQL applications will offer auto-sharding

New cards
72

how to improve performance of sharding

locality of reference, distribute aggregates evenly across nodes

New cards
73

sharding benefits

improves read and write performance

New cards
74

sharding drawbacks

may affect resilience

a node failure makes that shard’s data unavailable in the same way as a non-distributed model

only the user of that data on that shard will suffer

clusters may use less reliable machines making node failure more likely

New cards
75

master-slave replication

all changes are made to the master

changes propagate to slaves

reads can be done from either

New cards
76

master-slave replication details

data replicated across multiple nodes

one node is appointed (automatically or manually) as the master, it is the authoriative source for the data and responsible for it’s updates

the other nodes are slaves

New cards
77

benefits of master-slave replication

good for scaling out if the dataset is read intensive (add more slaves to handle the read load - will mean the master has to synchronise to more slaves when writing)

read reilience - if the master should fail, the slaves can still handle read requests, writes have to wait until the master recovers or is replaced

New cards
78

drawbacks of master-slave replication

not good for datasets with heavy write traffic, may cause inconsistency

New cards
79

peer-to-peer replication

no master node, all the replicas have equal standing, all nodes can accept writes

New cards
80

benefits of ptp replication

a failed node doesnt mean no writes are possible, adding more nodes improves performance

New cards
81

drawbacks of ptp replication

consistency - write-write conflicts are forever

New cards
82

how to mitigate w-w conflicts in ptp replication

nodes replicas co-ordinate to avoid conflict, allow inconsistent writes - both solutions trade consistency for availability

New cards
83

combining sharding and replication

multiple masters

each data item only has a single master

a node might be a master for some data and a slave for others, or nodes may be dedicated for master or slave duties

ptp and sharding is a common strategy for column family databases

New cards
84

ke value model

simplest NoSQL DB model

uses the concept of key-value pairs

aggregate oriented but sees no structure in the aggregate

scaling is achieved with sharding

removing aggregates is preformed using a key only

New cards
85

key value model in theory

the developer cannot search on fields within the aggregate, or retrieve parts of the aggregate

New cards
86

key value model in practice

some databases which would classify as key-value may still allow some structure on the data beyond a big blob of data, so the distinction between key-value DBs and document DBs has a grey area

New cards
87

document model NoSQL database

aggregate oriented

recognises structure in the aggregate (structures and types result in the limits on what can be stored but allows the developer flexibility in retrievals)

often developers add an ID field in a document database to do a key-value style lookup

New cards
88

key-value model applications

session information, user profiles/preferences, shopping cart data

New cards
89

document model scaling

achieved through a combination of sharding and replication, this allows for high availability and can be a challenge for consistency

New cards
90

document model applications

event logging

content management systems

web analytics

e-commerce applications

New cards
91

column family model NoSQL databases

based on a column oriented model

data in cells grouped in columns rather than rows of data

use a concept of a keyspace which shows the structure of the column family (like tables as they contain rows, each row contains columns)

use of a query language to query data

ptp replication

New cards
92

column family scaling

acheived through adding more nodes to the cluster

New cards
93

column family applications

event logging

content management systems

New cards
94

graph model NoSQL databases

have different drivers with an opposite model

small records with complex connections

graph data structure of nodes connected by edges/arcs

once the database is populated with nodes and edges the developer can query it

queries can exploit the complex relationships in the graph much easier than a RDBMS

the queries can be chopped and changed regularly

New cards
95

graph model scaling

can more likely run as a single server DB

no distribution is required so data is highly consistent

scaling is a challenge as nodes on different machines is a performance concern

New cards
96

graph model applications

social networks

product preferences

eligibility rules

routing

dispatch

location based services

recommendation engines

New cards
97

each database model can be compared by

features

scaling

availability

consistency

data retrieval

New cards
98

secure systems require

what are the assests to be secured?

what are the threats to which those assets are vulnerable?

what services should be put in place to address those threats?

what are the current technological mechanisms that can support the services required?

New cards
99

what are the assets to be secured?

network resources, DB servers, application servers

New cards
100

what are the threats to which those assets are vulnerable?

information leakage - disclosure to unauthorised parties

integrity violation - data loss or corruption

denial of service - unavailability of system/service/network

illegitimate use - use of resource by unauthorised person or in unauthorised way

New cards

Explore top notes

note Note
studied byStudied by 4 people
... ago
5.0(1)
note Note
studied byStudied by 94 people
... ago
5.0(1)
note Note
studied byStudied by 433 people
... ago
5.0(3)
note Note
studied byStudied by 33 people
... ago
5.0(1)
note Note
studied byStudied by 28 people
... ago
5.0(1)
note Note
studied byStudied by 32 people
... ago
5.0(2)
note Note
studied byStudied by 11 people
... ago
5.0(1)
note Note
studied byStudied by 29 people
... ago
5.0(1)

Explore top flashcards

flashcards Flashcard (121)
studied byStudied by 2 people
... ago
5.0(1)
flashcards Flashcard (84)
studied byStudied by 13 people
... ago
5.0(1)
flashcards Flashcard (20)
studied byStudied by 16 people
... ago
5.0(1)
flashcards Flashcard (27)
studied byStudied by 8 people
... ago
5.0(1)
flashcards Flashcard (83)
studied byStudied by 12 people
... ago
5.0(1)
flashcards Flashcard (53)
studied byStudied by 2 people
... ago
5.0(1)
flashcards Flashcard (48)
studied byStudied by 50 people
... ago
5.0(1)
flashcards Flashcard (24)
studied byStudied by 16 people
... ago
5.0(1)
robot