NoSQL

0.0(0)
studied byStudied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/132

Last updated 11:52 AM on 12/12/24
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

133 Terms

1
New cards

Why NoSQL?

•Relational databases have been a successful technology for twenty years, providing persistence, concurrency control, and an integration mechanism.

•Application developers have been frustrated with the impedance mismatch between the relational model and the in-memory data structures.

•The vital factor for a change in data storage was the need to support large volumes of data by running on clusters.

•Relational databases are not designed to run efficiently on clusters.

2
New cards

Common characteristics of a NoSQL database

•No relational model

•Suited to clusters

•Open-source

•Suits unstructured data  - Schema less

3
New cards

Current NoSQL trends

  • customer shift continues online

  • the internet is connecting everything

  • big data is getting better

  • applications are movie to the cloud

  • the world has gone mobile

4
New cards

Different NoSQL database models

  • key value

  • document

  • column family

  • graph (aggregate ignorant)

5
New cards

Aggregate orientated

Makes it easier for the database to manage data storage over clusters

6
New cards

Aggregate

A collection of data that can be interrogated as a unit

7
New cards

Aggregate orientated works well when

most of the time the same aggregate is needed

8
New cards

aggregate orientated does not work well when

users cahnge how the interrogate the data regularly

9
New cards

update consistency

write-write conflicts (lost updates, values overwritten)

10
New cards

read consistency

read-write conflict (reading in the middle of someone else’s write)

11
New cards

relaxed consistency

consistent DB is possible, but at what performance impact? Tradeoff may be necessary and the domain may tolerate some inconsistency

12
New cards

The CAP Theorem (Consistency, Availibility, Partition Tolerance)

Given the three properties, you can only get two

13
New cards

Consistency

When data is queried, the user recieves the most up to date version of data

14
New cards

Availability

When data is queried, the user always recieves a response, even if it is not the most up to date version)

15
New cards

Partition Tolerance

Where a database is distributed over a clustered network, should part of the network fail, the rest of the clustered network can continue to operate (partitions on the network can be tolerated)

16
New cards

if the network is working normally

all nodes are operating normally, reading, writing, and syncing with each other (Consistency AND availability)

17
New cards

if the network becomes partitioned

parts of the network has failed, which partitions the networks and nodes cant communicate with each other normally (consistency OR availability)

18
New cards

Write-write conflicts occur when

two clients try to write the same data at the same time

19
New cards

read-write conflicts occur when

one client reads inconsistent data in the middle of another client’s write

20
New cards

pessimistic approaches lock data records to

prevent conflicts

21
New cards

optimistic approaches detect conflicts and

fix them

22
New cards

distributed systems see read-write conflicts due to

some nodes having recieved updates while other nodes have not

23
New cards

eventual consistency

at some point the system will become consistent once all the writes have propagated to all the nodes

24
New cards

the CAP theorem states that if you get a network partition, you have to

trade off availability of data versus consistency

25
New cards

document data model two options

Embedded or Normalised

26
New cards

Embedded Data Model

Capture relationships between data by storing related data in a single document structure

allow applications to retrieve and manipulate related data in a single database operation

27
New cards

Normalised Data Model

references store the relationships between data by including links from one document to another

28
New cards

Distributed database

a logically interrelated collection of shared data, physically distibured over a computer network

29
New cards

Distributed DBMS

software system that permits the management of the distributed database and makes the distribution transparent to users

30
New cards

Types of DDBMS

Homogeneous and Heterogeneous

31
New cards

Homogeneous DDBMS

All sites use same DBMS product, much easier to design and manage

approach provides incremental growth and allows increased performance

32
New cards

Heterogeneous DDBMS

Sites may run different DBMS products with possibly different underlying data models

occurs when sites have implemented their own databases and integration is considered later

33
New cards

heterogeneous DDBMS require translations to allow for

different hardware and different DBMS products

34
New cards

functions of a DDBMS

functionality of a DBMS, extended communication services, data dictionary

concurrency control and recovery services, as well as distributed query processing

35
New cards

three key issues of distributed database design

fragmentation, allocation and replication

36
New cards

Fragmentation

Relation may be divieded into a number of sub-relations, which are then distributed

37
New cards

Allocation

Each fragment is stored at site with optimal distribution

38
New cards

Replication

copy of fragments may be maintained at several sites

39
New cards

definition and allocation of fragments carried out strategically to achieve

locality of reference

improved reliability and availability

improved performance

balanced storage capacities and costs

minimal communications costs

40
New cards

quantitative infomation for fragmentation may include

frequency with which an application is run

site from which an application is run

performance criteria for transactions and applications

41
New cards

qualitative information may include

transactions that are executed by application

type of access

predicates of read operations

42
New cards

four alternative strategies regarding placement of data for data allocation:

centralized

partitioned (fragmented)

complete replication

selective replication

43
New cards

Centralized (Data Allocation)

Consists of single database and DBMS stored at one site with users distributed across the network

44
New cards

Partitioned (data allocation)

database partitioned into disjoint fragments

each fragment assigned to one site

45
New cards

complete replication (data allocation)

consists of maintaining complete copy of database at each site

46
New cards

selective replication (data allocation)

combination of partitioning, replication, and centralization

47
New cards

Why fragment?

  • applications work with views rather than entire relations

  • data is stored close to where it is most frequently used

  • data that is not needed by local applications is not stored

  • with fragments as unit of distribution, transaction can be divided into several subqueries that operate on fragments

  • data not required by local applications is not stored and so not available to unauthorized users

48
New cards

disadvantaged od fragmenting

performance, integrity

49
New cards

types of fragmentation

horizontal, vertical, mixed, derived

50
New cards

Transparencies in a DDBMS

distribution

fragmentation

location

replication

local mapping

naming

transaction

concurrency

failure

performance

DBMS

51
New cards

distribution transparency

allows user to percieve database as single, logical entity

52
New cards

if DDBMS exhibits distribution transparency, user does not need to know

data is fragmented, location of data items, otherwise it would be local mapping transparency

53
New cards

naming transparency

each item in a DDB must have a unique name

DDBMS must ensure that no two sites create a database object with same name

One solution is to create central name server, however this results in loss of local autonomy

central site may become a bottleneck

low availability

54
New cards

transaction transparency

ensures that all distributed transactions maintain distributed database’s integrity and consistency

distributed transaction accesses data stored at more than one location

each transaction is divided into number of subtransactions

one for each site that has to be accessed

55
New cards

concurrency transparency

all transactions must execute independently and be logically consistent with results obtained if transactions executed one at a time, in some arbitrary serial order, same fundamental principles as for centalized DBMS

56
New cards

Failure transparency

DDBMS must ensure atomicity and durability of global transaction, means ensuring that subtransactions of global transaction either all commit or all abort

57
New cards

performance transparency

DDBMS must perform as if it were a centralized DBMS

DDBMS should not suffer any performance degradation due to the distributed architecture

DDBMS should determine most conse-effective strategy to execute a request

58
New cards

12 rules for a DDBMS

  1. Local Autonomy

  2. No Reliance on a Central Site

  3. Continuous Operation

  4. Location Independence

  5. Fragmentation Independence

  6. Replication Independence

  7. Distributed Query Processing

  8. Distributed Transaction Processing

  9. Hardware Independence

  10. Operating System Independence

  11. Network Independence

  12. Database Independence

59
New cards

aggregation suits what databases

key-value, document, column-family

60
New cards

aggregates are a natural unit for

replication and sharding

61
New cards

aggregates easier for developers to work with as they

naturally manipulate data through aggregate structures

62
New cards

paths for distributing the DB

replication, sharding, single server

63
New cards

replication

same data copied to multiple nodes

64
New cards

sharding

different data copied on different nodes

65
New cards

single server - if the driver to use NoSQL is not running the DB on a cluster

no distribution of the DB is needed

66
New cards

Benefits of single server

  • eliminates all the complexities that the other distribution options introduce

  • easy for operations people to manage

    • easy for application developers to reason about

67
New cards

What database works best in a single server configuration

graph

68
New cards

each shard (data)

read and writes its own data

69
New cards

with sharding, each node has

different data

70
New cards

with sharding, ideally each user accesses

one node each

71
New cards

how to decide the allocation of shards

aggregate orientation obvious unit of distribution

store aggregates together if they are normally read in sequence

use application logic to decide

some NoSQL applications will offer auto-sharding

72
New cards

how to improve performance of sharding

locality of reference, distribute aggregates evenly across nodes

73
New cards

sharding benefits

improves read and write performance

74
New cards

sharding drawbacks

may affect resilience

a node failure makes that shard’s data unavailable in the same way as a non-distributed model

only the user of that data on that shard will suffer

clusters may use less reliable machines making node failure more likely

75
New cards

master-slave replication

all changes are made to the master

changes propagate to slaves

reads can be done from either

76
New cards

master-slave replication details

data replicated across multiple nodes

one node is appointed (automatically or manually) as the master, it is the authoriative source for the data and responsible for it’s updates

the other nodes are slaves

77
New cards

benefits of master-slave replication

good for scaling out if the dataset is read intensive (add more slaves to handle the read load - will mean the master has to synchronise to more slaves when writing)

read reilience - if the master should fail, the slaves can still handle read requests, writes have to wait until the master recovers or is replaced

78
New cards

drawbacks of master-slave replication

not good for datasets with heavy write traffic, may cause inconsistency

79
New cards

peer-to-peer replication

no master node, all the replicas have equal standing, all nodes can accept writes

80
New cards

benefits of ptp replication

a failed node doesnt mean no writes are possible, adding more nodes improves performance

81
New cards

drawbacks of ptp replication

consistency - write-write conflicts are forever

82
New cards

how to mitigate w-w conflicts in ptp replication

nodes replicas co-ordinate to avoid conflict, allow inconsistent writes - both solutions trade consistency for availability

83
New cards

combining sharding and replication

multiple masters

each data item only has a single master

a node might be a master for some data and a slave for others, or nodes may be dedicated for master or slave duties

ptp and sharding is a common strategy for column family databases

84
New cards

ke value model

simplest NoSQL DB model

uses the concept of key-value pairs

aggregate oriented but sees no structure in the aggregate

scaling is achieved with sharding

removing aggregates is preformed using a key only

85
New cards

key value model in theory

the developer cannot search on fields within the aggregate, or retrieve parts of the aggregate

86
New cards

key value model in practice

some databases which would classify as key-value may still allow some structure on the data beyond a big blob of data, so the distinction between key-value DBs and document DBs has a grey area

87
New cards

document model NoSQL database

aggregate oriented

recognises structure in the aggregate (structures and types result in the limits on what can be stored but allows the developer flexibility in retrievals)

often developers add an ID field in a document database to do a key-value style lookup

88
New cards

key-value model applications

session information, user profiles/preferences, shopping cart data

89
New cards

document model scaling

achieved through a combination of sharding and replication, this allows for high availability and can be a challenge for consistency

90
New cards

document model applications

event logging

content management systems

web analytics

e-commerce applications

91
New cards

column family model NoSQL databases

based on a column oriented model

data in cells grouped in columns rather than rows of data

use a concept of a keyspace which shows the structure of the column family (like tables as they contain rows, each row contains columns)

use of a query language to query data

ptp replication

92
New cards

column family scaling

acheived through adding more nodes to the cluster

93
New cards

column family applications

event logging

content management systems

94
New cards

graph model NoSQL databases

have different drivers with an opposite model

small records with complex connections

graph data structure of nodes connected by edges/arcs

once the database is populated with nodes and edges the developer can query it

queries can exploit the complex relationships in the graph much easier than a RDBMS

the queries can be chopped and changed regularly

95
New cards

graph model scaling

can more likely run as a single server DB

no distribution is required so data is highly consistent

scaling is a challenge as nodes on different machines is a performance concern

96
New cards

graph model applications

social networks

product preferences

eligibility rules

routing

dispatch

location based services

recommendation engines

97
New cards

each database model can be compared by

features

scaling

availability

consistency

data retrieval

98
New cards

secure systems require

what are the assests to be secured?

what are the threats to which those assets are vulnerable?

what services should be put in place to address those threats?

what are the current technological mechanisms that can support the services required?

99
New cards

what are the assets to be secured?

network resources, DB servers, application servers

100
New cards

what are the threats to which those assets are vulnerable?

information leakage - disclosure to unauthorised parties

integrity violation - data loss or corruption

denial of service - unavailability of system/service/network

illegitimate use - use of resource by unauthorised person or in unauthorised way