NOSQL Databases and DynamoDB:

Studied by 0 people

0.0(0)

LearnA personalized and smart learning plan

Practice TestTake a test on your terms and definitions

Spaced RepetitionScientifically backed study method

Matching GameHow quick can you match all your cards?

FlashcardsStudy terms and definitions

Get a hint

Hint

DynamoDB Architecture

1 / 24

There's no tags or description

Looks like no one added any tags here yet for you.

25 Terms

DynamoDB Architecture

NoSQL database as a service products, its public database as a service (DBaas)- wide column key/value and document. No self managed servers or infrastructure. Supports range is scaling options- manual/automatic provisioned performance IN/OUT or on demand. Can also be highly resistant across AZs and optionally global. It’s really fast. Supports backups, point in time recovery, encryption at rest. Supports event driven integration (do things when data changes)
- Tables: base entity of dynamoDB. It is a grouping of items with the same primary key. No limit to number of items in a table. A Primary key can be a simple (partition) or composite (partition and sort) primary keys. Each item must have a unite value. Can have none, all, mixture, or different attributes (no right scheme). Item max 400KB (this is speed not space)

<ul><li><p>NoSQL database as a service products, its public database as a service (DBaas)- wide column key/value and document. No self managed servers or infrastructure. Supports range is scaling options- manual/automatic provisioned performance IN/OUT or on demand. Can also be highly resistant across AZs and optionally global. It’s really fast. Supports backups, point in time recovery, encryption at rest. Supports event driven integration (do things when data changes)</p><ul><li><p>Tables: base entity of dynamoDB. It is a grouping of items with the same primary key. No limit to number of items in a table. A Primary key can be a simple (partition) or composite (partition and sort) primary keys. Each item must have a unite value. Can have none, all, mixture, or different attributes (no right scheme). Item max 400KB (this is speed not space)</p></li></ul></li></ul>

New cards

DynamoDB backups

On demand: full copy of table retained until removed. Can be used for same/cross region restoration. Can adjust encryption and if its with/without indexes when restored
- Your responsible for performing and removing old backups
Point in time recovery: not enabled by default. When enabled it results in continuous stream of backups for a 35 day window. You can resort any one second granular backup

New cards

DynamoDB key points

NoSQL == DynamoDB (NEVER relational data)
Key/value == preference DynamoDB
Accessed via console, CLI, API, NEVER SQL
Billing is based on RCU, WCU, storage and features

New cards

DynamoDB- Reading and writing:

On demand: unknown, unpredictable and low admin for table. No need to set specific capacity setting. You pay on demand for the R or W units (typically more expensive)
Provisions: you set RCU and WCU set on a per table basis.
> every operation consumes at least 1 RCU or WCU
> 1 RCU is 1 x 4KB read operation per second
> 1 WCU is 1 x 1KB write operation per second
> Even table has a RCU and WCU burst pool (300 second)

New cards

DynamoDB- Operations: Query

Query: way to retrieve data from product
- You need to pick a partition key (blue). Query accepts a single PK value and optionally a SK or range. Capacity consumed is the size of all returned items. Further filtering discards data- capacity is still consumed!! Can ONLY query on PK or PK and SK
  - Best to combine operations single operation (in the example, if you were to spit the two PK==1 operations you would have consumed 1 RCU since it rounds up for each, totaling 2 RCU)

<ul><li><p>Query: way to retrieve data from product</p><ul><li><p>You need to pick a partition key (blue). Query accepts a single PK value and optionally a SK or range. Capacity consumed is the size of all returned items. Further filtering discards data- <strong>capacity is still consumed!!</strong> Can ONLY query on PK or PK and SK</p><ul><li><p>Best to combine operations single operation (in the example, if you were to spit the two PK==1 operations you would have consumed 1 RCU since it rounds up for each, totaling 2 RCU)</p></li></ul></li></ul></li></ul>

New cards

DynamoDB- Operations: Scan

Scan: least efficient in getting data but its more flexible. IT move through the table consuming capacity of every ITME. You have control on what data is selected, any attributes can be used and filters applied but SCAN consumes capacity for every time scanned through.

<ul><li><p>Scan: least efficient in getting data but its more flexible. IT move through the table consuming capacity of every ITME. You have control on what data is selected, any attributes can be used and filters applied but SCAN consumes capacity for every time scanned through. </p></li></ul>

New cards

DynamoDB- Operations: Consistency model:

how when data is updated or when new data is written to the database and then immediacy read is that data immediacy the same or only eventually the same.
- In dynamoDB all data is replaced to separate AZ. Each is a “storage node” one of them is a leader node. Writes are always directed to the leader node. The leader node is “consistent”. The leader node then starts process of replication of data to the other nodes
- Eventually consistent read: If you do a read, you are directed to one of the nodes at random. If the data is not yet consistent you pay less for the read (half the price)
- Strongly consistent: when you do a read, you are taken to the must up to dat copy (leader node)

New cards

DynamoDB- operations cost issue

indexes: improve efficiently of query data.
Queries are most efficient operation in DDB, but it can only work on 1 PK value at a time (optionally single or range of SK values)/ indexes are an alternative views on the table. You can get view using SK (LSI) or different PK and SK (GSI). When creating both indexes you have the ability to choose which attributes are projected (some/all).

New cards

DynamoDB- Local secondary indexes (LSI):

alternative view for a table. It MUST be created with the table, cannot be made after the table is made. You can have 5 LSI’s per base table. It has the SAME PK but alternative SK on the sale. It shares the RCU and WCU with the table. When picking attributes, you can chose to have all, Keys only and include.
- If you want ONLY a specific attribute, that attribute can be used as the SK.
- Capacity shared with the table

<ul><li><p>alternative view for a table. <strong>It MUST be created with the table, cannot be made after the table is made.</strong> You can have 5 LSI’s per base table. <strong>It has the SAME PK but alternative SK on the sale. It shares the RCU and WCU with the table.</strong> When picking attributes, you can chose to have all, Keys only and include. </p><ul><li><p>If you want ONLY a specific attribute, that attribute can be used as the SK. </p></li><li><p>Capacity shared with the table</p></li></ul></li></ul>

New cards

DynamoDB- Global secondary index (GSI):

can be created at any time after the tables creation. Default limit of 20 per base table. You can choose both an alternate PK and SK. GSI’s have wither own RCU and WCU allocations. You chan choose what attributes are displayed (same as LSI)
- Always eventually consistent, relation between base and GSI is asynchronous
- Own capacity allocation

<ul><li><p><strong>can be created at any time after the tables creation</strong>. Default limit of 20 per base table. <strong>You can choose both an alternate PK and SK. GSI’s have wither own RCU and WCU allocations</strong>. You chan choose what attributes are displayed (same as LSI)</p><ul><li><p>Always eventually consistent, relation between base and GSI is asynchronous </p></li><li><p>Own capacity allocation</p></li></ul></li></ul>

New cards

DynamoDB- LSI and GSI exam points

Careful with projects (all, Keys only and include)- you pay for capacity
Queries on attributes not projected are expensive
GSI as default over LSI (LSI is better for strong consistency requirement)
Use indexes for alternative access patterns

New cards

DynamoDB stream

is a time ordered list of item changes in a table. It’s a 24 hour rolling window. You need to enable it on a per table bases. Records are inserts, updates, or deletes. Different view types influence what is in the stream
- Streams can be configured with the following view types:
  - Keys Only: stream will only record PK and any applicable SK changes
  - New image: stores entire item AFTER to change
  - Old image: stores entire item PRIOR to change
  - New and old images: shows full visibility- both pre and post change of image

New cards

DynamoDB Trigger

allows actions to take place in the event of a change in data.
- The event contains the data which changes. An action is taken using the data. AWS = streams + lambda(trigger)
- DynamoDB global tables: provide multi master cross region replication (read and write for all global tables). Tables are made in multiple regions and added to the same global tables (becoming replica tables). Follows last writer wins is used for conflict resolution (recent overwrites). Reads and writes can occur in any region and there is sub second replication between regions. Its sternly consistent reads only in the same region as writes (other regions are eventually consistent)
Provides global HA and Global DR/BC

New cards

DynamoDB accelerator (DAX):

in memory cache for DynamoDB- integrated in DynamoDB.
- Traditional cache vs DAX
  - traditional: application goes to cache, if miss, it must go to the DB and grab data and add to cache, cache is then updated and retried data is now a hit
  - DAX: removes admin overhead. App makes single call to DAX, if miss, DAX does all the work to return and retrieve data from the DB back to the application.

<ul><li><p>in memory cache for DynamoDB- integrated in DynamoDB. </p><ul><li><p>Traditional cache vs DAX</p><ul><li><p>traditional: application goes to cache, if miss, it must go to the DB and grab data and add to cache, cache is then updated and retried data is now a hit</p></li><li><p>DAX: removes admin overhead. App makes single call to DAX, if miss, DAX does all the work to return and retrieve data from the DB back to the application. </p></li><li><p></p></li></ul></li></ul></li></ul>

New cards

Dax is cluster service

nodes are placed in multiple AZ, one being the primary and the others being replicas (which are read replicas). Item cache holds result of batch getItem.

<p>nodes are placed in multiple AZ, one being the primary and the others being replicas (which are read replicas). Item cache holds result of batch getItem. </p>

New cards

DAX Exam points

Primary node (supports writes), replicas (read)
Nodes are Highly avalbilbe, if primary fails, its replaced
In memory cache- sharing is much faster for reads, less cost
Can scale up and scale out (bigger or more)
Supports write-through (store data in cache too)
DAX deployed within a VPC
Good for workloads with heavy reads, want low response time
Not ideal for applications that need high consistency

New cards

DyanmoDB TTL

TTL lets you define a timestamp for automatic deletion of item. You specific a date and time and its set to ‘expired’. You configure TTL on a specific attribute.
- A Per partition process periodically runs, checking the current time (in seconds since epoch) to the value in the TTL attribute. They are set to ‘expired’, then its ran again and if an item is set to expired its actually deleted

New cards

Athena

serverless interactive query service. Allows you to preform ad-hoc queries on data- pay only for data consumed. Athena uses schema on read-> data stored on S3 never changes, the schema translates the data into a table like structure (relational like when read). Output can be sent to other AWS service.
- You have the source data then you define the schema (which the tables). It’s how you want to take source table and convert them to a table.

New cards

Athena key points

Athena has no infrastructure- no need to load data in advance. Best if you dont want to load/transform data.
Best for occasional queries on data in S3
Great if cost conscious- and serverless quiering scenarios
Best or query of AWS logs (VPC flow logs, cloud trail, ELB logs, cost reports, etc)
Can also query data from aws glue data catalog and web server logs
feature: Athena federated query- data source connector (code that translates between large data source that isn’t S3 and Athena)

New cards

ElastiCache

in memory database for apps that tree high performance. ElastiCache delivers managed Redis or Memcache as a service. Can be used to cache data for read heavy workloads with low latency requirements reduces database workloads (expensive). Can also be used to store session data (stateless servers). Using ElastiCache means you need to make changes to application code! Must know to check/write to cache (NOT FREE).

New cards

ElastiCache- session state data

if connected to an instance, session state is written to the instance. ElastiCache also ensures that the session state stays up to date, so if the connection is moved to another instance, the session data is maintained (stateless!!)

<p> if connected to an instance, session state is written to the instance. ElastiCache also ensures that the session state stays up to date, so if the connection is moved to another instance, the session data is maintained (stateless!!)</p>

New cards

ElastiCache- Engines

both offer sub-millisecond access to data, both support many programming languages
- Memcached:
  - simple data structures
  - No replication
  - Mulitple nodes (shading)
  - No backups
  - Multi threaded (better performance)
- Redis:
  - advanced structures (can help sore ordered data)
  - Multi AZ
  - Replication (scale reads)
  - Backup and restore
  - Transactions (can allow all or none to work)

New cards

Amazon Redshift architecture:

petabyte scale data warehouse (where you can pump data from databases across you business into here for analysis). It’s an OLAP (column based) not OLTP (row/transaction). Redshift is pay as you go similar to RDS.
- Can be used to query S3 using redshift spectrum
- Can directly query other DBs using federation query
- Integrates with AWS tooling such as quick sight
- SQL like interface JDBC/ODBS connections

New cards

Amazon Redshift architecture

Server based, not serverless (unlike Athena)
NOT used ad hoc like Athena since it needs provisioning
Redshift cluster runs with nodes privately, in one AZ. There is a leader node where query input, planning and aggregation, compute node perfumes querying of data.
Since it is a VPC service, you can manage it as such: VPC security, IAM permissions, KMS at rest encryption, CS monitoring
feature: Redshift enhance VPC routing-VPC networking
- By default it takes public routing but using enhanced VPC routing you can configure specific VPC routing. - customizable networking

<ul><li><p><strong>Server based</strong>, not serverless (unlike Athena) </p></li><li><p><strong>NOT used ad hoc like </strong>Athena since it needs provisioning </p></li><li><p>Redshift cluster runs with <strong>nodes</strong> privately, in one AZ. There is a leader node where query input, planning and aggregation, compute node perfumes querying of data. </p></li><li><p>Since it is a VPC service, you can manage it as such: VPC security, IAM permissions, KMS at rest encryption, CS monitoring</p></li><li><p><strong>feature: Redshift enhance VPC routing-VPC networking </strong></p><ul><li><p><strong>By default it takes public routing but using enhanced VPC routing you can configure specific VPC routing. - customizable networking </strong></p></li></ul></li></ul>

New cards

Redshift resilience and recovery

we know Redshift runs in one AZ. There are some recovery features
- Can take backups. Can be automatic incremental which occurs every 8 hours (anything changed is added to an S3, retention for 1-35 days) can also have manual backups (you manage deletion). Snapshots can be backed up to other AZ (multi AZ) and can also be sued to configure the data in another region if needed

New cards