1/79
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
Transaction
Logical unit of work; from a list of things you want to happen: everything for nothing happens
Properties that the relational databases support
ACID - atomicity, consistency, isolation, durability
Atomicity
All operations of a transaction happen or none do
Consistency
Consistent before and and after; maintaining data points in the correct state after transaction
Isolation
Transaction executes as if it were by itself
Durability
Once a change is made, it is permanent and lasts forever
Session
Connection + username, password, db name, hostname
Result set
List of lists returned from a query to driver
Dynamic web pages
Load information that was stored somewhere else (i.e. databases) and are continually updated
Static Web Pages
Information on pages not change
Stateless
For every request, WS asks for your information
Cookies
Little pieces of information that WS sends to browser for identification purposes
SQL Injection Attack
Injection of a separate SQL query via input data from client to application
Prepared statement
Database pre-compiles SQL code and stores results separate from data
Object oridented database
Has user reference databases to find information
ORM (Object Relational Mapping)
Aligns code with database structures, simplifies interaction between relational databases and OOP languages
4 pieces in information to connect to the database
1. hostname of server
2. username
3. password
4. name of database
COALESCE
Returns first non-null value in a list (NULLs to 0)
Tier 2 Architecture
Business logic that consists of a combined webserver and application with a database
Tier 3 Architecture
Business logic that where the webserver does not connect to the database; webserver, application, and database different
Process of sending signal to server
1. App to WIFI router
2. WIFI router to ISP, analyzed by ISP, if can't get there
3. ISP to SIP Trunk
4. SIP Trunk to backbone
5. Backbone to ISP closer to App server
6. ISP to App server
Man in the middle attack
Hacker positions themselves between user and application internet conversations
Cross Site Scripting Attack (XSS)
Hacker injects malicious executable scripts of code via an unsecure link
HTTPS
Hypertext Transfer Protocol Secure
Normalization
Organizing data only once in a database to reduce redundant data
Bad smell
Code that is "off" in certain areas
Redundant data
Same piece of information that is stored or not needed in different areas of the database
Steps to determine amount of redundant data
1. Identify functional dependencies
2. Calculate closure
3. Categorize closure
1st Normal Form
Data attributes of atomic type (= or !=), does not eliminate redundant data
2nd Normal Form
Useless
3rd Normal Form
1. Is closure trivial?
2. Is closure key?
3. If X+=Y AND X is subset of Y, then all attributes that is an element of Y-X is a Candidate Key
Candidate key closure
smallest set of attributes that is a key closure
Boyce-Codd Normal Form
3NF and X should be a superkey for every X->Y
Superkey
An attribute or attributes that uniquely identify each entity in a table.
4th Normal Form
BCNF & exists a simple candidate key; focus is on multi-value dependencies
Simple candidate key
set of 1 that gets every set of attributes
5th Normal Form
3NF & all candidate keys are simple; focus on join dependencies
Domain Key Normal Form
ultimate NF
ER Diagrams
Entity Relationship Diagram; model/design databases to display to the customer
Entities
objects or things on our enterprise, have attributes
Relationships
measure of interaction between entities
Cardinalities
maximum # of times entities can relate to other entities
Primary key
Field that uniquely identifies a given entity in a table, represented by ____
Multi-value attribute
multiple values for specific attribute, represented by [ ]
Derived attribute
An attribute whose values can be calculated from related attribute values, represented by ( )
Composite attribute
An attribute that can be further subdivided into additional attributes, represented by an indent
Generalization
2 or more entities that have more commonalities, common attributes go to superclass
Specialization
Entity divided into sub-entities based on its characteristics
Total
Everything in superclass must be in subclass (abstract)
Partial
Can represent entities other than subclasses
Disjoint
Either or
Overlapping
Can be both
Weak entity
Depends upon another entity to exist in database
Discriminates
(underlined dotted line) tell attributes apart in weak entity
Aggregation
Created when we want a relationship between relations
Notes
Any clarification materials and primary keys for combined entities
Data mining
Analyzing large databases in order to generate patterns (AI)
kNN
Make predictions about a data point based on k closest data points
Classification problem
Know some info but don't know the information we are supposed to predict using neighbors
Regression problem
Take average of closest neighbors information
Leave-One-Out Cross Validation (LOOCV)
Purposely leave out data in order to train model based on the data given later on
Mean squared error
The average of the squared differences between the forecasted and observed values
Root mean squared error (RMSE)
Give indication of how good prediction is with k and n
Clustering
Cluster points that belong together
k-means proces
1. pick k centers (randomly)
2. place data nearest each center
3. compute new center (average all points)
repeat 2./3. until nothing changes
Association rules
1. support - level needed for occurrence to be valid (transactions)
2. confidence - # combination / # antecedent
a priori
Limit of # of large item sets
2 factors for documents relevant keywords
1. Term frequency - contains words a lot of times
2. Inverse document frequency - weight indicating how commonly a word is used - measure of how rare or common a term is across a collection of documents.
Page rank algorithm
Provides a ranking to web pages that should be returned from a web search. Based in large part on how often other web pages link to a given page. (more links = better rank)
Random walks
Determines the probability of each site
Precision
How accurate retrieval is, # relevant docs / # docs
Recall
Found vs. relevant, # relevant docs found / # relevant docs
Search engine optimization (SEO)
Companies pay to get priority in search engines
Big Data properties
1. Volume
2. Velocity
3. Variety
Big Table properties
1. Sparse - only stores information, no NULL values
2. Dynamic Schema - each row has different set of attributes for that row
Map Reduce
uses a parallel distributed algorithm to process large amounts of data
Spark
An open-source, distributed processing system for big data workloads
Resilient distributed dataset (RDD)
Collection of data elements that are partitioned across nodes in a cluster
Dataframe
Collection of RDDs, cant be changed (immutable)
Why use database instead of Spark?
ACID principles not supported (no FK, PK, etc)