1/26
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
SeqScan
A full table scan, used when most of the table is needed in results.
Index Scan
scans index, look up tuples in table, using index values directly from memory. Used when a smaller part of table is needed in results.
Bitmap Scan
A hybrid approach that sorts the row locations identified by the index into physical order before reading them to minimize the cost.
VACUUM
Reclaims storage occupied by dead tuples which are not physically removed after deletion or update. It's necessary to do VACUUM periodically.
Logical Backups
contain copies of the database schema and data exported in a SQL script file. They are architecture independent but slower and More portable. (e.g., pg_dump in PostgreSQL).
Physical Backups
copies of the actual data files. They are architecture dependent but faster and Less portable.
Cold (offline) backup
"Database is offline; ""Easiest to implement."""
Hot (online) backup
"Database remains online; ""Crucial for applications that require high availability."" (e.g., pg_basebackup in PostgreSQL using ""Point-in-Time Recovery"" (PITR))."
Privilege Management
"""Databases implement access control by setting access privileges to objects for individual users and/or groups of users."""
GRANT and REVOKE
PostgreSQL Commands to assign and remove privileges on various object types (e.g., SELECT, INSERT, etc).
Advantages of Distributed Databases
Data Redundancy, High Availability (HA), Scalability, Monitoring & Automation.
Data Redundancy
"""Multiple nodes work together to store/process data amongst each other providing data redundancy."" All nodes are ""synchronized (replication)."""
High Availability (HA)
"""In case a server shut down the data/database will still be available."" ""More nodes provides higher availability."""
Scalability
"""Adding more nodes scales the system horizontally (more storage, processing/serving power)."" Load balancing facilitates this."
Monitoring & Automation
"""Indispensable to administer clusters,"" often managed by a ""designated machine"" running automated scripts."
Homogeneous
"""All the sites use same software"" and are ""aware of each other and agree to cooperate."" Appears to the user ""as a single system."""
Heterogeneous
"""Different sites may use different schemas and software,"" leading to ""major problem[s] for query processing"" and ""transaction processing."" Sites ""may not be aware of each other."""
Business Intelligence (BI)
"""Combination of strategies that use data, software, and company information to provide business owners with an overview of business operations at past and present levels in order to adjust and take business decisions."""
Business Analytics (BA)
"""Process by which company’s use historical data combined with software technologies to make predictions and support business decisions."""
data warehouse
Is a specialized data store designed for analytical purposes; they use OLAP technology. It can be used for making consolidated reports, Finding relationships and correlations, and data mining.
OLAP (On-line Analytical Processing)
Is an online analytical processing method used for data analysis, utilizing various databases, complex queries, and de-normalized table structures for planning and decision support.
MOLAP (Multidimensional OLAP)
Earliest OLAP systems used multidimensional arrays in memory to store data cubes.
ROLAP (Relational OLAP)
OLAP facilities were integrated into relational systems, with data stored in a relational database.
HOLAP (Hybrid OLAP)
Store some summaries in memory and store the base data and other summaries in a relational database.
Backend
Any relational database can be used as the be backend or data source for an OLAP implementation.
OLTP (On-line Transaction Processing)
Is a fast, efficient, and standardized system for managing transaction-oriented operations, with a focus on managing large numbers of short transactions and minimizing space requirements.
Data Mining
Is a semiautomatic process involving analyzing large databases to identify patterns, make business decisions, and make predictions, often linked to machine learning.