1/72
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
Why are key constraints important in SQL tables?
Without a key constraint, the rows in a table cannot be uniquely identified, and the table doesn’t qualify as a set.
Is there a particular order of the rows in a table?
Unless specified in the query, there isn’t a particular row present.
What is a predicate?
A property or an expression that either holds or doesn’t — it’s either true or false. It represents a condition that helps in setting rules for the output.
Example of a predicate?
Setting a constraint on the Salary column in the Employees table, where the salary must be greater than 0. This won’t allow adding employees with a salary of 0.
What is a relation in SQL?
A relation is a set of related data described by attributes (columns). It is not a relationship between multiple tables.
What is a proposition in SQL?
A proposition is an assertion or statement that is either true or false. Any sentence that states something about particular data can be true, in which case it manifests as a row, or false, in which case it won’t be added to the output.
How is data integrity achieved in SQL?
Data integrity is achieved through constraints that are defined in the data model.
What are some examples of enforcing data integrity?
Assigning an attribute type and nullability (e.g. Salary INT NOT NULL)
Adding candidate and foreign keys
What is a candidate key?
A candidate key is one or more columns whose combined values uniquely identify each row in a table.
A candidate key acts as a “flag” to tell rows apart, but formally, it’s any column or set of columns that makes each row unique.
What are primary and alternate keys?
In case where multiple candidate keys can exist, one of them can be chosen as the primary key. In this case, the rest of the candidate keys become alternate keys.
What is a foreign key?
A foreign key is defined on one or more attributes (columns) of a relation (known as the referencing relation) and references a candidate key in another (or the same) relation. Foreign keys are used to enforce referential integrity — constraints like this restrict the values in the referencing relation’s foreign-key attributes to the values that appear in the referenced relation’s candidate-key attributes.
A foreign key is a column (or set of columns) in one table that points to a candidate key (usually the primary key) in another table.
This means that the values for the foreign key are restricted to values that already exist in the referenced candidate key.
Deletes and updates are also restricted and require additional keywords.
Do we have to define foreign keys to join tables in SQL?
We don’t have to, but we should — they make the relationship official and ensure data consistency.
What is normalization?
It’s a formal mathematical process used to guarantee that each entity will be represented by a single relation.
This means that normalization ensures that each real-world concept (entity) is stored in one table only.
What are the advantages of using normalized databases?
In normalized databases we avoid anomalies during data modification and keep redundancy to a minimum, without sacrificing completeness:
Insertion anomaly
Update anomaly
Deletion anomaly
What is the Entity Relationship Modeling (ERM)? Why is it important?
ERM is the design step before creating the database. With it, we define:
What our entities are (what each table represents)
Their attributes (columns)
Relationships between them
If ERM is done properly, the database will be already normalized by design.
How many normalization rules (normal forms) are there?
There are 3 normal forms, known as 1NF, 2NF and 3NF.
What does the first normal form (1NF) state? How do we enforce it?
1NF states that tuples (rows) in the relation (table) must be unique, and attributes (columns) should be atomic.
We enforce the uniqueness of rows by defining a primary key or a unique constraint in the table. We can operate on attributes only with operations that are defined as a part of the attribute’s type.
Atomicity of the attributes is subjective and depends on the application.
What does the second normal form (2NF) state? How do we enforce it?
2NF involves two rules:
Data must meet the first normal form.
For every candidate key, every non-key attribute has to be fully functionally dependent on the entire candidate key. This means that every piece of information in the table must depend on the whole identifier (the set key,) and not just a part of it (in case where the key includes multiple columns).
What does the third normal form (3NF) state? How do we enforce it?
3NF has two rules:
Data must meet the second normal form.
All non-key attributes must be dependent on candidate keys non-transitively. Informally, this means that all non-key attributes must be mutually independent.
What is a database workload?
A workload is the type of tasks or operations a database is used for.
What kind of database workloads are there? What do they do?
There are 2 types of database workloads:
OLTP (Online Transaction Processing) — handles lots of small, fast operations like inserting, updating, or deleting individual rows (like banking transactions, or e-commerce).
OLAP (Online Analytical Processing) — handles complex analytical queries that summarize or aggregate large volumes of data (like reports, dashboards, trends)
What is a data warehouse (DW)?
A data warehouse is an environment designed for data-retrieval and reporting purposes. The model has intentional redundancy, fewer tables, and simpler relationships, ultimately resulting in simpler and more efficient queries than an OLTP environment.
What is characteristic for data in a data warehouse?
Data in a warehouse is typically pre-aggregated to a certain level of granularity (such as a day), unlike data in an OLTP environment, which is usually recorded at the transaction level.
What is a star schema?
A star schema is the simplest data-warehouse design.
What does a star schema include? What does it look like?
It includes several dimension tables, and a fact table. Each dimension represents a subject by which data can be analyzed (e.g. customers, products, employees, time can all be dimensions in a system with orders and sales).
Each dimension is implemented as a single table with redundant data. If a dimension table is normalized, we get a snowflake dimension and a snowflake schema.
The fact table holds the facts and measures, such as quantity and value, for each relevant combination of dimension keys.
What is ETL?
The process that pulls data from source systems, manipulates it, and loads it into the data warehouse is called ETL, and it stands for Extract, Transform, Load.
How can the SQL Server architecture be split?
It can be split into 2 categories: on-premises and cloud.
What does the on-premise SQL Server architecture offer?
The customer is responsible for everything - hardware, software, security, availability, etc.
The customer can install multiple instances and can write queries that interact with multiple databases, as well as switch connections between them (unless it’s a contained database).
What does the cloud SQL Server architecture offer?
Private cloud offers virtualized servers of the organization in question, and is self-managed.
Public cloud, hosted by Microsoft:
IaaS: developers manage SQL Server, Microsoft manages the hardware.
PaaS: Microsoft manages almost everything (developers only tune queries).
Azure SQL: SQL Server on Azure VM (IaaS) + Azure SQL Database + Azure SQL Managed Instance (PaaS).
What is an SQL instance?
An instance of SQL Server is an installation of the SQL Server database engine or service.
How many SQL Server instances can be installed on the same computer?
There can be multiple instances installed on one computer.
Do two SQL Server instances on the same computer have anything in common?
All instances are completely independent from each other in terms of security and the data they manage, and in all other respects.
The only thing that all instances on the same computer share is the server’s physical resources, such as CPU, memory, and disk.
In case of multiple SQL Server instances, is there a default one?
If multiple instances are present, then one of them can be set as the default, in which case all others must be named instances. The user determines whether an instance is the default or a named one upon installation — this decision cannot be changed later.
What is a database?
A database is a container of schemas, which contain objects such as tables, views, stored procedures, and other objects.
What are system databases?
System databases are created with the installation of SQL Server, and they hold system data for internal purposes:
master — holds instance-wide metadata information, the server configuration, information about all databases in the instance, and initialization information.
model — used as a template for new databases — every new database is a copy of this one; if we want certain objects to appear in all new databases, we have to create them in the model database. The new changes don’t affect old databases, only new ones.
tempdb — used for storing temporary data such as work tables, sort and hash table data, and our own temporary tables. This database is destroyed and recreated as a copy of the model database every time the SQL Server instance is restarted.
msdb — used mainly by SQL Server Agent (in charge of automation, including jobs, schedules, and alerts) to store its data, as well as info for other related SQL Server features, such as Database Mail, Service Brocker, backups, and more.
Resource — hidden, read-only database that holds the definitions of all system objects.
What is collation?
Collation is a database property that determines the default language support, case sensitivity, and sort order for character data in that database. If the collation is not specified when the database is created, the default one will be used (chosen upon installation).
What kind of authentication logins are there in SQL Server?
Microsoft Windows Authentication Login — tied to the Microsoft Windows credentials - the username and password are provided with the login to Windows.
SQL Server Authentication Login — the user must provide the username and password in order to connect to the instance.
What are contained databases?
Contained databases are databases that break the connection between a database user and an instance-level login — the user can only connect to the database they’re contained to, and must specify the database when trying to login, no matter what type of login they’re using. The user cannot switch to other databases that they’re not contained to.
What kind of files are there in a database upon its creation?
There are 3 types of files when a database is created — data files, transaction log files, and optionally, checkpoint files containing memory-optimized data (in-memory OLTP).
When a database is created, various properties for data and log files can be defined, including file name, location, initial size, maximum size, and auto-growth increment. Each database must have at least one data file and at least one transaction log file.
Can SQL Server write to multiple database files in parallel?
SQL Server can write to multiple data files in parallel, but it can only write to one log file at a time.
What is a filegroup?
A filegroup is a logical / virtual container inside the SQL server that groups one or more physical data files (.mdf or .ndf files that actually exist in storage).
Does a filegroup exist on the disk?
No, it doesn’t. Since it's virtual, it only exists in the SQL Server’s metadata.
Where is the info for tables and indexes stored?
In files that belong to a chosen filegroup.
What is a PRIMARY filegroup? What does it contain?
Every database has a PRIMARY filegroup, which contains:
The primary data file (.mdf — master data file, not to be confused with master database)
The system catalog (.ldf — log data file)
Can additional filegroups apart from the PRIMARY one be created?
Yes, other user-defined filegroups containing secondary data files (.ndf) can be created.
What is In-Memory OLTP?
It’s a special memory-optimized engine inside SQL Server used for extremely fast OLTP performance.
How can In-Memory OLTP be used?
In order for In-Memory OLTP to be used, a special filegroup needs to be created, marked for memory-optimized data. In this filegroup, SQL Server stores checkpoint (stream) files, used to rebuild the memory-optimized tables after a restart.
Is the In-Memory OLTP enabled by default?
No, it’s not, because:
It requires a special filegroup and setup.
The memory-optimized tables consume RAM permanently.
Not every workload needs extreme OLTP performance.
What are some of the benefits that the memory-optimized tables offer?
Much faster reads/writes
Lock-free
Reducing blocking and deadlocks
Optimized for high concurrency
Can use natively compiled procedures
What do databases contain?
Databases contain schemas.
What is a schema?
A schema is a container for objects, such as tables, views, stored procedures, etc.
Can permissions be granted at the schema level?
Yes, they can. For example, if a user has a SELECT permission on a schema, they can query the data from that schema.
How can a schema be used as a namespace?
Schemas can be used as namespaces when they are added as prefixes to the database objects. For example, if the Schema is Sales, and the table is Orders, then the schema-qualified object name is Sales.Orders. Database name and instance name can also be added as prefixes if needed.
What happens when a schema name is omitted when referring to an object?
SQL Server starts a process to resolve the schema name:
It checks if the object exists in the user’s default schema
If it doesn’t, it checks if it exists in the dbo schema
Microsoft recommends always using the schema name.
What is declarative data integrity?
Data integrity that’s being enforced as a part of the model (as a part of the table definitions). Examples are data types and nullability choices.
What is procedural data integrity?
Data integrity enforced with code, such as stored procedures or triggers.
How can table constraints be defined?
They can be defined when creating the table, or after table creation, by altering it.
What are composite constraints?
Constraints that are based on more than one attribute (column).
What is a primary key constraint? How does it affect the data?
This constraint enforces row uniqueness and disallows NULL values in the constraint attributes — each unique combination of values in the constraint attributes can appear only once in the table — only one row.
Can a table have more than one primary key?
No. A table can only have one primary key.
Can a primary key constraint be applied to a column that allows NULL values?
No, it cannot.
What happens behind the scenes when a primary key constraint is applied?
SQL Server creates a unique index that is a physical object. Indexes in general are also used to speed up queries by avoiding sorting and unnecessary full table scans (similar to indexes in books).
What kind of constraint is the unique constraint?
It’s a constraint that enforces the uniqueness of rows, allowing us to implement the concept of alternate keys — we can define multiple unique constraints, and they are not restricted to columns that are not NULL.
What does a foreign key constraint enforce? What is its purpose?
It enforces referential integrity, and is defined on one or more attributes in the referencing table, and points to a candidate key (primary key or unique constraint) in the referenced table.
The purpose of this constraint is to restrict the values allowed in the foreign key column to those that exist in the referenced columns.
What are no action foreign keys?
Foreign keys that enforce a referential action — this means that attempts to delete rows from the referenced table or update the referenced candidate key attributes will be rejected if related rows exist in the referencing table.
What does ON DELETE CASCADE allow us when it comes to foreign keys?
It allows us to create foreign keys with actions that will compensate for otherwise prohibited attempts. ON DELETE CASCADE means that when we delete a row from the referenced table, all related rows from the referencing table will be deleted as well.
What are check constraints?
They are constraints used to define a predicate that a row must meet to be entered into the table or to be modified. Rows are only accepted if this predicate evaluates to TRUE or UNKNOWN.
What are default constraints?
Default constraints are associated with a particular attribute. It’s an expression used as a default value when an explicit value is not specified for the attribute when inserting a row:
ALTER TABLE dbo.Orders
ADD CONSTRAINT DFT_Orders_orderts
DEFAULT(SYSDATETIME()) FOR orderts; What is the purpose of the SELECT statement?
The purpose of the SELECT statement is to query tables, apply some logical manipulation, and return results:
SELECT
empid AS EmployeeID
,YEAR(orderdate) AS OrderYear
,COUNT(*) AS NumOrders
FROM
Sales.Orders
WHERE custid = 71
GROUP BY empid, YEAR(orderdate)
HAVING COUNT(*) > 1
ORDER BY empid, OrderYear;What is the logical query processing order?
SELECT
empid AS EmployeeID
,YEAR(orderdate) AS OrderYear
,COUNT(*) AS NumOrders
FROM
Sales.Orders
WHERE custid = 71
GROUP BY empid, YEAR(orderdate)
HAVING COUNT(*) > 1
ORDER BY empid, OrderYear;FROM — queries the rows from the Sales.Orders table
WHERE — filters only orders where the customer ID is equal to 71
GROUP BY — groups the orders by employee ID and order year
HAVING — filters only groups (employee ID and order year) having more than one order
SELECT — selects (returns) for each group the employee ID, order year, and number of orders
Expressions
DISTINCT
ORDER BY — orders (sorts) the rows in the output by employee ID and order year
TOP / OFFSET-FETCH
What do we specify in the WHERE clause?
A predicate or a logical expression to filter the rows returned from the FROM clause:
FROM
Sales.Orders
WHERE custid = 71To what should the predicate/expression in the WHERE clause evaluate so that rows can be returned from it?
It must evaluate to TRUE. If the predicate evaluates to FALSE or UNKNOWN the rows are not returned.