AM

Rules of Normalization

Rules of Normalization

Normalization is a practice involving a set of rules for designing database tables to minimize the risk of data anomalies.

  • It's important and widely adopted but not always essential. Sometimes, we intentionally break normalization rules.

Data Anomalies

Data anomalies are errors or inconsistencies in data that we want to avoid.

  • Update Anomaly: Occurs when redundant data is only partially updated.
    • Example: A database of books and authors, tracking author addresses with each book entry. If an author has 10 books, there would be 10 copies of the address. If the address is updated but only five are changed, there will be inconsistencies.
  • Insertion Anomaly: Occurs when data cannot be added to the database due to the absence of other data.
    • Example: To add a new book, the author's address is required, but if the address is unavailable, the book cannot be added.
  • Deletion Anomaly: Occurs when data is unintentionally lost because other data is deleted.
    • Example: Losing an author's address because the author's only book in the database was deleted. The intention might have been to delete the book but retain the author's information.

Normalization Rules

There are many normalization rules, but the first three are the most important and widely used:

  • First Normal Form (1NF): Each value in a column is an atomic unit.
    • It should be a particular number, a unit string, a Boolean, or some scalar value that can't be broken down further.
  • Second Normal Form (2NF): Any attribute or column in a table is dependent on the key.
    • Everything in the table is related to that particular key.
  • Third Normal Form (3NF): There are no transitive dependencies.
    • Some pieces are dependent on something in the table but not the key. Example: An author's address is dependent on the author, and the author depends on the book. This violates 3NF.

Visualizing Normalized Databases

Normalized databases are visualized with diagrams showing rectangles representing tables and lines connecting the tables, representing relationships.

  • Relationships: These indicate the type of relationship (e.g., one-to-many).
    • Example: An order table (primary table) and an order items table (secondary table with detail). If an order has 10 different books, there's one order in the order's table and 10 order items in the order items table, representing a one-to-many (1 to N) relationship.
    • Relationships can be one-to-one, one-to-zero or one, or one-to-zero or many. All are allowed under normalization rules.

OLTP Systems

OLTP (Online Transaction Processing) systems involve many reads and writes and are constantly being updated. Data is written once but may be updated frequently by many different processes.

  • Example: E-commerce application with users updating many orders simultaneously.
  • OLTP systems are often normalized to the third normal form. Sometimes, for performance reasons, they are slightly de-normalized but are typically still considered normalized.

Analytical Databases

Analytical databases are used for data analysis.

  • They involve many reads by many processes.
  • Unlike OLTP, updates to single customer addresses or orders are less frequent. Reads span many rows but involve fewer columns.
  • There are many writes with batch processing, where bulk updates are done all at once, or streaming data, where data is ingested in near real-time.
  • Analytical databases are often de-normalized.