0% found this document useful (0 votes)

15 views7 pages

Data Normalization in Order Schema

The document discusses data normalization, outlining the process of organizing data attributes to reduce redundancy in relational databases. It explains the three common forms of normalization (1NF, 2NF, and 3NF) with examples, highlighting the importance of primary and foreign keys. Additionally, it addresses the benefits of normalization, potential denormalization for performance reasons, and provides visual representations of the data schemas at each normalization level.

Uploaded by

Yelena Bytenskaya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views7 pages

Data Normalization in Order Schema

Uploaded by

Yelena Bytenskaya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Dr.

Igwe

Figure 1: An Initial Data Schema for Order (UML Notation).

Order 0NF
OrderId: integer <<PK>>
DateOrdered: Date
DateFulfilled: Date
Payment1Amount: Currency
Payment1Type: Char(4)
Payment1Description: Char(40)
Payment2Amount: Currency
Payment2Type: Char(4)
Payment2Description: Char(40)
Taxdeferal: Currency
TaxState: Currency
TaxLocal: Currency
SubtotalBeforeTax: Currency
ShipToName: char(45)
ShipToStreet: char(40)
ShipToCity: char(20)
ShipToState: char(20)
ShipToCountry: char(20)
ShipToZipCode: char(20)
ShipToPhone: char(20)
BillToName: char(45)
BillToStreet char(40)
BillToCity: char(20)
BillToState: Char(20)
BillToCountry: char(20)
BillToZipcode: char(20)
BillToPhone: char(20)
ItemName1: char (40)
ItemNumber1: integer
NumberOrdered1: integer
InitialItemPrice1: currency
TotalPriceExtended1: currency
ItemName2: char (40)
ItemNumber2: integer
NumberOrdered2: integer
InitialItemPrice2: currency
TotalPriceExtended2: currency
.
ItemName9: char (40)
ItemNumber9: integer
NumberOrdered: integer
InitialItemPrice: currency
TotalPriceExtended: currency

Data normalization is a process in which data attributes within a data model are organized to
increase the cohesion of entity types. In other words, the goal of data normalization is to reduce
and even eliminate data redundancy, an important consideration for application developers
because it is incredibly difficult to stores objects in a relational database that maintains the same
information in several places. Table 1 summarizes the three most common forms of
normalization ( First normal form (1NF), Second normal form (2NF), and Third normal
form (3NF)) describing how to put entity types into a series of increasing levels of
normalization. Higher levels of data normalization are beyond the scope of this article. With
respect to terminology, a data schema is considered to be at the level of normalization of its
least normalized entity type. For example, if all of your entity types are at second normal form
(2NF) or higher then we say that your data schema is at 2NF.

Dr. Igwe
Table 1: Data Normalization Rules.
Level

Rule

First normal form (1NF)

Second normal form (2NF)

An entity type is in 1NF when it contains no repeating groups of data.

An entity type is in 2NF when it is in 1NF and when all of its non-key attributes are fully dependent on its
primary key.
An entity type is in 3NF when it is in 2NF and when all of its attributes are directly dependent on the primary
key.

Third normal form (3NF)

1. First Normal Form (1NF)

Lets consider an example; an entity type is in first normal form (1NF) when it contains no
repeating groups of data. For example, in Figure 1 you see that there are several repeating
attributes in the data Order0NF table the ordered item information repeats nine times and the
contact information is repeated twice, once for shipping information and once for billing
information. Although this initial version of orders could work, what happens when an order has
more than nine order items? Do you create additional order records for them? What about the
vast majority of orders that only have one or two items? Do we really want to waste all that
storage space in the database for the empty fields? Likely not. Furthermore, do you want to
write the code required to process the nine copies of item information, even if it is only to
marshal it back and forth between the appropriate numbers of objects? Once again, likely not.
Figure 2 presents a reworked data schema where the order schema is put in first normal form.
The introduction of the OrderItem1NF table enables us to have as many, or as few, order items
associated with an order, increasing the flexibility of our schema while reducing storage
requirements for small orders (the majority of our business). The ContactInformation1NF table
offers a similar benefit, when an order is shipped and billed to the same person (once again the
majority of cases) we could use the same contact information record in the database to reduce
data redundancy.

OrderPayment1NF was introduced to enable customers to make several payments against an

order Order0NF could accept up to two payments, the type being something like MC and the
description MasterCard Payment, although with the new approach far more than two payments
could be supported. Multiple payments are accepted only when the total of an order is large
enough that a customer must pay via more than one approach, perhaps paying some by check
and some by credit card.

Dr. Igwe

Figure 2: An Order Data Schema in 1NF (UML Notation).

An important thing to notice is the application of primary and foreign keys in the new solution.
Order1NF has kept OrderID, the original key of Order0NF, as its primary key. To maintain
the relationship back to Order1NF, the OrderItem1NF table includes the OrderID column within
its schema, which is why it has the stereotype of FK. When a new table is introduced into a
schema, in this case OrderItem1NF, as the result of first normalization efforts it is common to
use the primary key of the original table (Order0NF) as part of the primary key of the new table.
Because OrderID is not unique for order items, you can have several order items on an order, the
column ItemSequence was added to form a composite primary key for the OrderItem1NF table.
A different approach to keys was taken with the ContactInformation1NF table. The column
ContactID, a surrogate key that has no business meaning, was made the primary key.

2. Second Normal Form (2NF)

Although the solution presented in Figure 2 is improved over that of Figure 1, it can be
normalized further. Figure 3 presents the data schema of Figure 2 in second normal form
(2NF). an entity type is in second normal form (2NF) when it is in 1NF and when every non-key
attribute, any attribute that is not part of the primary key, is fully dependent on the primary key.
This was definitely not the case with the OrderItem1NF table, therefore we need to introduce the
new table Item2NF. The problem with OrderItem1NF is that item information, such as the name
and price of an item, do not depend upon an order for that item. For example, if Hal Jordan
orders three widgets and Oliver Queen orders five widgets, the facts that the item is called a
widget and that the unit price is $19.95 is constant. This information depends on the concept
of an item, not the concept of an order for an item, and therefore should not be stored in the order
items table therefore the Item2NF table was introduced. OrderItem2NF retained the
TotalPriceExtended column, a calculated value that is the number of items ordered multiplied by

Dr. Igwe

the price of the item. The value of the SubtotalBeforeTax column within the Order2NF table is
the total of the values of the total price extended for each of its order items.
Figure 3. An Order in 2NF (UML Notation).

3. Third Normal Form (3NF)

An entity type is in third normal form (3NF) when it is in 2NF and when all of its attributes are
directly dependent on the primary key. A better way to word this rule might be that the attributes
of an entity type must depend on all portions of the primary key. In this case there is a problem
with the OrderPayment2NF table, the payment type description (such as Mastercard or
Check) depends only on the payment type, not on the combination of the order id and the
payment type. To resolve this problem the PaymentType3NF table was introduced in Figure 4,
containing a description of the payment type as well as a unique identifier for each payment type.

Dr. Igwe

Figure 4: An Order in 3NF (UML Notation).

4. Beyond 3NF
The data schema of Figure 4 can still be improved upon, at least from the point of view of data
redundancy, by removing attributes that can be calculated/derived from other ones. In this case
we could remove the SubtotalBeforeTax column within the Order3NF table and the
TotalPriceExtended column of OrderItem3NF, as you see in Figure 5.

Dr. Igwe

Figure 5. An Order Without Calculated Values (UML Notation).

5. Why Data Normalization?

The advantage of having a highly normalized data schema is that information is stored in one
place and one place only, reducing the possibility of inconsistent data. Furthermore, highlynormalized data schemas in general are closer conceptually to object-oriented schemas because
the object-oriented goals of promoting high cohesion and loose coupling between classes results
in similar solutions (at least from a data point of view). This generally makes it easier to map
your objects to your data schema.

6. Denormalization
From a purist point of view you want to normalize your data structures as much as possible, but
from a practical point of view you will find that you need to 'back out" of some of your
normalizations for performance reasons. This is called "denormalization". For example, with
the data schema of Figure 1 all the data for a single order is stored in one row (assuming orders
of up to nine order items), making it very easy to access. With the data schema of Figure 1 you
could quickly determine the total amount of an order by reading the single row from the
Order0NF table. To do so with the data schema of Figure 5 you would need to read data from a
row in the Order table, data from all the rows from the OrderItem table for that order and data
from the corresponding rows in the Item table for each order item. For this query, the data
schema of Figure 1 very likely provides better performance.

Dr. Igwe

Common questions

The Order0NF schema presents challenges due to its use of repeating groups for order items and contact information, leading to data redundancy and inefficiency in storage. It is inflexible for orders with varying numbers of items, potentially leading to data wastage when too many fields are allocated for orders with fewer items. Normalization addresses these issues by restructuring the schema to remove these repeating groups. In 1NF, separate tables such as OrderItem1NF and ContactInformation1NF are introduced to eliminate redundancy, allowing for a flexible number of related records without empty fields .

3NF might not be sufficient when transitive dependencies or multi-valued dependencies exist, which can still result in anomalies during insertions, deletions, or updates. Business cases requiring more rigorous data integrity might demand normalization beyond 3NF to forms like Boyce-Codd Normal Form (BCNF) or Fourth Normal Form (4NF), which handle these more complex dependency structures to ensure data integrity and minimize redundancy in advanced database systems .

Data normalization to 3NF aligns with object-oriented design principles by fostering high cohesion and loose coupling. Normalization ensures each table stores distinct information related to a specific concept, akin to how object-oriented design encourages encapsulating related behaviors and data within a single class. This similarity means normalized designs can be more seamlessly mapped to object-oriented data models, facilitating easier interaction between database and application logic .

Normalization to 2NF ensures that all non-key attributes are fully dependent on the primary key by removing partial dependencies. Any attribute that does not fully rely on the primary key for its identification is moved to a new table. For instance, item information such as name and price, which are independent of orders, are moved to a separate Item2NF table instead of being part of OrderItem1NF. This ensures that the remaining non-key attributes are fully dependent on the composite key of the related table .

Separating calculated values from storage reduces redundancy and simplifies updates, as changes to the underlying data require recalculation only at query time, not updating stored calculated fields. This practice minimizes storage space and the potential for errors because it avoids maintaining consistency across calculations and base data separately. This approach is more efficient for storage and ensures the most current values are used in calculations, enhancing data integrity and system performance .

Denormalization involves reversing some normalization steps to improve performance but introduces trade-offs such as increased data redundancy and potential inconsistencies. By consolidating data into fewer tables or rows, queries can execute faster because they require accessing less data. However, this can lead to issues in maintaining data integrity across different instances where the same information is stored, heightening the risk of errors and increasing the complexity of data updates .

Organizations might choose to maintain non-normalized databases to optimize performance and simplify complex queries. Non-normalized databases keep related data in single locations, reducing the number of joins needed in queries, which can enhance speed for read-heavy operations. This can be particularly beneficial in systems where performance takes precedence over storage efficiency and data integrity, such as in environments with high transaction volumes or where real-time processing is crucial .

Composite primary keys are used in normalized schemas when uniquely identifying records requires more than one attribute, especially in many-to-many relationships or when integrating components of deprecated rows. For instance, the OrderItem1NF table uses a composite key of OrderID and ItemSequence to uniquely identify each item in an order. This ensures a robust key structure capable of differentiating between multiple instances of a relationship within the same context, promoting data integrity and supporting the logical design of related entities .

The OrderPayment2NF table requires normalization to 3NF because certain attributes, like the payment type description, are dependent only on a portion of the composite primary key and not the entire primary key. These partial dependencies violate the 3NF rule that attributes must depend on all parts of the primary key, prompting the separation of dependent attributes into a new table, such as the PaymentType3NF table, which ensures each attribute is directly dependent on the primary key .

The introduction of surrogate keys in data normalization improves database design by providing unique, non-descriptive identifiers that are unrelated to business logic, facilitating easier and more efficient management of relationships. Surrogate keys avoid potential complications and changes that might arise from using business-related keys. For example, the ContactInformation1NF table uses a surrogate key (ContactID) instead of multipart or business-related primary keys, thus simplifying one-to-many relationships and ensuring consistency in referencing records .

3 Normal Forms Tutorial
No ratings yet
3 Normal Forms Tutorial
12 pages
A3 Task
No ratings yet
A3 Task
11 pages
Understanding Database Normalization
No ratings yet
Understanding Database Normalization
5 pages
Normalization: Enhancing Data Integrity
No ratings yet
Normalization: Enhancing Data Integrity
9 pages
Customer Order Normalization Process
No ratings yet
Customer Order Normalization Process
4 pages
Database Design and Normalization Guide
No ratings yet
Database Design and Normalization Guide
21 pages
Database Normalization: 1NF to 3NF Guide
No ratings yet
Database Normalization: 1NF to 3NF Guide
20 pages
Data Normalization Techniques Explained
No ratings yet
Data Normalization Techniques Explained
29 pages
Lab3 - .Anomaly Detection and Normalization
No ratings yet
Lab3 - .Anomaly Detection and Normalization
6 pages
Understanding Database Normalization
No ratings yet
Understanding Database Normalization
11 pages
Database Normalization Exercises Guide
No ratings yet
Database Normalization Exercises Guide
9 pages
Understanding Database Normalization
No ratings yet
Understanding Database Normalization
50 pages
Data Normalization Techniques Explained
No ratings yet
Data Normalization Techniques Explained
7 pages
Data Normalization in Database Design
No ratings yet
Data Normalization in Database Design
7 pages
Understanding Database Normalization Steps
No ratings yet
Understanding Database Normalization Steps
13 pages
Database Normalization
100% (3)
Database Normalization
19 pages
Database Normalization Explained
No ratings yet
Database Normalization Explained
4 pages
Understanding Database Schema Design
No ratings yet
Understanding Database Schema Design
6 pages
Understanding Data Normalisation Techniques
No ratings yet
Understanding Data Normalisation Techniques
28 pages
Understanding Database Normalization
No ratings yet
Understanding Database Normalization
8 pages
Understanding Rule 31204 in Normalization
No ratings yet
Understanding Rule 31204 in Normalization
37 pages
Understanding Database Normalization
No ratings yet
Understanding Database Normalization
2 pages
IT221 - Chapter 2 - Advanced Data Modeling
No ratings yet
IT221 - Chapter 2 - Advanced Data Modeling
59 pages
Database Design and Normalization Guide
No ratings yet
Database Design and Normalization Guide
43 pages
Normalization vs. Denormalization in Databases
No ratings yet
Normalization vs. Denormalization in Databases
21 pages
Database Normalization Techniques Guide
No ratings yet
Database Normalization Techniques Guide
30 pages
Database Normalization Explained
100% (1)
Database Normalization Explained
59 pages
Database Normalization Exercise Guide
No ratings yet
Database Normalization Exercise Guide
8 pages
Understanding Database Normalization
No ratings yet
Understanding Database Normalization
9 pages
Understanding Database Normalization Forms
No ratings yet
Understanding Database Normalization Forms
1 page
Database Normalization and Design Principles
No ratings yet
Database Normalization and Design Principles
38 pages
Logical Database Design Overview
No ratings yet
Logical Database Design Overview
19 pages
Understanding Database Architecture
No ratings yet
Understanding Database Architecture
10 pages
Database Design: Normalization & ER Modeling
No ratings yet
Database Design: Normalization & ER Modeling
30 pages
Data Processing: Modeling and Normalization
No ratings yet
Data Processing: Modeling and Normalization
10 pages
2.3 Normalisation
No ratings yet
2.3 Normalisation
14 pages
Understanding Database Normalization
No ratings yet
Understanding Database Normalization
11 pages
Software Design Principles and Practices
No ratings yet
Software Design Principles and Practices
14 pages
Understanding Database Normalization Techniques
No ratings yet
Understanding Database Normalization Techniques
4 pages
Normalization and Data Warehousing Guide
No ratings yet
Normalization and Data Warehousing Guide
11 pages
Understanding Database Normalization
No ratings yet
Understanding Database Normalization
1 page
Relational Database Design Essentials
No ratings yet
Relational Database Design Essentials
25 pages
Database Normalization Stages Explained
No ratings yet
Database Normalization Stages Explained
44 pages
Understanding Database Normalization Steps
No ratings yet
Understanding Database Normalization Steps
61 pages
ERD Normalization Explained
No ratings yet
ERD Normalization Explained
10 pages
E Commerce
No ratings yet
E Commerce
13 pages
Understanding Database Normalization
No ratings yet
Understanding Database Normalization
2 pages
Database Normalization Exercises Guide
No ratings yet
Database Normalization Exercises Guide
7 pages
Object-Oriented Data Modeling & Normalization
No ratings yet
Object-Oriented Data Modeling & Normalization
4 pages
Database Normalization Exercises Guide
No ratings yet
Database Normalization Exercises Guide
9 pages
Database Normalization Overview
No ratings yet
Database Normalization Overview
11 pages
Database Normalization Exercise Guide
No ratings yet
Database Normalization Exercise Guide
9 pages
Understanding Data Normalization Steps
No ratings yet
Understanding Data Normalization Steps
5 pages
Database Design for Kids Creative Retail
No ratings yet
Database Design for Kids Creative Retail
6 pages
Database Normalization Explained
No ratings yet
Database Normalization Explained
42 pages
Oracle Database 11g SQL and PLSQL New Features
No ratings yet
Oracle Database 11g SQL and PLSQL New Features
232 pages
Oracle Label Security Overview
No ratings yet
Oracle Label Security Overview
59 pages
Oracle Database 11g Database Architecture and ASM
No ratings yet
Oracle Database 11g Database Architecture and ASM
44 pages
SQL Workshop Exercise Solutions
100% (1)
SQL Workshop Exercise Solutions
47 pages
Oracle Database 11g Using DDL Views Sequences Indexes and Synonyms
No ratings yet
Oracle Database 11g Using DDL Views Sequences Indexes and Synonyms
54 pages
RAC Instance Recovery Overview
No ratings yet
RAC Instance Recovery Overview
52 pages
Oracle TDE Setup and Management Guide
100% (1)
Oracle TDE Setup and Management Guide
40 pages
Oracle SQL Practice Questions 14-18
No ratings yet
Oracle SQL Practice Questions 14-18
11 pages
SQL Queries for Course Enrollment Analysis
No ratings yet
SQL Queries for Course Enrollment Analysis
3 pages
SQL Date and Conversion Functions Guide
100% (1)
SQL Date and Conversion Functions Guide
1 page
SQL Assignment: DML, DDL, Views
No ratings yet
SQL Assignment: DML, DDL, Views
2 pages
SQL Subqueries and Joins Assignment
100% (3)
SQL Subqueries and Joins Assignment
3 pages
Normalization Example Document
No ratings yet
Normalization Example Document
7 pages
SQL Functions and Student Data Analysis
No ratings yet
SQL Functions and Student Data Analysis
2 pages
Structural Independence in Databases
No ratings yet
Structural Independence in Databases
3 pages
Data Mining Course Overview PDF
No ratings yet
Data Mining Course Overview PDF
38 pages
IT405 E-Portals Project Guidelines
No ratings yet
IT405 E-Portals Project Guidelines
15 pages
Database & File Concepts Key Terms
No ratings yet
Database & File Concepts Key Terms
23 pages
Document Management Systems Overview
No ratings yet
Document Management Systems Overview
5 pages
CARIS HPD Product Editor Training
No ratings yet
CARIS HPD Product Editor Training
2 pages
Automated RAID Level Selection Techniques
No ratings yet
Automated RAID Level Selection Techniques
13 pages
Ext4 Handout
No ratings yet
Ext4 Handout
6 pages
House Rental Management System Overview
No ratings yet
House Rental Management System Overview
31 pages
Overview of Secondary Memory Devices
100% (1)
Overview of Secondary Memory Devices
26 pages
SQL Programming Course Outline STY2211
No ratings yet
SQL Programming Course Outline STY2211
10 pages
Pathlib Documentation Overview
No ratings yet
Pathlib Documentation Overview
30 pages
DS File Ops Kit Overview
No ratings yet
DS File Ops Kit Overview
5 pages
Database Normalization Techniques Explained
No ratings yet
Database Normalization Techniques Explained
41 pages
SQL Practice Questions and Solutions
No ratings yet
SQL Practice Questions and Solutions
3 pages
Detecting Anomalies in Excel Data
No ratings yet
Detecting Anomalies in Excel Data
12 pages
Data Services Repository Migration Steps
No ratings yet
Data Services Repository Migration Steps
2 pages
Dell Boomi Associate Developer Course
No ratings yet
Dell Boomi Associate Developer Course
4 pages
SQL Queries for Movie Ratings Analysis
No ratings yet
SQL Queries for Movie Ratings Analysis
4 pages
SQL to Pandas: Data Manipulation Guide
No ratings yet
SQL to Pandas: Data Manipulation Guide
38 pages
Crud Laravel Tutorial
No ratings yet
Crud Laravel Tutorial
5 pages
Healthcare Data Analyst Profile
No ratings yet
Healthcare Data Analyst Profile
4 pages
Understanding Database Triggers
No ratings yet
Understanding Database Triggers
17 pages
Percona Server Installation Guide
No ratings yet
Percona Server Installation Guide
3 pages
Business Analyst Role in Financial Data Migration
No ratings yet
Business Analyst Role in Financial Data Migration
6 pages
Creating Cartesian Joins in MySQL
No ratings yet
Creating Cartesian Joins in MySQL
17 pages
SQL Queries and Normalization Guide
No ratings yet
SQL Queries and Normalization Guide
184 pages
Class 12 Computer Science Practical File
No ratings yet
Class 12 Computer Science Practical File
71 pages
Understanding SQL Self Joins
No ratings yet
Understanding SQL Self Joins
22 pages
Hadoop Installation and HDFS Commands Guide
No ratings yet
Hadoop Installation and HDFS Commands Guide
56 pages
Resolving ORA-01403 Error in CMs
No ratings yet
Resolving ORA-01403 Error in CMs
2 pages

Data Normalization in Order Schema

Uploaded by

Data Normalization in Order Schema

Uploaded by

Dr.

Figure 1: An Initial Data Schema for Order (UML Notation).

First normal form (1NF)

An entity type is in 1NF when it contains no repeating groups of data.

Third normal form (3NF)

1. First Normal Form (1NF)

OrderPayment1NF was introduced to enable customers to make several payments against an

Figure 2: An Order Data Schema in 1NF (UML Notation).

2. Second Normal Form (2NF)

3. Third Normal Form (3NF)

Figure 4: An Order in 3NF (UML Notation).

Figure 5. An Order Without Calculated Values (UML Notation).

5. Why Data Normalization?

Common questions

What challenges come with storing order items and contact information in the Order0NF schema, and how does normalization address these issues?

In what scenarios might 3NF not be the endpoint for normalization, and further steps such as BCNF or 4NF be required?

How does data normalization to 3NF align with object-oriented design principles?

How does normalization ensure that all non-key attributes are fully dependent on the primary key in 2NF?

How does separating calculated values from storage, as seen in 3NF and beyond, benefit database efficiency?

What trade-offs might be encountered when denormalizing a data schema that has been highly normalized?

Why might organizations choose to maintain non-normalized databases despite the advantages of normalization?

What design considerations might lead to the use of composite primary keys in normalized database schemas?

What problems in the OrderPayment2NF table necessitate further normalization to 3NF?

In what ways does the introduction of surrogate keys in data normalization improve database design?

You might also like