Monday, April 10, 2006

Data Normalization

In the name of ALLAH

Data Normalization

Abstract:

After gathering required information from end-users who are going to use your
application, you move to design database and you should spend sometime to design
it well because any mistake in your design would cost you a lot of effort, money
and time if your application built on it…

Introduction:

Data Normalization is the process of organizing information in your database
to should be able to develop professional code to update and retrieve data efficiently
and your code comes from a series of rules of standards called NORMAL FORMS

FIRST NORMAL FORM:

Let's execute that code [SELECT * FROM student] so the output is



It's definitely wrong as what about if I want to sort by Student's Last Name
I have no way except writing functions to do this job and that's wrong

After modification solution is:

Let's execute that code [SELECT * FROM student] so the output is


So now it's very easy to make all operations I need to sort by any attribute
and update and retrieve needed data

Let's execute that code [SELECT * FROM student] so the output is


Unquestionably very wrong to do that for many reasons first of them I can not
count all students who take Microsoft SQL Server 2005 course second what about
if a student intend to take another course you need to modify the schema or
the design of the relation and never ever put yourself in a case according to
it you may change your database schema, believe me if you did, you would be
fired, assume well to design well…

After modification solution is:



Now I can count all students who take any specific course and also update data
smoothly without any modification in database schema

So we can summarize FIRST NORMAL FORM as:

1- Each attribute in a relation should contain only one piece of information
that you can care about.

2- And also each piece of in formation that you can care about should be contained
in just ONE attribute.

SECOND NORMAL FORM:

It focuses on relation itself and its concepts are:

1- Database must be in FIRST NORMAL FORM.

2- Every attribute should be functionally dependant on the entire primary key.

3- Every relation should model just one entity, object or event.

Let's execute that code [SELECT * FROM student] so the output is



According to SECOND NORMAL FORM concept we see that relation violate SECOND
NORMAL FORM as CourseName dependant on to primary keys they are StudentID and
CourseID

After modification solution is to have two tables one for
StudentCourse and one for Course respectively as shown





Here, we divided first table to two tables to avoid violation
of SECOND NORMAL FORM as in last table CourseName became just dependant on CourseID.

THIRD NORMAL FORM:

Let us show what the concepts of the THIRD NORMAL FORM

1- Database must be in SECOND NORMAL FORM.

2- The attributes in addition to be dependant on entire primary key it should
not be dependant on another attributes.

3- Don't store any calculated or derived values, only store the raw data and
use queries to perform calculations and looks up as needed.

Let's execute that code [SELECT * FROM Item] so the output is


Here, all attributes dependant on ItemID as it's the primary key of this table
however the problem is in TotalCost as you see TotalCost is a calculated value
result FROM multiplication of ItemCost by ItemQuantity and that's a violation
of THIRD NORMAL FORM.

Conclusion:

As we saw applying normalization to our design is very easy and logical but
also not applying it will cost us a lot and a lot, and while designing large
database you will find some difficulty to achieve the equation that you must
attain performance, normalization, efficiency, all end-users requirements and
so forth so they are cases you should DENORMALIZE your design i.e. what about
if end-user asked you to store calculated or derived values in this case if
you didn't find solution you would be forced to say "Ok".

In the article I just mentioned 3 forms of NORMALIZATION and there are other
forms like Boyce-Codd normal form – BCNF, Forth normal form - 4NF and
fifth normal form - 5NF, may next article will mention them in details.

I appreciate your reading.

Thank you,
Ramy Mahrous

0 comments: