This topic is a bit digressing from a DW point of view. But it is better that we know about Normalization. Once
we understand Normalization & DW facts & dimensions, the schema concepts would be clearer. It would help us understand
why reporting is easier & faster from a DW. There are 5 types of Normalization. But for now it’s enough to understand
3 types of Normalization. Normalization helps in reducing data redundancy. As we move towards higher normalization
1NF: This type of normalization states that there must not be any duplicates in the tables that we use.
In other words, all the tables used must have a primary key defined.
2NF: This type of normalization states that data redundancy can be reduced if attributes those are dependent
on one of the keys of a composite primary key are isolated to a separate table. Not only does this reduces data redundancy
but also helps in increasing data retention when a delete is done. For example, consider a table that has the following columns:
Part Id, State, City, and Country. Here, assume Part Id & Country form the composite primary key. The attributes state
& city depend only on the country. 2NF states that if such is the case then split the table into 2 tables. One with Part
Id & country as the columns. Other with Country, state & city as the columns. In the 1st table if a delete
is made to all the rows with Part Id = ‘X’ then we would lose country related data too. But in the 2nd
case this would not happen.
3NF: This type of normalization states that if a dependency exists on certain attributes other than the primary key then
the table split depending on the dependency has to be done. Consider the same example above. In the present case consider
that Part Id is the only primary key. Now state, city depend only on country & not on Part Id. This table is already in
1NF & 2NF. But to achieve 3NF we would do the same split as above.