database normalization tutorial for beginners

database normalization tutorial for beginners


Good morning Good afternoon Good evening from whichever location you are seeing this video. In this beginners tutorial we will be learning Data Normalization. This tutorial is Database Normalization for beginners. It is a 30 minutes of tutorial where we will first walk through the basics of First normal form, second normal form and third normal form. While we are doing so will understand the different Database design problems we come across and how these normal forms helps us. also about the difference between duplicate and redundancy. Many people think both the words are same They are same but different as well. Also we will be talking about Partial Dependency, Transitional dependency what is Atomic & repeating groups. This is going to be a 30 minutes of tutorial which will make us understand Database Normalization for beginners. Database is like a fort. One of the main goals of this fort is Redundant Data, Bad Data, Duplicate Data should not enter the system. Second the end user should also be able to trust the data. There should be data integrity, the data should be useful and so on. For that the first important thing is how is the redundancy done? What kind of database design principles should we follow? Before go into first normal form, second normal form and third normal form will first try to understand the basic database design mistakes people make. Once we understand that it is very easy to understand the Normal Forms. The first mistake which developer makes they do not put proper Primary Keys, Candidate Keys and Unique Keys. Here we have tblCustomer table In this tblCustomer Customer Code should not be duplicate. If we put some duplicate records it does allow to enter duplicate records customer code should be unique. Very basic thing we should do is we should have a Primary Key, put proper Unique keys on fields which are needed. In this case we should have a Unique Key or a Primary Key on the Customer Code. We can resolve this issue by putting a proper Primary Key or Unique Key. Go to the design With this we cannot put duplicate records in the Cutomer Code. The first thing we need to follow is put proper Unique Keys & Primary Keys. Now we are getting an error here saying The second Database design mistake people do is this is not a mistake but more of a lethargic approach. People try to put multiple values into single columns. We need to understand this is a column, its a single value it is not a wall it is column. A pillar has only single thing. shoes, tie, pen. Then there are smart people who do not use commas(,), they use Pipes (|) or weird separators. if we try to store data in this format we can have problems. This is like a free text. Over here the first rule we need to follow is one column should store single value/atomic value means value which cannot be divided any further. In order to resolve this people use another lethargic approach, they create two columns If they need three values/product to be stored they design that table and name it as Product1, Product2, Product3. The first one is Shoes, second one is Socks and third one is Pen. again this is lethargic approach. Enter one more record The problem here is Redundancy. Before we proceed ahead it is important to understand the difference between Duplicate data and Redundant data. Lot of people who are especially coming from Vernacular medium for them Redundant and Duplicate same vocabulary but they are not same. These two sentences will probably tell you the difference between Duplicate & Redundant. Look at the first sentence Now look at the second sentence Duplicate means exactly same like when we do a kind of a photocopy or keep a backup exactly same as it is. Redundant means something extra, it is not same but something extra for backup purpose. Duplicate & Redundant are two different things. The fields Product1, Product2 & Product3 are Redundant fields. Sometimes back when we entered here 1001, both these are duplicate records and the rests are redundant extra fields. In the first row we have Shoes, Socks & Pen. In the second row we have Pen and do not have any other products for the second customer so we have put NULL. In the third row we have two products and the third column is NULL. For the second and the third records there are some columns NULL. They are redundant, extra. they are of no use. Because of the bad design we have entered into Redundant Problem. This problem is also technically termed as Repeating Groups. Repeating Groups means we have created columns which are exactly same like Product1, Product2 and Product3. These columns are created in a very very unorganized manner. Why is these fields Product1, product2 and Product3 no one knows The sequence makes no sense as such because they are redundant fields. If we want to add Product4 we have to change the database design. This repeating groups brings strict design, its not flexible as such. To remove the repeating group problems we need to create one more table called as a Product table. We are going to split this table into two tables. One table will have the customer records and the other table will have the product records. The splitting of tables into two is termed as Normalization. Create a Product table here. It will have ProductId and ModelId. In this table make both of these as a Composite Key. We will have here is ProductName, ModelName, ManufacturerName who manufacture these Product and Model. We are creating a new table here called as tblProduct. In this will make all the entries and this table will have a foreign key relationship with the tblCustomer Table. From the tblCustomer table remove all these fields because these are Repeating groups. Over here put a ProductId_fk. Henceforth, in this tutorial if we have foreign key name it as _fk so we know this is a foreign key. Also put a ModelId_fk. The primary table has a composite key so we created a composite key out here. Go to the database diagram and define the relationships. These two are foreign keys. In the product table we have entered two products 101 and model number , 1002 and model number, product name. We have a separate table where everything is defined. Link that to the tblCustomer. The first customer buys 101 and 908. The second customer buys 102 and 909. This design is not what we are expecting. One customer can buy multiple products. We need to create one more intermediate table which will have reference of the CustomerId from this table. It will have the reference of the ProductId and ModelId from this table. In other words we need to create a table which is having many to many relationships. By putting the columns in this customer table what we have done is we can enter just one customer to one product which is not we wanted to do. We wanted one customer can buy multiple products and multiple products can be bought by multiple customers. We are not expecting this design. Remove both the fields from here. Create one table which will have many to many relationship. We can name the table as tblProductCustomer. This will also has the reference of the CutomerId from the Customer table. This customer has bought this product of this model. _fk means foreign key. We can name it as tblProductCustomer – the many to many table. Shiv has bought 101 and ModelId 908. Again Shiv has bought the second product which is 102 and 909. This is flexible, if we look at the repeating groups where we were adding column wise now we are adding row wise. This is not even redundant in other words if there is no second product we can delete it, it does not occupy space. We talked about three things Problem 1 – Duplicate data issue is removed by creating properly primary keys, unique keys and candidate keys. Problem 2 – In a single column we should have single value or atomic value. We should not have multivalue rows, we cannot have Product1, Product2 as we did previously. Problem 3 – We should not have Redundant Columns or Repeating Groups. When we satisfy these three rules that means we have implemented first normal form. First Normal Form says all of these three rules. The fourth design mistake which lot of developers do is the non-key columns means ProductName, ProductCost, ModelName, ModelBrand and so on which are not primary key do not depend fully on the primary key. At this moment we have composite primary key as ProductId & ModelId. ProductName and ProductCost is dependent on the ProductId field. The ModelName, ModelBrand is dependent on the ModelId. This Model has nothing to do with the Product directly as such. The Product has nothing to do with the Model directly as such. The non-key columns are partially dependent on the composite key. This is a bad design because a table is logical unit if we put fields which are not related to that unit then we have problems Tomorrow if we add or remove fields then it becomes very difficult to change the structure because in one table we put lot of entities. Thank you very much.

Danny Hutson

3 thoughts on “database normalization tutorial for beginners

Leave a Reply

Your email address will not be published. Required fields are marked *