Jasper22.NET: Revisiting Normalization and Denormalization

Revisiting Normalization and Denormalization

Posted by jasper22 at 11:18 | Monday, November 29, 2010

In this blog I have done at many articles on Normalization and Denormalization, but I have never put all of the arguments together in one place, so that is what I would like to do today.

There are links to related essays on normalization and denormalization at the bottom of this post.

This blog has two tables of contents, the Topical Table of Contents and the list of Database Skills.

The What and Why of Normalization

Normalization is the process of designing tables so that each fact is stored in exactly one place. A "fact" in this case is any detail that we have to keep track of, such as a product's description, a product's price, an employee's social security number, and so forth.

The process is all about figuring out what tables you need and what columns each table will have. If we are talking about an employee's social security number, then we can guess right from the start that will have a table of EMPLOYEES, and that one of the columns will be SSN. As we get more details, we add more tables and columns.

The advantage of normalization comes when your application writes data to the database. In the simplest terms, when the application needs to store some fact, it only has to go to one place to do it. Writing this kind of code is very easy. Easy to write, easy to debug, easy to maintain and improve.

When the database is not normalized, you end up spending more time writing more complicated application code that is harder to debug. The chances of bad data in your production database go way up. When a shop first experiences bad data in production, it starts to become tempting to "lock down" access to the database, either by forcing updates to go through stored procedures or by trying to enforce access to certain tables through certain codepaths. Both of these strategies: stored procedures and code paths, are the actually the same strategy implemented in different tiers, they both try to prevent bugs by routing access through some bit of code that "knows what to do." But if the database is normalized, you do not need any magic code that "knows what to do."

So that, in brief, is what normalization is and why we do it. Let's move on now to denormalization.

Denormalization is Harder to Talk About

Normalization is easy to explain because there is a clearly stated end-goal: correct data. Moreover, there are well-defined methods for reaching the goal, which we call the normal forms, First Normal Form, Second Normal Form, and higher forms. By contrast, denormalization is much harder to talk about because there is no agreed-upon end goal. To make matters worse, denormalization violates the original theory of Relational Databases, so you still have plenty of people screaming not to do it all, making things even more confusing. What we have now in our industry is different shops denormalizing in different ways for different reasons.

Jasper22.NET

Revisiting Normalization and Denormalization

Archive

Random sites

Followers

Search This Blog

Jasper22.NET

Revisiting Normalization and Denormalization

Archive

Random sites

Followers

Search This Blog

Subscribe To