Database Normalization: How bad can it get in a non-normalized database!

Database decisions are right if they positively affect your application. Normalization (or the lack of it) is one decision every programmer makes while designing his application database. Looking at the big picture is crucial in such a circumstance.  The positives associated with normalization are:
1.    Lesser coding required.
2.    Coding becomes easier.
3.    Fewer errors.

What it really means:

Programmers often spend hours reading papers on normalization without really understanding what the need is. At its very basic form normalization ensures that each nugget of data or fact is stored in exactly one place. So now when you need that data, you know where exactly to look for it. Now lets look at the other side of the coin. You have the same data stored in lets say 3 to 4 locations. Firstly you have increased redundancy. Now this in itself is not such a bad thing. If you end up increasing the overall speed of the application while maintaining its consistency, you have won the battle! But its this second part that becomes tricky and is usually responsible for panic-stricken calls by a client, usually when you are getting ready to call it a day.

When facts are repeated in the database, the application programmer has an added responsibility to make sure they are all consistent. Failure to meet this requirement will lead to an inconsistent database and erratic application behavior. Once the programmer understands this very simple goal, each rule for normalizing suddenly makes more sense, and the application programmer has his ‘Eureka’ moment!

While dealing with a non-normalised database, a programmer is likely to run into some anomalies and inconsistencies (a euphemism for saying that the database holds inaccurate data). Lets then spell out what these anomalies and inconsistencies are and how they affect the application.

Case: A newbie programmer who has had a more than healthy dose of logs stating how relational databases are the worst possible kind of curse and hence have not tried to even understand the concept.
Consider a simple (actually very very simple) e-commerce application. A programmer creates a simple table called employees. This table lists the employees of the company along with their employee no. and the customers the employee handles. Now one fine day the employee id of a particular employee is changed. This will give rise to 3 irregularities.

Update irregularity:
If a user goes to this employee-customer table and changes an employee’s id on only one row, and no provision is made to change the others, then the database now has inconsistent values for the employee’s email. This irregularity occurs when a fact is stored in multiple locations and a user is able to change one without changing them all.

Insert irregularity:
This occurs when it is not actually possible to record a fact. A new employee not assigned to any customers will not be recorded by the system at all!

Delete irregularity:
This occurs when deletion of one fact eliminates some other fact. If an employee is absent causing a removal of his assignments, then we will have lost his employee-id in the process!

Thus a non-normalized database will require additional care by the programmed to ensure that these issues are fixed. This situation gets worse and worse as the program get more and more complicated. This leads to more and more complaints and you will spend all your time making amends for it. And you can’t make progress with a program when you are busy all day fixings bugs and pacifying irate customers.


Leave a Reply