When the developers and architects design enterprise databases, they usually follow three normal forms or 3NF, which is considered a silver bullet to chase. Developers used to think that normalization is the only possible practice to design database schema. With this mindset, they also sometimes face roadblocks as their projects move ahead.
If you are planning for database design and new to the normalization concept, what you need to first understand is that the three normal forms are getting into action and how to manage it step by step. It is known that normalization rules are crucial guidelines, but on the basis of how we are taking them as a benchmark, there can be many troubles. Here we will discuss some other standard rules which you should remember on top of the 3NF in database design.
1. What is your application’s nature, if it is OLTP or OLAP?
While you start with the database design, one of the most important primary things to analyze is your application’s nature and what you are designing it for. The two basic structures are transactional and analytical. You can find that many different developers applying the normalization rules without actually thinking about their applications’ nature and run into troubles later in terms of performance. As we discussed, transactional and analytical are two types of applications.
- Transactional – In these types of enterprise applications, the end-users are more interested in creating, updating, reading, and deleting the database records. The official name of the transactional database is OLTP.
- Analytical– In these types of applications, the end-user is more interested in doing analysis based on data. Data reporting and forecasting are the major objectives of analytical databases. These types of databases may only have a lesser number of inserts and updates. The major objective of analytical databases is to fetch data and analyze it as quickly as possible. These types of databases are officially known as OLAP.
2. Split your data into logical pieces
While considering the 3NF, it comes as the first rule of the same. One of the major signs of violation of this first rule is that the queries may have too many passing functions as substrings, char_index, etc. In such cases, this rule has to be applied. For example, there may be various names to carry in a table, and you cannot imagine which type of query you may end up with. In this case, the best approach is to break the fields into further logical pieces to optimize queries with cleaner looks. For splitting databases into logical units of data, you can get the assistance of providers like RemoteDBA.com.
3. do not overdose the rule above
We have to be aware of the fact that developers are adamant creatures. If you tell them to do something a specific way, they will keep on doing it and soon go on the verge of overdoing it. Sometimes, they overdo it to the extent that it leads to many unwanted complications. This also applies to the rule of fragmentation, which we discussed above.
When you are thinking of decomposing the tables, you need to be very careful about which level it needs. The decompositions needed to be done logically. For example, if you have to put a phone number field, you may rarely operate on the ISD codes of the phone numbers separately. So, it may be a wise decision to just leave it as such in order to avoid it, leading to more complications.
4. Treat the duplicates data as a threat
While doing the schema design, you need to focus on duplicate data and try to refactor it. You need to worry about the duplicate data as it takes extra disk space and creates a lot of confusion. For example, if you write ‘Fifth’ and ‘5th’, both of these serve the same purpose as the data in the system is considered, but on using both, it could a bad data entry practice in terms of validation. If you want to derive a report later, this will show them as two different entries. It may be confusing to the end-users who are using it for some analytics. The solution for this problem may be to move data into various master tables altogether and refer them using corresponding keys. You can also see that the master tables for standards are linked to the same using a simple foreign key.
5. Watch for data to be separated using Unix separators
The rule of the first normal form says the importance of avoiding any repeating groups. One example of the repeating group may like – if you enter the syllabus of many students getting into one field, there may be too much data stuffed into it. These fields are known as repeating groups. If you have to use this data for any purpose, the query may be made more complex, and the performance also may get adversely affected. These repeated repeating groups and columns may have data stuffed with some separators, which needs special attention and a better approach to move the fields to various tables and link them with other keys of the related table for better management.
6. Watch for all partial dependencies
You also need to watch for the fields, which may depend partially on the primary keys. For example, in a table with a primary key created based on the students’ roll numbers and standard, the example mentioned above of the syllabus field may be associated with a particular standard and not with the student’s roll number. In this case, the syllabus is associated with a standard in which a student is studying learning. So, if tomorrow, we have to update the syllabus, one may have to update it for each student, which may be illogical and painstaking. On the other hand, it will make more sense to move such fields out and associate them with a standard table footer.
Along with these rules, you also need to consider choosing the derived columns preciously, not to be very hard on avoiding redundancy, use a multi-dimensional data model by treating it as a different beast. Centralize all the name values, table design, and using self-references for unlimited hierarchical data.
What we discussed here is not meant to avoid following the three normal forms, but on the other hand, you need not have to follow these blindly. Along with this, you also have to look for your project’s actual nature and the type of data you are dealing with in order to administer the design best practices.