DP-300 Microsoft Azure Database – Optimize Query Performance Part 2

  • By
  • July 13, 2023
0 Comment

3. 67. identify normal form of database tables – Normal Forms 1 to 3

In the previous video we had a not optimal table structure and we’re going to use the first normal form to get it into better shape. But what is the first normal form? Well, the first normal form was proposed by Edgar F. Cod. He’s the person who first proposed the relational model. Now, his requirements for the first normal form are twofold. Firstly, the values in each column must be atomic indivisible. So what does that mean? Well, maybe we’ve got somebody’s name. So John Smith.

Well, that’s not indivisible. You are able to divide that between first name and last name and maybe even middle name. And each value must contain only a single value. So in our example, we’ve got multiple values. Not good. So in our example here, we have got multiple values for class. Yes, we’ve divided it up into separate columns, but really that is exactly the same as this. While we have them all into one, we have multiple values of classes. So let’s now put this into first normal form.

So in first normal form we just take each of these and put them onto a separate row. So if I insert new columns, certain rows, and I say that we now have a class for physics, class for computing and a class for art. So we still have the same information. We still have John with Mr. Y and age 52 for the teacher. But now our class is now physics, computing and art. And it’d be the same process if you’ve got the information like this. So again, we just add in extra rows and we divide them at the comma. So now we have art and now we have computing separate.

Either way, we arrive at exactly the same table. So this is now in the first normal form, but there’s still a fair bit more we can do. The requirements for the second normal form is that first of all, we’ve already done everything in the first normal form. So we have everything that is atomic indivisible. Each value contains only a single value. The next is to reduce repeated information. So if there are any repetitive data, then we might need to create separate tables. And we led the tables with a foreign key. So here you can see that we have repeated information.

We have John, whose teacher is Mr. Y, who’s age 52. The only reason we got that repeated is because he’s taking three classes with Mr. Y. And in fact, in this school, all you’ve got is one teacher for each individual pupil. In this school, John can’t take a class with Mrs. G for instance. So in the second normal form we can separate these classes out to another table. So in the second normal form we would have these classes in a separate table to the pupils names. So now it just looks like we’ve just duplicated everything. So this is our pupil teacher table and this is our pupil class. Table. But what we can now do is get rid of the repeated information. So we only need to know that John is with Mr. Y once. So now we know that John is with Mr. Y who’s age 52 for physics, john is with Mr.

Y who’s aged 52 for computing and the same for art. So this is now a second normal form. And these are linked together using a primary key here of pupil name to a foreign key of pupil name in the table pupil class. So a primary key cannot repeat a foreign key. Certainly can and usually does repeat it’s a onetomany relationship. So this is getting better. But what’s the next form? So if this is the second normal form, what’s the third?

Well, the third form, firstly, everything has to be in the second normal form. But secondly, anything which is not dependent from the primary key is to be removed from that particular table. So what do we mean by that? Well, going back to our second normal form, we see that John has Mr. Y who is 52, and Albert has Mr. Y who is 52. The fact that the teachers are aged 52 is not dependent on the primary key pupil name. Instead, teacher age is dependent on teacher. So what we need to do to get this to work properly is another table. So this table is going to be a table for teachers and we’re just going to have the teachers and the teacher ages. So now we can get rid of teacher ages in their entirety from the table pupil teacher. So now in our table teacher, we just have two teachers who happen to be 52 and 36. And we know that John is going to Mr. Y. And Mr. Y is 52.

So we can connect the dots. And now we have a primary key of teacher going to a foreign key of a teacher here. So in our teacher table, we only have one teacher for each teacher, one row, whereas each teacher can teach more than one pupil. So now let’s go through the problems that we saw in the previous video. So if we delete Margaret yes, we delete that Margaret is being told by Mrs. G, but Mrs. G remains aged 36 in the teacher table. So that’s no longer a problem. If it’s Mr. Wise birthday, then we just have one raw to increase. So Mr. Wise now 53 and we don’t have any inconsistencies of him being 52 and 53. And if I’m looking for everybody who does chemistry or computing, then I just need to say where class equals computing. No need to do where class like computing or where class one equals computing or class two equals computing and so on.

So that makes the third normal form very useful. And I generally, when I’m using SQL Server, go to the third normal form, except when I’m using something like a data warehouse with SQL Server analysis services. And when that happens, I generally go to the first normal form. And the reason for this is because it’s a lot quicker for the computer to read this than it is to do all of the joins that are here. And speed is more important for the data warehouses. And in a data warehouse, I’m not actually updating the data, it’s really read only. So I’m not going to be deleting data or updating data or anything like that. However, that’s only the case if you’re doing a data warehouse. For every other instance, I generally go to third normal form. So these are the most important ones.

Unnormalized means anything goes, including really bad design. If you’ve got multiple items in the one column, like for instance somebody’s name being first name, middle name, last name, all at one, then first normal form would say you have to separate them out into unique columns, better columns, better rows, and then take anything that is duplicate in the rows out. So now we have the pupils classes and the pupils names with teachers. So now we only have Mr. Y in once and then anything which is not related to the primary key but is related to another column should again be taken out into another table. So these are the most important normal forms and I suggest you have a look at your own databases, your own table constructions and see is there a way to go through these free normal forms and improve them? Improve them for consistency, improve them for updates and deletions, and improve them for your queries.

4. 67. identify normal form of database tables – Normal Forms 4 to 5

Microsoft’s website says the fourth normal four and fifth do exist, but are rarely considered in practical design. Disregarding these rules may result in less than perfect design but shouldn’t affect functionality. Quite frankly, I don’t use the fourth and the fifth, and I don’t think I need to. To be honest, I find these start getting more and more complicated designs of than less and less.

So I always go to the third normal form. So let’s have a look at what the fourth normal form is. So the requirements are that it should already be in third normal form. And that for me is good, but it should have no multivalued dependencies. So what does that mean? So here we have an example of a table. So we have pupils one, three and five studying math with Mr A, and two and four studying science with Mrs B. So it could be that we should separate the instructor out, because each instructor is not based on the pupil, but on the course, and that would follow the third normal form.

However, there is a problem when pupil number two studies math with Miss C. So it’s no longer Mr A who teaches a math, it’s also Miss C. So now we can no longer separate this because the instructor is no longer dependent on the course. Because now with math, we no longer have an idea whether it’s Mr A or Miss C. So what? Fourth, no more form would be would be to separate out the relationship between pupil and course and pupil and instructor. So that would result in two separate tables, one pupil and course, one pupil and instructor.

Now, for me, at the moment, we have a composite primary key. So we have pupil and course, because pupil is no longer a primary key by itself, because we now have number two, science and number two, math. So I would have another ID. So this is just a table ID and running 123456. And I would repeat that inside both of these tables. So dial would be our primary key, instead of the pupil and cos composite key. That, for me, I’m okay with I can deal with. I’m not a great fan of this because I think pupil course, instructor with themselves in total, they are the primary key.

It’s not just a pupil course for me now, it’s a pupil course instructor, because it’s possible that pupil two was having masks with Mr A or with Miss C. So all of those is the primary key. In reality, I would have this extra column, but I wouldn’t separate it out into these two. I’ll be much happier if they’re all in one particular table. Of course, I wouldn’t be using Mr A and Mrs B as my instructor. I’d probably be using inst one, inst two, and then I’ll be relating this instructor ID to an instructor name elsewhere, so we can have title surname, so that’s Mrs Ms ABC.

So this is how I would actually be doing my table, rather than going down this fourth normal form route. And I’ll probably also give the course an idea as well. So maybe these are the same math course, or maybe these are different math course. I have no idea. Why would they be taught by the different people if they’re the same? It’s possible, but it’s possible that it’s completely different. So this is how I would do it. Rather than using this as my result, the fifth normal form goes one stage further. And there are some people who say, well, if it’s auditing fourth, it’s very rare for there to be any violation of the fifth normal form. So it says have no join dependencies.

So here we have an example. Mrs A sells fridges and freezers from companies one and two. Mr B only sells fridges from company two. The fifth normal form would mean that I would be separating out the person in company, person and item, and company and item. And the advantage of this is, at the moment, I have got duplication, I’ve got company one in twice, company two in three times, I’ve got fridge in three times and freezer in twice. So, five rows in total, whereas each of these tables is no more than four rows. So it does reduce the amount of information. However, for me, yes, you can deduce that Mr B would be selling fridges from company two, and company two does sell fridges. However, this is going way too far for me. I would much prefer it to be in a form closer than this. Again, I would use company ID and item IDs rather than actually listing the items.

But this is what the fifth form is all about. It’s reducing any joint complexities. So we have a company and a person and an item and a person and an item and a company. So there are your three joins. Now, this could be useful for asking the question, hang on, mr B sells products from company two, mr B sells fridges. Company two sells fridges and freezers. Why is Mr B not selling freezers from company two? And that is something that can also be deduced in this fifth normal form.

So, having gone through the fourth normal form and the fifth normal form, my advice is forget it, ignore it. Except to the requirements that you should know what fourth and fifth normal form are. If you are creating your own databases, I would just go down to the third normal form. If you see opportunities for improvement along the lines of the fourth normal form and fifth normal form, and you think it’s a good idea, then by all means keep going. But Microsoft recommends that you stick with the third normal form because it’s just adding complexity without necessarily creating those massive improvements. And in fact, for me, it actually adds complexity.

Comments
* The most recent comment are at the top

Interesting posts

The Growing Demand for IT Certifications in the Fintech Industry

The fintech industry is experiencing an unprecedented boom, driven by the relentless pace of technological innovation and the increasing integration of financial services with digital platforms. As the lines between finance and technology blur, the need for highly skilled professionals who can navigate both worlds is greater than ever. One of the most effective ways… Read More »

CompTIA Security+ vs. CEH: Entry-Level Cybersecurity Certifications Compared

In today’s digital world, cybersecurity is no longer just a technical concern; it’s a critical business priority. With cyber threats evolving rapidly, organizations of all sizes are seeking skilled professionals to protect their digital assets. For those looking to break into the cybersecurity field, earning a certification is a great way to validate your skills… Read More »

The Evolving Role of ITIL: What’s New in ITIL 4 Managing Professional Transition Exam?

If you’ve been in the IT service management (ITSM) world for a while, you’ve probably heard of ITIL – the framework that’s been guiding IT professionals in delivering high-quality services for decades. The Information Technology Infrastructure Library (ITIL) has evolved significantly over the years, and its latest iteration, ITIL 4, marks a substantial shift in… Read More »

SASE and Zero Trust: How New Security Architectures are Shaping Cisco’s CyberOps Certification

As cybersecurity threats become increasingly sophisticated and pervasive, traditional security models are proving inadequate for today’s complex digital environments. To address these challenges, modern security frameworks such as SASE (Secure Access Service Edge) and Zero Trust are revolutionizing how organizations protect their networks and data. Recognizing the shift towards these advanced security architectures, Cisco has… Read More »

CompTIA’s CASP+ (CAS-004) Gets Tougher: What’s New in Advanced Security Practitioner Certification?

The cybersecurity landscape is constantly evolving, and with it, the certifications that validate the expertise of security professionals must adapt to address new challenges and technologies. CompTIA’s CASP+ (CompTIA Advanced Security Practitioner) certification has long been a hallmark of advanced knowledge in cybersecurity, distinguishing those who are capable of designing, implementing, and managing enterprise-level security… Read More »

Azure DevOps Engineer Expert Certification: What’s Changed in the New AZ-400 Exam Blueprint?

The cloud landscape is evolving at a breakneck pace, and with it, the certifications that validate an IT professional’s skills. One such certification is the Microsoft Certified: DevOps Engineer Expert, which is validated through the AZ-400 exam. This exam has undergone significant changes to reflect the latest trends, tools, and methodologies in the DevOps world.… Read More »

img