DP-300 Microsoft Azure Database – Evaluate performance improvements Part 3
4. 64. recommend query construct modifications based on resource usage
In a previous video, we ran this query. So we created the sales order detail copy and sales order header copy tables. But the important thing about these is that they’ve got no indexes whatsoever. So I created this query and run it and you can see it uses table scans because there’s an old wire clause. So you’ll be using a scan and it uses a hash match to create the results. Okay, so far so good. Now the computer is not at the moment complaining about the lack of indexes.
And there’s a reason for this. If you have a look at how many rows we’ve got, we’ve only got 32 rows, so really it doesn’t make that much difference. So when would it use the indexes? Well, it uses it when getting the join here between Sales order ID. Now, what if this wasn’t 32 rows? What if this was a lot more rows? So let’s double the number of rows. So I’m inserting into this table the table so that doubles the number of rows. So now there are 64 rows.
So if I execute this, we’re still fine, but let’s execute this a few more times. So 128, 256, 512. So now up to 1024 rows. So let’s execute this. Look at the execution plan, we’re still fine. Now let’s do it a few more times. So 20, 48, 40, 96, 81, 92, taking a bit longer and 16,384. So now let’s execute our original query. And if we now have a look at the execution plan, you can see that there is a missing index and a suggestion of what we create. Create nonclusive index had a name on and it gives what we should be having.
So the sales order ID and the customer ID as an include. So the include columns would be in a separate part. So in other words, you would have the main index. And then when you get to individual rows, there would then be a link to this separate part. So it’s a lot more efficient to be able to include include columns rather than just having just one column and then having to get the information elsewhere. So if I click on the dot, you can see this is the query text and if I write and click we can see missing index details and there is my code that I would need.
So let’s run that code and have a look at this query again. And now we can see it uses an index scan no clustered of this sales order header copy. Right, LAN, let’s just drop that index. So I’ll get rid of the words create and no clustered temporarily. So just drop this index. I didn’t actually give it a proper name, I’m just using the default name. So now it’s dropped on undo that. So it is still dropped. I just did the typing. So run this again and we’ll have that problem again.
So this is one way of detecting missing indexes but is there a better way to do it for an entire database? And the answer is we can use a DMV and this DMV is sys DM underscore DB underscore missing index details plural. So if we have a look at this because we are missing get this current index, it comes up so you can see it is in database ID five object ID 133-010-3779 and you can see that we’ve got an equality column of sales order ID no inequality columns and then some included columns. What are equality and inequality? Well, if we have a look at this index, it doesn’t actually give us any clues whatsoever.
So go back to the query and you can see that the query has got this equals. So that is the equality. So if somewhere like in the where clause we would have not equals to or greater than or anything apart from equality, then that would be in the inequality. When you are creating an index, you put equality first and then the inequality columns, both of these should be in the key and then you’ve got the included columns in the include section of the index. Now you can go through like this and work out what you need to do and why you need to do it and so forth.
So you can see it is this particular table, it says equality, inequality and not include columns. There are available on the internet and in the PDF that’s attached as a resource much earlier on in this course. Examples of the full thing that you can do with these sorts of things. So this is a much wider query and it uses three different DMVs but it gives basically the same results. It’s just that you now have the create index statement there so you just need to copy and paste it.
So there’s my entire index statement including a good name for the index. Going further on, we’ve got things like average user impact so that’s percentage average benefit that user queries could experience if this missing index group was implemented. You’ve got average total user cost so that’s the average cost that could be reduced by the index. So you have to go okay, this might be a big saving 98%, but if we’re saving not that much is it really worth it. But this is when this a query like this could help because it actually orders by those indexes which are most beneficial. So a quick look at what all of these different DMVs are. So they all start Dmdb missing index. So details give you the details groups that returns information about indexes that are missing from a specific index group.
And then we have group stats. So this is summary information about groups of missing index and you’ve also got Dmdb missing index group stats query which returns information about queries that are missing in index. So this is probably the biggest query construct modification I can think of actually create indexes as and when they are needed. Of course, don’t create too many because as soon as you insert additional information, update or delete or merge, then the indexes need to be updated. So that could grind your system to a halt in terms of actual changes that could be needed to the query, where we’ve already gone through everything that we need to be Sargable.
So we need to ensure, for instance, we’re not using functions when we can avoid it. So don’t use the year function when you can use between two sets of dates, don’t use left when you can use like and don’t use the is null function. So if a certain field is null, then give me this unless you’ve got a specific need for it. Instead you could say if my field is null or my field is equal to whatever. Yes, it adds a bit more to the words, but it makes it actually savable, which means that you can use any particular index that are available.
So you don’t necessarily need to know all of the specifics of how to change a query to make a Sargable query, you just need to know, for instance, you shouldn’t use functions when you can avoid it if there is a better alternative that can use an index. So the greatest thing you can do to speed up your queries is to have appropriate indexes and you can do that by using sysdmdb, missing index details and similar DMVs.
5. 65. assess the use of hints for query performance
In this video we’re going to have a look at the use of hints for query performance. So what I’m going to do is we currently have this table, which has expanded to 16,384 rows. So I’m going to drop this table and then recreate it from the original. So we’re back to 32 rows, so our queries will be a lot quicker. So, as a reminder, we currently have an execution plan with a hash match. But suppose we didn’t want a hash match, suppose we wanted another type of match, say a merge join. Well, we can do that through the use of query hints and they’re fairly easy to use. We write option and then in brackets the hint. So in this case, merge join.
Now, notice what we currently have, two scans and the hash match. So let’s run this merge join and have a look at the execution plan. So you can now see we’ve got this scans and these sorts. Now, remember what I said about sorts, they take a lot of time, they’re very costly, so you do need to be aware of that. But because of these sorts, we now have this merge join. Now, was this a good thing to have done? What I’m going to do is I’m going to duplicate this query. So now we’ve got one using a merge join and one using the previous hash match. And you can see that the one with the hash match is about twice as quick as the one with the merge join.
So the first one only uses 35% of the total batch of these two statements, whereas the second users 65%. Now, if you have a look at the Microsoft documentation regarding hints, because SQL Server query optimizer traditionally or typically sets the best execution plan for query, this should be a last resort for experienced developers and DBAs. So we have got an awful lot of query hints. Now we’ll be looking at some of them later on in this course when we have a look at intelligence query processing. But I just want to have a look at some of the main query hints rather than go to all of them because you won’t need all of them, you just need to know the main ones.
So we’ve already seen that you can have option merge join. Well, we can also have other joins, we can have a loop join. So let’s put that in and see how good that is. And you can see even worse the loop join, but it is possible to do so. We can see this table spool. A lazy spool is an additional thing that has to happen. If we wanted to force the hash join. We could of course just write hash join. We’ve also got similar things for groups and unions. So we’ve got hash group, order, group merge, union, hash union and concatunion.
Different ways of doing groups and union. But again, don’t do it unless you absolutely have to now what else do we have now instantly if you are putting in more than one option, you can do that. So you can have a comma in between options. Now keep fixed plan. Now this query won’t be recompiled when its statistics change, it will only be recompiled if the schema of the underlying tables change. What’s the schema we’re talking about the columns or if you actively run it procedure called recompile that would also change the plan. So once it’s done, it’s fixed. Sounds too much. Well what about key plan that recompiles less often when statistics change?
So statistics being for instance, you’ve got 10,000 rows for where a particular sales order is equal to one, but you’ve only got 100 rows where it’s equal to 100 and you’ve only got two rows where it’s equal to 1000. So that’s statistics. But suppose I delete all of where sales order ID equals one, where there’s no longer this 1000 or 1 million rows that we’ve got there. So those are statistics. We can also optimize for unknown. So this uses the average selection of the where rather than a specific thing. So I might run this query where the parameter is sales order ID equals one and so we have got these million rows. But I would say I just want to do the average number of rows that it would be.
And then another one that I want to talk about is robust plan. So this creates a plan that works for the maximum potential row signs. So in this case using the million rows, if you are running a query that doesn’t contain this million rows then performance may be impaired. So we’ve got keep fixed plan which doesn’t recompile at all unless something major happens. Keep plan which recompiles left often optimized for unknown which uses the average selectivity the average number of roles and robust plan which uses the maximum number of roles. And then if I was creating a store procedure so create a stored procedure, so my procedure and I have an input for the sales order ID, and I use it here.
Then I could say optimize for. So I can say create this plan where a particular parameter is a particular value. Or I could equally say where it is unknown. If you don’t do that, then the store procedure will be optimized in its first running and will keep that. So if I call this with a sales order ID of one, then it will be optimized for a million rows and then when I call it again with sales order ID 1000, well there’s only 1000 rows, 100 rows, it will still be using the same plan as if sales order ID was optimized for number one. That should have an equal sign there.
That’s why I’ve got these squiggles. So optimize for unknown quite useful if you want this door procedure to just optimize for a particular parameter with unknown values. So these are table hints so we can use them, but only when absolutely necessary, I would suggest. So we’ve got merge join, loop join, hash join. Those are probably the more widely used hints but we’ve got hash group and order group. We’ve got various different types of union and then we’ve got keep fixed plan so the plan remains the plan until, say the scheme of the table changes.
We’ve got keep plan so that re optimizes or recompiles left often we’ve got optimize four so we can optimize for known or optimize for a particular value more often used in store procedures, I would suggest. And then we’ve got robust plan creates a plan that works for the maximum potential raw size. But it the major thing is don’t use it unless you absolutely have to because quite often SQL Server query optimizer does a pretty good job.
Interesting posts
The Growing Demand for IT Certifications in the Fintech Industry
The fintech industry is experiencing an unprecedented boom, driven by the relentless pace of technological innovation and the increasing integration of financial services with digital platforms. As the lines between finance and technology blur, the need for highly skilled professionals who can navigate both worlds is greater than ever. One of the most effective ways… Read More »
CompTIA Security+ vs. CEH: Entry-Level Cybersecurity Certifications Compared
In today’s digital world, cybersecurity is no longer just a technical concern; it’s a critical business priority. With cyber threats evolving rapidly, organizations of all sizes are seeking skilled professionals to protect their digital assets. For those looking to break into the cybersecurity field, earning a certification is a great way to validate your skills… Read More »
The Evolving Role of ITIL: What’s New in ITIL 4 Managing Professional Transition Exam?
If you’ve been in the IT service management (ITSM) world for a while, you’ve probably heard of ITIL – the framework that’s been guiding IT professionals in delivering high-quality services for decades. The Information Technology Infrastructure Library (ITIL) has evolved significantly over the years, and its latest iteration, ITIL 4, marks a substantial shift in… Read More »
SASE and Zero Trust: How New Security Architectures are Shaping Cisco’s CyberOps Certification
As cybersecurity threats become increasingly sophisticated and pervasive, traditional security models are proving inadequate for today’s complex digital environments. To address these challenges, modern security frameworks such as SASE (Secure Access Service Edge) and Zero Trust are revolutionizing how organizations protect their networks and data. Recognizing the shift towards these advanced security architectures, Cisco has… Read More »
CompTIA’s CASP+ (CAS-004) Gets Tougher: What’s New in Advanced Security Practitioner Certification?
The cybersecurity landscape is constantly evolving, and with it, the certifications that validate the expertise of security professionals must adapt to address new challenges and technologies. CompTIA’s CASP+ (CompTIA Advanced Security Practitioner) certification has long been a hallmark of advanced knowledge in cybersecurity, distinguishing those who are capable of designing, implementing, and managing enterprise-level security… Read More »
Azure DevOps Engineer Expert Certification: What’s Changed in the New AZ-400 Exam Blueprint?
The cloud landscape is evolving at a breakneck pace, and with it, the certifications that validate an IT professional’s skills. One such certification is the Microsoft Certified: DevOps Engineer Expert, which is validated through the AZ-400 exam. This exam has undergone significant changes to reflect the latest trends, tools, and methodologies in the DevOps world.… Read More »