DP-300 Microsoft Azure Database – Optimize Query Performance Part 3
5. 58. determine the appropriate type of execution plan
Now in this video we’re going to go a bit behind the scenes and have a look at execution plans. So what I’ve got here is a crossjoin. So I’m joining this table to itself. So if this table is giving me 450 rows then joining it to itself with no idea about how they actually join. I just want every single row compared against every single row will result in huge number of rows, 450 squared. Now I’m not so much interested in the results, you can see it’s taking a bit of a while but I want to show you three different types of execution plan. So the first one is an estimated plan. So the computer has a look at the select query and goes okay, I think it should take XYZ and here we go. If I click on this particular icon here so it says display estimated execution plan.
We don’t actually see the result, we see what is happening. So we have a clustered index scan, in fact a couple of them with a nested loop joining them together and then a select statement at the end. That’s the actual close at the end. So it’s doing the from first and then the select close. Now execution plans run from the right to the left and the arrow thickness represents the number of rows and you can hover over these arrows for more details.
So you can see if it’s thicker then it’s more rows. More estimated rows means more memory reserved. If it’s a particularly long running query and the computer thinks that it can divide share table up into two and then join together at the end, then it can introduce what’s called parallelism and we’ll be looking at that later on in other videos. Now I should point out that if you right and click you can also see the execution plan as an XML.
You can analyze the actual execution plan and you can zoom in and out and you can also see properties which come in on the right hand side. So that’s the estimated execution plan, it happens immediately without running the query. So the next thing I want to show you is the actual execution plan. So again if I run a simpler version of this then we get a third tab and you can see we have got the scam and it goes through to the select clause. So this is the actual as opposed to the estimated. So it includes additional runtime statistics and of course is what actually happened rather than computer thinking.
Well, I think there’s this number of rows, the amount of rows that are actually in each table in each view may have changed since the last query, it may not be 100% accurate. When you have the actual execution plan it is what is actually there. Now there is an alternate way to include these actual execution plan and that is to say set shore plan all on. So if I do that and run, then you can see that the result is no longer the actual table, but information about each of these stages. So you’ve got it in table form. So I’ll say off now and an alternative is set Show Plan, text on. So now you can see it with separate queries showing execution information.
So neither of these versions set show plan, text or show plan. All will actually execute the statement, but it will show execution information. So let’s go back to our previous version, which has a separate tab for the actual execution plan. Now, that’s fine when it’s a short query, but what if it’s a long query like this? We don’t get the actual execution plan until it ends.
So instead what we can do is have live query statistics. I say instead we can also have it in addition to the actual execution plan. So if I were to click on this icon as well, so both the actual execution plan and live query statistics were being shown. Then here you can see the execution information as it’s actually happening. So you can see the number of seconds things are taking.
You can see the data flowing. Needless to say, it does take more time for the computer to process this, so I wouldn’t be doing this on every single query. You can also see how far the query has progressed. When it’s finished, then those dust moving lines become solid lines. And you can see it’s similar to the sort of information that we have with the actual execution plan. So these are the three different types of execution plans. Estimated execution Plan, actual execution plan and live query statistics. In the next video, we’ll have a look at how these tables can be joined together.
6. Different types of loops used in execution plans, and Scan and Seek
In this video we’re going to have a look at various ways of joining tables or views together and also the difference between a scan and a seek. Let’s have a look at this query. Select star from the address table, 450 rows. So the execution plan you can see that I’ve got include actual execution plan is a scan. So what does that mean? Well, it’s like if you take book and you start at page one and go all the way through to the end. So I’m now going to introduce a where clause, where the city’s buffalo. And let’s look. And there are only four items here, four rows. So ideally the computer knows where all of the four rows where the city is Buffal and it goes oh, this one, that one and these two. However, look, it’s still a scan, it’s not the case. And the reason is because we don’t have an index for City. So we’ve got index for other things. We’ve got index for state province.
For instance, we’ve got an index for address line one and City is in there somewhere. But it’s so far down the list the computer can’t use it. Now, suppose we added a new index. So this is a non clustered index. A clustered index would mean that the table would get reshaped for that particular index. All the rows would be sorted in that order. nonclassed index is just maintained separately. So I’m going to create a new nonclustered index. And if I refresh, you can see we now have this new index. So if I now do this select, we can see that the execution plan is changed from a seek to a scan. The computer knows where all of these four items are and it’s going just those.
Next I’m going to add the query we were looking at in the last video. And if I execute that and wait for some 24 seconds, you’ll see that it is using a nested loop. Nested loop joins, they’re very efficient. They’re used when you’ve got two inputs, input one and input two. Input one is at the top. So input one is generally small, input two is potentially large. But most importantly, it is indexed on whatever join is being used. Now, we don’t have such an index here, this is a cross join. But the computer has worked out that the nested loop is sufficient for a cross join as well. Advantages of nested loops, it uses the least I or that’s input output and the fewest comparisons. So what happens is it takes the input one, the top input, and takes just one row of that and then matches any rows in the bottom input for it.
So in this case, all of the table. And then it takes the next row and matches, next row and matches and so on. So that’s a nested loop join. Next is a merge join. So this is used when input one and input two are not small, but also input one and input two are sorted on their join. So in this case, where the sales order ID are the same. So the advantage of merge joins, well, they can be very fast and so the computer is going to look for nested loop joins when it can and merge joins when it can’t. Now, what happens if that particular index, the sorting, is not there? So what I’m going to do is I’m going to take this sales order detail table and I’m going to create a new one. So that’s what this into thing is. It creates a new table called sales order detail copy.
So very simple statement, so it’s identical to the previous one. I’ll just refresh that with just one exception. Here are the indexes of sales order detail. There are no indexes for sales order detail copy, so I will do that again, but this time I’m going to create a new header, sales Raw header again without any indexes, indexes being the plural of index as opposed to indices in SQL Server. So now if I run exactly the same query, except this time it’s on the copy, we don’t have those input one and input two sorted on their joins, so now it’s got to use a hash match. So this is the least favorite join. It’s used for large, unsorted non indexed inputs. It can also be used in the middle of complex queries as intermediate results, often not indexed or suitably sorted. So we’ve got three main types of joins nested loop joins, merge joins and hash joins in that order. We’ve also got seek and scan. Scanner is like going through a book from page one onwards. And Seek is like going to an index at the back of the book and going, oh, I need to go to pages 515 and 25.
Now, I should point out, in SQL Server 2017 and onwards there was a new batch mode adaptive join introduced. So you could think of this as a fourth join, but it isn’t really. It converts into a hash join or a nested loops join once the first input has been scanned, when it uses batch mod. So it’s like a question mark mod which then gets converted. But we’ll be looking at the batch mod adaptive join when we have a look at intelligent query processing or IQP. So three types of joins, nested loops, merge joins and hash joins and then scans and seeks. So with that information, that will help us when we have to identify problem areas in execution plans, which is the next video?
7. 59. identify problem areas in execution plans
In this video, we’re going to identify problem areas in execution plans. So we’re going to start off with this. And we saw that there were problem initially if you didn’t have an index. But there’s another problem. Now I’m using select Star. Is that really necessary? Do you really need all of the columns? Can you narrow it down? If so, then you probably have more chance of using indexes. So here we’re using an index scan, but it’d be much better if you could use an index seek. For instance, so if you got a scan when you’re using a wear, then you may require an index. If you haven’t got any indexes, if you’re using a heap, then maybe you need a clustered index. Maybe it’s the problem with the wearer. Could it be Sargable? So SARG, basically it means can you use an index? Like, for instance, let’s have a look at this select star. We have here a modified date. Suppose my where was where the year of modified date. It’s going through a function now equals at 2006.
I can’t with a function more easily use an index in terms of a sequence of having to scan. Now, to be honest, select star will probably be using a clustered index. Canister a clustered index because you’re having to retrieve all of the fields. What if we changed it to select city? Well, you can see that we are still using an index, and in this case, it’s still a clustered index scan. Okay, what happens if we create a non clustered index just using modified date and city? So in other words, it’s exactly the index that we want. So, name of index. So I’ll call it IX address, modified date on sales lt address and in the brackets say modified date and City.
So this gives us exactly the index that would be useful if we were looking at modified date and city. So hopefully it will now do a seek. It will look out those particular roles. Unfortunately, when we have a look at it, no, it is still an index scan because the year function is not. Soggable now what I can do instead is modify this. So it says where modified date between 2006 or 101 and 2000 and 612 31 and all the rest. So getting 1 second before 2007. If I look at that and go to the execution plan, then you can see it is now an index seek using this non clustered index. And the seek, so that it just goes to a particular set of rows is much quicker than a scan, where it’s got to go through the entirety of the input or table.
Now, let’s have another example. Maybe I was looking at the beginning of line one. So let’s have a look and select this where address line one. The first character is equal to eight. So if I did that again, I’ll be looking at a scan. If instead of that I said where address line one like 8% so any number of characters after eight then I can use an index seek now it’s similar with the is null function so I could say is null two words instead of using the function is null one word now, have we got a key lockup? If so, could we use with the index the word include. So here I’ve created an index, but I could have afterwards include and a certain set of columns that I wanted to not be included in the index, but written in a separate part of the key away from the index.
So it means I don’t have to go into the table. It’s much quicker, but it doesn’t slow down the index so much. Other field types too wide so we previously had a look and see that we got an NVAR char 60 so do we really need an Nvarchar 60? Let’s see what the length of address? One address line one from sales Lt dot address is and I’m going to order by this field and you can see we’ve got 39 as the maximum so do we really need 60? Maybe we can narrow it down to 50 if field types are too wide then it will increase the raw size so it’s increasing time to retrieve the data have you got a sort? So let’s say I wanted to order by this modified date well, sorts can be quite expensive do you really need it? If so, do you have an index which is already sorted on this column?
Have you got a store procedure where you’re using parameters? So let’s say this was a store procedure and it had parameters or correct procedure. Name a procedure and it had certain parameters. So year, for instance. So if you got this and its performance is not that good then maybe you can use the words with recompile. So if I did this, then it would force the procedure to look at the parameter and not used a cached version of the execution plan each time it would recompile. Now can do this for individual queries by saying at the end option recompile but you should only do that if you have got such different statistics.
For instance, maybe you have got a million rows where the modified date is 2006 but you’ve only got 100 for 2007 we’ve had a look at loops are you using a hash join? When with some changes, maybe an index you could be using a merge join or nested loop and are you using a cursor? So, in other words, there are certain operations where you go through and take one row at a time and the next and the next. If so, you might want to look at a set based operation instead.
I rarely use cursors. Sometimes when the computer gets overwhelmed, that’s when I use it. But quite often, if I’ve got the choice, I will use use a set based operation instead. So these are some problem areas in execution plans. Select Star having scans when a seek would be better with an index sorts parameters you could use the hint recompile are using hash, joins when you can improve that and are using cursors. So those are some problem areas in execution plans.
Interesting posts
The Growing Demand for IT Certifications in the Fintech Industry
The fintech industry is experiencing an unprecedented boom, driven by the relentless pace of technological innovation and the increasing integration of financial services with digital platforms. As the lines between finance and technology blur, the need for highly skilled professionals who can navigate both worlds is greater than ever. One of the most effective ways… Read More »
CompTIA Security+ vs. CEH: Entry-Level Cybersecurity Certifications Compared
In today’s digital world, cybersecurity is no longer just a technical concern; it’s a critical business priority. With cyber threats evolving rapidly, organizations of all sizes are seeking skilled professionals to protect their digital assets. For those looking to break into the cybersecurity field, earning a certification is a great way to validate your skills… Read More »
The Evolving Role of ITIL: What’s New in ITIL 4 Managing Professional Transition Exam?
If you’ve been in the IT service management (ITSM) world for a while, you’ve probably heard of ITIL – the framework that’s been guiding IT professionals in delivering high-quality services for decades. The Information Technology Infrastructure Library (ITIL) has evolved significantly over the years, and its latest iteration, ITIL 4, marks a substantial shift in… Read More »
SASE and Zero Trust: How New Security Architectures are Shaping Cisco’s CyberOps Certification
As cybersecurity threats become increasingly sophisticated and pervasive, traditional security models are proving inadequate for today’s complex digital environments. To address these challenges, modern security frameworks such as SASE (Secure Access Service Edge) and Zero Trust are revolutionizing how organizations protect their networks and data. Recognizing the shift towards these advanced security architectures, Cisco has… Read More »
CompTIA’s CASP+ (CAS-004) Gets Tougher: What’s New in Advanced Security Practitioner Certification?
The cybersecurity landscape is constantly evolving, and with it, the certifications that validate the expertise of security professionals must adapt to address new challenges and technologies. CompTIA’s CASP+ (CompTIA Advanced Security Practitioner) certification has long been a hallmark of advanced knowledge in cybersecurity, distinguishing those who are capable of designing, implementing, and managing enterprise-level security… Read More »
Azure DevOps Engineer Expert Certification: What’s Changed in the New AZ-400 Exam Blueprint?
The cloud landscape is evolving at a breakneck pace, and with it, the certifications that validate an IT professional’s skills. One such certification is the Microsoft Certified: DevOps Engineer Expert, which is validated through the AZ-400 exam. This exam has undergone significant changes to reflect the latest trends, tools, and methodologies in the DevOps world.… Read More »