DP-300 Microsoft Azure Database – Configure Azure SQL Database resource for scale and performance Part 4

  • By
  • July 7, 2023
0 Comment

8. 72. evaluate the use of compression for tables and indexes

In this video, we’re going to have a look at compressing, data tables and indexes. Now, first of all, why should you compress? Well, you get reduced to space, so that’s useful for data which is infrequently used. The downside is, however, it takes extra time and compute power both to compress and retrieve the data. Now, there are three main types of components compression, one of which is of course, none, no compression whatsoever.

The second way is raw compression, so individual rows. So for instance, suppose we had a char 60 as opposed to a varchar. So a Char 60 contains 60 spaces of data, 60 characters. So char and enchar will be compressed. You can get savings of around up to 50% in English, German, Hindi, Turkish, around up to 40% in Vietnamese, but only about 15% in Japanese.

So it really does depend on what your language is. Vaja and Nvaja types were they already have a compression technique, so they wouldn’t actually benefit for compression. Now, for numbers like INTs, everything apart from tiny int storage can be reduced maybe down to one byte.

Tiny int itself already takes one bite, so that can’t actually be reduced. Certain types of dates, like date time, date time two and daytime offset will be compressed. Others won’t actually benefit from any compression. So that’s raw compression. It looks at what it can compress for each of the individual items, but it’s highly dependent on the field types.

So varchars, Nvalchars and tiny INTs won’t be compressed at all. And only date time, date time to and date time offset will be compressed in the date types. So you’ll have most of the savings in char and Nchar. Now, you can also have page compression.

Now, this consists of three operations. First of all, raw compression, which we’ve already talked about, and then prefix compression. So I want to show you what that is. Suppose we have got this table, so you can imagine there is a fair bit which can be compressed. So we start off with garden equipment or house equipment with all of these. So what I can do is put garden equipment up to the top and house equipment up here as well, so I can say prefix one and prefix two. And I’ll also include the space at the end as well.

So in this bit, what I can do is say garden equipment space change to prefix one and house equipment space change to prefix two. Now, of course, it won’t actually be called prefix one, prefix two, it won’t take that many bites. So you can see hopefully how much more compact it can be. So all of this is put into the page header into an area called the compression information structure. It’s actually just after the page header, so references are made to the value in the prefix. And there is one more additional item which I didn’t put in how many characters there will be. So let’s say there’s 17 characters there. So this will probably say, give me the first 17 characters of this.

So that’s useful if we have garden things. So if that’s the case, we can’t say garden equipment, but we can say give me the prefix one, but just the first seven characters. So that’s prefix compression. Dictionary compression is done after prefix compression. So prefix compression is done by looking at one particular column. Dictionary compression is looked after all the columns. So if the values after prefix compression in any column are the same, then this can be optimized. So let’s say we had another column and for some reason we had prefix one four. Well, if it made any sense, then that could be put off into the compression information structure.

And there’s my prefix three. And so this gets replaced with prefix three. So you can see the idea it’s looking for things that it can move up into the compression information structure. So we got row compression first. Again, that’s dependent on the file types or column types. Then we have prefix compression, which is about the same column, and then dictionary compression which is about all the columns. Now, there’s probably one more bit that you need to know, page compression. What is pages? Well, a page is a set of rows and it’s up to 8192 characters. That is called a page. So pages are uncompressed at first. Page compression is only used when additional rows can be fitted on a full page. So let’s say that we go back to the beginning and let’s say that this is sufficient for an entire page. And I want to put an extra raw just in here. Well, it won’t fit.

So the question now is, well, how can we make it fit? Well, we can use all of these compression techniques. Now, I should point out, compression doesn’t affect backup and restore. So this is available in Azure SQL Database, azure SQL managed instance and on some SQL Server on virtual machines. It’s available from SQL Server 2016 service Pack one in all editions. Before that it was only available in the very expensive Enterprise edition, the top of the line. So what can you compress? You can compress tables. So tables either stored with a clustered index or without.

So a table without a clustered index is called a heap. Now, you can’t use data compression with tables which have sparse columns. So if you set up a particular column to be called sparse, meaning there’s not much data in there and you want it to have a certain amount of space reduction, then you can’t use data compression on those tables. If you want to change the compression option, say from raw to page or raw to off in a clustered index, then you would need to drop the clustered index and rebuild the table. You can also compress indexes. So you can compressed in non clustered index. By default, they are not compressed. You can also compress a complete indexed view.

Now you can’t compress system tables. Now, in the previous video we were talking about partitions. If you have got a table with lots of different partitions, then different partitions can be compressed using different settings. So let’s see how you can enable compression. Well, you can go to the table or anything else and go to Storage Manage Compression. So you can see that option with indexes storage manage Compression. So if you do that, then you’ll be given this screen here.

Welcome to the data compression wizard. And what sort of compression do you want? Non raw or page. So you can see how many rows there would be. And if you had lots of different partitions, then they would be shown that you can also use the same compression type for all partitions. Now, if this wasn’t an Azure SQL Database, you could also calculate the amount of current space in the compressed space. However, in Azure SQL Database that functionality doesn’t work.

So click next and you can either run immediately or you can create a script. So what I’m going to do is create a script to a new query window. And here you can see the SQL code. So it is alter table, name of table, reboad partition equals so it could be one, for instance two or all. And then with data compression equals page raw or none in brackets. And it’s a similar thing for an index. So reboad index name of index on name of table. So let’s say we got this index on a particular table and then reboot partition equals all with data compression equals page, raw or none. Now what I’m going to do now is connect to my on premises SQL Server.

So this could be like for instance a virtual machine. And you will see if I go down to a table, if I go into storage and manage compression. And you can see we can create partitions there. As I said, partitions aren’t available part from the primary partition in Azure SQL Database and say I want this to go into a row or page. You can see that we do actually have a calculation of current space and the requested compressed space as well. Now we can also estimate it in TSQL, but again, we can’t do this energsql database. So if I create a new query, I can say exec SP estimate data compression savings.

So then I would need to put in the schema name or in this case DBR the document or object name. So in this case TBL employee and other things. If I wanted to, if it was an index and I’ll put the index number, I could put the partition number and I can say what type of compression I am looking at. And you can see at the moment we go from 88, but as I say, that doesn’t work in Azure SQL Database. So I’ll just close that. Now disconnect. Now, this sort of table, which is called a raw store table, so it stores it in pages with lots of different rows, isn’t the only kind of object.

There’s also a column store table as well. So unlike the row store which stores all of these rows, and there’s one page and then stores another set of rows as another page and so on, then the column store concentrates on individual columns and then mashes them together whenever needed. It was initially used for data warehouses, but then expanded in a column store table or any other objects. Columns are always compressed. So this is column store compression. Now, indexes work best when you scan large amounts of data like fact tables in data warehouses, column store indexes, and they’re generally clustered. Non clustered column store indexes are only used when you have a data type not supported by a clustered index like XML, Text and Image.

So the major thing about column store objects is they’re already compressed. Now they can be compressed further from the column store compression by using column store archival compression. So what that does, it saves even more space. We don’t need to go into the ins and outs of how this works. But the major thing you need to know about column store archival compression is it’s best to be used when the data is not often read, but you need the data to be retained for regulatory or business reasons. Yes, it does save space, yes, it does save input output.

However, there is a very high compute cost in uncompressing it, which far outweighs any saving that you might have for the I or the input output. So don’t use it unless it’s for data which is not often read and you want to save the space. The data needs to be retained for regulatory or business reasons. And of course, on the web, saving space means saving money. So for column store tables, they are already compressed using column store compression. We can compress it further using column store archival compression, but for raw store tables. So the standard tables then you can choose from Numb, you can choose from Raw and you can choose from Page, which has three elements the raw compression, the page compression and then dictionary compression which looks at all of the columns on a particular page.

Comments
* The most recent comment are at the top

Interesting posts

The Growing Demand for IT Certifications in the Fintech Industry

The fintech industry is experiencing an unprecedented boom, driven by the relentless pace of technological innovation and the increasing integration of financial services with digital platforms. As the lines between finance and technology blur, the need for highly skilled professionals who can navigate both worlds is greater than ever. One of the most effective ways… Read More »

CompTIA Security+ vs. CEH: Entry-Level Cybersecurity Certifications Compared

In today’s digital world, cybersecurity is no longer just a technical concern; it’s a critical business priority. With cyber threats evolving rapidly, organizations of all sizes are seeking skilled professionals to protect their digital assets. For those looking to break into the cybersecurity field, earning a certification is a great way to validate your skills… Read More »

The Evolving Role of ITIL: What’s New in ITIL 4 Managing Professional Transition Exam?

If you’ve been in the IT service management (ITSM) world for a while, you’ve probably heard of ITIL – the framework that’s been guiding IT professionals in delivering high-quality services for decades. The Information Technology Infrastructure Library (ITIL) has evolved significantly over the years, and its latest iteration, ITIL 4, marks a substantial shift in… Read More »

SASE and Zero Trust: How New Security Architectures are Shaping Cisco’s CyberOps Certification

As cybersecurity threats become increasingly sophisticated and pervasive, traditional security models are proving inadequate for today’s complex digital environments. To address these challenges, modern security frameworks such as SASE (Secure Access Service Edge) and Zero Trust are revolutionizing how organizations protect their networks and data. Recognizing the shift towards these advanced security architectures, Cisco has… Read More »

CompTIA’s CASP+ (CAS-004) Gets Tougher: What’s New in Advanced Security Practitioner Certification?

The cybersecurity landscape is constantly evolving, and with it, the certifications that validate the expertise of security professionals must adapt to address new challenges and technologies. CompTIA’s CASP+ (CompTIA Advanced Security Practitioner) certification has long been a hallmark of advanced knowledge in cybersecurity, distinguishing those who are capable of designing, implementing, and managing enterprise-level security… Read More »

Azure DevOps Engineer Expert Certification: What’s Changed in the New AZ-400 Exam Blueprint?

The cloud landscape is evolving at a breakneck pace, and with it, the certifications that validate an IT professional’s skills. One such certification is the Microsoft Certified: DevOps Engineer Expert, which is validated through the AZ-400 exam. This exam has undergone significant changes to reflect the latest trends, tools, and methodologies in the DevOps world.… Read More »

img