DP-203 Data Engineering on Microsoft Azure – Monitor and optimize data storage and data processing Part 3
7. Azure Synapse – Workload Management
Now in this chapter, I just want to go through workload management, which is available as part of your dedicated SQL pool. This is something that we had actually implemented earlier on. So when we were looking at loading data in a SQL pool that time, we had created a unique user known as User Underscore Load, and we had created some workgroup settings to ensure that this user could effectively load the data in a table in our dedicated SQL pool. Now, the main reason for having workload management in the dedicated SQL pool is because you can have different types of workloads running on the SQL pool. You might have one set of users that is loading data into the SQL pool.
You might have another set of users that are performing analysis on the data. And most of the time you need to manage the workloads so that the resources are managed accordingly. You want to ensure that the users that are loading data into the SQL pool have enough resources at hand. At the same time, you want to ensure that the processes running by these users does not affect the processes of the users that are performing analysis on the data. So you want to ensure that the resources are allocated accordingly.
So for that, you can implement workload management. And this is something we had seen earlier on. The first thing we had done was to create something known as a workload group. So this forms the basis of workload management. Here you define the boundaries of the resources for the workload group.
After you define the workload group, you then define the workload classifier, which basically ensures the users are then added onto the workload group. Now all of this we are actually done via SQL statements, but the same thing is also available from the Azure portal. So if I go onto Azure, if I go on to my synapse workspace, so here I can’t see anything when it comes to the workgroup itself, I actually have to go on to the dedicated SQL pool. So when you create a SQL pool, there’ll be a new resource at hand. So here I can see new pool, I’ll click on it.
Here we have something known as Workload management. If I hide this here, I can see my data warehousing units, and here you can see the workload group that we had created earlier on. Here you can also create a new work group. There are different work groups, predefined work groups that are available depending upon the type of workload usage.
You can also create a custom workload group for a particular workgroup. Here in the context menu, if I go on to settings here you can define what is the importance of the requests that are made from this workload group. You can also define what is the query execution timeout and what are the maximum resources percentage per request. So depending upon the type of users and what are the requests that are actually being made onto the dedicated SQL pool. You can define the workgroup accordingly.
8. Azure Synapse – Retention points
Now, in this chapter, I just want to give a quick note when it comes to restore points for your Azure Derrick SQL Pool. Now, regular backups are taken for your dedicated SQL Pool. These are snapshots of the data warehouse that are taken throughout the day. Now these restore points are then available for a duration of seven days. Days.
You can restore your data warehouse in the primary region from any one of the snapshots that have been taken in the past seven days. You can also define your own user defined snapshots as well. So here, if I go on to my dedicate SQL Pool, here you can see you have a restore option in place. So if I hit on restore here, I can look at what are the available automatic restore points. So I said that the backups are taken by the service itself at different points in time.
So here I can look at all of my previous days and choose a time for my restore point. So if I feel something is wrong with the data in my dedicated SQL Pool, I can use these restore points to restore my pool to a previous point in time. Now, you can also create your own user define restore points. So here you can create a new restore point. So let’s say that you are going to be making a big change on to the data in your decade SQL Pool.
So you can first create a new restore point. It’s like someone taking a backup of your entire workload. And then after the restore point is in place, you can then perform the required operations on your data warehouse. And if anything goes wrong, then you can recover back on to that restore point. So in this chapter, I just want to give a quick note when it comes to the restore points that are available as part of your dedicated SQL Pool.
9. Lab – Azure Data Factory – Monitoring
Hi and welcome back. Now in this chapter, I just want to go through the monitoring aspect which is available for your Azure Data Factory pipelines. Now, if you go for any pipeline in the pipeline runs, here, you can see which activity pass and if it failed, which activity failed in your pipeline. For each activity, you can look at the data flow details. Here you can see all of the diagnostic information for each of the steps. This was based on a mapping data flow. So you can see the amount of time it took to start up the cluster going back onto the activity runs. If you look at a copy based activity, if you look at the details.
So here you can see the amount of bytes that were read from the source and the amount of bytes that were written on to the destination. You can see the throughput and you can see other information as well. Now, when it comes to the metrics for the pipelines and the information, it is only there for a period of 45 days. So each of the pipeline runs, you can only see it in the past 45 days. If you want to persist all of this data, then you need to create something known as a Log Analytics Workspace. So a Log Addicts Workspace is actually a central logging facility or place that you can actually store logs from various Azure resources.
And you can do the same thing when it comes to Azure Data Factory. So I’ll quickly show you how you can do this. Now, Azure Data Factory our resource. So let me just go on to it quickly. So our factory resource is in the North Europe location. I’ll open up all resources in a new tab and let’s create a Log Analytics Workspace. So here I’ll hit on create. I’ll search for log analytics. So the Log Analytics workspace. I’ll just hide this. I’ll hit on create. Yeah, I’ll choose my resource group. I’ll choose my location. Has North Europe give a name for the workspace? I’ll go onto next. So it’s a pay as you go pricing model. I’ll go on to next. I’ll go on to review and Create. And let’s hit on create. It will just take a minute or two for the Log Attics Workspace to be in place.
Once we have the workspace in place, I’ll go ahead onto the resource. I’ll leave it as it is. Now I’ll go on to Azure Data Factory Resource in Azure. And here we need to go on to diagnostic settings. So this diagnostic setting is available for a lot of Azure resources. And here you can send information such as your activity runs, your pipeline runs, your metrics, your trigger runs, et cetera, onto that Log Antics Workspace. And you can retain the data in that log antics Workspace for an extended duration of time. So I’ll add a diagnostic setting. And here I can choose my activity runs, my pipeline runs and my trigger runs and I can send it on to a Log Antics Workspace.
So I’m choosing my factory workspace. You can also send this on to other destinations. I’ll just give a name for the setting and click on Save. Now it might take around half an hour for the log information to start showing up in our Log Antics Workspace. So let’s come back after some time. Now after waiting for around half an hour, if I now go on to the logs section in the Log Antics Workspace, let me hide this and let me close this. Let me also close this as well. And here if we expand this we can now see tables which map onto our Activity run and onto our Pipeline run. Here we can see all of the data. For example in our Activity run table you can also add some clauses as well.
Let me close this and let me run this query hazardous. And here you can see all of the data. So for each row you can see different information about the activity run like what is the activity name, what is the activity type. You can also use something known as a custom query language which is understood by the Log Antics Workspace and you can run queries. So for example in the Pipeline run table you can look at those operations which contain the keyword of fail.
So if you are trying to find those operations that have actually failed, you can actually run this particular query. So at the moment there are no such operations but you can filter out using a lot of custom query language operators, right? So in this chapter I just want to show you that feature wherein you can persist the logs of Azure data factory on to a Log Antics Workspace which is in turn part of the Azure monitoring solution.
Interesting posts
The Growing Demand for IT Certifications in the Fintech Industry
The fintech industry is experiencing an unprecedented boom, driven by the relentless pace of technological innovation and the increasing integration of financial services with digital platforms. As the lines between finance and technology blur, the need for highly skilled professionals who can navigate both worlds is greater than ever. One of the most effective ways… Read More »
CompTIA Security+ vs. CEH: Entry-Level Cybersecurity Certifications Compared
In today’s digital world, cybersecurity is no longer just a technical concern; it’s a critical business priority. With cyber threats evolving rapidly, organizations of all sizes are seeking skilled professionals to protect their digital assets. For those looking to break into the cybersecurity field, earning a certification is a great way to validate your skills… Read More »
The Evolving Role of ITIL: What’s New in ITIL 4 Managing Professional Transition Exam?
If you’ve been in the IT service management (ITSM) world for a while, you’ve probably heard of ITIL – the framework that’s been guiding IT professionals in delivering high-quality services for decades. The Information Technology Infrastructure Library (ITIL) has evolved significantly over the years, and its latest iteration, ITIL 4, marks a substantial shift in… Read More »
SASE and Zero Trust: How New Security Architectures are Shaping Cisco’s CyberOps Certification
As cybersecurity threats become increasingly sophisticated and pervasive, traditional security models are proving inadequate for today’s complex digital environments. To address these challenges, modern security frameworks such as SASE (Secure Access Service Edge) and Zero Trust are revolutionizing how organizations protect their networks and data. Recognizing the shift towards these advanced security architectures, Cisco has… Read More »
CompTIA’s CASP+ (CAS-004) Gets Tougher: What’s New in Advanced Security Practitioner Certification?
The cybersecurity landscape is constantly evolving, and with it, the certifications that validate the expertise of security professionals must adapt to address new challenges and technologies. CompTIA’s CASP+ (CompTIA Advanced Security Practitioner) certification has long been a hallmark of advanced knowledge in cybersecurity, distinguishing those who are capable of designing, implementing, and managing enterprise-level security… Read More »
Azure DevOps Engineer Expert Certification: What’s Changed in the New AZ-400 Exam Blueprint?
The cloud landscape is evolving at a breakneck pace, and with it, the certifications that validate an IT professional’s skills. One such certification is the Microsoft Certified: DevOps Engineer Expert, which is validated through the AZ-400 exam. This exam has undergone significant changes to reflect the latest trends, tools, and methodologies in the DevOps world.… Read More »