DP-203 Data Engineering on Microsoft Azure – Monitor and optimize data storage and data processing Part 4
10. Azure Data Factory – Monitoring – Alerts and Metrics
Now, in this chapter, I just want to go through another aspect that is available in the monitoring section of Azure Data Factory, and that’s when it comes on to alerts and metrics. First of all, if you go on to your dashboards here, you can see a representation of the number of pipelines that succeeded and those which failed. If you scroll down, you can see the number of activities that succeeded and those that failed, and the same for any sort of triggers.
Next, if you go on to the alerts and metrics, if you go on to the metrics, you will be redirected on to the Azure Portal. Here you can go on to the metrics section in terms of the resource types, you can go on to your data GRP, you can choose your app factory resource and hit on apply. If I just hide this for the moment here, you can see a lot of metrics in place. So for example, if you want to look at the failed pipeline run metrics, you can actually see these metrics in place.
And then you could also pin this onto your dashboard. So by default, if I go on to dashboard so this is the default dashboard that you have. If you want to have another tile over here that actually shows you the metrics about your pipelines, you can actually pin those metrics onto your dashboard.
Apart from that, you can also create alerts. So if I click on new alert rule here, you can choose the severity for this particular alert and you can add an alert criteria. So here there are so many criterias in place. These are based on the metrics. So let’s say that you want to look at the fail activity runs metric here on continue, and here you can select values. So for example, if the activity type is either a data flow or copy activity, if you want it for any activities that you have, for any pipelines that you have, and all failure types.
So I’m selecting everything. And here you can say that if the number of failed activity run metrics is greater than two over, let’s say, the last five minutes, over a frequency of every 1 minute, you can add this has the criteria. Then you can configure a notification. So you can create something known as an action group. And here you can add a notification, so you can add an email notification where, let’s say, an administrator is actually notified. You can add the notification, add the action group, and then create the alert rule. So in addition to looking at the metrics for Azure Data Factory, you can also create alert rules as well.
11. Lab – Azure Data Factory – Annotations.
In this chapter, I briefly want to go through annotations that are available for your pipelines. So if you want to add any sort of metadata for your pipeline to indicate the purpose of the pipeline, you can actually go ahead and add an annotation. So, for example, I have a pipeline over here. It’s called databricks. Now here, if I go on to properties here, you can see something known as annotations. So here, let me create a new annotation. And let me say that this pipeline is meant for loading data. Let me then hit on publish. Let me hit on publish over here.
Now let me trigger this pipeline and let’s wait till this pipeline is complete. And then I’ll go on to the monitor section. Now, after waiting for a couple of minutes, if I go on to the monitor section. So here I can see two invocations of data bricks. So I actually invoked this earlier on, and this was my recent invocation of the pipeline. Now here, first of all, let me filter on the pipelines. So I’ll just filter on the databricks pipeline. And here, if I scroll on to the right, you can see an extra column of annotations. So now this annotation has been added only for the new runs.
So I added this kind of label or annotation for my pipeline. Once you start running those pipelines, then the annotations will apply as an additional column in the pipeline runs. So now if you want to add a filter based on the annotation wherein you only want to select the pipelines which are based on the load processing, then you can only filter on those pipeline runs. So it’s like adding an extra label on to your pipelines so that you can filter them in the pipeline runs. Please note that annotations only become applicable for new runs of your pipeline after you’ve applied them to the pipeline itself.
12. Azure Data Factory – Integration Runtime – Note
Now, in this chapter, I just want to go through a couple of notes when it comes to the integration runtimes that are available in Azure Data Factory. So we have looked at the default Azure integration runtime that is the underlying compute infrastructure that is used for running your pipelines in Azure Data Factory. We had also seen how we could also host our own self hosted integration runtime. So if we have our own virtual machine and we need to, let’s say, copy data from that VM onto a service in Azure, we can make use of the self hosted integration runtime. Now, let me go on to the Manage section here.
So if I go on to my integration run times, let me just hide this. So currently I can see that the status of my selfhosted integration runtime is unavailable. And that’s because I have stopped that virtual machine. So I just created that virtual machine when it came to that particular lab. If you don’t need now the self hosted integration runtime anymore, you can go ahead and hit on the delete option here. So I can hit on delete. So now if you need to delete this self hosted integration runtime, firstly, if you go on to the related section, you can see that it is related onto a link service. So first you have to go on to the Link service. I can see I have the link service of Demo VM service.
And in order to delete this link service, you probably would need to go ahead and delete the data set. So you would need to go through a set of steps in order to ensure that you delete that integration runtime. But important things that I want to bring across when it comes to the integration runtime so by default we have the Azure integration runtime. Yeah, it is given the name of Auto Resolve integration runtime. Now we can create our own integration runtimes. If I hit on new if you have any SQL Server Integration Services packages and you want to make use of those packages in Azure Data Factory, then you can make use of the Azure Sys integration runtime that is available here.
So, SQL Server Integration Services packages is like your ETL packages. It is also used when it comes to extracting data, transforming data, and then loading data into a destination data store. So if you want to make use of those existing packages in Azure Data Factory, you can actually make use of the Azure Sys integration runtime. Apart from that, you might also want to create a new Azure integration runtime. So if I hit on continue here.
So we’ve seen these self hosted and we also have Azure as well. Now, if I click on Azure and I hit on Continue and here if I want to create a new integration runtime, we can do that. But what would be one of the primary benefits of actually creating a new Azure integration runtime. So one reason could be the region. Now by default it is set onto autoresolve. And what does this mean? So, let’s say that we were copying data from an Azure storage account that is located in our case in the North Europe location and we are also then copying it onto a dedicated SQL pool in the same location. So now Azure data factory understands that we do have our data set in the North Europe location.
So it will create that underlying compute infrastructure which is determined by the Azure integration runtime in the same location. Because when it comes to data transfer cost, there is a separate pricing for that. So if you are transferring data from any service from one region on to a different region, let’s say that the storage account is in the North Europe location, and let’s say your dedicate SQL pool is in the West Europe location, you will actually pay a price for the data transfer from one region on to another. Data transfer within the same region is free.
So in this case when it comes to the underlying compute infrastructure, it needs to ensure that the data transfer cost is also kept to a minimum. Also another reason as to why you might want to change the region here. So in the autoresol, let’s say I put the location as North Europe and then I hit on create some companies might have the restriction wherein the data should never be transferred onto a different location. This could be from a security perspective.
So now the integration runtime will always be created in the North Europe location. So here the assumption is that the company has all of their resources in the North Europe location and also they would want to ensure that the Azure data factory integration runtime is also located in the North Europe location. They want to ensure that nothing leaves that particular region. So this could be another use case as to why you want to create an additional integration runtime setup. So in this chapter, I just want to give some other important points when it comes to the integration runtime.
Interesting posts
The Growing Demand for IT Certifications in the Fintech Industry
The fintech industry is experiencing an unprecedented boom, driven by the relentless pace of technological innovation and the increasing integration of financial services with digital platforms. As the lines between finance and technology blur, the need for highly skilled professionals who can navigate both worlds is greater than ever. One of the most effective ways… Read More »
CompTIA Security+ vs. CEH: Entry-Level Cybersecurity Certifications Compared
In today’s digital world, cybersecurity is no longer just a technical concern; it’s a critical business priority. With cyber threats evolving rapidly, organizations of all sizes are seeking skilled professionals to protect their digital assets. For those looking to break into the cybersecurity field, earning a certification is a great way to validate your skills… Read More »
The Evolving Role of ITIL: What’s New in ITIL 4 Managing Professional Transition Exam?
If you’ve been in the IT service management (ITSM) world for a while, you’ve probably heard of ITIL – the framework that’s been guiding IT professionals in delivering high-quality services for decades. The Information Technology Infrastructure Library (ITIL) has evolved significantly over the years, and its latest iteration, ITIL 4, marks a substantial shift in… Read More »
SASE and Zero Trust: How New Security Architectures are Shaping Cisco’s CyberOps Certification
As cybersecurity threats become increasingly sophisticated and pervasive, traditional security models are proving inadequate for today’s complex digital environments. To address these challenges, modern security frameworks such as SASE (Secure Access Service Edge) and Zero Trust are revolutionizing how organizations protect their networks and data. Recognizing the shift towards these advanced security architectures, Cisco has… Read More »
CompTIA’s CASP+ (CAS-004) Gets Tougher: What’s New in Advanced Security Practitioner Certification?
The cybersecurity landscape is constantly evolving, and with it, the certifications that validate the expertise of security professionals must adapt to address new challenges and technologies. CompTIA’s CASP+ (CompTIA Advanced Security Practitioner) certification has long been a hallmark of advanced knowledge in cybersecurity, distinguishing those who are capable of designing, implementing, and managing enterprise-level security… Read More »
Azure DevOps Engineer Expert Certification: What’s Changed in the New AZ-400 Exam Blueprint?
The cloud landscape is evolving at a breakneck pace, and with it, the certifications that validate an IT professional’s skills. One such certification is the Microsoft Certified: DevOps Engineer Expert, which is validated through the AZ-400 exam. This exam has undergone significant changes to reflect the latest trends, tools, and methodologies in the DevOps world.… Read More »