DP-203 Data Engineering on Microsoft Azure – Monitor and optimize data storage and data processing Part 6
18. Azure Stream Analytics – The importance of time
So in the previous chapter, we had looked at the metric of the watermark delay. In this chapter, I just want to stress the fact about how important time is when it comes to Azure Stream antics, because all of the queries that you actually write in End will be based on time. Because the entire idea is you want to get a set of metrics from your streaming data based on a particular time window, and especially when it comes to using your function, such as, let’s say, your tumbling window.
At that point in time, it is important for the Stream Addicts job to understand the timing of the events that are coming in. And also remember when it came on to Azure Event Hubs. When we were getting data, there was additional properties when it came to the event process, UTC time and the event encountered UTC time. So when it comes to the time, there are various factors to consider.
Now in our particular use case scenario. So first of all, the events are being sent by the diagnostic setting that is available on our Azure SQL database. Then our event is sent onto Azure Event Hub, and then our event is consumed by our Stream antics job. So here you can see that when it comes to the time for a particular event, there is a particular timeline for that event.
First is the application time, when was the event generated? So in our Azios SQL database on the server, there must have been a time when the event was generated. Even if you have a device, if you have, let’s say, multiple devices that are sending information or events onto the Azure Event Hub, when the event is first created by the device itself, that is the application time, the time at which the event was generated.
Next, the event flows onto the Azure Event Hub. Now that is known as the arrival time when the event actually reaches the Azure Event Hub. And then finally we have the time at which the event was consumed by Azure Stream ATEX. So in your query, you might be basing the time either on the application time or the time when it arrived in the Azure Event Hub. So in order for Azure Steve Addicts to understand the timeline of your events, because it needs to understand what is the flow of your events. So at different points in time, it will try to calculate, okay, I have received ten events so far.
In the next 1 minute, I probably might be receiving another ten events. It needs to create some sort of internal smart algorithm to understand what are the flow of events. Because I said everything is based on time and that’s why it creates that watermark to kind of understand, okay, when was the last time did it actually receive an event? And I said there are different reasons as to why you would have this watermark in place. Remember, in our case, it is because no events were reaching our Steam Attic’s job.
Sometimes it can also be based on clock skews. So what does this mean? So it could be that let’s say in this case you had an application. Now the application might be running on a system. This could be on your on premise environment. Now here, this might be following a particular date time. The time is very important. And then in Azure, we have another time. Now, you can’t be so sure that the time here, a clock is exactly the same across the board, across all of your systems. There could be a small, very minor difference, and that’s known as the clock skew.
So this also needs to be taken into account by Azure Steamatics. It’s not only about the difference in the time, the application time, the arrival time, et cetera. It’s also dependent upon these other factors as well. And the next is the latency. So there could be time taken for the event to reach each of its destinations. So how does then Steam antics job actually handle these events? And then maybe you could also have late arriving events or early arriving events. So let’s say that in a particular time window, so let’s say from 10:00 to 1010, so all of the events which had the timestamp have reached accordingly on a steep antics job.
But let’s say that there was an event, but let’s say that there was an event that actually was generated at, let’s say, 957 and it reached late, right? So all of the events were processed during this time frame, but there was one event that could be due to network latency, et cetera, reached Azure symptoms at a later point in time. So even after this window was actually elapsed, it has already been processed. So what does Azure Steam Addicts do about this late arriving event? Because see if it includes this event in the next time window.
And let’s say you’re performing aggregates on the metrics that are being collected in this time window. This will not give an accurate measure. And I’m only talking about one event. There could be a flurry of events that could have been delayed. So again, this is something that you can actually configure in the Azure Steam Addicts job. It depends on how you want to process these late arriving events. So I said all of these things are something that the Azure Steam Addicts jobs need to consider when it comes to processing your events. I just want to shed more light on this. Just to let you all know about what goes in the background when it comes to Azure Steam Addicts, when comes to processing your events.
19. Azure Stream Analytics – More on the time aspect
Now in this chapter, I just want to go through some other aspects when it comes to the time for your events in Azure Steam antics. So, when Azure Steam tics actually receives an event, so let’s say it has a timestamp of 10:00, it arrives at 10:00. So here the watermark, remember, gets generated by Aziotematics to understand the time aspect of the events that are coming into the job. So the watermark is the largest event time minus any out of order tolerance window size. So you can actually define this out of order tolerance window size in the settings of the Steam antics job. If there is no incoming event, then the watermark is the current estimated arrival time minus the late arrival tolerance window. So, if I just quickly go on to the Steam antics job. So here, if I scroll down, there is something known as the event ordering.
And here you can decide that if events do arrive late, so accept the late events with a timestamp in the following range and I’ll explain what this setting actually does. There is actually a screenshot that I’ve actually attached of the Microsoft documentation that explains that what happens when you actually have the setting in place and what to do with out of order events or events that don’t come in the perfect order when it comes to the timestamp itself. So here you could either when it comes to handling of such events, you can adjust the timestamp of the event or you can drop the event altogether.
So this is where you can actually set these additional settings, what to do if events arrive late, or what to do with out of order events. You can decide to drop them or you can decide to adjust the timestamp. So again, this depends upon the accuracy that you want when, let’s say, a performing aggregates on the data that is flowing onto the Azure Stream antics job. So here, if I scroll down, so here I have a screenshot from the Microsoft documentation.
It actually lets you know on what are the different aspects when it comes to late arriving events. So, let’s say that you have an event that was generated at 10:00 and then it arrived at Azure Steam Attics at a time of 1040. And then you have something known as the system timestamp. So this is the current system time. So when you are running a query at that point in time, what is the timestamp of the underlying system?
So the event time, remember, is different. This is the time that the event was generated. The arrival time is different. This is a time at which the event arrived at Azure Steam Attics. And the system timestamp is the current timestamp of the system itself. So when you are running the query, it can actually make use of the system timestamp as well. So if you have a late arrival policy of 15 seconds. So here you can see that the Arrival time -50 seconds will then be the system time stamp.
So there are different aspects that Azure Steam antics can actually take care of when it comes to the timing of your events. So remember, when it comes to events that are flowing in from Azure Event Hub, we have the event process UTC time. So this is the date and time that the event was processed by the Stream Antics job. And then you have the event NQ time. This is the date and time that the event was received by Event Hubs. Now, if you want to take the system time, this is how you can take it. System timestamp. So remember, this is the current timestamp of the system itself.
So if I just take this query, if I go onto my Steam Addicts job, I’ll go on to the query section. Let me just hide this. Let me replace it here and let me go on to DB Multi Hub to ensure that I have some data in place. So I have some data in place, let me test my query. So here, if I scroll on to the right, so we have the time. So this is the application Generate time. So this is the time at which the event was actually sent from the diagnostic setting of the database onto Azure Event Hub. Then I have the event time. So here, my event time is the time that is actually processed by the Aziosty Antics job.
So again, going down. So this is the event time at 547, and the system time right now is also 547. 539 was when the event was generated by the diagnostic setting of the Azure SQL Database. So you can see there are various ways that you can actually process time when it comes to your events that are flowing onto your Azure Stream antics job.
Interesting posts
The Growing Demand for IT Certifications in the Fintech Industry
The fintech industry is experiencing an unprecedented boom, driven by the relentless pace of technological innovation and the increasing integration of financial services with digital platforms. As the lines between finance and technology blur, the need for highly skilled professionals who can navigate both worlds is greater than ever. One of the most effective ways… Read More »
CompTIA Security+ vs. CEH: Entry-Level Cybersecurity Certifications Compared
In today’s digital world, cybersecurity is no longer just a technical concern; it’s a critical business priority. With cyber threats evolving rapidly, organizations of all sizes are seeking skilled professionals to protect their digital assets. For those looking to break into the cybersecurity field, earning a certification is a great way to validate your skills… Read More »
The Evolving Role of ITIL: What’s New in ITIL 4 Managing Professional Transition Exam?
If you’ve been in the IT service management (ITSM) world for a while, you’ve probably heard of ITIL – the framework that’s been guiding IT professionals in delivering high-quality services for decades. The Information Technology Infrastructure Library (ITIL) has evolved significantly over the years, and its latest iteration, ITIL 4, marks a substantial shift in… Read More »
SASE and Zero Trust: How New Security Architectures are Shaping Cisco’s CyberOps Certification
As cybersecurity threats become increasingly sophisticated and pervasive, traditional security models are proving inadequate for today’s complex digital environments. To address these challenges, modern security frameworks such as SASE (Secure Access Service Edge) and Zero Trust are revolutionizing how organizations protect their networks and data. Recognizing the shift towards these advanced security architectures, Cisco has… Read More »
CompTIA’s CASP+ (CAS-004) Gets Tougher: What’s New in Advanced Security Practitioner Certification?
The cybersecurity landscape is constantly evolving, and with it, the certifications that validate the expertise of security professionals must adapt to address new challenges and technologies. CompTIA’s CASP+ (CompTIA Advanced Security Practitioner) certification has long been a hallmark of advanced knowledge in cybersecurity, distinguishing those who are capable of designing, implementing, and managing enterprise-level security… Read More »
Azure DevOps Engineer Expert Certification: What’s Changed in the New AZ-400 Exam Blueprint?
The cloud landscape is evolving at a breakneck pace, and with it, the certifications that validate an IT professional’s skills. One such certification is the Microsoft Certified: DevOps Engineer Expert, which is validated through the AZ-400 exam. This exam has undergone significant changes to reflect the latest trends, tools, and methodologies in the DevOps world.… Read More »