DP-203 Data Engineering on Microsoft Azure – Monitor and optimize data storage and data processing Part 7

  • By
  • July 3, 2023
0 Comment

20. Azure Event Hubs and Stream Analytics – Partitions

Now in this chapter I want to talk about partitions. Now when it comes to Azure Event Hubs, you can actually have partitions in place and then you can also make use of these partitions when it comes to your Azure Steep Antics job, they’re actually picking up data from these various partitions. So in Azure Event Hubs, you can actually define the number of partitions to define for your Event Hub. This actually helps in better throughput ingesting more data at a time.

So if you have a lot of events that are coming into Azure Event Hubs, they can actually be streamed into multiple partitions at the same time. You could also have multiple consumers that target each partition and actually process the data we can see in as your Event Hubs. It’s not required to go ahead and process the events one by one in any sort of order. Here you are trying to process the events based on a particular time interval. So you can have multiple consumers that actually process the events in these different partitions. So I said you can specify the number of partitions when you create the Event Hub.

You can’t change the number of partitions when it comes to Azure Event Hubs. So if I go on to an existing hub that I have, so let’s say DB hub, if I go on to the overview, when it comes to the partition count, you can see it is one. But there is no way of changing this partition count. It’s only when you create a new Event Hub. Here is where you can change the number of partitions.

Now, when you are sending data onto Azure Event Hubs, you can also use one of the attributes in your data to serve as something known as the partition key, so that Azure Event Hubs can probably use this attribute to distribute the events across the different partitions. Now, when it comes to our example, we have been streaming the diagnostic data from our Azure SQL database.

Now there we don’t have any sort of control on how the events are sent onto Azure Event Hubs. We cannot mention anything has the partition key. But if you have devices, if you have your own applications that is sending data onto as your Event Hubs, and there you have more say in the data that is being sent there, you can actually decide what can be the partition key. So when you are making the event there, you can decide or specify which attribute should be the partition key.

So let’s say you have devices that are sending data onto Azure Event Hubs. You can have the user identity or let’s say the device identity attribute to serve as the partition key. And I said these events can then be spread across multiple partitions. And then when it comes to Azure Steam antics, they can actually get the data from multiple partitions. And then also Azure Steam Attics can also take this data work it in parallel and then send across multiple partitions onto your output. So if your output also supports taking data from multiple partitions, this can also be done by a Steam antics job.

So here you can see there is a lot of efficiency because you can have your processing done in parallel. And if, for example, you are specifying the partition key, let’s say your application, you have the ability to specify your partition key when you define the input details there you can actually specify the partition key. Now, in our case, when we define DB Hub, we did not specify any sort of partition key because I said that all of the data is being sent by the Azure SQL database diagnostic setting. And there we don’t have control over what is the partition key. Now, I’ll just show you an example that I’ve implemented when it comes to having multiple partitions.

21. Azure Stream Analytics – An example on multiple partitions

So in this chapter, I just want to give an example when it comes to multiple partitions. Now what I have done is I’ve created a new event. Hub. And here I’ve mentioned the partition count as four. So when I created the event Hub, I mentioned four partitions. Then what I have done is I have ensured that I have four databases in place. Now, I’m not trying to do a one to one mapping of a database onto our partition because that’s something I can’t do. So the diagnostic setting will automatically send the events onto the Azure Event Hub, and Azure Event Hub will then partition the events accordingly. But just to ensure that I have enough events to be sent across multiple partitions, I just try to ensure that I have four SQL databases in place. And then for each database I have changed the diagnostic or implemented the diagnostic setting.

So here, if I go on to the setting, so here I am streaming it on to the Multi Hub Event Hub. And here is the basic metric. Now, when you want to stream diagnostic settings for multiple databases onto the same Event Hub, what you have to do is you need to have a different policy name in place, a different policy in place every time. So for this, I had gone on to my namespace. And here, when I go on to the shared access policies, I created four different policies for each database. So this is something that is required. You need to have a different shared access policy for each setting.

So now all of these databases are sending their diagnostic setting onto this Azure Event Hub, which has four partitions. Now, in our Azure Stream Attics job, first of all, I think I have a job. So it’s in the stop state, that’s fine. Now let me go on to my inputs and here let me create a stream input of the Event Hub. So here I’ll give the name, then I’ll scroll down.

Here I’ll choose my DB Multi Hub, I’ll scroll down, I’ll use the authentication mode as a connection string. And here I won’t mention any sort of partition key because as I mentioned before, I don’t have control over the partition key. When the events are being streamed from Azure Event Hub, I’ll click on Save. So now it’s saving the input. Now I’m going to be using one of the same queries we had seen earlier on to send all of this data on to our DB log table. So firstly, in SQL Server Management Studio, let me delete whatever I have in the DB log table.

So this is done. So now if I do a select star from the Bblock table, so I have nothing in place. Now let me take this query, let me go on to my stream antics job, let me go on to the query, let me replace all of this and I’ll save the query. So it’s taking it from DB multi hub that’s over here into DB log. Now let me go on to the overview and let me start the steep antics job. And again, let me choose a custom time so of at least an hour so that it has some data in place. And let me click on start.

Let’s wait till the Steam antics job is in the start state. Now, once my Steam antics job is in the running state, let me do a select star so I can see we do have some records in place. I’ll do a select count star. Let me hit on execute so I can see. We do have some rows in place. Now let me go on to the query section quickly. Here if I choose DB multi hub so here I can see some data in place if I just reduce the zoom. So here I can see that some data is coming in from partition ID number zero.

Some data is coming in from partition ID one. So now in Azure event hubs, based on how it is ticking up the events, it is spreading the events from the different databases onto different partitions. And now the Azure Steam Attic’s job can also by default work on this data in parallel based on these different partitions. So you can actually partition your query. So here you can actually use a partition by clause, so by default you can actually partition by the partition ID itself.

This is something that you can actually do. So again, this is already in built in this version when it comes to the Steam antics job. But if you had a particular partition key, I said you can partition by that partition key in your Steam antics job. For now, I just want to show you that you have your data coming in from different partitions.

Now, if I go on to my job diagram, so here also in my job diagram, I can see I’ve four partitions that are available not only in DB multi hub but in my query step, because this is being done automatically. And here, even though you can’t see it, you will be able to see four partitions in DB log as well, right? So in this chapter, just want to go through this idea of partitions as an example.

Comments
* The most recent comment are at the top

Interesting posts

The Growing Demand for IT Certifications in the Fintech Industry

The fintech industry is experiencing an unprecedented boom, driven by the relentless pace of technological innovation and the increasing integration of financial services with digital platforms. As the lines between finance and technology blur, the need for highly skilled professionals who can navigate both worlds is greater than ever. One of the most effective ways… Read More »

CompTIA Security+ vs. CEH: Entry-Level Cybersecurity Certifications Compared

In today’s digital world, cybersecurity is no longer just a technical concern; it’s a critical business priority. With cyber threats evolving rapidly, organizations of all sizes are seeking skilled professionals to protect their digital assets. For those looking to break into the cybersecurity field, earning a certification is a great way to validate your skills… Read More »

The Evolving Role of ITIL: What’s New in ITIL 4 Managing Professional Transition Exam?

If you’ve been in the IT service management (ITSM) world for a while, you’ve probably heard of ITIL – the framework that’s been guiding IT professionals in delivering high-quality services for decades. The Information Technology Infrastructure Library (ITIL) has evolved significantly over the years, and its latest iteration, ITIL 4, marks a substantial shift in… Read More »

SASE and Zero Trust: How New Security Architectures are Shaping Cisco’s CyberOps Certification

As cybersecurity threats become increasingly sophisticated and pervasive, traditional security models are proving inadequate for today’s complex digital environments. To address these challenges, modern security frameworks such as SASE (Secure Access Service Edge) and Zero Trust are revolutionizing how organizations protect their networks and data. Recognizing the shift towards these advanced security architectures, Cisco has… Read More »

CompTIA’s CASP+ (CAS-004) Gets Tougher: What’s New in Advanced Security Practitioner Certification?

The cybersecurity landscape is constantly evolving, and with it, the certifications that validate the expertise of security professionals must adapt to address new challenges and technologies. CompTIA’s CASP+ (CompTIA Advanced Security Practitioner) certification has long been a hallmark of advanced knowledge in cybersecurity, distinguishing those who are capable of designing, implementing, and managing enterprise-level security… Read More »

Azure DevOps Engineer Expert Certification: What’s Changed in the New AZ-400 Exam Blueprint?

The cloud landscape is evolving at a breakneck pace, and with it, the certifications that validate an IT professional’s skills. One such certification is the Microsoft Certified: DevOps Engineer Expert, which is validated through the AZ-400 exam. This exam has undergone significant changes to reflect the latest trends, tools, and methodologies in the DevOps world.… Read More »

img