DP-203 Data Engineering on Microsoft Azure – Design and Develop Data Processing – Azure Databricks Part 6
19. Lab – Getting data from Azure Data Lake – Implementation
So in the previous chapter we had created our keyword. We create a key in the keyword. Now it’s time to create something known as a databricks scoped secret. So for that I need to go on to my Databricks resource. So here, here I’ll go on to my Azure Databricks workspace. Here I have to take the workspace URL. So I’ll copy it onto the clipboard and then I need to complete this URL. So I’m creating a scope when it comes to the secrets, I need to go back on to my Azure Key vault. So here I need to give a name for the scope. So I just give a scope name. Now here I have to give the DNS name of the Azure Keyword. So if I go on to the overview of the keyword here I have the vault Uri.
I need to copy this, I need to place it here. Next is the resource. ID. So I have to go back on to my Azure Keyword. I have to go on to the properties. Here, let me just hide this. And here I need to take the resource. ID. Note that the word Uri is also available over here. So I’ll copy the resource ID. Then I’ll place it here and let me hit on Create. So now it has added the Data Lake key I’ll hit on. OK, now here let me create a new notebook. Again, I’ll choose scala. I’ll attach it on to my app cluster. I’ll hit on create. Here I have some code in place to fetch data from Azure Data Lake. I’ll just copy this. I’ll place it here.
So here now I want to connect onto our Data Lake Gen Two storage account and I need to replace this. What is the name of my Data Lake account? So I’ll copy this, I’ll place it here. Yeah, I am setting a configuration. When it comes to the Spark context, I’m mentioning my Data Lake storage account name. Here is the name of the scope that we create early on. So remember, we just create a databricks scope secret. Yeah, this is the name. Remember, we gave the same name of Data Lake key here. What is the actual key in the key vault? So, if I go on to my key vault, if I go on to secrets so, even though we are looking at access keys, don’t confuse us with the keys given here.
These are encryption keys which you create in the Azure Keyword. Secrets are like passwords. So for example, if you have a database password, you can actually store it as a secret in the key vault. So what is the name of our secret? It’s data lake 2000. So I’ll place it here. And what is a file that I want to access? So it’s my log CSV file. It is in the data container. I’ll replace this. So let’s confirm. Let’s go on to my containers.
I’ll go on to my data container. So it’s in my raw folder. Right? I have my log CSV file, so in my raw folder. Now let me run the cell. And here we are, getting the output as desired. So again, this is from a security measure. How do you access your Azure data? Lake Gentle accounts in a more secure way.
20. Lab – Writing data to Azure Synapse SQL Dedicated Pool
So in the last chapter we had seen how we could read data from our Data Lake Gen two storage account. Now let’s take this a step further and let’s write this data into a dedicated SQL pool. So here, let me now add some code. So this code can now be used to read data data from our Data Lake storage account and then write it on to our dedicated SQL pool table. Now here when it comes to the data frame itself, so let me just copy this and replace it here. So here I’m ensuring to infer the schema again, let me ensure that I change the name of my storage account and also the name of the container, and it’s in the Raw directory. Now here I am creating another data frame wherein I’m just selecting the columns of interest. Now I am selecting all of the columns, but over here, if you want to only select certain columns in your initial data frame, this is something that you can do.
But at this point in time, I’m only selecting all of the columns I’ll scroll down. And here is where we need to start writing data into our table in the dedicated SQL pool. So let’s first of all go back onto my dedicated SQL pool in my databases. I’ll go on to new pool. I’ll right click and choose a new query. Now here I’ll delete from my log data table if any information is already available. So I’ve deleted all of the rows. Now going back onto our notebook. So I said that you need to also have a temporary staging area. So I can actually create a temporary staging area in my Dllake Gentu storage account. So let me create a new container. So here I’ll go back. Let me create a new container. I’ll give this container name of Tempdir so that is in place.
So here my table name is Log Data. So here is my temporary container. I can also create another folder known as Log. So I can go on to the temporary container and let me create a new directory. Click on save. Now here I’m creating my connection sync. So internally it’s using Java based drivers. So here I need to give a name for my workspace. So if I go on to my Synapse workspace, I’ll copy the workspace name I need to replace it here. This is the name of my pool, my username and the password. Now I’m using the right method that is available as part of your data frame. Here I’m giving the mode as append. So I’m saying that if there are any new rows that are coming in, then appended on to the table. So this is very useful when you’re using or streaming events from Azure Event Hubs.
So whenever there are new events, it will be added on to the table here. The format is that of being a SQL data warehouse. Here is my connection string here is the temp directory here. I’m just saying to go ahead and take whatever Azure storage credentials we have here. I’m giving the table name and the save option is then used to write onto that table. Let me also copy the Import statement. And now let me run the cell. Now it’s running the command. So it seems this is done. Now let me do the select Star. And here we can see our data in place. So in this chapter, I want to show you how you can now take data from a file in your Azure Data Lake Gen To storage account and write it directly in your notebook onto a table in your dedicated SQL pool.
21. Lab – Stream and write to Azure Synapse SQL Dedicated Pool
So now I want to complete the notebook which I’d use for taking events from the Azure Event Hub and I want to stream that data into a table, into a dedicated SQL pool. So earlier on also we had a table in place. This was the DB log table. Now, if you don’t have the table in place, we can go ahead and recreate the table. So first let me drop the table. If it does exist, then here in the code I have the statement to recreate the table. So here I’m representing all of the numbers has long that’s big integers. So let me run it here. So we don’t have any data in place, so that’s fine.
Now I have gone ahead and made changes on to the notebook in which we were taking the streaming data from Azure Event Hub. Now, what are the changes I have made? Firstly, I have added that spark configuration onto our notebook so that our notebook can now authorize itself onto our data lake Gen Two storage account. Next, if I scroll down so all of the logic remains the same here we’re now defining my data schema to keep in line with the schema that I have in my table.
Again, when it comes to the numbers, I have defined it as a long type over here, since my columns have now the type of Big integer, Big int here, I need to map it accordingly. Earlier it was the integer type. So I want to fit larger numbers into these columns if required. Now I’ll scroll down here I have my table name, which I have changed. So now it’s dBlog. And then we have our temp directory, we have our connection string. Now, when it comes to our data frame, if they are now using the right function, we are using now to write stream. Since we are now taking a stream of data, we have to ensure that we use this particular method. Now here I am also mentioning something known as a checkpoint location. This is very important. So remember, we are reading data from an Azure Event Hub. So let’s say one sequence of events have been read in order to ensure that the notebook does not read the same sequence of events.
Again, it will add a checkpoint to know exactly where it left off from. It also helps if in case this notebook or the query crashes, it knows where to resume from. And then we are now using the start method to start the streaming process. Before we were using the save method to directly save the data onto our table. So these are the changes I have made. Now let me run this cell. So now it’s starting the streaming process. So it started the spark job.
Let me go on to the table and see if we have any data. And here we can see we have data in place if we do account Star I’ll hit on execute. So currently these are the number of events that I have let me just wait for a couple of minutes. So just wait for a couple of minutes. Let me again run the statement. And now you can see we have a change in the number of rows. So now our data is being continuously streamed onto the table. So now I’ll go ahead and cancel the spark job. So in this chapter, I want to show you now how you can stream your data directly from Azure Event Hubs onto a table in your dedicated SQL pool.
Interesting posts
The Growing Demand for IT Certifications in the Fintech Industry
The fintech industry is experiencing an unprecedented boom, driven by the relentless pace of technological innovation and the increasing integration of financial services with digital platforms. As the lines between finance and technology blur, the need for highly skilled professionals who can navigate both worlds is greater than ever. One of the most effective ways… Read More »
CompTIA Security+ vs. CEH: Entry-Level Cybersecurity Certifications Compared
In today’s digital world, cybersecurity is no longer just a technical concern; it’s a critical business priority. With cyber threats evolving rapidly, organizations of all sizes are seeking skilled professionals to protect their digital assets. For those looking to break into the cybersecurity field, earning a certification is a great way to validate your skills… Read More »
The Evolving Role of ITIL: What’s New in ITIL 4 Managing Professional Transition Exam?
If you’ve been in the IT service management (ITSM) world for a while, you’ve probably heard of ITIL – the framework that’s been guiding IT professionals in delivering high-quality services for decades. The Information Technology Infrastructure Library (ITIL) has evolved significantly over the years, and its latest iteration, ITIL 4, marks a substantial shift in… Read More »
SASE and Zero Trust: How New Security Architectures are Shaping Cisco’s CyberOps Certification
As cybersecurity threats become increasingly sophisticated and pervasive, traditional security models are proving inadequate for today’s complex digital environments. To address these challenges, modern security frameworks such as SASE (Secure Access Service Edge) and Zero Trust are revolutionizing how organizations protect their networks and data. Recognizing the shift towards these advanced security architectures, Cisco has… Read More »
CompTIA’s CASP+ (CAS-004) Gets Tougher: What’s New in Advanced Security Practitioner Certification?
The cybersecurity landscape is constantly evolving, and with it, the certifications that validate the expertise of security professionals must adapt to address new challenges and technologies. CompTIA’s CASP+ (CompTIA Advanced Security Practitioner) certification has long been a hallmark of advanced knowledge in cybersecurity, distinguishing those who are capable of designing, implementing, and managing enterprise-level security… Read More »
Azure DevOps Engineer Expert Certification: What’s Changed in the New AZ-400 Exam Blueprint?
The cloud landscape is evolving at a breakneck pace, and with it, the certifications that validate an IT professional’s skills. One such certification is the Microsoft Certified: DevOps Engineer Expert, which is validated through the AZ-400 exam. This exam has undergone significant changes to reflect the latest trends, tools, and methodologies in the DevOps world.… Read More »