PL-300 – Section 20: Part 2 Level 8 – Get Other Types of Data
157. Introduction to SQL Server
Hello and in this section of what we’re going to do, concluding our look at the Power Query section of Power BI, is connecting to various different data sources.
Now, there are obviously, a huge number of data sources and we can’t possibly go through them all. So, what we’re going to do is look at the requirements of the official Microsoft Exam. And so what we’ve got is, connect to files and folders, well, we did that previously, when we had a look at January and February and then merged in March’s and April’s details. We’ve been importing from Excel for most of this course, so that leaves us with databases and then SQL Azure, Big Data, and SSAS. So, we’re going to start off with connecting to a database.
Now, for this video and the next video, I am not expecting that you actually have SQL Server loaded onto your computer, and it’d probably be a waste of time, to be honest, because it takes a bit of awhile to load it on computer, then you got to load a database, and so forth. So, this is a bit more of a follow along, what’s happening on-screen, rather than actually doing it. So, SQL Server, we can create a database in SQL Server, which holds lots of tables. So, you can imagine it is analogous to a big workbook, Excel workbook, containing lots of spreadsheets.
So, before I connect, there is a sort of thing you might need to know, and that is the authentication, how to prove that it’s you. Otherwise, anybody could just connect to your precious data. And there are two main ways of connecting to SQL Server. Windows authentication, so you need your Windows username and your Windows password, and that is good; that’s a general way of doing that where possible because the computer itself has had to log you into Windows and go through lots of checks to ensure that your password is still current.
If you’re in a corporate environment, it meets complexity requirements and all the rest of it. SQL Server, that requires you to have a log-in and a password and is generally thought to be a little less useful than the Windows authentication. For example, I have to be connected to the network, or have a computer that has been connected to the network, to be able to connect to a corporate network. With SQL Server authentication, I could be anywhere on the internet, on any computer, and just connect using a log-in and password. So, I’ve got to connect to this database and the database could have a really complex name, could have lots of things, like a backslash, but in this particular case, my databases just got, or my server is just going to be called dot. And that indicates it’s the local database connected to my computer. So, I’ll connect and there we have an entry into SQL Server.
Within that, there are lots of databases. So, these are the workbooks, if you will. And within that, there are the individual tables. So, I’m going to go into AdventureWorks 2014, and of course, just because I’m in the front door, doesn’t mean I’ve actually got rights to use any of these particular databases, it just so happens I’m doing this. But, authentication, seeing if you can get into an SQL Server database, that really is an issue for your database administrator. So, you can see, all of the tables that are connected, or that are part of this database, and I’m going to concentrate on three. So, production dot product, so, just so you know, this last bit after the dot is called the table name, the first bit is called the schema name. So, in the schema production, I’m looking at the product table, the product subcategory table, and the product category table. So, what I could do in SQL Server is write a little piece of code to get everything from production dot product. And this bit is relevant to what we’re doing in Power BI. I’m not trying to teach you how to use SQL Server. So, there are two aspects here, select and the star, or the asterisk, means all of the fields, from and then the name. So, here we have lots and lots of columns and it could be a huge number of rows. In this particular case, it’s only 504 rows, but just imagine a production database, you could have millions, tens of millions, hundreds of millions, even billions of rows. And if you didn’t want all of those columns, then you would specify which columns you wanted.
So, for example, maybe, I wanted name and, maybe, I wanted to call that column product name, maybe, I wanted the size. And then if I run that, you can see we have got a much smaller version. And then, maybe, I wanted product subcategory ID. And generally of course, the less information you get from the database, the quicker it’s going to be and the less drain it’s going to be on SQL Server.
158. Importing Database Data into Power BI, and Query Folding
Let’s now go back into Power BI and go into SQL server database.
So, I’ll connect using my connector. So, the server name is dot. If you have a complicated server name then you can, and you’ve got access to SSMS or SQL Server Management Studio, you can go to connect database engine and whatever that complicated thing is, just highlight it and CTRL + C to copy. So, you could have copied all of that. Database optional, but I’m going to connect to the AdventureWorks 2014 and, again, you could right-hand click on the database, go to rename and copy the name.
Now, we’ve got two different types of data connectivity mode, import and direct query. Import is the one that you’re more familiar with. So, the tables and columns are imported into Power BI desktop. It’s going to take a snapshot of it so if the data changes, you will have to refresh your snapshot. Direct query, on the other hand, there is no data that’s imported or copied into Power BI. It gets the metadata, it gets the tables and columns, but just the names and then when you actually run the query, that’s when it asks the database for the query.
Now, there are advanced options, as well. If you have a particular SQL statement you can, to run, you can run it. If you have got access to an SQL server that’s got failover support, so in other words, we have a database which is connected to a second database so that if the first database crashes, we can instantly use the second database so there is no drop time in connectivity. But you’ll be told whether you’ve got that. However, this is more important, “include relationship columns,” so remember that, and currently it’s checked. We can also navigate using full hierarchy, I don’t recommend that, I’ll show you the difference between the two. So, leave all of this as is and click okay, as long as we’ve got the “is include relationship columns,” checked. So, then a dialogue box comes up asking, “Okay, what table are you interested in?”
Now, you might think, “Well, I’m interested in “this table product, and this table product category “and so forth,” if you do that you’ll get two separate queries. So, if we’re trying to combine these, product, product subcategory and product category in one, we don’t want to select three separate queries. We just select one table, and let’s transform the data. So, we go back into power query and here we have the sources, the database and the navigation leads to particular table. And you can see we’ve got, in this case, all of the data but it could just be the top 1,000 rows and we’ve got the data with all the columns.
Now, what I’m going to do is do just a little something which is looking at the native query. So, if I right-hand click on this whatever stage I’m on and view native query, you can see that we have a select statement with all of these columns and then from the table. So, we have table size as size. Now, suppose I said, “I’m just really interested in the name.” So, I right-hand click remove other columns.
Now, let’s have a look at the native query. You can see the native query has really been reduced, so what’s happening? Is the computer going into the database, retrieving all of this information and then the next step, discounting all of the columns apart from one?
The answer is, no, that would be a huge waste of the computer’s time. Instead, what it’s doing is something called query folding and that means it’s going to go through your steps and produce, in computing terms, the simplest query that it can use to get the data. So, it’s not going to get all of this data if it has to. At the moment, it has got all the data, but it’s only got the data for the preview, for the first 1,000 rows. If this was a data base with 1,000,000 rows, then it hasn’t loaded all 1,000,000 rows for all of the columns, and it’s not going to because when I click close and apply it knows that all it’s got to do is load one column equally. If you then say, “Well, I only want to keep “the top fifty rows,” then the query reflects this. Select top 50, name, from this table. So, the computer will simplify the query to ensure that the minimum amount of data that it needs to do what you want is transferred.
So, that’s query folding, and it’s done automatically. There are some things that will stop it from doing it, but those are a bit rare, being for instance doing something that is not easily supported in SQL Server language.
159. Select a Storage Mode
Now, I just want to have a further look at this data connectivity mode.
So as said in the previous video, import is for downloading all of the data at the beginning and direct query is for getting the data only when it needs to.
Now, in the DA-100 certification exam, it doesn’t call this data connectivity mode. Instead it calls it a storage mode. So, you can think about this as storage. So, do you want to all of your data to be stored on your computer?
Now, let’s just have a think about why you might not. First of all, you would have to get all of the data as a snapshot, as I said, in the previous video. Secondly, could that be a bit of a security problem? If you have all of your data stored in your computer, and maybe, let’s say you were connecting to this SQL server database, maybe, the security changes on the SQL server, which means you no longer should have the right to access it. Well, you’ve already got to it. It’s already imported on your computer. So, there’s nothing that the end DBA on the SQL server, the database administrator, would be able to do about this. But with direct query, this means that the security is always up to date because you are always getting the latest security requirements.
Additionally, your data is up to date. So, let’s suppose it’s nine o’clock in the morning, and I grab all of the data. I would have the data as of nine o’clock in the morning, and then I’d have to refresh to get the latest data. With direct query, I’ve always got the latest data. So, whether it’s nine o’clock, 10 o’clock 11 o’clock.
Now, the disadvantage of that is that I am constantly having to go back to the source. However, the advantage is that I’m not downloading all of the data. Suppose, I do engage select statement and manage to do use query folding. And it’s actually quite a small contained data set. Well, that would work for the import as well, but, maybe, my direct query, my visualisations, I’m getting an even smaller set. So, maybe, I’m doing photos. Maybe, I’m doing something else. Direct query just takes the data that you need when you need it. But there may be a bit of a lag, because you’re having to ask SQL server for it. It really is a bit of a question which is better getting everything at the beginning, saving local data. So that’s the import one.
You can also use things like Q&A and quick insights because you’ve got the data, but you would have to do all of your refreshing manually. Or do you want to just get the data when you need it? Wave the up to date data, wave the up to date security. So, you’ve got these two different modes, import and direct query.
Now, for some data, there is a third mode, and that is called dual, D-U-A-L or composite mode. So, what that means is, “Okay, computer, I want you to choose.” So some data can be directly imported, other data must be directly queried.
Now, just to let you know, you can change your data connectivity mode. So, I’ve just loaded a table at random. And if I hover over it, you can see the storage mode. Now, if I go to the mode and click on that table, you will see that over here we’ve got properties. And if you go down to advanced, you can see the storage mode. So whether it’s direct query, dual or import.
Now, you’ll notice that direct query and dual are greyed out. I have actually imported this table. If you go from direct query or dual to import, it’s a one-way route. You cannot change it from input back to direct query or dual. So, if you get the data initially using direct query or dual, you can change it to import. If, however, you’ve got it as an import, if your storage mode is import, you cannot change it back. And the only thing you can do, therefore, is to delete the query and start again.
So, storage modes. The import mode allows you to create a copy of your datasets on your local computer. And it is the default. Direct query is for when you don’t want to save local copies of your data, maybe, for security reasons, maybe, because you will find that it’s quicker if you’re using a huge data source and it always shows that you have the most up to date data, and that’s all security requirements from the sauce are met. And then there’s dual or composite mode, which allows some data to be directly imported and other data to be queried. So, it’s once foot in both data connectivity modes or storage modes.
Interesting posts
The Growing Demand for IT Certifications in the Fintech Industry
The fintech industry is experiencing an unprecedented boom, driven by the relentless pace of technological innovation and the increasing integration of financial services with digital platforms. As the lines between finance and technology blur, the need for highly skilled professionals who can navigate both worlds is greater than ever. One of the most effective ways… Read More »
CompTIA Security+ vs. CEH: Entry-Level Cybersecurity Certifications Compared
In today’s digital world, cybersecurity is no longer just a technical concern; it’s a critical business priority. With cyber threats evolving rapidly, organizations of all sizes are seeking skilled professionals to protect their digital assets. For those looking to break into the cybersecurity field, earning a certification is a great way to validate your skills… Read More »
The Evolving Role of ITIL: What’s New in ITIL 4 Managing Professional Transition Exam?
If you’ve been in the IT service management (ITSM) world for a while, you’ve probably heard of ITIL – the framework that’s been guiding IT professionals in delivering high-quality services for decades. The Information Technology Infrastructure Library (ITIL) has evolved significantly over the years, and its latest iteration, ITIL 4, marks a substantial shift in… Read More »
SASE and Zero Trust: How New Security Architectures are Shaping Cisco’s CyberOps Certification
As cybersecurity threats become increasingly sophisticated and pervasive, traditional security models are proving inadequate for today’s complex digital environments. To address these challenges, modern security frameworks such as SASE (Secure Access Service Edge) and Zero Trust are revolutionizing how organizations protect their networks and data. Recognizing the shift towards these advanced security architectures, Cisco has… Read More »
CompTIA’s CASP+ (CAS-004) Gets Tougher: What’s New in Advanced Security Practitioner Certification?
The cybersecurity landscape is constantly evolving, and with it, the certifications that validate the expertise of security professionals must adapt to address new challenges and technologies. CompTIA’s CASP+ (CompTIA Advanced Security Practitioner) certification has long been a hallmark of advanced knowledge in cybersecurity, distinguishing those who are capable of designing, implementing, and managing enterprise-level security… Read More »
Azure DevOps Engineer Expert Certification: What’s Changed in the New AZ-400 Exam Blueprint?
The cloud landscape is evolving at a breakneck pace, and with it, the certifications that validate an IT professional’s skills. One such certification is the Microsoft Certified: DevOps Engineer Expert, which is validated through the AZ-400 exam. This exam has undergone significant changes to reflect the latest trends, tools, and methodologies in the DevOps world.… Read More »