PL-300 – Section 14: Part 2 – Getting Multiple Files

  • By
  • May 3, 2023
0 Comment

112. Merge Queries and Expand Table

In this video we’re going to have a look at merge queries, the first of the combined. And what we’re going to do is I’m going to get rid of almost everything that we did in last few sections. Just have it so that we have our raw data. So, these are our file names. This is the path which contains the album.

Now, you can categorise these albums. This one for instance is a sampler. It’s got music from all over the place. Further down, we have an audiobook. Further down we have opera and that sort of thing popular music maybe. So, what I’m going to do is I have in my Power BI spreadsheet a spreadsheet called CD categorization. And that categorises the various paths the albums into these CD types. So, what we’ll want is an extra column in this query which has that type. S,o in Excel this will be the equivalent of a vertical lookup.

So, what I’m going to do first of all is import to get the data of the CD categorization as a second query. So, let’s get from a recent source and this time we’re getting this CD categorization. Click OK and it loads no problems. So how do we combine these two together? Or first of all I want to promote this first row’s headers. So how do we get these two queries together? We use merge queries and we’ve got two options. Merge one query into an existing query or merge them into a third query. So, I’m going to merge from CDs. I’m going to get in CD categorization. So, I’m going to merge queries. So, this opens a dialogue box with my existing query at the top and a second query at the bottom. So, I’m going to call the second query the CD categorization. And it says okay. How do you want me to match them? Where are your columns which are going to be the same? Well, it’s the path in this column and is the path in this query.

Now, there are various join types if you are used to SQL Server then you will understand immediately what left join, right join, full joint, inner join means. Alternatively, have a look at what’s in the brackets. So left outer join says it’s going to take all of the rows from the first table and any matching rows in the second. So where they don’t match when I’ve accidentally deleted one of the categorizations. I haven’t categorised it and it’s completely blank in both paths and CD type. We’re still going to keep our existing table. So, you can see all from the second table and matching for the first all from both. That would mean that it would have to be either in the first table or the second table. Inner, it had to be in the first and the second. Left ante and right ante it’s where there is a match. You get rid of it essentially. So, if is used for trying to exclude certain data. So, maybe, you had something that said grand total and you wanted to exclude the grand totals. If none of this is making much sense then just know the main ones we’ll be using are left outer while you’re keeping your original data always inner while you’re only keeping your original data way, matches something with the second data and full outer while you’re keeping both sets of data regardless of how they match. So, let’s look at left outer. And here you can see that the two queries have emerged.

Now, they’ll be merged together in a table so because each match doesn’t just have one column. It has multiple columns.

Now, we can expand that quite easily by clicking on this expansion button. This is also here in the transform structured column expand. So, if I click on that it asks you what do you want to expand it to? What columns do you want to keep? So, this is the equivalent of choose columns. So, I could keep the path but I’ve already got the path. So, I don’t need the path again. So, I’m just going to keep the CD type.

Now, if it says use original column name as prefix then the column would be CD categorization dot CD type. Well, I don’t need that in this particular case so I’m going to de-tick that.

Now, if these contained numerical values and you wanted to sum them up then you could use the aggregation instead. So, you could sum the columns instead of having them as separate columns. So, let’s click OK and our CD categorization is now replaced by the CD type. So, here we have at the sampler audiobooks opera and so forth. So, obviously what you can now do is Close and Apply that. And because we have this new column called categorization, CD type. We can put CD type in there and now we can see the difference between sampler and Christmas and classical and so forth.

113. Merge Queries with Group By, and Different Types of Joins

Now, what else could we do with merging queries.

Well, as I said, you can also merge the queries as new. You remember a few videos ago we had this ‘group by’. So, I created a new query based on this original query. So, I am managing, and in this case I’ll duplicate, and then I’m going to group by, and I’m going to group by the CD type, and I’m going to count how many rows are in each. So similar sort of thing to what we’ve just seen in our visualisation. What I’m going to do now is introduce this count into our original query by merging it. So, we merge this query, and I’ll merge it as new just to show you that you can have a new query at the end of it.

So, it’s merging this CDs with CDs two. Obviously, I could change the name, and I’m doing a let out to join. I’m selecting which particular columns I want. Expand the table, I just want the count. So, we can see that there were 351 samplers, 68 opera and so forth. So, if you wanted to use that information to say, well what proportion, what percentage of samplers is this one particular row? Then you can see it’s one divided by 351 multiplied by 100%.

Now, I’m just going to remove this. And I’m going to change this CD categorization from CD categorization to CD categorization short, which only contains two lines of detail. And the reason I’m doing this, is just to demonstrate the various different types of joints. So, if I change this, the navigation, we’re going to change from CD categorization to CD categorization short. So, there’s my end table. So, at the moment we have lots of knows. So, we only have not known for these very first two items, the path and the CD type. And I’m just going to select path and CD type, and I’m just going to remove all other columns, just so we can see what’s there. So, this is a left join. So, it’s taking all from the first table and just those that exist in the second table. A right join well it’ll take all of those which exist in the first table, and all of only those which match in the second table. So, you can see my first table is now down to 35 rows because it’s only these two paths which exist in my first table. So, this explanation might just be a bit simpler, if we call these tables A and B. So, a left outer join, is all of those in table A and why they match in B, which happens to be all of the albums, so there’s 52 albums there. B, a right outer join, is all those in B and all those in A, which happen to match. So, what I’m going to do actually, is put in a third album. So, this one is a non-existent album, and who knows what the CD type is? So, I’ll just save that, and I’m going to refresh my data. So my right join contains all three of my roles in CD type, and only the data which exists in A. Now, so, in other words, it is this section here.

Now, a full outer join is both, pits all rows from both. So, we got the entirety of table A and if I scroll to the very bottom, it’s still within this preview. We will also have this, who knows from B. So, that’s a full outer join. So, that’s why you don’t want any particular data at all to be deleted. An inner join is just where they match. So, you can see this matches 35 roles from table A and just two rows from table B. So, it won’t match the who knows because that exists in table B, but it doesn’t exist in table A. And then we have a look at the anti-joins, and this removes what’s in table B. So, this will exclude the 35 rows that is initially there. So, anything in table B has now been removed. As I say, the vast majority of the time, you will be using a left outer join, which contains your original data, plus anything that matches in the new data. You’ll be using a full outer join, while you don’t want any of the data to be lost, and you’ll be using an inner, which just gets the bit in the middle where A and B overlap.

So, merge queries. We have created a new query, we have merged it, or we’ve imported it from another data source, or we’ve used the group by to create a new query. We have merged it together, by selecting the columns that we want to merge it with, and we’ve used the various type of joins, so that we get the information back that we want. And then we expanded the tables to get us just the columns from the new table, table B that we want. So, it might sound a bit complicated, but just know merge queries gets information into a query from another query.

114. Appending Two Queries Together

Now, in this video, we’re going to look at append queries. So merge queries, which we’ve just looked at was about adding extra columns from different queries. Append queries is about getting additional rows. So, combining them. It’s like a union in SQL Server.

So, what I’m going to do first of all is I’m going to delete my two existing queries and start again. So, I’m going to delete my B table and you’ll find that I can’t. My B table is being referenced by the A table. It’s because of this join here, this Merged Queries. So, I’m going to now just delete A and delete B. So, the data that I’m going to be using is in another folder called Monthly csv files. And if you’re following along with the resources, that’s very near the beginning of this video series, then create a new folder called Monthly csv files and you’ve got January and February. I’m also going to create another folder outside of this that I’m just going to call Spare. So, I’m going to have January and February in this Monthly .csv files and I’m going to move March and April into this Spare folder. So, all we’ve got are these two Monthly csv files. You’ll see why a few videos later. So, now I’m going to get new queries and you can see all the different ways that I’ve got of just getting new queries from Text or CSV. These are CSV files. So, here is my January data. And you can see the computer has interpreted it correctly. It is comma-based. It could be tab based, those are the two main types. It’s trying to find what sort of data types each column is based on the first 200 rows. The reason for this is so that let’s say you had a file of a million rows. It wouldn’t need in this preview setting to download the entire dataset if the data types were fairly consistent throughout. On the other hand, if halfway down, you’ve got some raw data that would change the dataset, then you would need to base it on the entire dataset. Generally, the first 200 rows are fine.

However, if you’ve got header rows, for instance, like we had in our previous example that weren’t part of the data type, then you could say do not detect the data types. So, based on the first 200 rows, you can see that it has promoted the first row to the headers. If I don’t detect data types, then it’s not going to allow it, I have to do that myself. So, let’s see what happens when we import January. So, you can see it’s got the source, its source is from here. It’s promoted the headers. And it’s changed the types. So, we’ve got date and text and financial money. Right, let’s get a second query, again from this folder. We’ll have it as February and we’ll just import it in. So, it only contains one row. So, what we have is two tables with the same sort of information. Date, Subtype, Out. Four rows and one row.

We want to merge these two queries together to choose an overall and that we can do using Append Queries. Again, we can append one query into another or we can append two queries into one. So, I’m going to do that. And here you can see we have got some options. Are we going to have two queries, two tables or are we going to have three or more? So, let’s just have these two tables. January and February. We start off with January, then at the bottom, we’re going to put February. Click OK and we get a third query which is the amalgam of these two. Now, if I did it the other way around, and had February first and then January, I think you can guess the results. We’ll have February at the top and then all of the January ones at the bottom.

Comments
* The most recent comment are at the top

Interesting posts

The Growing Demand for IT Certifications in the Fintech Industry

The fintech industry is experiencing an unprecedented boom, driven by the relentless pace of technological innovation and the increasing integration of financial services with digital platforms. As the lines between finance and technology blur, the need for highly skilled professionals who can navigate both worlds is greater than ever. One of the most effective ways… Read More »

CompTIA Security+ vs. CEH: Entry-Level Cybersecurity Certifications Compared

In today’s digital world, cybersecurity is no longer just a technical concern; it’s a critical business priority. With cyber threats evolving rapidly, organizations of all sizes are seeking skilled professionals to protect their digital assets. For those looking to break into the cybersecurity field, earning a certification is a great way to validate your skills… Read More »

The Evolving Role of ITIL: What’s New in ITIL 4 Managing Professional Transition Exam?

If you’ve been in the IT service management (ITSM) world for a while, you’ve probably heard of ITIL – the framework that’s been guiding IT professionals in delivering high-quality services for decades. The Information Technology Infrastructure Library (ITIL) has evolved significantly over the years, and its latest iteration, ITIL 4, marks a substantial shift in… Read More »

SASE and Zero Trust: How New Security Architectures are Shaping Cisco’s CyberOps Certification

As cybersecurity threats become increasingly sophisticated and pervasive, traditional security models are proving inadequate for today’s complex digital environments. To address these challenges, modern security frameworks such as SASE (Secure Access Service Edge) and Zero Trust are revolutionizing how organizations protect their networks and data. Recognizing the shift towards these advanced security architectures, Cisco has… Read More »

CompTIA’s CASP+ (CAS-004) Gets Tougher: What’s New in Advanced Security Practitioner Certification?

The cybersecurity landscape is constantly evolving, and with it, the certifications that validate the expertise of security professionals must adapt to address new challenges and technologies. CompTIA’s CASP+ (CompTIA Advanced Security Practitioner) certification has long been a hallmark of advanced knowledge in cybersecurity, distinguishing those who are capable of designing, implementing, and managing enterprise-level security… Read More »

Azure DevOps Engineer Expert Certification: What’s Changed in the New AZ-400 Exam Blueprint?

The cloud landscape is evolving at a breakneck pace, and with it, the certifications that validate an IT professional’s skills. One such certification is the Microsoft Certified: DevOps Engineer Expert, which is validated through the AZ-400 exam. This exam has undergone significant changes to reflect the latest trends, tools, and methodologies in the DevOps world.… Read More »

img