PL-300 – Section 22: End of Part 2

  • By
  • May 11, 2023
0 Comment

172. End of Part 2

Well, congratulations. We are halfway through the course. You’ve now got to the end of part two. So, let’s just have a look at some of the requirements of the PL-300 exam.

So in this part, you’ll have seen how you can get data from different data sources. So, you could connect to various data sources. You can use a shared dataset or a local dataset. You can select a storage mode. You could use Microsoft Dataverse and changing values in a parameter. You’ve also seen how you can clean, transform and load the data. So most of this part has been about shape and transform the tables. Where things like pivots, unpivot, promoting the first row to a header. We’ve been looking at combining queries, with merge and append. We’ve been identifying appropriate data types and the appropriate columns to join two tables together. We’ve also had a look at inconsistencies and data quality issues.

So, there was a lot in this part and if you’re going forward to the PL-300 exam, then I hope that you will get to grips with the M language so you’ll be able to answer any questions about that.

Well, I hope you enjoyed this part two, and I’ll see you in part three of this course.

173. Welcome to Part 3 – Modeling and DAX Functions

Hello and welcome to Part 3 of this course which is all about Modelling the Data. So, what is this part all about? Well, we have got the data, we have transformed it, but there may be additional bits that you need to tweak. For example, in the Power Query editor, you may have been able to add a column to every single row with a calculation.

However, what if you wanted to add a measure? That’s not necessarily adding something to every column and then adding it up. If it were, there could be some really bad performance, because let’s say you’ve got a billion rows. You then have to add a billion values into a new column and then do a calculation on those billion values. Those billion values would have to be still permanently in memory.

But for measure, all you have to do is the calculation and the computer can speed that up, because it may look at it and go, “Well, I see there’s a quicker way of doing this.” So, what we are doing in this part, we first of all, we’ll be designing a data model. So, we’ve got all of these different queries that need connecting together. So, one table might need to be joined to another table. So, yes, you could do that in a Power Query editor. You could merge, you can append, and that’s good, if you just wanted one table.

But sometimes it may be better to have it in a few tables and join them together only when you need to. Sometimes you might just want a visualisation, based on one table, and not use the others.

So, we’ll be looking at all of the relationships, all of the joining between the tables, so that it becomes a Model, we’ll be designing it, we’ll be developing it. And then we’ll be looking in some detail at the DAX language to create measures. So DAX, is a language which has its roots in Excel, but is an expanded version of that. So first of all, we will be creating a Data Model. We’ll be having an introduction to DAX Functions, including the Logical Functions, IF, AND, OR, NOT, SWITCH and more.

We’ll be looking at the Functions in various categories: Statistical, Mathematical, Text and Information. Then we’ll be looking at some real DAX language that can join things together, such as RELATED, RELATEDTABLE, COUNTROWS and Time Intelligence Functions such as DATESINPERIOD, SAMEPERIODLASTYEAR and PARALLELPERIOD.

And then we’ll have some things that just don’t fit into the previous sections. So, let’s start Part 3 which is all about Modelling Data and using DAX Functions.

174. Small Changes to the Menus

Hello, and welcome to Part 3 of this course. Microsoft have changed some of the menus in the Model and Data tabs. Basically, they have got rid of the Modeling tab. Most of the functionality has now been moved to the Home tab. Thank you for reading this, and keep learning!

175. Get Multiple Data Sets, and Connecting Them Together

Hello and welcome to part three of this cast. So, part one was all about visualising data, creating all of our visualisations. Part two was about get data, and all of the transformations you can do, including the M formulas. Well, part three is about adding modelling.

So, we’ve got the data, but we need to make some adjustments. All we need to, correlate multiple data sets together. And some of this will include using calculated columns and calculated measures. And for that we’re going to use, yet another language called DAX, D-A-X. So, previously we had M formulas for get data, which was like, text dot something. Well, DAX formulas uses its completely own language, unfortunately.

Now, there is a bit of overlap between the two. Both of them are transforming data. So, for instance, in the get data, you could use a calculated column to add some measure, but you can also do that and sometimes more effectively in the modelling section using DAX. So, with regards to the certification, we have done in part two, the consuming and transforming data by using Power BI, so it connects to various data sources We’ve performed transformations and we cleansed the data. In part one we did the visualising data. So, we were looking at things like creating hierarchies and creating and formatting interactive visualisations, and using KBIs gauges and cards.

Well, this part is all about this first section. Creating and optimising data models and creating calculated columns, calculated tables and measures, including DAX formulas. So, in this section we’re not going to be continuing our look at the M language, we’re going to assume that the data that we have got, we have used as much of the get data transformations as we want.

Now, we’re going to be using one data set for this particular part, or rather, may, I should say four data sets. So, let’s get those data sets now. So, get data from Excel. It’s our source, the Power BI data that you can download from way back at the beginning of this course. And, what we’ll be using are these four with Dim and Fact at the beginning. So, Fact is the same as measures and Dims are a series of descriptions about those measures. So, I’m going to click on each of those. I’m not going to transform the data, I’m going to assume that what we’ve got here is basically what we need, and click on Load. So, all of that data is coming into Power BI. And you’ll see it on the right-hand side with all of these fields.

Now, there is some connections or there are some connections already within the data, for instance I could get the category name and get the sub-category name. And you’ll see here, for instance, we’ve got the categories and the sub-categories successfully joined together. I’ll show you how the computer’s done that automatically in just a moment. However, there is a bit of discontinuity as well. If I go to Factinternetsales and I click on the sales amount, you can see the computer has no idea how to connect Factinternetsales with these two DimProduct categories and DimProduct sub-categories. So, now let’s take a quick look at the data that we’ve got and we’ll go to the data icon on the left-hand side. So, here is the DimProduct. So, this tells information about a product. So, here, for instance, is the product name. Dim category, very short table, just translating a product, category key into a category name. Similar, for the sub-category and then internet sales, this shows the nitty-gritty of what people bought.

Now, if you go into the model, this shows the four tables, and you can see at the bottom, we’ve got zooms, so we can zoom in and out, and you can see the disconnect. DimProducts is joined to Factinternetsales, DimProduct category is joined to DimProduct sub-category. It’s successfully worked those out. And you can see that there is a number one and an infinity, a star sign. What this means is that there is one DimProduct, which is used multiple times, could be zero times, could be one time, could be more than one time in Factinternetsales. And, similarly, there is a category, one category is used multiple times. So, if I go to categories, there is one category called Bikes, with a product key of one, which is used multiple times in sub-categories. So, here we can see that it is used for these first three items. So, that’s what it means from going one to infinity. So, we need a link between DimProduct and DimProduct sub-category. So, make sure both of those are on the screen and the link is the product sub-category key from DimProduct needs to go to the product sub-category key of DimProduct sub-category. So, to make this link between the two, go from one, drag to the other and there is our link.

Now, it looks a bit of a mess, so let’s get this in order with Factinternetsales on the left-hand side, DimProduct next, DimProduct sub-category next, and DimProduct category at the end. So, now, if I was to go into the data, and if I was to add, again, the sales amount, so I do that, you can see it is calculated correctly. So, previously, it was just telling me all of these categories had each 29 million, now it’s telling me the right amounts. That’s because our model is connected together.

Now, in the next video, what we’re going to do is have a look at this particular arrow and see why it is so important and how it can give you the wrong information, the wrong analysis.

176. The Problems with Direction of Relationships Between Data Sets

So, what is this arrow between two different data sources?

Well, this shows where the data flows, the analysis flows. So, in this case, it’s going from the DimProduct to FactInternetSales, and it always goes from the one to the infinite. Highlighting it also highlights the keys that are involved. So, you’ll notice the one is placed nowhere near the product key, neither is the infinity sign.

If I double-click on it, you can alter, edit, the relationship, and indeed if you wanted to add a new relationship, you would get a similar dialogue box if you go to Modelling, Manage Relationships, and then you can click on Edit, or you can click on New. Let’s have a look at this particular one. So, it goes from FactInternetSales, and product key, we’ve highlighted product key, and DimProduct, and we’ve highlighted product key there. And it’s a many to one cardinality, in other words, on the one side it is guaranteed that there is only one version, so there is only one product number two, but we might sell it multiple times.

So, for instance, product ID 528, only exists one time in DimProduct, but we sell it multiple times. Make this relationship active. In DAX model, you can only have one active relationship between a pair of tables. Between A and B, you can only have one active relationship. And this relationship can only be on one set of columns. So, I can’t highlight four columns, for instance, and have a relationship like I can in SQL server. So, if you do need to have a multiple, a compound, key, then you would need to create a calculated column that combines it together and then use that. It’s this that I really want to talk about in this video. The cross filter direction. It is defaulting to single. And the reason it defaults to single, i.e. it flows from the DimProduct to the FactInternetSales, is because that is a lot more efficient in terms of the actual calculations of the visualisations. And most of the time, you won’t need it to go anything else, you won’t need it to say both. But let’s have a look at an example when you do.

We’ll look at two examples, one that works and one that doesn’t work.

So, let’s go into the report, and we’ll create a new visualisation with DimProduct category and DimProduct subcategory. So, I’m going to click on the English product name, and I will focus on this particular visualisation. Let’s go down all the way to FactInternetSales. And I’m going to ask for the total amount, so the sales amount, so click on that. And you can see it works. And the reason that it works is because the data is flowing the correct direction. We start off in category, and we’re flowing down, look at where the arrow’s going, towards the FactInternetSales, and then doing the calculations based off FactInternetSales. Of course, we didn’t need to flow all the way to the end, we could have done calculations based in subcategory or DimProduct. So, for instance, I could go into DimProduct, dim means dimension by the way, ask for the total weight, if that was a measure that was being used. So that’s not the total weight, that’s each individual weight. I’m just going to count it. And you can see it works. It’s not given me the actual weight; it’s just given me how many products essentially there are. But, again, the reason it works is because we are flowing down.

So, we start with anywhere, we could start with subcategory, but as long as we are flowing in the right direction, this will work. So rather than do a counterweight, what I’m going to do instead is account of the number of product IDs. So, if I just click on ProductKey, you can see it’s given me each individual product key, but if I click on this drop-down and go to Count, it’s now counting the number of product keys there are, 606 in total, of which 209 don’t have a category, 35 are accessories, 125 in bikes, and so on. Right, so let’s clear this visualisation and start again. So, we’re going to have a visualisation that starts off in FactInternetSales, which is all the way on the left.

So, let’s take something about FactInternetSales, and let’s take for instance the order date. So, I’m going to add the order date into the fields, I’ll just click on it, there we go. And then I’m going to go from FactInternetSales to DimProduct, and I’m going to get the product key. And here we can see, for instance, on the 1st July 2005, we sold product key 310, product key 336, product key 346. So, so far, so good. But what I’m going to do now is count the number of product keys, so there are three on the first day. I click on the drop-down, go to Count, and you will see that, actually, there are 606 on every single day, and there is a total of 606, that does not make sense. And it’s as if there is no connection between these two. And, in reality, there is a connection, it’s just that it’s going the wrong way. We are trying to go from product key, or from FactInternetSales, and do an analysis of DimProduct. And that is not allowed with the direction flowing the opposite way. So, how to get round this?

Well, there are two ways. The first way is to change this cross filter direction from single to both. And now if we go back, now we can see that on the 1st July 2005, there were three different product keys. However, this is a major thing. Changing the directional relationship to both, and now you can see the arrows going in both directions, can really have negative consequences for the performance of your model. So, if you find that there is… it’s going slowly, and you don’t know why, have a look and see whether there are any cross filter directions which are on the both side. And ideally, what you should do, is change your model accordingly. For instance, in this version, I could, instead of using product key in DimProduct, I could use product key in FactInternetSales, it is in fact a particular field. So, let’s change this back to single. Here we have the wrong analysis, but you can see very easily it’s the wrong analysis. And we’ll change that to product key within the FactInternetSales. And now you can that it is working. And you’ll also notice instead that the answer is a bit different. We had three product keys, but five over here in FactInternetSales.

So, what is happening is, if you have a look at the data, no, that doesn’t work, that just gets me literally where we are, we’re in a table already. If we go into the data tab for FactInternetSales, and we have a look at that one particular date, you will be able to see, when I go over to the product key, that there are five items that were ordered, but, of them, only three were unique. It’s the far left-hand side, so here we have 346 being ordered by three different people, and then one for 310, and one 336. DimProduct will give us an answer of three, because there are three products, three links, as it were, going from here, because it’s product 346, maybe, three times, but according to DimProduct, there’s only one product called 346. But five items were sold. This is an interesting example to show that if you have the multiple fields in multiple columns, multiple tables, be very careful about interpreting the answer. If you wanted to see, oh, aye, hang on, there were only three different items, then you can change the count from Count to Count Distinct, and that will give us the three answer. However, if you want to count how many actual items there were, then you can say there were five.

There is a second way of getting the answer from a related table, and that’s using a DAX function called cross filter, which is fairly advanced, but later on we’ll be looking at filter and value functions in this part of this course data flows from one table to another in a particular direction.

And you need to bear that in mind when you’re building your model, and only override it when you absolutely need to. An advantage of using filter functions is that it will only be necessary for that one particular column. So, if I was to change this to both, then it does it both directions for all of the columns, and it really can degrade performance enormously. So, design your model so that the data flows one particular direction, where possible.

Comments
* The most recent comment are at the top

Interesting posts

The Growing Demand for IT Certifications in the Fintech Industry

The fintech industry is experiencing an unprecedented boom, driven by the relentless pace of technological innovation and the increasing integration of financial services with digital platforms. As the lines between finance and technology blur, the need for highly skilled professionals who can navigate both worlds is greater than ever. One of the most effective ways… Read More »

CompTIA Security+ vs. CEH: Entry-Level Cybersecurity Certifications Compared

In today’s digital world, cybersecurity is no longer just a technical concern; it’s a critical business priority. With cyber threats evolving rapidly, organizations of all sizes are seeking skilled professionals to protect their digital assets. For those looking to break into the cybersecurity field, earning a certification is a great way to validate your skills… Read More »

The Evolving Role of ITIL: What’s New in ITIL 4 Managing Professional Transition Exam?

If you’ve been in the IT service management (ITSM) world for a while, you’ve probably heard of ITIL – the framework that’s been guiding IT professionals in delivering high-quality services for decades. The Information Technology Infrastructure Library (ITIL) has evolved significantly over the years, and its latest iteration, ITIL 4, marks a substantial shift in… Read More »

SASE and Zero Trust: How New Security Architectures are Shaping Cisco’s CyberOps Certification

As cybersecurity threats become increasingly sophisticated and pervasive, traditional security models are proving inadequate for today’s complex digital environments. To address these challenges, modern security frameworks such as SASE (Secure Access Service Edge) and Zero Trust are revolutionizing how organizations protect their networks and data. Recognizing the shift towards these advanced security architectures, Cisco has… Read More »

CompTIA’s CASP+ (CAS-004) Gets Tougher: What’s New in Advanced Security Practitioner Certification?

The cybersecurity landscape is constantly evolving, and with it, the certifications that validate the expertise of security professionals must adapt to address new challenges and technologies. CompTIA’s CASP+ (CompTIA Advanced Security Practitioner) certification has long been a hallmark of advanced knowledge in cybersecurity, distinguishing those who are capable of designing, implementing, and managing enterprise-level security… Read More »

Azure DevOps Engineer Expert Certification: What’s Changed in the New AZ-400 Exam Blueprint?

The cloud landscape is evolving at a breakneck pace, and with it, the certifications that validate an IT professional’s skills. One such certification is the Microsoft Certified: DevOps Engineer Expert, which is validated through the AZ-400 exam. This exam has undergone significant changes to reflect the latest trends, tools, and methodologies in the DevOps world.… Read More »

img