PL-300 – Section 31: Part 3 Level 9 – Other Modeling and DAX Topics for the Exam Part 2

  • By
  • May 17, 2023
0 Comment

239. Resolve Many-To-Many Relationships – Different Types of Granularity

In this video, we’re going to have a look at another reason why you might have a many-to-many relationship. And that is detailed data on both sides of the relationship.

So, if we go into our data, and have a look at the other two spreadsheets, which are marked MtoM, which are Actual and Target, you can see what we’ve got. In the Target, we have this company which operates out of Western Europe. And it’s anticipating the amount of sales it will get, in three different countries. for whether it is in-store purchases, in a physical location or whether it’s internet or some other way through the post. Maybe, people phone-in their orders. So, we’ve got that for England, France and Germany.

So, we’ve got two lines for each. In the Actual, we have got the breakdown of the stores, the locations. So England-London, England-Birmingham, England-Manchester, so we got three England bases. We don’t know if they’re physical stores. We don’t know if they’re internet, they could be a combination of the two. In France we’ve got Paris, and in Italy, we’ve got Milan and Rome. Italy, but we had Germany in the target. So, we’ve got a mismatch of the countries. So, we’ve got England, France and Germany, and England, France and Italy. So, let’s load that data, and have a look at what happens when we join the model. So, we have got these two tables and we can relate them, using a many-to-many relationship with the country. Again, Cross filter direction, Both, these are quite good.

So, let’s see what we’ve got now. So, if I create a table, I’ll just add the, size a bit higher, but I can’t actually do that until I’ve added some data. So, I’m going to add, the country from the Target. So, now I can make this a bit bigger going to grid. So, we have got three countries in the Target, and we have the amount that they were going to get and the amount that they actually got. So, we have 15,000, 10,000 and 12,000 making 37,000. That’s all fine. And then in the target, we have got 23,000 and 4,000 making 43. What’s missing is Italy. Italy is not there. Italy is not in the Actuals, it’s not though in the Target. And we have got the country from the Target table. If I was to de-select that and put in the country from the Actual, then we wouldn’t have Germany. And it’d be even worse, if I put in both countries, we wouldn’t have either France, either Germany or Italy. We just have France and England. So, you can see doing it this way, although it looks as similar solution to when we did the bank accounts, we’re actually missing data. We’re missing data that exists in one table, but doesn’t exist in another.

So, what can we do? Well, we need to create a third table. This is called a bridging table. It bridges, just imagine a bridge going from one table to another. And what I’m going to do is combine these two tables together as an Append Query. So, I’ll append these queries as new. So, now we have lots of additional columns, which we don’t want. We just want the country. So, I’m going to leave the country, I’m going to remove all the other columns. And then, I’m going to de-duplicate, so that is remove rows, remove rows, remove duplicates. So, now we have the duplicates removed. We just have England, France, Italy, and Germany. So, I’m going to call this my Bridge. So go to Close and Apply, go back into the model. We get rid of the relationship that’s there. And you can see the computer’s already put in, another relationship. It’s dotted it out, it’s made inactive. I want it to be active. We’ve now got rid of this other relationship. The reason why it was inactive, it was, we had a relationship from Actual to Target, which is sort of the same as going Actual-Bridge-Target. So, we couldn’t have both of those at once. So, I need to make this, an active relationship.

So, now we have this Bridge going to Actual and going to Target. Let’s go back into here, can’t display the visual. It doesn’t know what’s happening. So, I’m going to get rid of everything. But now in terms of the countries, but now I’m going to put in the country from the Bridge. And here we can see the answer we really want. We want all of the countries that’s there, England, France, Germany, and Italy, and we’ve got gaps for the Actual and the Target, but that’s absolutely fine. We want the gaps we want to see missing data, and equally importantly, the totals now all add up. So, now what I’ll do to help the end user, is to get rid of all these, to hide the country from Target and Actual. So, we’ve now only got one country, we’re not going to have three country field, when we don’t need it. So, you can see, this is another reason for having a many-to-many relationship. We have got different types of data from the two tables. And the best way, even though we can use a many-to-many relationship, to connect them.

The problem with that is we are missing then countries from one table and therefore so the best way is to have a bridge. And the great thing about how we’ve done this bridge, we’ve not just typed-in a bunch of, countries, we’ve actually made it dynamic. So, if the data in Actual or Target changes, we now move into, Norway for instance, then this bridge will automatically regenerate, and add-in Norway. So, this is a never way to resolve many-to-many relationships, use a bridge containing the common factor. In this case, country.

240. Improve Cardinality Levels Through Summarization and by Changing Data Type

In the past two videos, we’ve had a look at how to deal with many to many relationships. Either you create a cardinality of many to many, or you create a bridge that goes in between the two, but what other ways could there be?

Well, let’s have a look at the data itself. So, if we go into the actual data, we have got the country, the location and the actual. What if you didn’t need the location at all? Well, you could just summarise it.

Now, one way you could do that is here in the get and transform or power query editor. So, we can group by. So, we’re going to group this table by a country and have the sum of the actual. So, this is my actual by country. So, there we have got fewer rows to deal with, and country is now a one. So, if I go back into my data. Everything gets refreshed. We no longer need the bridge because country, which I’m going to unhide, is now unique. And I’ll take this to the target as a one to many.

Now, this could be perfect were it not for the fact that we don’t actually have all of the countries in actual that there are in target. So, if I just delete that, start afresh, put in the country, put in the actual, and put in the target, there we have a different outcome. This time we do have all of the countries. At least, there was a blank for Germany because Germany is not in the actual. So, this is a way of improving cardinality, though you do actually lose some of the existing data. But it depends on what you want to drill down, that might be perfect.

Now, let’s add a new table, let’s add a target table. So, we do that by going to table tools, new table. And so I want this to be the target rolled up. So, here’s my target. And I’m going to have, from this target table, I’m going to use the function code summarise columns that creates a summary table. So, I want to group by the country. And then I don’t want any more grouping. I want to have a new column, not as though a lot. This is a fairly complicated formula, there are three different types of syntax, I’m going for this syntax next. I want to create a new column called total target, and that is going to be the sum of the target. So that gives us our target as a single table.

So, just to look at that again, I’m summarising a table. I’m grouping by the country. And then after I do all the group by, so I might be grouping by lots of different things. So, I’ll have all of the group by separated about commas, and then I’m introducing a new column, which happens to be the sum x of the target. And again, there I can have my relationship between target and actual as in this case, a one to one it thinks. And then what I could do is create a visualisation based on that, one to one is right. I’m just always begin with one to ones, because we’re still missing a particular country in one of them, whichever table you take as the principle, we’re missing the country in the event. So, here we have the total target and here, we have the actual, and again, we’re missing something there, we’re missing Italy. So, there’s two ways to do the summarization. You can obviously use the group by, in the edit query, which we’d use as your table downwards, or you can create a new summarised table using summarise columns.

Now, what other ways could there be? Well, you could theoretically change data types, perhaps. So, for instance, let’s say we had a primary key of AB 123456, well we could split that into two AB and 123456, that would be helpful in business terms, if it actually means something, because this is a string, whereas we are now dividing into a string of a small string and then a number we need to join this case. And if that can be related, if that can be used as our primary key, if the prefix says meaningless, it’s always AB, then we could change this so that it becomes an integer.

An advantage of this is, firstly, suppose we had another table that just had 123456 as the primary key or the foreign key. I’d say, other side of the relationship. Well, then we can more easily have this string be related to this integer. If this first bit is meaningless, we could truncate the string and just have an integer to integer.

Secondly, it saves memory. If we go from a wildly expansive data timeline string to something which is very narrow, integer just takes a few bytes where strings could take a huge number of bytes, that could actually increase performance. The smaller the data type is.

And then finally this topic is improving cardinality levels. What are the different levels? So, we’ve seen all of the types. One-to-one, many to one, many to many, but what are the actual levels? Well, this is an unofficial list, but basically you’ve got three major types. First of all, you have high cardinality. This is where you have no duplication. So, these are your primary keys.

These are your one side of a relationship. So, you’re looking at things that highly specify that particular role, so it could be an ID. Or it can be a combination of columns. The cardinality, a high cardinality means that they’re not duplicated or they’re a little duplicated if you’re thinking about higher cardinality, but the various high cardinality is there are no duplicates. The various lowest cardinality is when you’ve only got a limited number of values. So, true/false, you’ve only got three values, true, false and unknown or null or nothing. We could have age of school children. That’s only going to be within a narrow range of five to 18, five to 16, and then everything else, names, for instance, street addresses, we have your normal cardinality.

So, just imagine trying to have something which has low cardinality on one side or the other of a relationship, that’s going to have huge number for rows. Let’s say you’ve got a million rows on a table or of lower cardinality. You’re probably going to have half a million rows for each value. Whereas with high cardinality, you’re only going to have one row per unique value. And so, when you do relationship, then it’s one row to, however, many rows on the other side. But if it’s low cardinality, it’s many rows, you may need the entire table. And this might impact on the methodology that the engine behind power BI has to actually do your analysis.

In the next video, we’re going to have a look at how to identify poorly performing measures, relationships, remember the cardinality, and whether your visuals are taking a long time.

241. Identify Poorly Performing Measures, Relationships, and Visuals

In this video, we’re going to have a look at how to identify poorly performing measures, relationships, and visuals. I’m going to start largely with the measures and visuals.

First of all, this doesn’t include loading the data, doesn’t necessarily include loading the data. If you go to Edit query, we had a section in part two of this course that allowed us to look at the Diagnostics. So, that shows each step when something is loaded and all of the steps, how long each of those are going to take. But we’re talking about measures that we add in the model and we’re talking about visuals, which we include in part one of this course.

What is causing a problem? Well, if we got into View, there is the Performance analyzer pane, so I’m just going to get rid of the Filters pane and minimise the others and look at the Performance analyzer. See what it says, “Assess your reports performance and identify areas for improvement.”

First of all, I’m going to just create a blank page. I’m going to go into the Performance analyzer and I’m going to Start recording. And now I interact with my report to see how well it performs. And what I’m going to do is I’m going to go into the page with the visualisations. So that’s why I had a blank page to start with. And it’ll give me a list of all of the different visualisations in this case I’ve only got the one, and if I expand it you can see that it is divided up into Visual display. There may be another line for DAX query and Other. So Visual display, that’s the time taken to actually render the visual, to actually put it onto the screen. DAX query, if that happens, that is a request to get to the tabula engine. And if I copy this query and then go to my notebook and paste it in you’ll see that there is a query that is sent. And this is actually a fairly good way to learn fairly advanced DAX.

So, once you’ve got the stuff which is in this course to go even further and go, oh, that’s how I can use ROLLUPADDISSUBTOTAL, for instance, and SUMMARIZECOLUMNS, this is how I can use it.

Now, there is the Other. Other is basically time spent waiting its bottlenecks and so forth. I would personally not look at the Other because there may be bottlenecks that have happened. But the reason for this bottleneck it’s either somebody going to your computer one of the other visuals that is taking a long time or the engines, the DAX queries that it’s taking long time. Or it’s something else where, you may be getting data from an SQL server database and it might be the network, it might be the SQL server database itself there could be a lot of things. So that’s not necessarily helpful you can see time spent waiting in the queue or for the system to finish over-processing.

So, I’m going to refresh the visuals just because it’s happened badly once doesn’t mean it’s going to happen every time. And we can see in this refresh visually it took even long, it has 5 milliseconds for the DAX query and 32 seconds for Other. But by doing this you get an idea overall of what is causing the problem. So, if it’s a DAX query, or, maybe, there is a simpler or better way to express your query. So, why not Google some of the key words in combination with each other and see whether there are better, quicker ways to get exactly the same outcome. Visual display, well, unfortunately every visual has got a display cost.

Now, are you using it in the best way? For example, have you got a scatter graph with hundreds of thousands of points plotted, whereas, maybe, you could reduce duplications or change the granularity level so it such a different level and still have roughly the same outcome. Or, maybe, you’ve got too many displays, maybe, you’ve got all of the visualisations and it’s just a bit too much. And you could for instance, get a drill through target you could have visualisations on other pages that feed off on the central page, so you don’t have to have everything on the one particular page. And then, maybe, there is a problem with your relationships, here we’ve got relationships which are fairly straightforward, but, maybe, you’ve got multiple relationships going from one to another, to another, to another, to another. When you’ve got those going in all sorts of directions that is called a snowflake schema. Maybe, some of those tables can be merged into one bigger table and that would be closer to a star schema which is probably better in terms of relationships.

And another way of improving performance is to change these cardinality types if any of them are many-to-many. Why is this? Well, let’s say you have got five roles in one table relating to five roles in another table apart from many-to-many relationship, well, you’ve got 25 roles that the computer has got to generate, has got to hold in its memory. It will slow it down. Whereas if we have got this sort of Bridge, then that is a more logical way of doing this. So, the computer has in its memory all of the countries and these two aren’t directly joined they’re joined to the Bridge. So anyway, to choose many-to-many would be useful if you had a one-to-one, why have you got them in two separate tables? Unless there are reasons like security reasons then combine them, merge them together, so it’s one less thing for the computer to have to do.

So, these are some ways of identifying poorly performing relationships. You have to look at the relationship table, have you got too many relationships? Can you reduce the number of tables and still get the meaningful analysis you want? And four measures and visuals then go to View, Performance analyzer, and Start recording.

242. End of Part 3

Well, congratulations. We’re now three quarters of the way through the course. And let’s have a look at what we’ve been doing.

So, we’ve been designing, and developing, a data model. So, defining the tables, properties, any role-playing dimensions, cardinality, looking at star schema versus snowflake schema, and common date tables. We’d then be developing the data model. We’ve calculated tables and columns using DAX. So, with just a few items here, you can see that we have done an enormous amount of work on DAX, because it is not exactly non-trivial functions, such as time intelligence, and calculate.

Now, hopefully, you’ll look at all of these items, and say, “I feel pretty confident “about being able to do a huge number of these items, “if not all of them.” Alright. We only have one part left, part four. And that’s all about the Power BI service. I’ll see you there.

Comments
* The most recent comment are at the top

Interesting posts

The Growing Demand for IT Certifications in the Fintech Industry

The fintech industry is experiencing an unprecedented boom, driven by the relentless pace of technological innovation and the increasing integration of financial services with digital platforms. As the lines between finance and technology blur, the need for highly skilled professionals who can navigate both worlds is greater than ever. One of the most effective ways… Read More »

CompTIA Security+ vs. CEH: Entry-Level Cybersecurity Certifications Compared

In today’s digital world, cybersecurity is no longer just a technical concern; it’s a critical business priority. With cyber threats evolving rapidly, organizations of all sizes are seeking skilled professionals to protect their digital assets. For those looking to break into the cybersecurity field, earning a certification is a great way to validate your skills… Read More »

The Evolving Role of ITIL: What’s New in ITIL 4 Managing Professional Transition Exam?

If you’ve been in the IT service management (ITSM) world for a while, you’ve probably heard of ITIL – the framework that’s been guiding IT professionals in delivering high-quality services for decades. The Information Technology Infrastructure Library (ITIL) has evolved significantly over the years, and its latest iteration, ITIL 4, marks a substantial shift in… Read More »

SASE and Zero Trust: How New Security Architectures are Shaping Cisco’s CyberOps Certification

As cybersecurity threats become increasingly sophisticated and pervasive, traditional security models are proving inadequate for today’s complex digital environments. To address these challenges, modern security frameworks such as SASE (Secure Access Service Edge) and Zero Trust are revolutionizing how organizations protect their networks and data. Recognizing the shift towards these advanced security architectures, Cisco has… Read More »

CompTIA’s CASP+ (CAS-004) Gets Tougher: What’s New in Advanced Security Practitioner Certification?

The cybersecurity landscape is constantly evolving, and with it, the certifications that validate the expertise of security professionals must adapt to address new challenges and technologies. CompTIA’s CASP+ (CompTIA Advanced Security Practitioner) certification has long been a hallmark of advanced knowledge in cybersecurity, distinguishing those who are capable of designing, implementing, and managing enterprise-level security… Read More »

Azure DevOps Engineer Expert Certification: What’s Changed in the New AZ-400 Exam Blueprint?

The cloud landscape is evolving at a breakneck pace, and with it, the certifications that validate an IT professional’s skills. One such certification is the Microsoft Certified: DevOps Engineer Expert, which is validated through the AZ-400 exam. This exam has undergone significant changes to reflect the latest trends, tools, and methodologies in the DevOps world.… Read More »

img