PL-300 – Section 29: Part 3 Level 7 – Filter and Value Functions
211. RELATED – Flatten Out a Parent-Child Hierarchy
Now, this section is all about the filter functions and these are the ones which have absolutely no comparison with Excel functions. Now, these first two sort of do because we previously had a look at vertical lookup tables or something that look like a vertical lookup table which was the LOOKUPVALUE function. But you may remember, it’s a bit complicated. The RELATED function, which we are looking in this particular video, is a lot simpler and the reason for that is it’s using the relationships that we’ve already got built in and in this relationship function, we are going against the arrows. So, we’re going from the many to the one so we’re flowing the other way to the way that the arrow suggests.
So, you remember, in an earlier video, right neat the start of this part, I said that there was a problem you flow again the arrow. Well, this is how to get round this problem. So, for instance, this FactInternetSales table is part of the many where the DimProduct is the one. So, what we can do using the RELATED function is to look up something going upstream. So in the FactInternetSales, we can look something up in the DimProduct, in the DimProduct subcategory and the DimProductCategory. So, let’s see how we can do that. So, I’m going to add a new column which is called ProductName.
Now, what will be great is if I could just put in, well, it’s in DimProduct and it’s the EnglishDescription. So, it’d be great if I could just say give me the EnglishDescription and that’s it. But I can’t quite do that. Instead, all I’ve got to do is surround it with the word RELATED. Is if I could just type DimProduct EnglishProductName but I can’t because if I try to do that, there’s no autocompletion that is sensible. So, what I need to do is surround it with the name RELATE, with the function RELATE. So, now I can put in EnglishProductName in the brackets and there I get my EnglishProductName so we’ve got Mountain Tyres, Tubes and then further down, we’ve got Bikes and so forth.
So, let’s have another new column. This is going to be for the subcategory and that is equal to RELATED and I’ll just put in subcategory and there we have the EnglishProductSubCategoryName and then another new column and this is going to be the Category. And again, just put in word RELATED and I’ll just start typing in the word Category and there we go. Isn’t that easy to do? And yet how powerful this could be if you wanted everything to be in one particular table.
Now, of course, when we’re doing visualisations, it’s not necessarily that important because we could get something from one table and drag something out from another table and there’s no problem with that. However, suppose, you wanted to hide tables. Suppose you wanted everything just to be in one table. Well, that is possible if you bring them in.
Now, you might lose a bit of flexibility when you do that because you can no longer use the other things in the tables, like for instance, the FrenchCategoryName or the SpanishCategoryName but it could be useful and it could be really useful when debugging a model or if you want to concatenate two items from separate tables. So, suppose, I’m going to have another column which is CategorySubcategory and that is equal to RELATED category, then an ampersand and a space and the RELATED Subcategory.
So again, very easy to do but now you might be able to see how powerful this could be. So RELATED. It allows you in the many table to go upstream against the arrows and pull something in from other tables.
212. Design a Data Model that Users a Star Schema
In this video, we’re going to look at how we can design a data model that uses a star schema as opposed to a snowflake schema. But what exactly are these? Well, let’s have a look at our tables.
We have got FactInternetSales going to DimProduct going to DimProductSubcategory going to DimProductCategory. So that’s quite a chain. We start off at a table, which I’m calling a fact table, where we’ve got all things that we can aggregate, so all of the facts. And then we have got these other tables, which I’m calling dimension tables. So, these are things you can model. So, you can model products, there’s a physical item of a product.
Now, you can model subcategories of product. Now, it isn’t a physical item but it is a group of items. And you can also model categories of products.
Now, why have we got these in separate tables? Well, when we are doing data entry, what we want to have is as few repetitions as possible, as few inconsistencies as possible, and this is called normalising the data. Now, you don’t have to worry about how you normalise data for the exam requirements, but let’s have a brief look at the advantages of doing so. Now, if you go into the FactInternetSales table, and have a look at ProductKey 214, we can see that there have been quite a few of these ordered.
Now, let’s have a look at DimProduct table for ProductKey 214, and we can see that it is for a Sports helmet, red and we have a column for the colour, a column for the price and other things. What I’ve not had to do with this FactInternetSales for all of these 2,230 rows is each time that we have ordered ProductKey 214, I’ve not actually had to say what it is. I’ve not had to say that it is a sports helmet, red. I’ve not had to say what the price is. All of these are in the DimProduct table. And that’s the advantage of having it in a separate table. Firstly, for each of these rows, I can’t get these things wrong. If I say it is ProductKey 214, then I don’t have to have in a separate column sports 100 helmet, red. I can’t make a typographical error and say that it is sports 100 helmet, blue or something like that. As long as I get the ProductKey right, everything else that is in here will follow.
Now, similarly, this product is in ProductSubcategoryKey 31. So, if I were to go to DimProductSubcategory and scroll down to number 31, I can see that it is a helmet and this is part of Category 4. So going to the Category 4, and you can see that this is categorised as accessories. I’ve not had to say any of this when I go to ProductKey 214. All I have to do is put in the ProductSubcategoryKey.
Now, if I were to do this, then it could lead to some problems. For example, firstly, I would be saying helmet quite a number of times. That could lead to typographical errors. I might accidentally say pseudonyms. So, I might say helmet or helmets on different rows. There could be a few reasons why if I had to type it into each individual row, it might not be exactly the same each time. So that’s why we are using these relationships between these tables. So, we’ve got a relationship linking these two tables, DimProduct and DimProductSubcategory. In this case, the column ProductSubcategoryKey is linking them together and that’s great when you’re entering data. I only have to enter data into one table and not repeatedly add the same information each time, which is in DimProduct or DimProductSubcategoryKey. The only thing I’ve had to add is ProductKey 214.
So, this is great when I’m entering data and this is an example of normalisation. The disadvantage of this, if I ask Power BI for a visualisation, which includes category and sales amount from the fact table, Power BI has to use each of these tables, Category, Subcategory, DimProduct, all the way down to FactInternetSales to get the information from both FactInternetSales and from Category. It takes time for the computer to calculate it, depending on how complex the model is and the number of rows. In our particular model, it’s not a big deal. Power BI will calculate it fairly quickly. However, suppose, these four tables were only part of my model. So suppose we have our DimProduct table. We have our fact table here, we have say DimGeography, we have other tables and each of these can have additional tables linking from them. You can see we could have an enormous number of relationships. Let’s say that I create a visualisation, which requires Power BI to go from the category table all the way down here through the geography table to let’s say a country table. That’s an awful lot of relationships to consider. Now, this is called a snowflake schema because if I remove the boxes, then the resulting structure can look a bit like a snowflake.
Now, note, I’m not saying it has to look like a snowflake. I’m not saying that it has to have essential table linking to exactly five other tables, which link to three other tables but that’s the basic idea. It spreads out. It can look like a snowflake. So how can we avoid this? Well, we saw in the previous video that we can use the RELATED function. So, we can formulaically add in our DimProduct table all of our descriptive columns from the other dimension tables. And we wouldn’t need to have to put in keys, for instance, just the names. So, instead of having three dimension tables, which contain information about the what we are measuring, we can use a related table to combine them just into one. Then we can hide the other two dimension tables. This would lead just to one FactInternet table, which has the answer to the question how many or when and just one, in our example, dimension table, which has everything related to it. So, the answer to the question what is it in words. So sports helmet, red instead of ProductKey 214.
Now, quite often you wouldn’t just have a fact table linking to just one-dimension table.
Now, quite often, you wouldn’t have a fact table linking to just one-dimension table. So, here we’ve got a fact table linking to five different tables. So, we’ve got a product table, sales territory, currency, customer and a date table. All of these will be linked using relationships that you can set up in Power BI.
Now, this means that your dimension tables, because there are fewer, will have a lot of columns, and that there’ll be a lot of repetition in the rows. For example, taking our DimProduct table example, then when these tables are combined, there’s going to be multiple iterations of the same subcategory, multiple iterations of the same category in this DimProduct table. But this table is not likely to have a huge number of rows. It’s just going to have one row per product and that’s very manageable for Power BI, even with all of the extra space required for the repetition. Whereas our FactInternetSales is going to have a row for every product, for every order.
Now, that’s quite common with face tables that they’re a lot larger than the dimension tables. So whereas in our snowflake schema we’ve got tables linking to tables linking to tables, in our star schema, we have a central fact table linking to other tables, which show what we are measuring, the details of this. Because there are fewer relationships and fewer tables that Power BI needs to work through, this can greatly improve performance in more complex models. Though at the cost of repetition in your dimension tables.
Why’s it called a star schema? Well, it could be that if you have a table in the middle going to tables going from it that the resulting schema could look like a star. Again, this is a star that’s not limited to just five spokes. You could have 10 dimension tables; you could have 20. However, many is needed to make a model that you can use. So to design a data model that uses a star schema, you would have a central fact table that contains the things that could be aggregated, measured, in this case, transactions, and then potentially multiple tables linked directly off this fact table, which contains columns like descriptions, colours, categories, dates, customer information. So, there should be one table per type of description. One table for products, one table for customers, one table for a lookup data table. So, in order to do this, in a dimension table, you would add an extra column, which takes the information from other snowflake tables, which you can then hide. So, let’s say I wanted the EnglishProductSubcategoryName. So, I would use the RELATED function and put within it the EnglishProductSubcategoryName with the name of the table.
So, you can see helmets repeated in each of these rows. Once you’ve done that for all of the columns that you need, so you don’t need all of these key columns, which link the tables together, you just needed the other columns, then you can either hide the individual fields or once you’ve done the entire table, you can hide the individual tables. So that just leaves us with the fact and the dimension tables that we need.
So, this is how you can design a data model that uses a star schema.
Interesting posts
The Growing Demand for IT Certifications in the Fintech Industry
The fintech industry is experiencing an unprecedented boom, driven by the relentless pace of technological innovation and the increasing integration of financial services with digital platforms. As the lines between finance and technology blur, the need for highly skilled professionals who can navigate both worlds is greater than ever. One of the most effective ways… Read More »
CompTIA Security+ vs. CEH: Entry-Level Cybersecurity Certifications Compared
In today’s digital world, cybersecurity is no longer just a technical concern; it’s a critical business priority. With cyber threats evolving rapidly, organizations of all sizes are seeking skilled professionals to protect their digital assets. For those looking to break into the cybersecurity field, earning a certification is a great way to validate your skills… Read More »
The Evolving Role of ITIL: What’s New in ITIL 4 Managing Professional Transition Exam?
If you’ve been in the IT service management (ITSM) world for a while, you’ve probably heard of ITIL – the framework that’s been guiding IT professionals in delivering high-quality services for decades. The Information Technology Infrastructure Library (ITIL) has evolved significantly over the years, and its latest iteration, ITIL 4, marks a substantial shift in… Read More »
SASE and Zero Trust: How New Security Architectures are Shaping Cisco’s CyberOps Certification
As cybersecurity threats become increasingly sophisticated and pervasive, traditional security models are proving inadequate for today’s complex digital environments. To address these challenges, modern security frameworks such as SASE (Secure Access Service Edge) and Zero Trust are revolutionizing how organizations protect their networks and data. Recognizing the shift towards these advanced security architectures, Cisco has… Read More »
CompTIA’s CASP+ (CAS-004) Gets Tougher: What’s New in Advanced Security Practitioner Certification?
The cybersecurity landscape is constantly evolving, and with it, the certifications that validate the expertise of security professionals must adapt to address new challenges and technologies. CompTIA’s CASP+ (CompTIA Advanced Security Practitioner) certification has long been a hallmark of advanced knowledge in cybersecurity, distinguishing those who are capable of designing, implementing, and managing enterprise-level security… Read More »
Azure DevOps Engineer Expert Certification: What’s Changed in the New AZ-400 Exam Blueprint?
The cloud landscape is evolving at a breakneck pace, and with it, the certifications that validate an IT professional’s skills. One such certification is the Microsoft Certified: DevOps Engineer Expert, which is validated through the AZ-400 exam. This exam has undergone significant changes to reflect the latest trends, tools, and methodologies in the DevOps world.… Read More »