PL-300 – Section 25: Part 3 Level 3 – Statistical Functions
188. Measures – an Introduction, with Standard Aggregations including Countblank
In this part of the course, what we’re going to do is have a look at not new columns, though, we will be using what we’ve learnt in the previous session in the future part of this course, but also new measures.
Now, we’ve used measures quite a lot in part one of this course. Whenever you drag anything through to visualisation, it’s quite often if it’s numerical going to be a measure. So previously, we had the sales territory and sales amount. And we are summing it and that is a measure. And measure is something that you can sum, you can average, you can mean, you can max, it also happens to be something that you can count.
So, you can also count things that aren’t measures for instance. I can count the colours. I can count how many distinct colours there are in the rainbow. So, six or seven of them. But there are some limitations as to what we can do with measures and what we’re going to do in this video, is have a look at the standard measures plus one or two more as they are represented in the DAX language. So, here are the standard functions that you probably are very familiar with: sum, average, count, max and mean. Sum adds together than numbers. Average is an arithmetical mean of numbers. So, it adds all the numbers together and divide by the total number of numbers. Count counts the number of numbers. Max gives the largest numeric value and mean gives the smallest numeric value.
However, there are also these other functions with an A after it. And what these do is not just use numbers, but also logical values. And on occasions of a values count A, for instance, counts everything this is not empty. So, it could be text and date. But in reality, what we’ve got in Power BI is a very tight version. we can only have one a particular type of field, contents in a particular column, we can’t mix up numbers and strings as we saw in the version, in the previous section. So, a lot of this is actually just going to repeat what we’ve got over here
But let’s find one that doesn’t repeat what we’ve got as the standards. And that is count blank. Count blank counts the number of blank cells in a column. And in our previous version, in our previous section, we had quantity 4.99, which calculates… Well, in this case anything that’s under five or over 10 are equal to seven we expanded it but this includes a lot of blanks.
So, let’s have a look at the quantity 4.99 and count what we’ve got. So, if I go to count you’ll see that we have got a total of 60,000 rows which are not blank in sales amount. And that’s because sales amount, if I scroll down to sales amount, always has a figure, when somebody is placing an order. It always has value. But we also have this quantity 4.99, which doesn’t always contain a value. So, let’s count quantity 4.99. And you can see there are 52,000 values which are numeric.
But how many values are not numeric? Well, this is when we use the count blank function. So, let’s create a new measure.
Now, you can create new measures based on sum and averages and counts and so forth. But how user friendly is that really going to be? Well, maybe, a little user friendly, you might want to set half a value for total sales amount. But quite frankly, it just adds to the confusion adds to number field you got unless you’ve done hiding sales amount. So, let’s concentrate on the newest stuff. So, I want to use count blank. And I’m going to call this quantity blank or quantity 4.99 blank equals and it is count blank. And you can see the computer auto completing and which column are using, we’re using the quantity 4.99.
So, this is how we create a new measure. We put in the measure name and an equal sign and then the formula. So, we check that we commit it. And nothing happens. I can’t see it anyway here in my fields, but it is actually there. It’s just that we’re in the fact internet sales, or I’m looking at the fact internet sales, but it’s actually added it into the dim product. So, why is that?
Well, maybe, I had dim product highlighted at one point, I don’t know. What I do know is it’s definitely in the wrong place. So, a better way of creating a new measure is not to click on this new measure right up there. But to actually go into the relevant table, in this case, the fact internet sales, which is where I wanted to be, and go to the dot-dot-dot (…) and go to new measure. So, that’s my measure which I’ve previously done. Click check to commit it. And there it is now in the right place.
Another way of making sure it goes in the right table is by doing it in the data. Because there you only have one table highlighted at once. And then if you click new measure, we know which table you’re talking about.
Now, notice difference in the icons, we have here a calculator, as opposed to something that looks more like a spreadsheet or column. Also, notice that when I go into the data, it is not there as a separate column. Because it does not operate as a column it operates as a summation of a column. So, now if I move this into the values, you can now see that the number of blank rows is 7,403. And when you add these two together, you get the total number. Now, because it’s already a measure, you can’t change the summation. So like, you can change the summation here, the aggregation, you can’t do that in quantity 4.99 blank because it’s already a formula.
Now, as I say you couldn’t use sum or average on any of these existing ones, any of these existing fields count columns, but it just adds to the amount of clutter that there might be. And also reduces the flexibility because suppose I was putting in order quantity. I would be forcing it to sum order quantity I can’t count it, I can’t average it, I can’t mean it, I can’t max it.
Of course, if that’s what you want, great. If you want to restrict the user in just doing a summation, then you can create a new measure. And this would be total order quantity equals sum of order quantity.
Now, I just want to point out one small of a difference. You remember when we were doing new column and I started typing in a field, it started to auto complete it. So, if I’ve tried to put in order quantity into the field, into the calculation, it’s going uhm… Dan what you talking about? Because you have to use a formula first, you can’t just say order quantity plus one or something like that. So, we have to go sum of order quantity. So, there is a new field. And then if you so choose, you can then hide order the quantity so that it is not in display for the end user.
Now, we’ll be using sum in the next video and finding out some of the limitations of sum but discovering a whole new set of formulas to get round this limitation.
189. Aggregation of calculations
In the previous video, we used the sum function, to sum the order quantity. And what we’re going to do is just get rid of everything we’ve got apart from sales territory, and add in the total quantity order. And also adding that the original one, which defaults to sum, just so you can see we got it working.
Now, suppose, I didn’t want this particular value this particular measure to be the sum of the order quantity itself. Suppose, instead, I wanted to have a bit of a goal. So, every time somebody orders something, it could be one it could be several of them. I want to goal of them ordering another one. So, where it says one, I want this to be two.
Now, we did something similar in the quantity 4.99 plus one. So, we said, where a certain criterion happens, then give me one on top. So, for every single row, I want an additional item and additional one quantity.
So, let’s change this total order quantity from sum of order quantity to sum of order quantity plus one. And we can see that there is a problem, the sum function only accepts a column reference as an argument. So, in other words, we can’t just put in an expression, is only a single column. So, we can’t do that with sum. So, what we could do, of course, is use a helper calm as we’ve done previously. But a better way is to use an extended form of these aggregation and these extended functions have an X at the end. So instead of sum, it’s sum X average X, count X, count all X, which is count AXE, max X. So max X has two Xs at the end. Median X and mean X. And each of these, it takes two arguments, two expressions that we have to put in the first is the table. And then it goes down, in this case in territory of the table, and does a calculation per row.
So, let’s see how this works. So, instead of sum, we putting sum X. So, we need at the table with the table is going to be the fact internet sales.
Now, we can actually do a calculation to bring a calculated table and we’ll do that much later. So, you will see we don’t have to actually put the entirety of a table in. But we will do so just in this particular instance, so the table is the fact internet sales, and then the expression where we want the expression of the order quantity plus one. So, this is our standalone, what we want to do, but we are referencing this for every single role within this table. S,o no squiggly underlines, I will just change so that it is our goal. And you can see that we have in fact doubled the values because each order quantity was one, it’s now two.
Now, this is similar in Excel to array formulas. So, if I put in the values one, two, three, four, we can just sum that up very easily.
Now, if I wanted each of these values to have one first I could use a helper function. So, here we have two, three, four, five. And again, sum that up, make 14. But what I can’t do in Excel is say, have all of these values and add one each time. The computer goes, I don’t know what you’re talking about. But we can do that in Excel using an array formula, where it takes each of these as a separate value, and does something to it in this case add one before passing it to the sum. And to do this in Excel, instead of pressing Enter or Return, you press Ctrl+Shift and Enter. And the computer and these brackets you don’t do that yourself. But it gives exactly the same result without the need of this helper column. And that’s what these extended functions do.
So, the X pertains to an extension of the original function where we have a table, and then an expression, which is done for every single row within the table. So, you could do a new column, and then you’ve got the possibility of doing a sum on that column. But it does take up additional memory, because the computer has to maintain for each row the answer to this. Or you can use one of these X functions, these extended aggregations.
It’s a lot simpler in terms of the amount of work that the computer has to do, and it takes up a lot less memory inside the computer. So, I would prefer using these X functions rather than a helper column. Unless I had a specific reason for using the helper column for instance, I wanted to check the answer row by row just to see whether it actually makes sense. So, these are the extended functions.
190. Other Statistical Functions
Now, a lot of the other statistical functions in DAX are fairly advanced and I don’t want to overwhelm you with this number of function,s especially since some of them are to do with statistics or the inverse of the left-tailed probability of the chi-squared distribution, but don’t forget if any of these look of particular interest, and we do have hyperlinks in the spreadsheet that will take you to the official Microsoft documentation, and it’ll show you things like the full syntax and how you can use it.
So, ADDCOLUMNS for instance, can be quite useful later on when you’re manipulating tables in memory, and you want to add a calculated column.
COUNTROWS, counts the number of rows in the specified column or in a table expression. So, you can see a lot of these are about manipulating tables in memory.
DISTINCTCOUNT, that counts the number of different cells in a column, so for instance, suppose we had a column with a lot of ones, lot of twos, lot of threes. Well, DISTINCTCOUNT would then give you the number three being the number of different types, different results. However, you can do that in Power BI itself, so it’s more useful as a part of another function I think.
PERCENTILES, so this can be useful if you need to know say what the 50th percentile of a particular range is, and you’ll notice some of these are PERCENTILEX functions, so these are probably more useful say if you wanted to find out the most common PERCENTILE.
RANKX, again an extended function, more useful in measures, but very complicated as you can see from the number of arguments that it takes.
ROW Function, not what you might think. It doesn’t give you the row number. It returns tables with a single row, and as you can see, probably quite advanced.
SAMPLE Function gives you a sample number of rows. It’s similar to TOP N except TOP N gives you the top say 10 number of rows, whereas as sample will give you a random sample.
Now, we’ve also got some trigonometric functions like SIN or sine and SINH, but you can see the majority of these functions are fairly advanced, so if you do want to find out more about them, please, use the hyperlink.
In ending this particular part of the section, I do want to talk about one more function and that is the RANK.EQ function. So, this returns the ranking of a number in a list of numbers. So, let’s say we want to find out in this SalesAmount, where does 4.99 fit into everything? Well, we can introduce a new calculated column, is it top, is it bottom? So, here we have the SalesAmountRank equals RANK.EQ. EQ means equal, so when more than one value has the same rank, the top rank will be returned as opposed to the average rank. So, what value are we testing?
Well, we want to test the SalesAmount. What is the column name that we are testing that particular value in? Well, it also happens to be the SalesAmount. So, the fact that these first two are identical, don’t treat them as actually referring to the same thing. We’re looking for the SalesAmount on this particular row, and we’re looking at the SalesAmount in the entirety of the column, and then the order, is it ascending or descending? So, we’re going to say descending, so that the biggest amount gives a number one. So, if I get that, you’ll see that all these four pounds 99 amounts are ranked 46,000 and first whereas if we look at the very top amount, we will see that that is for 3,578 pound 27.
So, RANK.EQ, as I say probably one of the more useful functions that we haven’t looked at in the statistical functions. It gives you the ranking of a number in a list of number, so the first one is the value in that particular row, and then the second argument is the column name and quite often they will be the same.
And finally, just one word about simplification. We are in the FactInternetSales table. So, we have got a function here, which refers to the tab FactInternetSales and then the column SalesAmount, but we’re already in the FactInternetSales. So, the computer gives this by default, but if you prefer to just simplify the formula, if you’re in a table and you’re referring to the same tables, you can get rid of the reference to a table. So, if you think that looks simpler, then feel free to omit the table reference.
Now, personally, I generally keep the table reference unless it’s really getting a very complicated formula or it’s fairly obvious what I’m referring to. So, the choice is yours. If you’re in a table and you’re referring to another column in that table, you can get rid of the table name.
So, statistical functions, there’s quite a lot of them. They’re fairly advanced. The remaining one that we’re looking at, but if you want to know more about any particular function then just click on the hyperlink and it will take you to the documentation.
Interesting posts
The Growing Demand for IT Certifications in the Fintech Industry
The fintech industry is experiencing an unprecedented boom, driven by the relentless pace of technological innovation and the increasing integration of financial services with digital platforms. As the lines between finance and technology blur, the need for highly skilled professionals who can navigate both worlds is greater than ever. One of the most effective ways… Read More »
CompTIA Security+ vs. CEH: Entry-Level Cybersecurity Certifications Compared
In today’s digital world, cybersecurity is no longer just a technical concern; it’s a critical business priority. With cyber threats evolving rapidly, organizations of all sizes are seeking skilled professionals to protect their digital assets. For those looking to break into the cybersecurity field, earning a certification is a great way to validate your skills… Read More »
The Evolving Role of ITIL: What’s New in ITIL 4 Managing Professional Transition Exam?
If you’ve been in the IT service management (ITSM) world for a while, you’ve probably heard of ITIL – the framework that’s been guiding IT professionals in delivering high-quality services for decades. The Information Technology Infrastructure Library (ITIL) has evolved significantly over the years, and its latest iteration, ITIL 4, marks a substantial shift in… Read More »
SASE and Zero Trust: How New Security Architectures are Shaping Cisco’s CyberOps Certification
As cybersecurity threats become increasingly sophisticated and pervasive, traditional security models are proving inadequate for today’s complex digital environments. To address these challenges, modern security frameworks such as SASE (Secure Access Service Edge) and Zero Trust are revolutionizing how organizations protect their networks and data. Recognizing the shift towards these advanced security architectures, Cisco has… Read More »
CompTIA’s CASP+ (CAS-004) Gets Tougher: What’s New in Advanced Security Practitioner Certification?
The cybersecurity landscape is constantly evolving, and with it, the certifications that validate the expertise of security professionals must adapt to address new challenges and technologies. CompTIA’s CASP+ (CompTIA Advanced Security Practitioner) certification has long been a hallmark of advanced knowledge in cybersecurity, distinguishing those who are capable of designing, implementing, and managing enterprise-level security… Read More »
Azure DevOps Engineer Expert Certification: What’s Changed in the New AZ-400 Exam Blueprint?
The cloud landscape is evolving at a breakneck pace, and with it, the certifications that validate an IT professional’s skills. One such certification is the Microsoft Certified: DevOps Engineer Expert, which is validated through the AZ-400 exam. This exam has undergone significant changes to reflect the latest trends, tools, and methodologies in the DevOps world.… Read More »