PL-300 – Section 16: Part 2 Level 4 – Transform – Text and Numbers Part 2
129. Transform/Add Column – Text – Extract
Now, in this video, we’re going to have a look at the Extract functions.
Now, there may be a bit of similarity with Split Column, because we are able to split a column by delimiter. So, if we take this region, for example, and split this column by the delimiter of a space bar at each occurrence, you can see that we got East of England, for instance, in three separate columns. We could also say, well, we don’t want to split all three, we just want the first instance. So, we want the left-hand most instance, in other words, the first word. East, and then of England.
Now, what we are able to do in Extract with delimiters is have text before a particular delimiter. So, this doesn’t split it into two or more columns, this will, in the case of Transform, overwrite your existing column. In the case of Add Column, it would create a new column with this information. So suppose I wanted the first word, and I didn’t want to have to split by delimiter, and then delete this second column, which is, of course, perfectly possible to do. Instead, I could just extract all of the text before the delimiter. So, I can say, “I want all the text before the space mark.”
Now, you could say, “I want to run from the end,” so in other words, instead of looking for the first space, I want to look for the last space. So, if I just do this first one first, so you can see, now we’ve just got the first word. If I go into the settings, I can change that, so we’re still looking for space, but we’re now looking from the end of the input, and so that is giving me everything but the last word. So, this one here, for instance, is just one word, England. Everything but the last word is a blank.
Now, I’m personally not too keen on blanks. What I would do then is replace the values, replace a blank with a null. Or you could say, “Well, that’s fine giving me the first word, “but what I actually want is the second word.” And so, we go into the advanced options and say we’re going to skip one space. So, this will actually give me the first and second words. So, Inner London, North East, East Midlands, and then I could have another extract, which says I want all the text after the first space. So, that would give me then the second word in total. So, we have East, West, London, that was Inner London or Outer London, and east of England becomes of. So, what we’ve done is we’ve taken the first two words and then taken the second word of that.
Now, there is another way of doing that, and that is to say, I want to extract text between delimiters, so between a space and a space. And again, you can say where you want to start and stop. So, maybe, you don’t want to start at the first space, you want to start at the second space, in which case, that would be a number one. So with this, if I’m looking at text from a space to a space, that again gets me the second word.
Now, Extract can also give you things that you are probably quite familiar with in Excel. We’ve got length, for instance. So that just gives me the length of the text, so Text.length. Or you can say I want the first few characters. So, I could say I want the first three characters, how many starting characters are you going to keep? So East of England becomes, EAS, Inner London becomes INN, and so forth. And that’s Text.start. So, we’re not talking about left and right and mid as we’re doing in Excel; all of these start with Text, capital T, and then a dot, and here we have the start. So, you won’t be too surprised to learn what the formula is for the last few characters, it’s Text.end. There is, however, one potentially surprising thing.
Let’s go for Extract Range, and, let’s say I want from the third character onwards, for four characters, okay? So, I’m going to say the starting index is three and number of characters four. So, if you have a look at the top example, East Midlands, what I want is the third character, which is here, and then I want four characters. So that should get me ST space M. Let’s see whether that’s correct. Click OK. And you can see it doesn’t give me ST space M, it gives me T space MI. And the reason for that is because of this one. We are expecting in Excel that we have one, two, three, four, five, six, seven, eight, nine, and so forth. However, that’s not the case in Power BI. The first character is not index number one. Because Power BI, like other things like C#, and other programming languages are what’s called a zero-based index. And so the first character is not one, but zero. And therefore, if I wanted the third character, I would need to go for index number two. So, to get the third character, this text middle, three comma four, we need to change that to two comma four. So, now we have got, starting from the third character, and going forward for four characters by using two, four. So, you don’t have to worry about that with Text.start or Text.end, you just have to worry about that for Text.middle. So, this is a list of commands like len, left, right, and mid, which are translated as Text.length, Text.start, Text.end, and Text.middle. And it’s only the Text.middle that you have to remember to deduct one from your starting character.
130. Transform/Add Column – Text – Parse
In this video we’re going to have a look at parse. XML Parse, which I’ll just remind you is in the Transform and in the Add Column. Generally you’d probably want to transform them. But first of all, what are XML and JSON, J-S-O-N? Well, they are both things, largely used on the internet, so if I start with XML, which is the older one of the two, that stands for Extendable Markup Language. And I’ll show you an example of an XML file that I’ve got in, Notepad. And this relates to a shopping trip, and I’m not going to go into all of the details of the XML but suffice to say that, it is a fairly understandable, yes it’s got a very strict syntax but this course isn’t about building XML or JSON files it’s about understanding or how to get the data from, XML or JSON files. So, I had a shopping trip, the shopper name was me, the weather was nice, and in fact I had two shopping trips. The first one I got bananas for five dollars, apples for four and cherries for three. And then the second time I got emeralds, diamonds, and furniture, I do my shopping in alphabetical order roughly. And you’ll notice there’s no actual cost for any of those, it’s null. So, this is a perfectly well formed XML file. Now, you don’t just have to have one XML.
Now, I’m going to happen to use just one example but we could have example two and example three and so forth further down. So, all of this is contained in just one cell in my Excel.
Alternatively, you might be importing it from text documents, or down from the web directly, there might be lots of things you’ll be doing with these XML files if you have cause to use them. So, what I’m going to do now is load that, XML file and we’ll see how it’s presented. So click okay. And we have a new query, and as I say, all of this is in the one field, one row, one row, so it’s literally is one piece of information.
Now, so far it is a text file and it’s very difficult for me to drill down, for instance I can’t tell you the cost of four apples, or apples being four dollars. I simply can’t tell you that. So, what I can do is I can drill down. And I do this by going to Transform, Parse, XML. And my XML is now changed into a table.
Now, we’ve previously had a look, at how to get tables out. We go to Transform and Structured Column, Expand. Or alternatively we can press this little button which also does the same job. So, whatever you do, click on that, and our table then will disappear in favour of, the actual parts of it. So, if we just have a look at our example again, we’ll see that we have a shopping trip and, or the two shopping trips we’ve got, and we have the shopping name and the weather. So, let’s expand and we can use the original column name as the prefix, I’m not going to, I just want to see the name shopping trip and so forth. And shopping trip is still a table because you can see there are two shopping trips. So, I’ll expand that further, and we can see we’ve got an item and the shopping trip ID, so L1 and L2, and then the actual contents of it as well.
So, let’s look, again I’m not going to use the original column name, as a prefix, so here we have shopping trip L1 and shopping trip L2, so already our one row has expanded to two rows because there were two shopping trips. So, just to remind you, we had one row with item number one, it contained two shopping trips, now we have two rows, both with an item number one. So now I can expand each of these shopping trips and in fact if I wanted to click on the table, and temporarily see what’s underneath, I can do, so you can see bananas, apples, and cherries. So, I’ll just undo that, or I could use the arrow keys, and you can see it at the bottom. So, now if I expand this further, you can see, element and attribute, so the element being what it is, I got the bananas, apples, cherries, the attribute being, the thing which is within these brackets, within these less than and greater than, but you can see it’s the cost. Really you don’t need to focus too much on what elements are, what attributes are, all you need to know is you can keep expanding, and expanding, and expanding, until you get to the actual data. So now we’ve got six rows of data, which is completely in line with what we’ve got in our XML file, and you can see bananas cost five dollars, apples cost four, cherries cost three, and emeralds, diamonds, and furniture were done in shopping trip L2 with a null cost, an unknown cost.
Now, it’s at this stage, that really all we’ve done is clicked expand, expand, expand, expand. That’s all we’ve done to get from here, apart from going to transform XML, and we’ve just clicked expand, and expand, and expand, and expand, but it’s at here that you can be grateful that this is a macro, this is a series of instructions for the computer because when the XML changes, assuming that it’s in the same structure, then you don’t have to do any of this, the computer’s done it for you. So, you load the data, you refresh the data, and the analysis changes automatically.
So, imagine this was not one item, imagine this was a hundred items, a hundred rows initially right at the beginning. Well, it’s exactly the same principle, all I would do is exactly what I’ve done here, I would transform, I would, parse the XML and then I would click on the, expand, expand, expand. Either in the Structured Column or to be honest, I’ll just click this little icon here. So that is XML. There is, however, another, slightly newer version. And that’s called JSON.
Now, JSON stands for JavaScript Object Notification. And it’s used quite often in webpages. And you can see I can describe the same sort of thing, it’s a different format, we’ll use these little brackets, these little squiggly brackets, but it’s still all understandable I think. So, if I open up a JSON document. Again, the computer has it all in one cell in Excel. So, if I go to, JSON, they’re all in one cell. And again, what I can do, is just transform it, I will parse it, so that says to the computer, this is JSON, read it as such. So, it’s a bit like changing the data type to JSON except there is no JSON data type that the computer knows, and now you can see that it is a record. So, it’s contained, the person’s name, in this case, my name. Because that was the first thing in the JSON, and then it contains another record within it, the shopping trip. So, if I expand the record, you can see the shopping trip, there is one shopping trip, I’ll expand this. Within the shopping trip, there are items. Each of those items is, part of a record, within a list. So, I need to expand the list, and that will expand it onto new rows, and then expand the records. So list, essentially going down, adding more rows, records, adding more columns.
So, now you can see, just by clicking the same buttons as before, the computer has understood, now, because I’ve told it it’s a JSON document, that we have bananas at five dollars, apples at four dollars, cherries at three dollars.
Now, I just want to go back, a step here, to the list. You’ll notice it said expand to new rows, but the alternative is extract values. So, if I extract values with a comma between them, we can see, that we get an error because it’s a bit too complicated for the computer to do but if it wasn’t containing records, it was just containing numbers, then you could see, or fruit, it would say apples, comma, bananas, comma, cherries. But generally what I like to do anyway, is expand to new records, and then expand further, and then I’ve got everything back where I want. I want each column to represent a different unique thing. So, this is the person who went shopping, this is their shopping number ID, this is what they bought, this is how much they cost. Whereas if you were to, concatenate them, into, a single value, into a single piece of information, then I wouldn’t have that, I wouldn’t have apples on one row, column, cherries on another row, bananas another, I would have apples, comma, bananas, commas, cherries. Not usually what I want. So, if you get, some JSON or some XML it’s very easy for you to be able to use them in Power Query, Power BI, all you’ve got to do, is import it, and then parse it, with JSON or XML.
131. Transform/Add Column – Number Column – Statistics and Standard
In this video, we’re going to have a look at the Transform or Add Column, number functions. And there aren’t too many surprises here.
So, first of all ,we have Statistics. So Sum, Min, Max, Median, Average, Standard Deviation, Count Values and Count Distinct Values. So, what that will do is transform what you’ve got into just that, so here we have the sum of the AveragePrice.
Now, the formulas that they use is not that surprising, it’s List.Sum, Min, Max, Median, Average, StandardDeviation, NonNullCount.
Now, Count Distinct Values will look at a particular column and you can see this can actually work for text columns as well. And see how many values there are, which are unique. That’s to say, they can repeat, but ignore all the repetitions. So, in this case the answer will be 12 because there are 12 regions in our data.
Now, going to the Standard we have Add, Multiply, Subtract, Divide, Integer-Divide. So, what these first four do is they change the column, in this case, or add an extra column, which shows what happens if you divide a particular number or multiply a particular number. So, let’s say this AveragePrice is in Pound sterling and we want to multiply it by 1.3 let’s say to get an answer which is in dollars. Well, I could add a column, multiply and it says “what do you want to multiply by?” Well, in this case 1.3, click OK. And there we have our new price. But suppose we had two columns so here we have the price in dollars and the price in pounds for instance, and we wanted to divide each of them. So, I want it to go “what is dollars divided by pounds?” And this is something that you can do as well from here. So again, Standard > Divide, but instead of dividing by 1.3 you want to divide by pounds or divide by another column. And that’s where we click on this ABC123 which is a fixed figure, like 1.3 or hello or something like that. And instead I will say “use a column”.
And I want to use the column pounds. It’s only giving me the columns that make sense, I can’t divide dollars by Region. 59,000 divided by East Midlands doesn’t make sense. So, when I click OK, then we get the answer, unsurprisingly of 1.3, with a slight rounding error in some cases. And you would probably not be too surprised to learn the sign is exactly the same as it is in Excel, plus, minus, multiply, divide. Divide Integer, that’s the same as the Divide except it gives a whole number. So, for instance if I was to divide these dollars by two let’s say I wanted to get it into another currency, but I just wanted a whole figure in the end I don’t want all these decimal points then that’s what integer division can do. And also notice, we’re not using the divide sign this time, we’re using Number.IntegerDivide.
Modulo is the next one. And what Modulo does is it takes the remainder. So, let’s say I wanted to divide this by 10, so no problem I can divide that by 10. Or, maybe, a thousand. Let’s say I wanted this figure to be expressed in thousands, that’s fine, but I actually want it to be in a whole number. Well, I’m going to change that from a Division to a Division Integer. So, let’s divide that again or I could obviously just change the formula. So, that gets me the number of thousands. And to be honest I wouldn’t do the renaming that way, I would actually just edit the formula. That just showed that a lot of stuff can be done with just using the GUI, with using the interface that we’ve got. But then I want the remainder after divided by thousands. So that’s the Modulo.
So, I want to find the remainder and again you can use another column if you want. And so this number is 59,000 plus 208.5. These last two functions, Percentage and PercentageOf are merely versions of things that we’ve seen. So, for instance, I want 50 percent of this pounds, well you can just multiple by 50 and divide by 100 and that’s what Percentage does. So, I want 50 percent of that figure. There we are, 22,772, which is multiply by 50, divide by 100. And PercentageOf is similar, what percentage of the pounds, Dollars, what’s the percentage relationship between them. So, percentage of Dollars, of the pounds will come up to 130 because it’s a 1.3 relationship. So again, all of this, just very similar to what we’ve already done. Just expressed in a convenient form, if you so wish.
132. Transform/Add Column – Other Number Column Functions
The scientific and trigonometric values of functions.
Absolute value so that takes away the minus sign at the beginning. Power so we can square it, whenever it’s multiply by itself, we can cube it, multiply it again. So three times, or as many times as we want. So, this is Number.Abs, Number.Power, square root Number.SQRT. Slightly different functions than you might be expecting. So that’s the sequence of power to the half. So four squared is 16. The square root of 16 is four and then exponent logarithm. You know if you need them, whether you need to use E or logarithms, if you don’t, you’ll never need to know that it’s Exp, or log 10 or Ln. And then factorial five factorial is one times two, times three, times four, times five. So, I would say these first three much more often used than these last three. Trigonometry, sine, cosine, tangent, arc sine arc cosines and arc tangents. I’ve never had to use them in real life. But if you need to use sin, cos, tan, Asin, Acos and Atan then they’re all number dot functions. Rounding you can Roundup, you can round down, or you can round to a multiple. So, if I wanted to round up, that just rounds up to the nearest whole number, rounding down will round down to the nearest whole number. But round dot-dot-dot (…) how many decimal places you want to round to. So, let’s say I multiply this by 1.1 let’s go for our earlier example 1.3 to get the dollars. So, I’ve added a new column, but then I’m going to transform this column so I’m not going to create yet another new column and I could just round up. So, let’s take 57145.4 becomes six, so rounds up or round down. So that just truncates it in this case, if they’re negative numbers, then you go down to the down to next one, or I could just round. So, because zero decimal places. And in fact I’ll just add a column to do this particular example. With zero decimal places. You can see that if it is .5, or greater than .5, it rounds up. If it’s lower than .5 it rounds down, if it is exactly equal to .5. Well, it looks like it’s rounding down. And this looks like it’s rounding down as well. This one here, but if I go down a bit further, this one rounds up. So, what actually happens is it rounds to the nearest is multiple of two. So, you’ll get an even number at the end. So above .5 rounds up below .5 rounds down .5 exactly it rounds to the nearest even number.
Now, you could, if you wanted, not go to zero decimal places, you could go up to one decimal place, which in this particular example would keep us exactly where we were. All you could go to minus one decimal places. So, that gets us the nearest tens. Or got the minus three decimal places, that will get us in nearest thousands. So, the functions there are Number.Round up dot round down, or dot round. And then finally, and not often that you’ll need this at all, is the information. So, we have a question of “Is it even?”, “Is it odd?” And what’s the sign? That I suspect is used more than is even or is odd. I quite often use it in terms of using it with the absolute value I want to know if it’s negative or positive or zero. So that gets me a Boolean, a true or false value.
So, this is a quick whistle-stop tour through the number there’s nothing really earth-shattering in all of this. So, the standards will allow for mathematical functions. Scientific will allow functions like absolute power and square root, trigonometric all to do with angles. Rounding gets us to nearest whole number or the one immediately above one immediately below or to nearest tens or nearest tenth and information is even, is odd and sign.
Interesting posts
The Growing Demand for IT Certifications in the Fintech Industry
The fintech industry is experiencing an unprecedented boom, driven by the relentless pace of technological innovation and the increasing integration of financial services with digital platforms. As the lines between finance and technology blur, the need for highly skilled professionals who can navigate both worlds is greater than ever. One of the most effective ways… Read More »
CompTIA Security+ vs. CEH: Entry-Level Cybersecurity Certifications Compared
In today’s digital world, cybersecurity is no longer just a technical concern; it’s a critical business priority. With cyber threats evolving rapidly, organizations of all sizes are seeking skilled professionals to protect their digital assets. For those looking to break into the cybersecurity field, earning a certification is a great way to validate your skills… Read More »
The Evolving Role of ITIL: What’s New in ITIL 4 Managing Professional Transition Exam?
If you’ve been in the IT service management (ITSM) world for a while, you’ve probably heard of ITIL – the framework that’s been guiding IT professionals in delivering high-quality services for decades. The Information Technology Infrastructure Library (ITIL) has evolved significantly over the years, and its latest iteration, ITIL 4, marks a substantial shift in… Read More »
SASE and Zero Trust: How New Security Architectures are Shaping Cisco’s CyberOps Certification
As cybersecurity threats become increasingly sophisticated and pervasive, traditional security models are proving inadequate for today’s complex digital environments. To address these challenges, modern security frameworks such as SASE (Secure Access Service Edge) and Zero Trust are revolutionizing how organizations protect their networks and data. Recognizing the shift towards these advanced security architectures, Cisco has… Read More »
CompTIA’s CASP+ (CAS-004) Gets Tougher: What’s New in Advanced Security Practitioner Certification?
The cybersecurity landscape is constantly evolving, and with it, the certifications that validate the expertise of security professionals must adapt to address new challenges and technologies. CompTIA’s CASP+ (CompTIA Advanced Security Practitioner) certification has long been a hallmark of advanced knowledge in cybersecurity, distinguishing those who are capable of designing, implementing, and managing enterprise-level security… Read More »
Azure DevOps Engineer Expert Certification: What’s Changed in the New AZ-400 Exam Blueprint?
The cloud landscape is evolving at a breakneck pace, and with it, the certifications that validate an IT professional’s skills. One such certification is the Microsoft Certified: DevOps Engineer Expert, which is validated through the AZ-400 exam. This exam has undergone significant changes to reflect the latest trends, tools, and methodologies in the DevOps world.… Read More »