PL-300 – Section 13: Part 2 Level 1 – Get Data – Home

  • By
  • May 2, 2023
0 Comment

107. Sort and Filter

Now, it’s possible that you don’t have the data in the order that you want especially if you’re going to reduce rows by removing the top few rows or the bottom few rows. So, maybe, you didn’t want the very first ones where the Filename starts with the very start of the alphabet at zero-zero, zero-one, zero-two, or maybe, you wanted this column, this number column, to be sorted in the other order, in descending order, and then you can remove the rows that you don’t want, or you can do that with a Sort. And there are two sorts of Sorts, ascending and descending, fairly obvious, so I’ll put that, and notice I’ve highlighted a step that I actually wanted as an end step. So, just highlight the last step and click on it, Z to A, so that sorted the rows by Order, Descending. Alternatively, I could sort them by Ascending. So, I can use the Sort here, or I can use the Sort over here with these little arrows.

Now, while I’m talking about Sorting, I can also talk about filtering. Now, filtering, we’ve had a look at removing certain rows, that’s what filtering does, but in a more versatile way. Suppose, I didn’t want any of the files which have got .m4b, or maybe I didn’t want any of those which are .wma. I can just go down to this drop-down list and deselect it just like I can in Excel. So, here you can see it says filtered the number of rows, and now we’re down to a much smaller amount of rows, 190 it looks like.

Now, unlike Excel, I can’t then easily unfilter the rows. It may look like I’m able to do so, so here I’m just filtering and getting 190 rows, and then I can unfilter, but that’s only because I’m still on this same filtered rows step. So using Table.SelectRows. If I go to something else, like for instance, I Sort Descending and now go back into this filtered rows, our .wma option is no longer there. We’ve moved beyond that. We have done something else. So whilst you can unfilter immediately as in, in the sense of correcting your current filter you can’t then go back in a later step and unfilter what you have filtered. So notice also how this is displayed in M, Table.SelectRows, the previous step and then each, so for each row, where the File extension is not equal to .wma. So, if I was just to highlight m4b, then it would say where the File extension is equal to m4b. So, this allows you to do a little bit maybe of formula manipulation.

You might want to say, “Where it is, where number is between 100 and 200?” So, if I just get rid of these filtered rows and these sorted rows, let’s say I wanted a number filter between 100 and 200. Okay, that’s fine. We can do that, and you’ll also notice there’s an Advanced option where you could say is greater than 100 and is less than 100 and where another column is equal to something else, And or Or. So, it’s quite a versatile filter, but notice how this is being displayed. Where the number is greater than or equal to 100 and where it’s less than or equal to 200, so you could extend it and where is not equal to 101. Oh, I’ve got an end bracket at the end. That shouldn’t be there, so let’s just get rid of that. There we go, so we can use a basic filter and then develop it further if we so wish in the M functions.

So, it gives the same sort of functionality that’s here, but we’re not just stuck to equals, greater than, less than. We can use other M formulas that you might me learning about either in this part of course or by reading the manual or something like that.

A quick look at other types of filters for strings. You’ve got Beings with, Ends with and Contains, or the opposite, and for dates, you’ve got particular ranges most of which are probably more useful if you’re looking at current data. So, when we get to the date functions in transform, which we’ll be using a data source, which uses current date data, and so, we will be able to use these filters.

So, sorting and filtering, just like what you can do in Excel, but bear in mind, that once you’ve filtered and you’ve gone onto another step you can’t then unfilter.

108. Split Column

Now, sometimes the data that you get is not exactly in the format you might want it. For example, let’s have a look at this folder path. And let’s move it to the end. So, let’s just drag it all the way across.

Now, this folder path has got all of the paths a particular CD is on. So the v drive, Music, Album, A Decade of Excellence and so forth. But what happens if I don’t want them all to be in one big column? What if I wanted them to be in lots of separate columns? So, I’m just going to open this back up. So, we can see there’s lots of information there. Well, I can split it, but how should I split it? Well, I might want to split it based on a delimiter. So, for instance, every time we have this slash, we want everything else to go into a new column. We can do this by using the transform Split Column. So, I can split By Delimiter. So, I will split by the slash, at each instance of this slash. So, I could just say, I want the left-most one, or I want the right-most one. But here, I want every single one. Going to split it up into columns and you can see it’s going to be generating five columns. So, let’s click okay and see what happens. So now you can see folder path has been split into folder path .1, .2, .3, .4, and .5. So, what use can I make out of this? Well, suppose I wanted a report to be based on folder path four, which contains the CD name. So, let’s rename this, CD Name. I just double-clicked on it. Click Close & Apply. And once it’s applied, let’s just do a table to start with, with CD name, and the number of bytes, say. So, we’ve got size. So, here we can see, all of the CD names, with the size. Which we could obviously easily change into a line chart or something like that, so you can see Christmas Hits 2007 right at the top. So how else could we do this? Well, suppose we were aggregating, sorry, suppose we were splitting I’d say a space. Maybe, even just the first space, or the last space. How would that be useful? Well, suppose we got somebody’s name. Phillip space Burton.

Now, that’s fine in one field, but you might want to have a full name, or full names and surname. Or full name, middle name, surname. So, you’ll be splitting up based on the delimiter of a space. You notice that we’re using the formula Table.SplitColumn. And with all of these, as I say, you can let the computer do the hard work in working out which formula it is, and then if you want to edit it later, fine tune it, then you can do so. What other split columns can we do? Well, we can split by a number of characters. So in this case, suppose you didn’t have a text file that was split at various delimiters, it was split at various points. So the 15th character was split, and then the 20th character. Well, if they’re regular, you can split by the number of characters. So 15th character, say. So, I do that to folder path.5, you can see we now have the first 15 characters, and then all the rest.

Alternatively, maybe, you have one which is 15 characters and then you’ve got five characters, and then you’ve got another 10 characters. which is perfectly possible as well. Well, you can do the by positions. So 15, 20, 25. And there we have it split into these locations. However, also notice what didn’t happen. It’s hasn’t given us the first 14 characters. We have here, 12 Concerto for Strings Il Riposo. That has been changed to stri, five characters of a space, ngs, no apostrophe, another five characters, and Il Riposo. So, if you want the first set of characters as well to appear, then you need 1 comma, 15, 20, 25. In fact, you need zero, because is it zero based.

So, you can see, we’re missing the very first character there. So, if I change that to a zero, then we are getting the very first character and then 15 characters later, we get the next few and so forth. So, the split columns allows us to fine tune why we split. There are a few more instances. We have got split where it changes from lowercase to uppercase, from uppercase to lowercase, from a digit to a non-digit, or non-digit to a digit. So, when might these be useful?

Well, suppose, you have got a product called, which is all numbers, and then the product itself all merged into just one column. So 3, 4, 9, 2, 6, 4, 8, stapler. 1, 2 , 6, glass. They’re not necessarily of the same length. And they might not have a space in the exactly why you as a delimiter.

So, maybe, you want to split it so that when it goes from digits to non-digit, then you make a new column. Bit rarer, out of thought. So, the main two out of thought would be by limiter, and by positions which gets it into fragments. So, by delimiter, when you’ve got things like spaces, or backslashes, or anything that can be used to identify why the column stop and end, and then by positions. You can say well, the first one’s going to be 15 characters and the next one’s going to be five, next one’s going to be five, but do remember, you need to start at the very first character being zero. So, if you wanted the beginning 15 characters, then you would with zero comma 15.

109. Other Transform Activities

In this video we’re going to have a look at the remainder of the transform section of the home tab. So group by, suppose we have the CD name that we have created by splicing this path down. And we want to see how many titles there were for each particular CD name.

Now, you might be saying to me well actually what you should be doing is something similar to this. Actually, give it as an end visualisation, so let’s put say open in which is always going to be that. And need to apply the changes, drag that into Access it should be in value and so we can see here is the results that we want. And, utterly, I would generally tend to agree with you. This sort of summarization that the group by offers generally should be done as the final step. What we’ll try to do here in the Power Query editor is get our data into a format where we can use it well. So, what we’re going to do in a few videos time is have a look at a more practical example of this group by. For instance, I might say okay out of the number of bytes which are available in CD name, what percentage is occupied by each individual one.

So, for instance for all of these 12 items, we have a number of bytes, what percentage is that over the entirety of this particular CD? And that might be a useful thing that we can do at a line item role level.

So, let’s just have a look at this in isolation and in a future video we will be able to combine this with other functions that we’re looking at. So, I’m going to group by this CD name, so let’s just click on the group by and it says group by CD name, obviously this is where you can change it here and what do I want to do? Well, I want the total number of bytes. So, there’s my size so I’ll call this total bytes and click OK. And what happens is that our table is now replaced with this grouped table so remember just like filtering I can’t go back a step and still have it. So, I can delete this existing step, I can go back to previous steps to see what happened by our end data now is this grouped data.

Now, let’s just have a look at the dialogue box this is basic we do have an advanced version because it might be that you want additional groupings, so do it by type of music by CD name for instance, so, maybe, you want all the classicals to be grouped together and within that the CD names. Maybe, you want additional aggregations, so you can see what aggregations we’ve got sum average median min max count rows count distinct rows that means that it removes in the count duplicates, so, the count of AAABB is five, the count distinct is two because there’s only A and B. And then finally there’s count the number of rows that there are. Notice that this little dot-dot-dot (…) that appears, you can see what it does you can move things down and up when you’ve got multiple columns. So that’s the grouped rows. Data type so this is saying what a particular column is, decimal, whole number, percentage, something to do with date and time, text, true or false, also called Boolean, and some binary data which we don’t need to use in this version of the data that we have.

Now, the important thing about this is if you have the wrong data type to the column, the computer will be interpreting it wrongly, so that’s wrong. We’re going to be looking at an example where the computer is going to import date data from a text file, but it’s going to be in the wrong format. And it’s going to interpret it wrongly and so what we need to do is instead of having the computer auto convert it into a date, we need to convert it into a text and then we can use date functions to get it to interpret correctly.

So usually what the computer decides is fairly good, but we saw much earlier on how when we had the wrong number of well zero rows have moved that it was getting bad data types. If I just show you it again from the very beginning and open up the CD’s query, and allow the computer to do its thing, from the very beginning you will see that when we get to this number it’s actually giving it as a whole number so this one two three and text, and I don’t really know that that’s the best solution for the computer.

If we go into the transform, you can see that the number column we can’t use any of these functions in the number column but we can use all of these in the text. So while it’s saying ABC one two three is in whole number and detect field, it is really treating it as text. And if you look at the M language for this change type it is saying type any, in other words it doesn’t know what the type is and it’s just saying it could be one of many things to give you a clue, now let’s go back to our real query and you can see that the computer has intelligently put particular types.

It’s given this number this column as a whole number type and difficult to transform but you can see that the whole array of number column functions is available, it said that this is text, over here it said that this is a date time field. And because of that very little is available in the text column, there’s some but not others, but most importantly the date and time column functions are available. So, if the computer has got the wrong date time you should correct it at the earliest opportunity even if that means deleting some steps.

Now, replace values that as you can see replaces existing values with new values, so suppose we go back to our folder path and we’ve gotten some V column, maybe, we’ve got some others there as well, well I suppose it’s no longer my V drive, suppose it’s my U drive. Well, I can replace the values with find V column and replace with U column. So, you can match the entire cell contents and if you wish you can also use special characters just like you can do in the advanced version in Microsoft Word, so if I do this you’ll see that all of the V columns have been replaced by U columns. This could be quite useful if for instance we had some CD’s in one computer, some CD’s in another computer and it was getting information from two different places and we wanted to unite it into the one set of data with the one set of real data so U column for instance, we wanted to map it to the exact same place.

Now, one final thing about data types, make sure that you have a valid data type, suppose I try to change this size from a number to date time the computer is not going to have a really good time of doing this and you can see lots of errors, sometimes these errors might be good the computer is saying, you know I can’t do it and, maybe, these are data that isn’t really what you want. Maybe, you have got a huge report which is divided up into sections with headings and so forth, and you just wanted to keep the numbers. Well, what you could do is remove the rows that contain errors in this particular column and then you would have the right set of data. So, errors aren’t always bad it’s just indicating that there was a problem with probably a conversion earlier on.

In this video we had to look at group by, how to replace your existing table with an analysis again not generally recommended in the Power Query editor though we will be seeing an instance of how we can use it later on. We also saw how you can replace data types and how you can replace values.

Comments
* The most recent comment are at the top

Interesting posts

The Growing Demand for IT Certifications in the Fintech Industry

The fintech industry is experiencing an unprecedented boom, driven by the relentless pace of technological innovation and the increasing integration of financial services with digital platforms. As the lines between finance and technology blur, the need for highly skilled professionals who can navigate both worlds is greater than ever. One of the most effective ways… Read More »

CompTIA Security+ vs. CEH: Entry-Level Cybersecurity Certifications Compared

In today’s digital world, cybersecurity is no longer just a technical concern; it’s a critical business priority. With cyber threats evolving rapidly, organizations of all sizes are seeking skilled professionals to protect their digital assets. For those looking to break into the cybersecurity field, earning a certification is a great way to validate your skills… Read More »

The Evolving Role of ITIL: What’s New in ITIL 4 Managing Professional Transition Exam?

If you’ve been in the IT service management (ITSM) world for a while, you’ve probably heard of ITIL – the framework that’s been guiding IT professionals in delivering high-quality services for decades. The Information Technology Infrastructure Library (ITIL) has evolved significantly over the years, and its latest iteration, ITIL 4, marks a substantial shift in… Read More »

SASE and Zero Trust: How New Security Architectures are Shaping Cisco’s CyberOps Certification

As cybersecurity threats become increasingly sophisticated and pervasive, traditional security models are proving inadequate for today’s complex digital environments. To address these challenges, modern security frameworks such as SASE (Secure Access Service Edge) and Zero Trust are revolutionizing how organizations protect their networks and data. Recognizing the shift towards these advanced security architectures, Cisco has… Read More »

CompTIA’s CASP+ (CAS-004) Gets Tougher: What’s New in Advanced Security Practitioner Certification?

The cybersecurity landscape is constantly evolving, and with it, the certifications that validate the expertise of security professionals must adapt to address new challenges and technologies. CompTIA’s CASP+ (CompTIA Advanced Security Practitioner) certification has long been a hallmark of advanced knowledge in cybersecurity, distinguishing those who are capable of designing, implementing, and managing enterprise-level security… Read More »

Azure DevOps Engineer Expert Certification: What’s Changed in the New AZ-400 Exam Blueprint?

The cloud landscape is evolving at a breakneck pace, and with it, the certifications that validate an IT professional’s skills. One such certification is the Microsoft Certified: DevOps Engineer Expert, which is validated through the AZ-400 exam. This exam has undergone significant changes to reflect the latest trends, tools, and methodologies in the DevOps world.… Read More »

img