PL-300 – Section 36: Part 4 Section 5 – Manage Datasets Part 2
271. Configure a Dataset Scheduled Refresh
In the previous video, we installed a gateway connection, which will allow Power BI Service to connect to my computer. In this video, we’ll have a look at how we can refresh the data automatically.
Now, your data source cannot be refreshed because your credentials are invalid. Okay, let’s edit our credentials. So, we can either use windows without impersonation, or we can use, other categories that it gives me. So, I’m going to click no privacy setting for this click sign and there we go. So next we have to say, when do we want it to be refreshed. So, here’s our scheduled refresh, and you can see that the scheduled refresh, keep your data up to date, is currently off.
So, I’ll switch it on. So, we can say we want this to happen every day or can say we want it to happen specific days of the week. It says weekly, but that’s not quite true. We want do it just one day a week. This is when you can say, well, I want it to happen on Tuesdays, Thursdays and Saturdays. So, I want this daily and let’s say, I want this to happen at 12:30 pm.
Now, it’s usually best of course, to have a time when a, the data will have changed and b, when it’s not too busy on your network, perhaps. So, I’m just going to say 2:30 am and let’s pretend that the data was updated at 1:00 am each day, takes about an hour, so, I’m going to also do it at three o’clock in the morning.
Now, how many of these can I actually do? Well, I can do, eight refreshes a day for Power BI Pro. If you’re on Power BI Premium, that you can do 48 times a day. That’s one every half hour. And you notice I was only able to select the hour, or the half hour.
Now, because I said want to start at 12:30, doesn’t mean that it will necessarily start at 12:30 because I’m not the only person to say can Power BI Service do this at 12:30? There might be some capacity issues. So, it could take up to an hour to start.
Now, it doesn’t want very long drawn out database refreshes. So, if your dataset takes longer than two hours to refresh or five hours, if you’re in Power BI Premium, then you might want to consider incremental refresh settings and we’ll have a look at that in another video.
Now, what’s happens if it doesn’t work? Well, the refresh will be paused if there was a failure four times in a row. So, fails once, twice, three times, and then the fourth time, and you can’t change that number four, that will happen. You can’t say I want it to stop if there’s eight time failures in row. Or if it’s pointless, if there were no reports, that’s uses this dataset, that have been viewed for the last two months. So, you may have a report, but if nobody’s using it, why should Power BI Service use all of its compute capacity on something that doesn’t matter? So, what happens, if the refresh fails? Well, there are two options, which you can see here in Power BI Service. I would strongly suggest checking, send refresh failure notifications to the database owner and put in additional people just in case the database owner happens to be on holiday. So, let’s click apply.
Now, it should be noted that this is for data refreshing. It is not for model refreshing. So, if I go into my model and I changed the model. Say I got rid of this last days that appended the query onto itself and that by doubled the amount of sales volume. Then that’s the change of the model. That won’t be updated. It’s when the data changes. So, the data which comes from the source here. So, this is from an Excel workbook, for instance. So, let’s go into that workbook and change the data. So, here is that workbook, and we’re just using the sales amount and I’m going to change this very slightly from $3,578 to one with a huge number of notes. So, that is is 100 billion. So, let’s save that data. And so when the computer gets the data again through the gateway, then the data will be refreshed on the Power BI Service. Why not do this now? So, let’s go back into the Power BI Service.
Let’s just make sure that it has actually kept all of this auto refresh. So, I’ll go into my workspace and into settings. Lots of ways you can get into it. So, there we can see it’s still going to happen and you can see the next refresh it’s going to happen at 12:30. Well, I want to manually refresh it. So, let’s go into our dataset and I’m going to click on refresh. So currently we’re used to getting things like this dashboard. Why the amount is 58 million. We’re expecting this to go up by 100 million if I’ve gone my last correct. So, let’s go back into the dataset and I’m going to refresh it. There are no limitations as to how many times you can manually refreshing There is a limit of 8 times for scheduled freshes, but you can manually refresh when you want. So, I’m going to click refresh now. Notice we’ve got columns for refreshed and next refreshed. And it doesn’t take long because there’s only four XL data sources that it’s going to grab. So, you can see it’s finished.
Now, it might just take a few minutes for it to work through the system. It’s not necessarily instantaneous. So, I’m just going to pause the video and just wait a few minutes. All right having done that, we’ll go into our dashboard. Remember this was saying 58 million, well it’s now saying 200 billion, so it doubled the amount afterwards. But you can see, it has made a serious impact on the data. So, what I’m going to do next is open up the data again, go into this sales amount, get rid of a couple of zeros, and now, I’m not going to do anything. I’m going to let the computer auto refresh because it’s going to have a scheduled refresh at 12:30. So, I’m going to pause the video and come back without having done anything. Well, it’s taken Power BI quite a long time to refresh, but it was meant to refresh at 12:30, it’s tower two o’clock. However, you can see it has finally refreshed. So, the sales amount has gone down from 200 billion to 2 billion.
So, in this video, we have had a look at scheduling refreshes. We’ve also seen how to do it manually and we’ve also seen the limitations, how many times you can do it and who gets notified if it fails.
272. Configuring Incremental Refresh Settings – Step 1
Now, generally in this course, we’ve been working with fairly small data sets. However, you can have potentially huge data sets, which are gigabytes, terabytes, and they may not fit in your Power BI desktop model. So how are you actually going to do that? How are you actually going to create a model that you can then upload into the Power BI service? Well, you can do this with something called incremental refresh.
So, what this means is you don’t necessarily give Power BI service the entirety of the data, you give it a small section. And then when it refreshes, it will expand that section to all the relevant data. And then when it refreshes again, it will just get the latest data. So instead of having to download gigabytes into your Power BI desktop, you’re downloading megabytes, then you’re uploading megabytes. Then you’re uploading all of the data gigabytes, but then on each refresh, you’re just uploading the most recent data, so megabytes. So, this could be things like for instance the internet of things, you might have a reading from a sensor every six seconds, and you’ve got 100,000 sensors when you can imagine that that is going to be huge very quickly. So, how can we configure incremental refresh? Well, first of all, you might want to know that this is a relatively new addition to Power BI Pro, and this is the time we’re recording this. It was a preview for quite a while on Power BI Premium, but it became mainstreamed, what’s called general availability in Power BI Premium in February, 2020. And at the same time, it got added into Power BI pro. So, you’ll understand from this that if you’re using the free version, you won’t have access to the incremental refresh. So, it is a two stage process. The first stage is that we need to reduce the amount of data that we’ve got in these tables. And we can do this by editing details.
Now, you either do this right when you’ve loaded the data initially. The advantage of doing this if you’ve got say an SQL server data set or something like that data source, is that you can take advantage of native query folding. So, the query folding that you can do, you can have a look at the native query and see if query folding is part of that. And what that means, if you remember back to part two, is that it might take a lot of steps, and put them into just one query for the end database to get. So instead of having to download all the data and then filter it, it just retrieves a smaller section. So, it’s quicker to get. And then the second advantage is you can manipulate it in Power BI desktop, actually build a model both for neutralisation without having a huge amount of data. So, I’m going to put a new step in here, but before I do that, I need two parameters. So, I’m going to go to home and manage parameters, new parameters, and these parameters have to have the exact name I’m going to give you, which is range start and the other one’s called range end. So English words, if you’re using a foreign language, it is range start no spaces. These are what’s called reserved parameter names and the type has to be date time. And what the computer then wants is a suggested start. So, what is your data range that you are going to personally be working with? So, I’m going to say it is going to be the 1st of January, 2006, and I’m going to have a number one range. And again, it’s got to be date type, date time and type, and I’m just going to give it a figure. And so this is just my initial values. And if you want, you can say 23, 59, 59, etc, you can see the computer has automatically inserted the time when I haven’t put this in. So, these are your parameters which will appear here on the left hand side. Don’t mind if you get these dates wrong, there are no wrong values. I mean, I suppose theoretically, if you say putting in 2032 date simply aren’t in your data set, that could be a wrong value. Right, so now I want to partition this table down. So, what I’m going to be using is my older date. And again, don’t worry if you get the wrong field, this is just for your initial data upload, and then you’ll have all of the rest of the data. But it would be helpful if you choose something which is going to be in chronological order. So, if I get a new value, it’s not going to insert itself somewhere in the middle. It’s going to be right at the end or very close to the end of the existing data. So, what I need to do is now filter this down. So, it’s going to go right at the end because before that we haven’t got the right type and this again needs to be a date type. At the moment it’s a date, I’m going to change that to a date type. So, now let’s add a photo. So, I’m going to use a custom photo. And this custom photo is going to say whether the date is after or equal to the parameter range start and is before the parameter range end.
Now, please note one of these has to be or equal to otherwise you’re going to have gaps. And if you have both of those have be equal to, then you’re going to have duplicated data and you don’t want that. So, I would stick with is after equal to and is before. Right, so that reduces the amount of data that we’ve got. So that’s the first steps, close and apply. And it’s going to update my query. Needless to say, I’m not using something that is terabytes in size, so, it’s going to work very quickly. So that’s step one, our initial data.
273. Configuring Incremental Refresh Settings – Step 2
Step two, is to define a policy. So, I’m going to go into incremental refresh, and this new dialogue box comes up. So right at the top, I don’t know that this dataset this data source that you have got, whether it can be query folded. If you can’t, it’s not recommended to use incremental refresh. Why’s that? Well, let’s say I’ve got a spreadsheet with a billion roles, not possible in real life, I think. But let’s say I’ve got one with a billion roles. And I can’t use query folding. So to do a filter, I’ve got to load the billion roles and then filter it down. So, the local machine has to do that work. Instead of saying, okay, database just gives me this particular partition that I’m looking for. So that’s fine. So yeah, okay, I’m not using a clue folded. I’m going to ignore that. This is just an example. I could use SQL Server and get it from there, but I’m well aware not everybody has SQL Server and will not be able to follow along. So, I’m going to do an incremental refresh.
Now, notice this other thing, once you deploy this table to the Power BI service, you will be able to download it back. Why is that? Well, Power BI Microsoft, anticipating that you’re going to have billions, trillions of roles, and it doesn’t particularly want to download a file so that you can’t actually use it on your desktop computer. So, I want the number of roles to be in this date range. So, I don’t necessarily want everything, I might have data going back 30 years, but I know what the last 10 years well, I can say I want only the last 15 years to be stored on the Power BI service.
Now, when I do a refresh, I want you on incremental refresh, I want you to just refresh the last number of say Five days. So, I’m not refreshing a lot and we learnt in the previous video that you can refresh up to eight times a day for Power BI Pro, 48 times a day for Power BI Premium. So having, say five days, 10 days, that’s actually going to work quite nicely. You can restrict it further by saying, I only want you to update and last five days, if this due date column or the date column or ship date column, this particular column, the maximum value has changed. In other words, it’s not like nothing has happened. So, the last order was at 12 midnight on one particular day, and I come to do an incremental refresh and it’s still 12 midnight of that particular day, there have been no new orders, I think, therefore, don’t refresh the data. So that is another way of reducing the amount of data that’s quite transferred. If there’s nothing new, don’t bother but it looks at the maximum value of a particular date time column as in is something new. And then finally, you can choose only to refresh complete days. So, let’s say it’s 1pm on a Tuesday, while it won’t download anything from Tuesday if you check this box, because Tuesday is not yet complete, so we’ll do Monday, Sunday, Saturday and Friday, the last five days, and it won’t do Thursday from 1pm onwards, because that is not a complete day. So, let’s apply all and now I’m going to save this. So, remember, you can’t re-download it. So, I’ll save it as a new Power BI model. And then I’m going to publish it. And that’s all there is. So, there we go. I was wondering when that dialogue box is going to come up. So incremental refresh, there are two aspects. First of all, you need a smaller section of the data so you can work with it in Power BI Desktop. And then once you’ve done that using the parameters, range start and range end. And by the way, you will need those parameters. Because if you try to do incremental refresh on a table that hasn’t got those parameters, you can’t do it. Once you’ve done that, then you go into incremental refresh. And you set the date range for what you want to store and the date range for what you want to refresh. So, now let’s see if it works. So, I’m going to add in a bar chart, which is going to have the order date that’s the one that I’ve used as a filter and the sales amount. There’s my values, and I’m just going to draw down and you can see we have successfully got 2006 January, 2006 December. So, the photo works. So, I’m going to re-save this and I’m going to re-publish this. So yes, I do want this report to be updated. So, that’s success. That’s bringing in our Power BI service window. And let’s have a look at how report should be identical to what we’ve previously got January to December.
So, now let’s see if we can refresh it going to settings. I want to use the data gateway, personal gateway that we set up previously. Yep, that’s all set up fine so now, let’s refresh. So preparing for refresh. So at the moment, we have a reduced data set, it should be getting the totality of the data now of the last 15 years anyway. So, it’s still working through it. As you can see. So, this is when it’s now downloading everything and not just the 2006 that I’ve previously voted upon. Right, so it’s finished. So, let’s go into our workspace refresh. And now you can see it has got everything in from 2005 through to 2008. This big figure, obviously, is because I’ve manipulated the dataset a few times. But the important thing here is that it has successfully grabbed all of the data If I was to put in more data in the last five days and then refresh, it would then get just that very latest data.
So, in this video, we’ve had a look at incremental refresh. We found out how to set it up, we can publish it with the policies on the Power BI Desktop, and then we can refresh the data to get all of the data that wasn’t included in the initial data upload. And then if you so wish, we can then go into settings, go into schedule, refresh, and keep the data up to date, but only by downloading uploading as little data as is needed.
Interesting posts
The Growing Demand for IT Certifications in the Fintech Industry
The fintech industry is experiencing an unprecedented boom, driven by the relentless pace of technological innovation and the increasing integration of financial services with digital platforms. As the lines between finance and technology blur, the need for highly skilled professionals who can navigate both worlds is greater than ever. One of the most effective ways… Read More »
CompTIA Security+ vs. CEH: Entry-Level Cybersecurity Certifications Compared
In today’s digital world, cybersecurity is no longer just a technical concern; it’s a critical business priority. With cyber threats evolving rapidly, organizations of all sizes are seeking skilled professionals to protect their digital assets. For those looking to break into the cybersecurity field, earning a certification is a great way to validate your skills… Read More »
The Evolving Role of ITIL: What’s New in ITIL 4 Managing Professional Transition Exam?
If you’ve been in the IT service management (ITSM) world for a while, you’ve probably heard of ITIL – the framework that’s been guiding IT professionals in delivering high-quality services for decades. The Information Technology Infrastructure Library (ITIL) has evolved significantly over the years, and its latest iteration, ITIL 4, marks a substantial shift in… Read More »
SASE and Zero Trust: How New Security Architectures are Shaping Cisco’s CyberOps Certification
As cybersecurity threats become increasingly sophisticated and pervasive, traditional security models are proving inadequate for today’s complex digital environments. To address these challenges, modern security frameworks such as SASE (Secure Access Service Edge) and Zero Trust are revolutionizing how organizations protect their networks and data. Recognizing the shift towards these advanced security architectures, Cisco has… Read More »
CompTIA’s CASP+ (CAS-004) Gets Tougher: What’s New in Advanced Security Practitioner Certification?
The cybersecurity landscape is constantly evolving, and with it, the certifications that validate the expertise of security professionals must adapt to address new challenges and technologies. CompTIA’s CASP+ (CompTIA Advanced Security Practitioner) certification has long been a hallmark of advanced knowledge in cybersecurity, distinguishing those who are capable of designing, implementing, and managing enterprise-level security… Read More »
Azure DevOps Engineer Expert Certification: What’s Changed in the New AZ-400 Exam Blueprint?
The cloud landscape is evolving at a breakneck pace, and with it, the certifications that validate an IT professional’s skills. One such certification is the Microsoft Certified: DevOps Engineer Expert, which is validated through the AZ-400 exam. This exam has undergone significant changes to reflect the latest trends, tools, and methodologies in the DevOps world.… Read More »