AZ-304 Microsoft Azure Architect Design – Design a Site Recovery Strategy

  • By
  • January 17, 2023
0 Comment

1. Design Business Continuity (10-15%)

The fourth section of the exam says Design Business Continuity as worth ten to 15% of the exam score. Underneath the topic of Business Continuity are two subtopics. One is Backup and Recovery and the other is High availability. Now, we should deal with these together because obviously not going down. Having an application that has High Availability is probably the best defense against a system that’s going to be unstable, crashing, or need restoration. And so we’re going to talk about High Availability.

What do you need to do to ensure that your applications stay up? Whether you’re using infrastructure as a service or platform as a service, but you might not be able to have multiple instances of your application running, in which case you need to have a solid backup and recovery strategy so that if something does happen, you’re able to restore from backup.

You know exactly what your downtime is going to be, you know exactly how much data loss you’re going to have and when that application is going to be available for users to start using again. And so that’s in the Backup and Recovery section. So that’s what we’re going to talk about in this section of the course. Hope you’re enjoying it so far. Thanks a lot. Be sure to leave your questions. A couple comments in the Q and A section of the course. If you’ve got ideas for improvement. I’m always open to improving this course for yourself and for other students. Thanks. Let’s keep going.

2. Introduction to Azure Site Recovery (ASR)

So let’s talk about the concept of business continuity and disaster recovery. And what is our strategy for protecting our entire website and web applications from a disaster? Now, we have to admit that no cloud provider is perfect. Whether you’re Amazon, Microsoft, Google, or anyone else, there are times when you’re having a particularly bad day, when a natural disaster strikes, when a hurricane or a flood or earthquake or something happens, those data centers are knocked offline. Or when it’s a man made issue, like what can happen when there’s a bad software deployment? So the Azure team goes and updates the Azure firmware within a region, and they tested it and they missed something, and that region goes down.

So let’s say the scenario is you have a perfectly beautiful software solution running in the East US. Region, and customers are happy, you’re happy, everything is going great. And then Azure has a problem. Suddenly east us. Region is knocked offline and nobody can access it. You go into your portal, you look into Azure Service Health, and they say the engineers are investigating. They don’t know what happened with East Two. East us? Region. So what do you do as an engineer or as an architect, as a team member here? Do you wait? So do you say, well, we’ve got to hope that they fix it quickly? And so that’s option number one is just to sit there 30 minutes, an hour, 2 hours, 3 hours until the solution comes back online. Do you scramble and say, you know what? We can go to the west us. Region, create those resources, redeploy the VMs, redeploy the code, redeploy the databases from our backup? Do you manually rebuild it?

Or maybe you’re smart enough and you’ve planned this in advance and you have that standby region like we saw in an earlier diagram. And so you’ve been paying money all along for this insurance policy, essentially, and now you need to fail over to the already running code? Or do you just have everything in standby mode so you’re not actually using the resources, but you’ve got the finger on the button that will automatically get you up and running and no manual rebuilding required. These are the options pretty much that we have in the case of a disaster. So enter ASR Azure Site Recovery. Now, Azure Site Recovery is software that is sort of designed to help us with our business continuity issues. Okay? Now, this is something you have to do in advance if your site is currently down. If you’re watching this video and East US. Just happens to be down, you have got to take one of those other options. You can’t use Azure Site Recovery because the sites are unavailable. But if you plan this in advance, you could create yourself a backup vault, a recovery services vault. Keep copies of your VMs and of all the things that you need in this recovery services vault.

Make sure your recovery services vault is in a region other than your primary region because the East US goes down and your Recovery Services vault is in the East US. While you’re still screwed. And this is an example of something that Azure Site Recovery can help you with. So we can have in this case, East US region. We have our storage accounts, we have our virtual machines, we have our virtual networks, availability, set, subnets and everything’s great. And if East US goes down, we’d want to move to Central US. But notice in the graph there’s no VMs running in Central. There’s no storage account provisioned. It’s an empty set of boxes. So we can have the Vet created, those are free. You can have the subnet created, those are free. We can have an empty storage account with no data in it that’s free. And so ASR will help us set up this ghost copy that doesn’t cost us anything really in the Central US. Region.

Now, once you have the ASR set up, you have to install extensions into your virtual machine, the Site Recovery extension, and you’re going to have your data cached into another storage account. So you need another storage account that’s going to operate as the temporary holding area for data that’s going to get copied over to Central. Now, in order to have that data available to us, we do need to keep a copy of the data in the two places. So we have East US, which is our primary active region, and then we have Central US, which will have our data but doesn’t currently have any running virtual machines. And this is what ASR will help us to set up. So we have this in a case of emergency, that’s what we’re doing. We’re paying for two storage accounts but we’re not paying for double sets of virtual machines at this time.

And then a bam, right, the storage account, the source environment, East US goes down and we can trigger, using ASR, trigger a failover and we say, okay, we want to switch everything to Central. ASR will take the backups of the virtual machines from the Recovery Services vault, restore them, get them up and running within Central US. We already have the data sitting there from the synchronization that’s been going on and within 30 minutes or whatever, we are back and running, right? So it takes five or ten minutes for the virtual machines to get spun up.

Those machines have to start whatever startups things that you have going on. So ASR is not perfect, but this would be sort of like plan B or plan C if the East US. Region was to go down. You’ve already been using ASR and then you sort of manually flip the switch and get stuff running in Central and you only suffered the 30 minutes downtime or the 1 hour downtime, whereas the East US could be down for 7 hours. And you’ve basically saved yourself a lot of hassle in the process. No manual effort. This has all been taken care of for you by Azure Site Recovery.

3. Testing Failover and Initiating Failover

So, continuing to talk about business continuity and specifically the Azure site recovery strategy that we talked about in the last video, let’s talk about the site failover and failed back. Now, to remind you, we have this ASR a site recovery plan set up where we have a source environment, in this case is in East US. It has a couple of virtual machines, it has a couple of storage accounts and the site recovery system and within Azure is actually keeping a backup of the data and basically synchronizing that data with another environment. Now, there’s no VMs that are set up, but the site is ready to go as soon as you initiate a failover. You would expect that the two VMs would be created there and you can start directing traffic to that location. So it’s an emergency failover site. But how do you test that? It’s important with any backup plan, business continuity plan, when we’re talking about real high stakes consequences. If your application goes down and it’s going to be a big financial hit. If you can’t get back up and running within minutes, then you better test this.

You better actually run through it, have your team go through and prepare just like a fire alarm test or an evacuation test they do for hurricanes. You’re going to want to test your site recovery plan. This is not something you just set up and never try. You can do that. You go into Recovery Services vault and you’ll see that there’s this source and target set up in a recovery plan set up. And so you can basically click this test failover button and it’ll ask you some configurations, which recovery point and to where do you want to test this failover.

And it will actually try to create that application. It will get the data up, it’ll get the virtual machines up in the right configuration and you can go and see that it actually worked or it didn’t work. And it doesn’t cause any data loss, it doesn’t cause any downtime to your source environment. So this is something you can do without affecting production. So it’s one type of test, right? You’re going to get some confidence that the site recovery plan worked when you do that kind of test. But it’s not full confidence yet, right? It’s not an actual disaster and you’re not actually shutting down the source environment to get up and running in the new environment.

You’re leaving the origin alone. Now, when you do a test failover, it does a test into a virtual network and then after the fact, you can play with it, test it, make sure it works, and then you can clean up after the test. Now another type of test is to actually initiate a failover. Okay? So this is more of reality. This is you going in and saying, we have a disaster, the source environment is down, or we don’t trust it anymore or something happened and I want the target environment to be the life. And so this is a real failover. You go into the more option of your recovery services vault and you click the failover button. You can choose the direction, you can choose the recovery point, and then you can shut down the virtual machines in the source environment afterwards.

Now that’s assuming that they’re accessible. So you can click failover. And then doing this, the failover really does happen. This is not a test. This is an actual failover. Again, you could shut down the virtual machines and if it does work, that would be the zero data loss option. So by shutting down the VMs, you’re ensuring that the data doesn’t have any additions to it. Once you start the failover process, the other option, that’s more of an in an emergency, I need this to happen thing. There is also this concept of the planned failover. Now this would be more like you are actually moving to that new environment. So you want to get out of East US and you want to get into Central US.

And you are replicating that environment. Everything’s set up and then you want to make the move. So you can use this planned failover as an actually move my application and everything I need into a new environment. So on that menu right above the failover option is the planned failover button. And so that’s actually, again, no data loss option, but that’s going to shut down your source environment. So when you hit that button, it’s going to say, okay, we’re shutting down on the VMs and we’re making sure the data gets replicated to the last byte and that application is going to be down during that time. So doing a planned failover is a disruptive event, obviously shuts down the VMs.

4. ASR Supported Workloads

Now, one thing I haven’t mentioned so far about Azure Site Recovery is that it operates in a number of environments. This is not just for replicating an existing Azure network with applications into another Azure region. It supports what we just talked about, which is Azure Virtual Machines, windows or Linux, from one region to another. It also supports a lot of on premises backup recovery strategies. So if you’ve got a VMware Virtual Machine set up within your own environment, you’ve got a VMware box. It’s got a number of VMs running on top of that hypervisor.

You can use ASR to have this backup strategy from going from your on premises into Azure. So imagine that using the Cloud as the backup site, but having your own site as the primary. And then disaster strikes, you initiate the failover. Then suddenly you have an Azure region that has your virtual machines that were originally running on VMware running within Azure. Those of you who think right away is like, well, can’t we do this as a copy or as a migration technique? Of course. So if you have a VMware application and number of VMs running on top of that, and you want to move that permanently into Azure, you can use ASR, set it up, test it, test the failover, make sure the applications are all working, and then initiate the failover.

And then suddenly you’ve got your workloads running in. Within Azure. It works as well not only for VMware, but physical HyperV and also physical servers. So if you just have a Windows server that’s not virtualized in any way, you can migrate that into Azure or have that as emergency backup. And finally, maybe most surprisingly, you can have, ASR in the Cloud, migrate and replicate between two on premises sites.

So if you have a network with some physical servers running on your premises, and you want an emergency backup in case of emergency failover into another network that you have, that can work on that workload too. Here’s a diagram from the VMware side where it shows on the left a source set of VMware, VC VMs, and a bunch of physical servers as well. Azure site. Recovery lives in the middle. And then there is a secondary on premises location that is also VMware Vsphere. And this Azure VM is doing from on premises to on premises using combination of physical servers and VMware. And you can see the sort of the process of that are required to make that work.

5. ASR Geographies and Paired Regions

So let’s talk about the effect of geography on your ASR site recovery or business continuity strategy. Now, different people are going to be using ASR for different reasons, right? Some people, like we saw in an earlier example, it’s purely for disaster recovery. So you have your existing application in East US. And in case of emergency, you’ll want that to be running in Central, and there’s a reason why you’re staying so close. So part of that is that you’re not moving your app to Europe, you just want the app to still be running with the United States. It’s just that particular region is having a downtime. Microsoft has this concept of paired regions and it typically takes two or more regions within a particular geography and pairs them together.

So we can see from the official documentation here that for any particular geography, whether it’s United States, Canada, North America, Australia, there’s two regions that are paired. It’s not that one is a primary, one is a backup. It’s more that between East Asia and Southeast Asia, this is the quickest connection that you’re going to get out of all the regions. So if you have a workload running in East Asia, if you have the backup running in Southeast Asia, then the data synchronization is going to be the quickest. The moment a data is written to East Asia, it’ll get to Southeast Asia quicker than any other region, which reduces data loss.

Now, it might be that your purpose for backup is not to have a disaster recovery, the quickest backup, but you are going to use this as a migration strategy or a copying of an application strategy. Do keep in mind that there’s pricing based on your geography, right? So data connection between one location and another, you are charged for the data that transfers between regions. So if you are picking two regions within America, well, that’s the cheapest and that’s the simplest charge. Of course, you can pick one region in East US and another region in Canada and copy your app that way. Do keep in mind that you cannot mix an Azure North American Azure public cloud with the US government cloud. So US government cloud runs as its own geography. Generally, you’re not going to be able to do Azure site recovery between those two locations.

And that’s why it kind of makes sense, I guess. Now if you wanted to move your site from US to Europe, there’s going to be a slightly higher charge for the bandwidth. So there’s a higher per gigabyte charge for data traveling between US and Europe. South Africa for some reason is encountered in Europe. I guess there’s only the one African location. And so counting that as its own geography, it’s not quite ready yet. One day. And so that’s Europe, the other geographic clusters, so we saw them list Asia.

And so, you know, Indian and Asian, Australia, China, China, you need a special agreement with the Chinese company that runs the Azure for them and Brazil. Actually, there’s only one data center in Brazil, and so it can’t even do a paired region with another Brazilian one. It has to be paired with the US. So these things there’s again pricing implications of going to further geographies, but it may be that you want to copy your app, you want to have a second backup or the permanent failover. You’ve realized you don’t want to be hosted in one location. And there’s.

Comments
* The most recent comment are at the top

Interesting posts

The Growing Demand for IT Certifications in the Fintech Industry

The fintech industry is experiencing an unprecedented boom, driven by the relentless pace of technological innovation and the increasing integration of financial services with digital platforms. As the lines between finance and technology blur, the need for highly skilled professionals who can navigate both worlds is greater than ever. One of the most effective ways… Read More »

CompTIA Security+ vs. CEH: Entry-Level Cybersecurity Certifications Compared

In today’s digital world, cybersecurity is no longer just a technical concern; it’s a critical business priority. With cyber threats evolving rapidly, organizations of all sizes are seeking skilled professionals to protect their digital assets. For those looking to break into the cybersecurity field, earning a certification is a great way to validate your skills… Read More »

The Evolving Role of ITIL: What’s New in ITIL 4 Managing Professional Transition Exam?

If you’ve been in the IT service management (ITSM) world for a while, you’ve probably heard of ITIL – the framework that’s been guiding IT professionals in delivering high-quality services for decades. The Information Technology Infrastructure Library (ITIL) has evolved significantly over the years, and its latest iteration, ITIL 4, marks a substantial shift in… Read More »

SASE and Zero Trust: How New Security Architectures are Shaping Cisco’s CyberOps Certification

As cybersecurity threats become increasingly sophisticated and pervasive, traditional security models are proving inadequate for today’s complex digital environments. To address these challenges, modern security frameworks such as SASE (Secure Access Service Edge) and Zero Trust are revolutionizing how organizations protect their networks and data. Recognizing the shift towards these advanced security architectures, Cisco has… Read More »

CompTIA’s CASP+ (CAS-004) Gets Tougher: What’s New in Advanced Security Practitioner Certification?

The cybersecurity landscape is constantly evolving, and with it, the certifications that validate the expertise of security professionals must adapt to address new challenges and technologies. CompTIA’s CASP+ (CompTIA Advanced Security Practitioner) certification has long been a hallmark of advanced knowledge in cybersecurity, distinguishing those who are capable of designing, implementing, and managing enterprise-level security… Read More »

Azure DevOps Engineer Expert Certification: What’s Changed in the New AZ-400 Exam Blueprint?

The cloud landscape is evolving at a breakneck pace, and with it, the certifications that validate an IT professional’s skills. One such certification is the Microsoft Certified: DevOps Engineer Expert, which is validated through the AZ-400 exam. This exam has undergone significant changes to reflect the latest trends, tools, and methodologies in the DevOps world.… Read More »

img