AZ-304 Microsoft Azure Architect Design – Design a Data Protection Strategy
1. Data Geo-Replication
So one of the big advantages of cloud computing is the global nature of it. When you create a virtual machine, you can create it in any region of the world. And in fact, you can create your multiple virtual machines in multiple regions of the world, and you can really get global distribution for your applications and even your storage files and other things using a cloud computing. It’s very difficult if you’re running your own data center within your own company to achieve that kind of scale unless you’re the global multinational and you’ve got that kind of money. Even if you go to work with data centers and hosting providers, you need to work with multiple data centers and multiple hosting providers. And certainly there’s some coordination required for that. Now, one of the challenges of being a global application is having your data, your databases that are located globally and replicated globally. This is the concept of geographic data storage.
Now, even with storage accounts, which is not necessarily data storage, but file storage, storage accounts give you the option of having your files globally replicated. The advantage of this is if there’s a regional outage, let’s say there’s a massive storm, it knocks out a significant part of the United States in terms of electricity, power, Internet and stuff like that. If you’re using georedundent stories, your files are located in another region of the world which is hopefully not affected by that localized storm and power outage. And then presumably, you can then get up back and running, get your applications back and running in the new region while you’re waiting for the eastern United States to come back online.
The same ability of doing georedundant storage for storage accounts is available for Azure SQL Database and Cosmos DB as well. So what you can do with Azure SQL Database is you can choose a second, 3rd, or fourth, or even more region of the world where Azure will actually keep a copy of your full database. And then Azure will also replicate that data from your primary location to these backup locations. Same thing with Cosmos DB. You choose your primary location, you can choose where to put the secondary and on locations, and Azure will take care of keeping them synchronized with each other. You can do this in a multimaster they call it a multimaster configuration, where the North American Cosmos DB receives data inserts and stores the table, and the European Cosmos DB also does that.
And Azure takes care of making sure that all data is replicated everywhere and takes care of handling conflicts. If that does happen, we can see on screen an application design that shows the type of configuration. Now, this is an active standby configuration. At the top of the screen, we have an active region that contains a Web service. It contains a Web app, a queue, a function app, a reds cache, and it also contains SQL Database and Cosmos DB. Now, that’s located in one region. Like we said, if we have a regional outage, some sort of massive power outage that takes out that whole region or some other bad deployment, then we want to have a standby region which is handled through traffic manager. On the front end where customers coming in are going to get directed into the standby region if the active region is no longer accessible.
And the databases. Now, the App service plan and the function app is pretty easy because you’re not changing that code very often. When you do, you can just deploy to both locations. Databases, however, are constantly changing normally, and so keeping that replicated is not something you can do yourself. You need to set up that geo replication within SQL Database or within Cosmos DB so we can see that geo replication. Once you’ve configured that you have your active region, you have your standby region, and when that active region fails, the data has already been replicated up to the moment where it fails.
Now with Traffic Manager, it does take a few minutes for Traffic Manager to recognize the failure and for to switch over people to the new standby region. So you may experience a five or ten minute downtime, but hopefully you’re not losing any data because your active region is no longer available. There’s going to be a very low amount of downtime and lost data for that. So you can do your failover from your active region to your standby region, and your standby region just picks up and continues on with the SQL Database. That’s up to date. A cosmos DB. That’s up to date. Eight accept.
2. Data Encryption
So we all know how the important security is this day and age, especially when it comes to data. And so in this section of the course, we’re going to be talking about data protection strategy. One of the first things that comes to mind when we talk about data protection is how we are encrypting the data. And so this is we’re trying to talk about encryption strategy. Now, Microsoft does a pretty good job already with Azure SQL Database and Cosmos is DB in that the data is encrypted on the disk already. So this is what’s called transparent data encryption, or TDE. What that means is that you do an insert statement into an Azure SQL database. You send that data over the wire, I would say unencrypted, even though you’ll do it over an Https connection. But you send that data in plain text over to Azure SQL database, and Azure SQL Database will store that in its data files.
And those data files are encrypted as they sit on the disk natively. So if someone was ever to grab a copy of that hard drive, you know, find one in a dumpster or take a copy of it, it’s not going to do them much good because all of those files are encrypted. They’re going to need a copy of the decryption key, which they don’t have. Microsoft keeps that decryption key separate. Now, there is a way for you to control the encryption key. So as a higher level of security, you’re going to probably want to set up your SQL databases and Cosmos, DB, and even your storage accounts such that you control the encryption key. You put that key into the Azure key vault, and then that way that key is under your control.
Not even Microsoft can decrypt your files, only you can with your Azure key vault. So that’s data at rest, data in transit. You want to be using Https Always. And so you’re going to want to set a setting to say that we only ever connect to Azure SQL Database, cosmos, DB, and storage accounts using an SSL connection. We will not even allow a non SSL connection. That way, the data traveling between Azure and your own client network, your own application, is going to be encrypted and can’t be intercepted along the way. And so you’ve got your data stored at Rest encrypted using transparent data encryption and anyone who interacts with the data that’s encrypted between Azure and their client.
Now, for another whole level of paranoia, there is this thing called always encrypted. Now, always encrypted requires a special client to read the data from SQL database, for instance. But in that sense, the data is never unencrypted until it gets onto your computer. And so if you’re using the special always encrypted setting, it goes from your computer to Azure encrypted, to the database encrypted. If you want to read it, it comes back from the database, back over the wire. It never sits in memory or over the wire in any kind of unencrypted state. It requires a special client, like I said, but that’s the always encrypted setting. It does require a client.
Now, another layer of security for databases is called dynamic data masking. And so if there are select fields, if you’re looking at a data table and you’re like, well, that’s the credit card number, we never want to show that credit card number except for administration purposes or some very, very special circumstances, you can set up a dynamic data masking on that field.
Now, I’m going to step back and say never ever store credit card number in a data table in plain text. Probably a bad practice in any way, shape or form, but let’s call it their email address. So do you want your email addresses to be exposed to when someone does a select statement? Or should it always be masked with a dynamic data masking? And only in certain circumstances do we need to unmask that. So for sensitive fields, not never use store credit cards this way, but for really other sensitive fields, you may want to enable dynamic data masking at the data table level, so that certain fields are hardly ever exposed, even to people running queries in the query window or to the applications.
3. Data Scaling
So, next up, we’re going to want to talk about scaling. How do you scale databases like SQL Database and Cosmos DB? Now, this is a graphic taken from the Azure website. It’s kind of inaccurate because there’s a lot of overlap between the standard and premium tiers. And so really those lines have to run side by side and not be one after the other. But you can go all the way from a tiny basic five DTU database all the way up on the right to as the graphic shows, 4000 DTUs, which is 800 times more powerful than the basic five DTU database. Now, as I said previously, DTU is not the only pricing model for SQL database.
You can also get cores and basically pay with the actual hardware. But we can see that basically the SQL database is designed to be scaled. So if you’re sitting at one of these levels, if you’re sitting at the P 4500 DTU level and you want to get to the next level, you go to P six, which is 1000 DTU is that it should be double the performance. Now, this is a little bit of a disruptive operation, in which case you go from 500 to 1000. There’s going to be that small amount of downtime while the server switches over. So it’s going to be working at 500 at one point and then blah, blah, and it goes to 1000. So it is a little bit of a disruptive operation.
But it is possible to scale Azure SQL Database by just switching over to a plan. Now, this is a manually scaling system, not an automatic scale. So you actually have to go and make the decision that you want to go from 500 to 1000. And it’s not like an app service where it can detect the CPU utilization and automatically grow and shrink. Now, another popular way of scaling a database is not to just grow to the bigger plan, but to have this concept of a read scale out. Now, what that means is you’ve got a copy of the database. We’re just talking about Georedundency. So you have a georedundant version of the SQL database and you use that database whenever you know you need to read from the database. So you can concentrate your rights and updates and inserts to the master database. And you can focus your reads on the secondary database.
And in this way you can actually take some of the pressure off of the master database, your reporting, your displaying of values through the home page to the web pages, anytime. You just need to do a select statement, which is a lot of applications, is quite frequent. You go to the backup secondary data store and you leave the primary for rights. And this is something you need to do within your application. So you’re going to need to store both URLs in your application and you’re going to need to make that determination. This is my Read server. And this is my Right server. Another option for scaling databases is what’s called sharding. Now, the concept of sharding is that you’re going to make a logical division in the data and store some data in one database. And store another set of data in another database. So, for instance, you can store all of your North American customers in your North American database and all your European customers in your European database, et cetera.
And the advantage of that is you’ve got these multiple servers, each of them storing, hopefully even part of the data. And if your application is intelligent and it knows it’s dealing with a North American user, it goes to the North American database. Or it knows to go to the other one. The other concept, of course, is having some type of mapping where you could have your customer numbers in a table and that will just tell you from that point forward where the rest of their data is stored. So breaking up your database into multiple databases and then storing those in other geographic regions or even just in a secondary data store is another way of extending the performance of your database without having to scale it.
4. Data Security
Now, it’s all well and good to have your data protected, to have it encrypted, to have it scaled. But how are you going to access this securely, right? How are you, as working on client applications going to allow your application to get access to it, but protect it from other applications, trying to get access to it when they’re not authorized, hiding or protected from hackers, etc. So one option to do this is to realize that you don’t need to have your data on the Internet, right? So even though Azure SQL databases by default got a public endpoint, if you don’t anticipate ever needing to access that public endpoint, it’s actually a security risk to have it exposed. And so you might want to investigate this concept called Virtual Network Service Endpoints, where you’re basically restricting your Azure SQL Database or Cosmos DB to a specific virtual network.
And so you’re basically adding your database as a resource on that network. And with NSG security settings, only resources on that network or who have access to cross the NSG threshold can get access to that SQL database. So if you have a web app and you have a data in an Azure SQL database, both of those things can be attached to a virtual network, and you can ensure those two things have secure access to each other while no one outside of the network has access. That could be a scenario. Another way to protect data is SQL Database uses a firewall concept both at the server level and at the database level. And so you can basically blacklist all access from anywhere in the world and only whitelist certain IP ranges. So you can say, well, we’re in this office, our traffic always comes from here. That’s the only IP address allowed to access this.
That’s not perfect security, but basically knocking off your SQL database from access to 99. 99% of the world does significantly improve security when you’re dealing with SQL. We know from the SQL Server world that there’s the difference between SQL authentication and Windows authentication.
Within Azure, you don’t necessarily have Windows authentication, but you have Azure ad. And so you can create an application that has a managed service identity service principle in effect within Azure ad. And then that user is what gets authenticated to use the database. And there’s no user IDs and passwords at all, it’s just the managed service identity. With SQL authentication, there’s a user ID and password, and that is traditional logging in with user X password Y that gets stored in a connection string, et cetera. So your choice of authentication method could also affect the security of your data.
SQL Server and SQL Database have a concept of row level security that’s pretty granular, right? So if you’re looking at a data table and you’re saying, well, user Joe only has access to customers in Ontario, and so you’re writing a rule that says the, the province code must be on and user Joe has given read access to that and denied access to all others. That’s role level security. I haven’t seen that a lot in practice it’s available to you, but in terms of when you’re designing your applications, are you letting users into your database and relying on role level security to block their access? To me, that’s a bit of an odd set up. You probably want to have an API that sits in the middle, and that API makes the decisions to who sees the data.
A lot of people don’t know this, but Azure has a fancy security protocol for databases called Advanced Threat Protection or ATP. Now this is an additional option. You have to pay for it. It’s not free and it’s available in a bunch of their products. But let’s say SQL database is one of them. And what that does is it checks the traffic coming through against some known patterns of bad behavior. And so for instance, one example of what ATP can do, let’s say there’s a database of known hackers. So this group of hackers have been attempting to break into databases around the world, and that list of IP addresses becomes known to Microsoft. Well, then they will block access from those IP addresses to your SQL database, for instance. So that’s a database list of known hackers.
And that’s one way Advanced Threat Protection can protect you. Again, there’s pattern matching and some intelligence to this. But basically, if you’re wanting Microsoft’s help in terms of stopping the bad guys from getting access to your databases, ATP could be one of the options. Now all of these databases, single database and Cosmos DB, they do push off logs, and Azure Monitor does integrate with them so that you can actually run reports on these things. You can put charts up, alerts. So if you get five incorrect passwords in the past hour, it sends you a text message, something like that, where you can see something weird going on. So go to Azure Monitor.
Azure Monitor has a single database hookup and you can actually look at the diagnostic logs, the event logs that are coming off of that. We said in the last video. But be sure to force SSL connection for all of database and data options. Really, there’s no excuse in 2019, 2020 to allow insecure connections. There’s very few places where Https traffic would not be allowed across a router, would not be allowed out of the Internet.
So turn on SSL that will stop people man in the middle attacks, or stop people along the way from logging your traffic to be able to see what you’re passing back and forth. And we talked in the last video about using dynamic data masking to protect sensitive fields. The advantage of this is, if you have a low level account, let’s say you’re just a standard user and you’ve do have the right to run a report against the data table. But some of those columns will be asterisks out for you. You won’t be able to see the email address. You won’t be able to see the phone number. Maybe you can see the first name, but you can’t see the last name. And so this data masking gives some anonymous to your data that is not required for the person to do their job.
5. Data Loss Prevention (DLP)
So we’ll wrap up this section talking about strategies to protect your data. Azure in the AZ 301 requirements called this Data Loss Prevention, which I will sarcastically respond with while these are policies designed to prevent data loss, of course. So one tip to prevent data loss is to look at your data, catalog across your organization and identify what is the sensitive data. So if there is an anonymized log that just contains times that people logged in and time they logged out, but it’s got no identifying information, that might not be sensitive at all. Whereas if you have a table of customers names, emails, addresses, phone numbers, and a historical list of everything they’ve ever ordered, well, that could be extremely sensitive. And so knowing what’s sensitive and what’s not sensitive will help you to make the determination as to what steps you have to take to protect that data, et cetera. Now, once you’ve identified it, you may want to look at certain standards that exist.
So the personally identified information is basically anything that’s personally identifiable. If you look at a customer’s email address, well, that would identify them. If you know my email address, then you know who I am, whereas a postal code by itself is not personally identifying either at all. So we can look at different pieces of information and say, are they going to identify a person or they’re not? The other standard is the payment card industry data standard PCI DSS. And so anytime you are dealing with credit cards, you should be looking at the PCI standards for handling of credit cards. I have a personal rule when I’m designing systems, I abhor the idea of storing a credit card. And so I think through my career, I have successfully never had to store a credit card, even working with retail sites and even working with some really big brands, even working with Visa, which I worked with for a couple of years, I didn’t have to create a system that stored the credit card number.
And so don’t store what you don’t need, right? Even if you’re going to have access to a lot of information, you want to be realistic about what it is that your application needs, what you may reasonably need in the future. But there’s some stuff that can have short expiration dates or doesn’t even get stored passwords. I mean, this is the embarrassment of many system designers over the past ten or 20 years when we get a major leak.
When you hear the news that some big company got its database leaked, the first thing that people ask about was were the passwords in plain text? And it’s a very sensitive subject. If you got your database leaked, that’s bad enough. But if your passwords are properly encrypted and salted, and it’s going to be very difficult nor impossible for them to reverse engineer your passwords out of that encryption, well, good for you. You’re going to get small bonus points for having impossible to crack encrypted password algorithm in that data. So at least the person, people’s, email addresses and other information got released.
But the passwords, no one will ever know those. Whatever you do, don’t keep the security key for your password encryption in your code or in your config file or on the server somewhere. So if you’re going to encrypt something, that’s great, but set it up so that the key to decrypt it is not being stored alongside the data. In fact, when you’re dealing with stuff like passwords, might be even better to hash it instead of encrypted. And so hashing it is when you use it, it’s a one way function. It can turn a string into another string, but it’s not possible to go back from what you’ve turned it into. And so with a password you can hash it, store the hash and then when someone tries to log in, you hash what they’re logging in and you compare the two hashes together.
So hashing it is a good strategy. Security is best done on what they call a need to know basis. The minimization of security less the principle of least privilege. So don’t give people too many permissions. And we’ve said this before in this course, but minimizing up the permissions on a person by person basis using the Access Review function within Azure ad to make sure that every person who has access to resource needs it. And it might be better to have enable a temporary access somebody needs to get into such and such system, but they only need it for that afternoon. Well, don’t give them full on permissions that never expire. Give them seven days permissions to get access to that resource and have it automatically expire after that time. So take advantage of a temporary permission and not a full on escalation for every little time that they need a one time access to something. Azure has a technology called Azure Information Protection. And Azure Information Protection is sort of like a DRM digital Rights management system for your information, for your documents and for data. It does try to prevent you, if you set it up to disable email forwards, then within Outlook 365, the person won’t be able to take that document and forward it on to another person. If it prevents printing, it can prevent printing. So Azure Information Protection is basically a digital rights management system that can be attached to documents.
You can look at this as one other way of protecting your information. If you make it difficult to do something, then only really malicious people would go around that protection. Now the GDPR has been in effect for a couple of years now. We’re actually seeing the news the other day that the first cases are starting to come to court over GDPR. Basically the not taking the protection of data seriously. And so you need to have a data controller. There needs to be data protocol assessing there’s reporting obligations, disclosure. So this is a law and it’s starting to become more serious in terms of how Europe is handling people’s data.
Interesting posts
The Growing Demand for IT Certifications in the Fintech Industry
The fintech industry is experiencing an unprecedented boom, driven by the relentless pace of technological innovation and the increasing integration of financial services with digital platforms. As the lines between finance and technology blur, the need for highly skilled professionals who can navigate both worlds is greater than ever. One of the most effective ways… Read More »
CompTIA Security+ vs. CEH: Entry-Level Cybersecurity Certifications Compared
In today’s digital world, cybersecurity is no longer just a technical concern; it’s a critical business priority. With cyber threats evolving rapidly, organizations of all sizes are seeking skilled professionals to protect their digital assets. For those looking to break into the cybersecurity field, earning a certification is a great way to validate your skills… Read More »
The Evolving Role of ITIL: What’s New in ITIL 4 Managing Professional Transition Exam?
If you’ve been in the IT service management (ITSM) world for a while, you’ve probably heard of ITIL – the framework that’s been guiding IT professionals in delivering high-quality services for decades. The Information Technology Infrastructure Library (ITIL) has evolved significantly over the years, and its latest iteration, ITIL 4, marks a substantial shift in… Read More »
SASE and Zero Trust: How New Security Architectures are Shaping Cisco’s CyberOps Certification
As cybersecurity threats become increasingly sophisticated and pervasive, traditional security models are proving inadequate for today’s complex digital environments. To address these challenges, modern security frameworks such as SASE (Secure Access Service Edge) and Zero Trust are revolutionizing how organizations protect their networks and data. Recognizing the shift towards these advanced security architectures, Cisco has… Read More »
CompTIA’s CASP+ (CAS-004) Gets Tougher: What’s New in Advanced Security Practitioner Certification?
The cybersecurity landscape is constantly evolving, and with it, the certifications that validate the expertise of security professionals must adapt to address new challenges and technologies. CompTIA’s CASP+ (CompTIA Advanced Security Practitioner) certification has long been a hallmark of advanced knowledge in cybersecurity, distinguishing those who are capable of designing, implementing, and managing enterprise-level security… Read More »
Azure DevOps Engineer Expert Certification: What’s Changed in the New AZ-400 Exam Blueprint?
The cloud landscape is evolving at a breakneck pace, and with it, the certifications that validate an IT professional’s skills. One such certification is the Microsoft Certified: DevOps Engineer Expert, which is validated through the AZ-400 exam. This exam has undergone significant changes to reflect the latest trends, tools, and methodologies in the DevOps world.… Read More »