CompTIA CYSA+ CS0-002 – Technical Data and Privacy Controls Part 2
4. Data Loss Prevention (OBJ 5.1)
Data loss prevention. In this lesson, we are going to talk about DLP, which is data loss prevention. Now, DLP is a software solution that detects and prevents sensitive information from being stored on unauthorized systems or being transmitted over unauthorized networks. Essentially, we’re trying to protect our data from leaving our network and leaving our control. Now, when you start setting up DLP, there are three main components, components that you have to have. The first is a policy server. This is used to configure classification, confidentiality and privacy rule sets and be able to figure out how you’re going to log your incidents and compile your reports. The second thing we have is an endpoint agent, and these are used to enforce policy on client computers even when they’re not connected to a network.
So if I have a laptop and I’ve disconnected it from the corporate network, but I still have DLP installed as an endpoint agent, it’ll still stop me from copying those files to an external hard drive. And then I have network agents. And these are essentially network appliances that will sit at the network boundary and interface with different web and messaging services to scan the messages going through them and protect things from leaving your network. Now, DLP agents can scan both structured and unstructured formats. And we’ve talked about these two formats before. When I’m dealing with structured formats, that would be things like data messages in a particular format, like a JSON format, or maybe a CSV file that has particular set of data in each position. Now, when I talk about unstructured, this is things like chat and email and Word documents and other things like that.
Now, as you start transferring information around, you want to be able to block it if it doesn’t conform to a predetermined policy. And that’s what DLPs are there to do. Now, as you look at DLPs, there are lots of different systems out there, but essentially you’re going to get some sort of a dashboard that looks like this. This will tell you what policy matches you’ve had and what type of false positive rates you’ve been having, and you can search through or create new policies. This is an example from Microsoft’s DLP that comes as part of Office 365. Now, DLP systems will act whenever a policy violation is detected. And based on your policy, it will do one of four actions. The first is alert. If it’s set to alert only, it’s going to allow the copying to happen. So let’s say I had a file on the sharedrive and I want to copy it to a USB drive. I plug in my USB thumbstick to my laptop.
I drag and drop that file over, and it might be that it flags and alerts and tells the administrator that I copied it, but it’s still going to let me do it. This just makes a note that I copied it, and then the administrator can report me if he wants to. Now, the second thing you can do is be a little more accurate, active, and you can actually block it. In this case, the user is going to be prevented from copying the original file, but they would still have access to it. So it notices that I’m trying to copy this file off the shared drive, and it’s going to block that action. But I can still read that file, and I can still access it from the corporate network because I’m not blocked from using it. I’m just blocked from taking it with me. Then the third type is a little bit even more stringent than this. It’s called quarantine. This means the access to the original file will now be denied to that user or possibly any user.
Essentially, as I tried to copy that file, it flagged it that I was trying to take it, and it goes, oh, no, somebody’s trying to steal our stuff. Let’s lock it down and not let him see it anymore. That’s the idea of quarantine. And now oftentimes what will happen with Quarantining is the system will just encrypt that file, and that way you can’t access it or read it because now it’s scrambled up. And then the fourth thing we have is what’s known as tombstoning. And this one might be new to you, but essentially, with tombstone, the original file on the sharedrive is now not only Quarantined, but it’s also replaced with a different file that says the policy violation has occurred. So if I am the user and I go back to try to copy it again or read that file, it’s going to say, this file has been removed because you violated DLP policy.
To get the file reinstated, take these filing actions, and then it would tell me what I need to do. That’s the idea of tombstoning. Again, it’s a little bit more severe. So we’re going from alert to block to quarantine to tombstone. Now, these four actions we just covered are all forms of DLP remediation. And these can occur in multiple places. They can occur on the client side, using your DLP agent, or on a server side if your server has a DLP agent installed, or it might be done at the network boundary if you’re using a network appliance. So there’s lots of different ways to do DLP remediation. So it’s important for you to understand how you’ve configured your system. And for the exam, you’re not going to be asked to configure a DLP system. You’re just going to be able to understand why you would use it and how it’s going to work.
5. DLP Discovery and Classification (OBJ 3.2)
DLP discovery and classification. Now, as we talked about in the last lesson, we talked about DLP at large, but DLP is going to define data that should be protected using different mechanisms, and there’s really six of them. Now, as we go through, we’re going to talk about these six, which includes things like classification, dictionary, policy template, exact data match or EDM, document matching, and statistical or lexicon. Now, we don’t have to know these in depth, but we basically have to know a definition for each one of those. And that’s what we’re going to cover in this lesson. Now, when we talk about classification, this is a rule based on confidentiality tags or labels that was attached to the data. So we talked about data classification in the last section of the course, right? And we said, as you have this data, you say this is unclassified or secret or top secret or whatever classifications or tags you use.
Well, if DLP sees that, for instance, this is a secret file and you’re trying to send it to somebody who is not authorized because you’re only able to send things out at the unclassified level, then DLP can flag that, alert it, or block on it. And this is all done based on these labels and classifications. Now the next one we have is dictionary, and a dictionary is essentially just a set of patterns that should be matched. Now, these can be actual words or they can be phrases or registry expressions. It really depends on you and how you configure this. For instance, let’s say my company was starting to do some kind of a new project and we codenamed it Tiger. Well, anytime we see something that matches the word Tiger trying to be emailed out, we would block it because we don’t want anybody getting information on our super cool new project called Tiger. That would be the idea of a dictionary.
Now the next one we need to talk about is a policy template. Now a policy template is essentially a dictionary, but it’s a very specialized dictionary. This is a template that contains dictionaries that are optimized for data points in a regulatory or legislative schema. So if you fall under a PCI DSS, there’s a policy template you can download that would say, hey, put this in your DLP to make sure anything that matches this format doesn’t leave the network. The same thing with things like HIPAA or GDPR. There are different templates that are there that can help match individual taxpayer identification numbers, Social Security numbers, passport numbers, or whatever else you want based on that particular template. The next one we want to talk about is EDM, which is exact data match. Now, EDM is a structured database of string values that we want to search for and match.
Now the difference here is that these particular strings are actually hashed and they create fingerprints. And then we start searching with those policy engines based on those hash strings. This way we’re not compromising confidentiality or privacy issues, but we are still matching the exact thing. So if I had a list of all of my customers credit card numbers, for instance, and wanted to make sure they didn’t leave the network, I wouldn’t want to load up all their credit card numbers in my DLP, because if I did that, then somebody could get their credit cards from my DLP. So instead, I would hash those credit card numbers individually and store them in this structured database. And that’s what I’m searching for. So as I see something that looks like a credit card number going out in email, I can hash that, compare that hash against my database, and if it matches that’s an exact data match. I would then flag that email. Next, we have document matching.
Now, this is matching based on an entire or partial document, again based on hashing. So I have my new top secret Tiger program and I have my PowerPoint explaining everything we’re going to do in it. I can create a full document match against that by hashing it. And if I see that hash file trying to leave the network, I would then block it. Now, in addition to that, I could do partial document matching where I’m looking for certain slides in it or certain pictures or certain words. And all of those would be based on smaller amounts that were then hashed and we could check for those as well. And then the final one we have is Statistical Lexicon. Now, Statistical Lexicon is a further refinement of partial document matching, which uses machine learning to also analyze a range of data sources. So we’re not just using the standard document match based on what I fed the system, but I’m also using some machine learning to make it more intelligent and do a better job.
6. Deidentification Controls (OBJ 5.1)
Deidentification. Now, in a previous lesson, I mentioned the concept of deidentification when we’re talking about privacy. In this lesson, we’re going to dig a little bit deeper into it. When I’m talking about deidentification, this is the methods and technologies that remove identifying information from data before we distribute that data. Now, the real benefit of deidentification here is to be able to take data that may be protected by privacy. And once we do the deidentification, that data now becomes usable by us again for other purposes. Now, this doesn’t violate anybody’s privacy because we are deidentifying the data. Oftentimes your deidentification is going to be implemented as part of your database design. Now, there are lots of different things we have to talk about when we talk about de identification. This includes things like data masking, tokenization aggregation, and banding and Reidentification.
Now, when we talk about data masking, this is where a de identification method is used, where a generic or placeholder label is substituted in for real data while preserving the structure or format of the original data. So let’s say you’re going to give me all your credit cards. I take all your credit cards, and I take away all of the information from your 16 digits, and I put XXX in front of all those 16 digits that would mask the data. Nobody would be able to identify that credit card anymore as yours because we don’t have the credit card. We just have XXX. That’s a form of data masking. So really, when we talk about data masking, we are covering up the data. Or maybe I have a database of all my customers, and for some reason we collected Social Security numbers. We would never do that, but let’s say we did. Well, that’s a nine digit number.
Instead of having your unique Social Security number, I might go back through the database and change all your Social Security numbers to one, one one. And by doing that, I have now genericized it across all my students to have the same number. It keeps the same format, it keeps the same structure, but it doesn’t actually take any personal information from you because I’ve erased that Social Security number. The next one we have is what’s known as tokenization. Now, this is a de identification method where a unique token is substituted in for real data. Now, when you do tokenization, one of the things you have to worry about is if you have the ability to go back and be reversible. And usually with tokenization, it is. So again, let’s say I had your Social Security numbers. Instead of changing them all to one, I assign a random number to each of my students that’s now their student ID.
That student ID is now substituted in for that Social Security number field. But I might have a master list in my safe that says this student ID matches the Social Security number. That’s what we’re talking about with tokenization. We’re using another number to represent the information. So if any of my staff go into the database and look at your Social Security number, they would just see the made up student number. They wouldn’t get your real Social Security number because that’s stored in my vault. But if I had some real business case where I needed it, I could then do the matching and then reidentify you that way. So it’s a little bit more dangerous to do. Tokenization the next one we want to talk about is aggregation and banding. Now, aggregation and Banding is where you deidentify people by gathering the data and generalizing it to protect the individuals involved.
So if we are using aggregation and banding, we might take all of our subjects in a medical trial and instead of identifying them as the person or the subject number, we would say out of the 100 people who participate in this trial, 90% of them didn’t have side effects. Now, that doesn’t mean any of those 90 quickly identifies as you. It just means somebody didn’t have side effects. It’s one of those 90. And if we knew that you didn’t have side effects, well, you’re just one of 90. We don’t know you individually, and that’s where we’re able to protect your privacy. Now, let me give you another example of the dangers of some of these things and when you have to think about de identification in terms of when somebody tries to reidentify people.
So let’s say that I went and did a corporate survey of my company. We went ahead and we sent out a survey to everybody and we said, don’t tell us your name because we don’t want to identify you. We want you to feel comfortable giving us your honest feedback. And we ask them a whole bunch of questions about the company. How do you like it here? Is the pay competitive? Do you enjoy your job? Do you like helping the students? All that kind of stuff. But then on the final question, we ask something like, what is your age? What is your sex? Are you married or not? And we get that kind of information. So, okay, that seems innocuous enough because we didn’t ask for things like your Social Security number or your employee ID or your name. So we still shouldn’t be able to identify you. So we take all the results of the survey, we shuffle them all together and we start reading through them.
This one’s a five star. This one’s a five star. This is a four and a half star. This one’s a one. Well, now I’m upset. I want to know who this one is, right? Can I reidentify them? Well, let’s say I look at them and I read through their comments and I get to the last page. It says this is a woman. This is somebody who is between the ages of 30 and 40. This is somebody who is married. Well, based on that and my small staff, I know that’s only one person in my company, and so I know the person who thinks Jason is the worst boss ever. And lo and behold, it’s my wife Tamara went and filled out the survey and leaves me a one star review. Thanks, honey. This is the kind of stuff that happens. But again, if you have this where you can reidentify somebody, then all that anonymization doesn’t really help. Now, why does this happen? Well, because we’re a small company.
We only have ten people. And so if we ask a question like that on the last page, and we don’t, to be honest, but if we did, it would be very easy for me to identify, because we only have a handful of employees. We have ten people. And so if I ask things like ten year age bands like are you between 20 and 30, 30 and 40, 40 and 50, and if you’re a male or female, and if you’re married or not, that tells me pretty much I can identify everybody down based on that result. And so that would take away the ability of having that deidentification. So this is the concept of Reidentification, right? Reidentification is an attack that combines deidentified data sets and with other data sources, things that you know to discover how secure the de identification method is. And so if we use that system in our company, that would not be secure.
Now, if I use that same system in my last job where I worked with 400 other people, it would have been very secure, because there was a lot more people who might have been a woman who was married between 30 and 40 years old. And so it’d be very easy for them to hide in the bulk out of those 400 people of that company, that probably signifies about 50 or 60 people. And so I wouldn’t be able to identify you individually asking those questions. So when you’re building out surveys, when you’re building out systems to have a deidentification in place, you need to think these things through, because sometimes something that seems like it would work because it works at a large company won’t work at a small company, or vice versa.
7. DRM and Watermarking (OBJ 5.1)
DRM and watermarking. In this lesson, we are going to talk about digital rights management and watermarking, and why these are important. Now, in a lot of things, we want to be able to protect our data and DLP systems are great for that. It can help protect things from leaving our network. But in some companies, our product has to leave the network, and our product is digital. For instance, right now you’re watching this course. This product is digital product, and it has to use DRM digital rights management to protect our product, because otherwise the product itself is very easy to copy. If you look at any kind of movie or music, you can find torrents online that illegally allow you to download them. That’s what people are trying to fight when you start talking about DRM and watermarking.
Now, when we deal with digital rights management or DRM, these are copyright protection technologies for digital media, which attempts to mitigate the risk of unauthorized copies being distributed. Essentially, you’re a movie studio, and you don’t want the latest copy of Batman to be released to everybody in the world. You want to sell it in theaters so you can make money off of it, because you invested a lot of money building that thing. And so that’s what DRM is all about. Now, DRM can be implemented in two different ways. You can use a hardware approach or a software approach. If you’re using a hardware approach, this might require you to have an authorized player. For instance, I can’t take a PlayStation game and play it on something that’s not a PlayStation. Those are required to be played on that player. The same thing happened in the early days of DVDs.
They were region coded. If I bought one for America, I couldn’t play it in Asia. If I bought one in Asia, I couldn’t play it in America. Those had to be played on an authorized piece of hardware. Now, on the terms of software, a lot of times this will be done based on a software viewer. If you bought the digital textbook for this course from CompTIA, you have to look at it through their software viewer, which happens to be a website. This allows them to prevent copying and being able to send it across to all different devices across the internet, so they can sell that product and keep making money, because that’s part of their business model, is making money, because they invested tens and thousands and hundreds of $1,000 in making this book and making this textbook.
And they don’t want to just give it away for free. They need to be able to recoup the money they spent and be able to pay their authors and things like that. So there are different security mechanisms in place, whether they’re hardware or software, and DRM will use both of those. Now, the other way we can protect our content is doing watermarking. Now, watermarking is a method and technology that applies a unique antitamper signature or message to a copy of a document. For instance, when I go and make these courses, we go out and try to find images that would look good for you to be able to see the points we’re trying to make. So I might get a picture like this. Now, notice there’s a big watermark across the center shutterstock.
That’s who we buy it from. Now, if you don’t buy the image, you have this big watermark, and it’s illegal to use. Now, in this case, I’ve bought this image, so I’m allowed to use it. Now, when I buy the image, I’m allowed to download it without the watermark, and it looks like this. So it’s the exact same image. There’s just this big word right across the center of it. One of the companies I work with, Excelos, they do the same thing with their textbook books. We, as partners who make courses for them, get advanced copies of the textbook. When they send that to us as a PDF, there’s a big thing across the center that says, not for redistribution, for partner use only. And it has my name on every single page of the 400 pages of the textbook. So if that got out online, they can then say, ah, Jason gave away our textbook for free. Let’s sue Jason. That’s the idea here. When you start dealing with watermarking. Now, it doesn’t have to be in your face like this, though. It can also be a forensic watermark.
Now, a forensic watermark is a digital watermark that can defeat attempts at removal by cropping pages or images in the file. Essentially, it’s a small hidden thing that you can’t really see. Or it can just be a couple of bytes of code that are in the file itself. So you might have a PDF, and you take that PDF and you don’t see anything that’s on a watermark. And so it looks like a real official copy of a textbook. But when you send that out, it automatically embeds your IP into it, or it embeds some kind of signature that identifies you or it identifies the person who made it, and so they can tell who owns that file originally. There’s lots of different ways that you can do this when you’re dealing with forensic watermarks. But just keep in mind that this is a valid solution for some of your corporate documents if you’re trying to protect them and you want to go beyond just using DLP.
8. Analyzing Share Permissions (OBJ 5.1)
Analyzing Share Permissions in this lesson, I’m going to show you how we can analyze the Share Permissions for our different file servers. Now, to do this, we’re first going to go into our Server Manager. Once we’re in there, we’re going to click on Manage and then add Servers. Then we’re going to type Ms One and click Find. Now, Ms One is a Windows 2016 server in my lab environment. Once I have that, I’m going to click click it and then hit the right arrow to add it to the right pane and then click OK, this will add that file server into my Server Manager Dashboard. Now that it’s in my server manager dashboard, I can go to File and Storage Services, then click on Shares and then click on Audit. From here we’re going to right click on Audit and select Properties. Once we’re there, we’ll select Permissions. And this is where we can start looking at the permissions for those shares.
Now, as you can see here, the SEC Glo Audit group, which we added in a previous lesson, has permissions of full control, not read only. So let’s go ahead and change that. Next, let’s click customize permissions. And note here that SEC Glo audit has been granted full control using NTFS file system permissions. These have been applied locally. Now, these NTFS File System permissions do propagate to child objects underneath and to containers as you’re seeing here. Or they can also be set separately. Now, if we go and click on the Share tab, we can also look at the permissions for that sharedrive. These are simple. Share permissions here. It allows for full control to authenticated users.This permission is going to apply to anyone who accesses this sharedrive over the network, whereas the NTFS permissions are when anybody accesses that domain controller locally.
That’s the difference. Now, it is considered a standard practice in Windows Administration to allow very permissible Share permissions over the network, such as Full control here, and then more constrictive or restrictive permissions for NTFS File System permissions for the local drives. So from here, we’re going to go ahead and click Cancel. And this way we’re not going to apply any changes because we didn’t really make any changes. So now if we want to go ahead and fix some of these Share Permissions to the way we want them to be, we can do that by running the command inside of PowerShell. So let’s go into PowerShell and we’re going to go ahead and type in Cmdcaclsmsignlab Files. That’s my share access. Then slash Grantr for read access instead of full control. Then quote SEC Glo Audit parentheses Oi, parentheses, parenthesesci, parentheses R.
So that command is going to go ahead and apply those Read permissions to the security group that we are using SEC Glow Audit. Now, once we do that, we want to verify our permissions. And again, we can do this using the ICAX. And again, we can do this using the Icacls command. So we’ll type in CMDC icaclslash Ms onecdinelab files. So again, we’re going to see what the ACLs look like for this particular folder on that sharedrive. Once we do that, we can check the output and we can verify that we now have read permissions, not full control, because we use that grant colon R for read. And we use that Oi CI flags, which means the permission is inherited for objects which are files and containers which are subfolders within that share.
Interesting posts
The Growing Demand for IT Certifications in the Fintech Industry
The fintech industry is experiencing an unprecedented boom, driven by the relentless pace of technological innovation and the increasing integration of financial services with digital platforms. As the lines between finance and technology blur, the need for highly skilled professionals who can navigate both worlds is greater than ever. One of the most effective ways… Read More »
CompTIA Security+ vs. CEH: Entry-Level Cybersecurity Certifications Compared
In today’s digital world, cybersecurity is no longer just a technical concern; it’s a critical business priority. With cyber threats evolving rapidly, organizations of all sizes are seeking skilled professionals to protect their digital assets. For those looking to break into the cybersecurity field, earning a certification is a great way to validate your skills… Read More »
The Evolving Role of ITIL: What’s New in ITIL 4 Managing Professional Transition Exam?
If you’ve been in the IT service management (ITSM) world for a while, you’ve probably heard of ITIL – the framework that’s been guiding IT professionals in delivering high-quality services for decades. The Information Technology Infrastructure Library (ITIL) has evolved significantly over the years, and its latest iteration, ITIL 4, marks a substantial shift in… Read More »
SASE and Zero Trust: How New Security Architectures are Shaping Cisco’s CyberOps Certification
As cybersecurity threats become increasingly sophisticated and pervasive, traditional security models are proving inadequate for today’s complex digital environments. To address these challenges, modern security frameworks such as SASE (Secure Access Service Edge) and Zero Trust are revolutionizing how organizations protect their networks and data. Recognizing the shift towards these advanced security architectures, Cisco has… Read More »
CompTIA’s CASP+ (CAS-004) Gets Tougher: What’s New in Advanced Security Practitioner Certification?
The cybersecurity landscape is constantly evolving, and with it, the certifications that validate the expertise of security professionals must adapt to address new challenges and technologies. CompTIA’s CASP+ (CompTIA Advanced Security Practitioner) certification has long been a hallmark of advanced knowledge in cybersecurity, distinguishing those who are capable of designing, implementing, and managing enterprise-level security… Read More »
Azure DevOps Engineer Expert Certification: What’s Changed in the New AZ-400 Exam Blueprint?
The cloud landscape is evolving at a breakneck pace, and with it, the certifications that validate an IT professional’s skills. One such certification is the Microsoft Certified: DevOps Engineer Expert, which is validated through the AZ-400 exam. This exam has undergone significant changes to reflect the latest trends, tools, and methodologies in the DevOps world.… Read More »