CompTIA CYSA+ CS0-002 – Detection and Containment Part 2
5. Impact Analysis (OBJ 3.1)
Impact analysis. When we talk about impact analysis, this is a really important concept as part of our triage function. Now, when we talk about triage, really what we’re focused on is how do you look at an event and decide how severe it is and how much priority you need to give it? After all, you only have so many resources. Now, triage is a concept that originally started with the military. Military doctors have limited resources when they’re on the battlefield. And so if they have ten different injuries come into them and they only have enough resources to help three people, they need to go through very quickly and figure out who is the most severe case, who needs assistance right now, and who can wait a couple of hours or who can wait a couple of days or a couple of weeks.
That’s the idea of triage. Now, this is really important, and hospitals do this all the time. If you go to the emergency room right now, they’re going to look at you when you first check in. And if you have a big gaping wound that’s gushing blood, you’re going to get seen right away. If you’re having a major heart attack, they’re going to help you right away. But if you go in and you have a high fever and a temperature and you think you might have the flu, well, you might be sitting there for several hours until a doctor has time to see you, because your case is a lower priority than the person who needs urgent medical assistance because they have a gushing wound or they’re having a heart attack. And that’s the idea of triage. Now, as we go through and we start thinking about triage, we’re going to consider different categories.
For example, is there damage to data integrity? Is there unauthorized changes? Is there a theft of data or resources? Is there a disclosure of confidential data? Is there going to be an interruption of services or will there be system downtime? All of these are things that we have to consider as we are doing our triage. Now, as I went through this list, you might have saw that last one with system downtime and thought, that doesn’t seem so bad. What’s the big deal if the system is down for a few minutes or a few hours? Well, in reality, system downtime is very expensive and costs our companies a lot of money. Back in 2019, there was a survey done and they started looking at the average cost per hour of downtime. As you can see here on the chart, the average with about 25% was somewhere around $300 to $400,000.
Now, that is a lot of money for 1 hour of downtime. Some companies, though, were as high as $5 million or more for just 1 hour of downtime, with 15% of the respondents saying that. Now, if you look down at the up to $10,000. 00, people said it cost them $10,000 or less for 1 hour of downtime of those larger companies surveyed. So you can see that downtime is a costly factor. And as instant responders, we have to keep that in mind. Because if we have a system that is completely down and it stops us from being able to take in orders that might be something that we need to prioritize resources to quickly to get that system back online and into a safe configuration. Now, Triage and categorization is going to be done based on one of two things it’s either going to be impact based or taxonomy based.
And these two approaches are different ways for us to consider how we want to prioritize things. If we’re using an impact based approach this categorization approach is going to focus on the severity of an incident such as an emergency, significant, moderate or low priority. If we’re using a taxonomy based approach this approach is going to define the instant categories at the top level. For example, we might call them a worm outbreak or a phishing attempt or a DDoS or an external host or account compromise or internal privilege abuse. Based on those different categories we would have different procedures and different priorities based on the different categories we’re using. Now, when you’re using an impact based approach you’re going to categorize the incident based on the scope and the cost and this is usually the preferred way of doing this inside the industry.
If it’s a large scope and a very large cost it’s going to have a higher priority. If it’s a smaller scope and it’s going to have a smaller cost then we might delay our response a little bit because we’re going to deal with other things that are more important. Now, in my real world experience most organizations seem to use the impact based approach and when they do that they’re going to use four main categories of impact that you can look at. These are the organizational impact, the localized impact, the immediate impact and the total impact. Let’s take a look at these four areas. First, organizational impact. Now, this occurs when you have an incident that affects mission essential functions and therefore the organization can’t operate as it’s intended to.
When you’re considering an organizational impact this is something that affects a wide range of users across your organization. For example, if my email server went down that affects pretty much everybody in the organization. So it has an organizational impact, that’s a big deal. Then we go and look at localized impact. This is an incident that is limited in scope to a single department, a small user group or just a few systems. For example, if my customer service manager’s computer has been hacked that is one system that is localized to her issues. And so while it’s a bad thing for us and we need to solve it, it’s not nearly as bad as something as large as all of the email across the company is. Now, one thing you have to be careful of.
Don’t always assume that a localized impact is necessarily less important or less costly than an organizational impact. Now the reason why I say this is if I have a single computer or a single server that’s affected, that may be a localized impact, but it may have less of an impact than a larger organizational impact. A good example of this might be our payroll system. We have one computer that runs our payroll and every Friday we send out paychecks for all of our employees. Well, if that system was hacked on Thursday, that even though it’s a localized impact and has one function that is being affected, one small part of the business, in our case, payroll, that would be a bigger issue for us than the entire email server being down right now because we have to get those paychecks out on Friday.
That’s the kind of things you have to think about and why you need a thinking human being put in place to make these decisions. Because it’s not as easy as saying this affects ten computers. That affects one computer, therefore ten is more than one. That’s a bigger issue. It’s not necessarily that easy. It’s always more important to think about what functions do those machines and systems serve inside of the organization or inside that local area. The next thing we want to talk about is immediate impact. Now immediate impact is when an incident is measured based on the direct cost incurred because of the incident, such as downtime, asset damage, penalties and fees. Based on these things, you’re going to have some kind of dollar value and that allows you to quantitatively assess how important this thing is to get fixed and how much time you can delay.
The next one we’re going to talk about is total impact. Now, a total impact is going to measure that incident based on the costs that arise both during and following the incident, including the damage to the company’s reputation. So we take all that immediate impact stuff and we add all the long term stuff as well. Let me give you a good example of this. Now a lot of companies have had data breaches and it’s bad and there’s going to be a long term impact to them, but it’s not necessarily as severe depending on what industry they’re in. But in my company, we’re an online education company and so if we look at all the online education companies out there and they had a data breach, for them, it probably wouldn’t be that bad.
They would say they’re sorry to their customers, they pay some fines, they would do some remediation and they would work on it and they would move on and there would be no big issue. But if my company had a big data breach, that would have longer lasting effects and have a higher total impact. Why? Because we teach cybersecurity and if the cybersecurity training company gets hacked, that’s going to look really bad for us, that’s going to tarnish our reputation and so that is going to have longer total impact effects to us than a similar company who teaches something like how to draw or how to play piano or something like that. And so you have to think about your industry and what that total impact will be and that is going to impact your categorization as well.
6. Incident Classification (OBJ 4.2)
Incident classification. So now that we’ve prioritized things, we need to classify our incidents. Now, some organizations are going to add additional layers of incident classification depending on their particular needs. Now, there are lots of different ways to classify your incident, but some of the most common are things like data integrity, system process criticality, downtime, economic data correlation, reverse engineering, recovery time, and detection time. Let’s take a look at each of those in this lesson. First, we have data integrity. This is any incident where the data is modified or loses integrity. I like to think about this as my checking account. If my checking account has $1,000 in it and somebody went in and modified that and made it say that I had $10 in it, that would be a modification of the data integrity of my checking account balance.
That would make me upset because I just lost a bunch of money, right? And so that is the issue here. When you start dealing with data integrity, if you can’t trust the data, that is going to be an incident of this type. Second, we have system process criticality. Now this is an incident that disrupts or threatens a mission essential business function. So in my company, one of our essential business functions is being able to deliver training to you, our students. If somebody took down our video server, that doesn’t necessarily cost us money initially, but the longer it’s down, it’s going to cost us more money and a lot of upset customers. That is a system process criticality. It disrupts or threatens our mission essential business function which for us is delivering training to you.
Next, we have downtime. And downtime is an incident that degrades or interrupts the availability of an asset system or business process. Now, there’s lots of ways to create downtime. It can be a technical thing such as your system is going to be overrun by denial of service attack or it can be a nontechnical thing. For example, let’s say you call up an airline and you have a lot of people who start calling that airline all at the same time. Well, that can create a very long queue for you to get to a customer service agent. And if there’s an excessively long queue, like four or 5 hours, that essentially is going to be downtime and that is going to make that system be interrupted because there is not the availability of agents to support your call. This again would be going against the business process as opposed to going against a particular asset or system.
Next we have economic. When we look at our categories, we also have to consider the economics of things. This is when an incident creates a short term or long term cost. As I mentioned in the last lesson, it can cost hundreds of thousands of dollars per hour when there’s downtime on a system. So even though you have downtime, there is the physical downtime associated with it. There’s also an economic cost to that as well. If I went and hacked your system and destroyed your data on a server, that might cause downtime. But even if it didn’t, it’s still going to have an economic cost to you because you have to recover that data and that requires contractors or employees spending valuable time to bring that stuff back to the way it was.
Next, we have data correlation. Now, data correlation is an incident that’s linked to a specific TTP or tactics, techniques and procedures of known adversary groups with extensive capabilities. So if I’m looking at a particular indicator or compromise inside my scene based on an alert, that fired. And I say that that thing has been known to happen with Apt 28, which is a nation state actor that would have the hair on the back of my neck standing up a little bit more, that’s going to have a higher priority because that is a well known actor that has a lot of capabilities. And so I need to really be thinking how I’m going to get them out of my network. The next one we have is reverse engineering. This is any incident which the capabilities of the malware are discovered to be linked to an adversary group.
So, again, if you have some kind of malware in your system and as you’re reverse engineering it, you’re able to determine this goes with Apt XYZ. That tells you this is an advanced persistent threat. It’s a nation state or a large criminal organization. And now it’s going to be more significant than somebody who’s just simply stumbled onto your system because they’re a script kitty. So these are the kind of things you want to think about as you’re doing your categories. Next, we have recovery time. And recovery time is an incident that requires extensive recovery time due to its scope or severity. This would mean we’re going to have a higher recovery time, therefore a higher priority because that means we’re going to have longer downtime, right? And more costs because the longer it takes me to recover and get back to normal, the more cost and time is going to be associated with it.
And the final category we have is detection time. If you have a very high detection time, meaning it takes a long time to discover something, this incident would not be discovered quickly, making it more dangerous because if there’s a bad guy in my system and they’re there for a long time before I find them, they have a lot more chance to do bad things. As an incident responder, we are often racing against the clock when dealing with intrusions incidents and possible data breaches. In fact, according to one study, only 10% of data breaches were discovered within the first hour, meaning that person was in your system for at least an hour 90% of the time before they were discovered. Now, as we go further into that study, it actually shows that 20% of the incidents took days to discover, and then 40% took months to discover.
So almost 50% of these incidents can go on for months or years at a time before being discovered, which gives the attacker a lot of time to do things on your system. Now, why is all this important? Because nearly 40% of adversaries had successfully exfiltrated data within minutes of starting their attack. And so, as I said, only 10% were found within an hour, which means 90% went longer than an hour. And so that means 40% of the time they are definitely going to have your information. And so detection time is really important because the adversaries are beating the defenders a lot of the time. And so we have to continue to work on getting our detection times down. And therefore, if you have a high detection time, you have a higher priority incident, because they’re going to have a chance to do a lot more damage to you.
7. Containment (OBJ 4.2)
Containment. In this lesson, we are going to talk about containment, which is our next step in the incident response phases. Now, rapid containment in an instant response is really important because if we don’t contain things quickly, the adversary can get into our network and start pivoting around and laterally spreading across the network and cause more damage. When we talk about containment, our job here is to limit the scope and magnitude of the incident by securing data and limiting the impact to the business operations and our customers. Now, there are five key steps for conducting containment. First, we want to ensure the safety and security of all of our personnel. This is always going to be the first concern for management and executives because we can’t replace people, but we can replace data and technology.
Second, we want to prevent an ongoing intrusion or data breach. Now that we know our people are safe, we want to protect our information and we want to stop any ongoing intrusions to prevent them from further exfiltrating data. The third thing we want to do is identify if the intrusion is the primary or secondary attack. Sometimes as you detect something, you think you found the main cause, but you didn’t. You found the secondary attack. And then you have to go back and find out how did they initially get in and what other things are they doing. For example, if you see that they’ve gotten into your network and they’re exfiltrating unimportant data, is that really the attack they were going for? Or were they using that as a way to hide their true intentions of going after some important data store instead?
Fourth, we want to avoid alerting the attacker that the attack has been discovered. Now, this is important because some attackers, when they’re discovered, will actually destroy your systems. There is one apt that is known for doing what we call burning down the house. If they’re discovered, they will actually go and start formatting all your systems and destroying everything. This is burning down the house and we don’t want them to do that to our systems. So we don’t want to tip our hand and let the adversary know that we’ve detected them. Instead, we want to make sure we’ve cut off their ability to do any harm to us before they figure out that we know they’re there. And then fifth, we want to preserve any forensic evidence of the intrusion and attack because that evidence could be useful for law enforcement and for other things.
Now, notice these five steps are in priority order. We want to make sure we are safe, we stop the ongoing breach, we identify the primary or secondary attack, and then we want to avoid alerting the attacker. And finally, preserving evidence is our last thing. Now, as we go forward and we start thinking about containment, there’s really two categories of containment we can do. There is isolation and segmentation. Now, when we talk about isolation. This is a mitigation strategy that involves removing an affected component from whatever larger environment it’s a part of. So if I have a server and I think it’s been compromised, I can physically take it out of the environment and cut it off. That way the attacker can’t access it, but neither can my employees.
Now, when you’re doing this, you need to ensure that there is no longer an interface between the affected component and your production network or the Internet. One of the most common ways of doing this is by creating an air gap. You can do this by turning off a switch port or unplugging the cable directly from the switch port because this will make sure that nobody can talk to that device anymore. Now, creating an air gap is the least stealthy option though, and it will reduce your opportunities to analyze the attack or the malware because you are essentially cutting off the connection completely. So, should you use isolation? Well, if you have a high enough priority that can really do some damage, you may want to isolate immediately to prevent that damage from spreading.
Now, another way to do isolation besides unplugging the network cable is you can actually take away their permissions from an account or a service. For example, if somebody has compromised your administrator account, you can go in and disable that account. That again, is an isolation mechanism. Now, the other way we can do things is we can use segmentation. Now, segmentation is a mitigation strategy that achieves isolation of a host or a group of hosts using network technologies and architecture. Segmentation is going to use VLANs routing and subnets and firewall ACLs to prevent communications outside the protected segment. Now, this is also sometimes termed as sandboxing. Sandboxing is a security mechanism for separating a system from other critical system resources and programs.
This often can be used to test malware or other harmful applications without subjecting the rest of the network to the attack. Again, this is a form of segmentation. This can be accomplished during an instant response by redirecting all of your attacker’s activity to a workstation that can then be used to gather evidence and information about the attacker’s methods. So when we do this, we actually can have something like a Honey Pot, for instance. And if we think there is some attacker from the Internet trying to access us, we can use things like a proxy to proxy the information either into the Land Honey Pot or the DMC Honey Pot. And that way we can research what this person is doing. This type of segmentation is often used by security researchers as they start creating Honey Pots and honey nets that are designed to be attacked by hackers.
This way they can capture their techniques and create indicators of compromise for us to add to our detection systems. If you plan to use sandboxing as a technique in this way during an incident response you need to consult your organization’s legal counsel though to ensure you’re not breaking any laws when doing so, because laws do vary from place to place. Some countries consider setting these type of honey pots up in a law enforcement activity as entrapment and it wouldn’t be allowed. So you have to consider that and work with your legal team. Now, when you do this segmentation or sandboxing, you can reroute the adversary traffic as part of a deception defensive capability, something like a honey pot as we were just talking about.
Now again, I want you to consult with your senior leadership whenever you have plans for doing isolation or segmentation as part of your containment strategy. Now, one of these two is what you’re going to have to do, but which one you’re going to do is going to depend on senior leadership. This is because there are a lot of factors that we discussed here and these are only really scratching the surface. You always want to consider the impact to the business and the risk you’re taking by choosing one strategy over another for your overall containment strategy during the incident response. Now, this will have larger business impacts as well and so you always need to make sure you’re consulting Senior Leader and provide them with your recommendations based on your technical expertise and then they can make a decision based on their business experience.
Interesting posts
The Growing Demand for IT Certifications in the Fintech Industry
The fintech industry is experiencing an unprecedented boom, driven by the relentless pace of technological innovation and the increasing integration of financial services with digital platforms. As the lines between finance and technology blur, the need for highly skilled professionals who can navigate both worlds is greater than ever. One of the most effective ways… Read More »
CompTIA Security+ vs. CEH: Entry-Level Cybersecurity Certifications Compared
In today’s digital world, cybersecurity is no longer just a technical concern; it’s a critical business priority. With cyber threats evolving rapidly, organizations of all sizes are seeking skilled professionals to protect their digital assets. For those looking to break into the cybersecurity field, earning a certification is a great way to validate your skills… Read More »
The Evolving Role of ITIL: What’s New in ITIL 4 Managing Professional Transition Exam?
If you’ve been in the IT service management (ITSM) world for a while, you’ve probably heard of ITIL – the framework that’s been guiding IT professionals in delivering high-quality services for decades. The Information Technology Infrastructure Library (ITIL) has evolved significantly over the years, and its latest iteration, ITIL 4, marks a substantial shift in… Read More »
SASE and Zero Trust: How New Security Architectures are Shaping Cisco’s CyberOps Certification
As cybersecurity threats become increasingly sophisticated and pervasive, traditional security models are proving inadequate for today’s complex digital environments. To address these challenges, modern security frameworks such as SASE (Secure Access Service Edge) and Zero Trust are revolutionizing how organizations protect their networks and data. Recognizing the shift towards these advanced security architectures, Cisco has… Read More »
CompTIA’s CASP+ (CAS-004) Gets Tougher: What’s New in Advanced Security Practitioner Certification?
The cybersecurity landscape is constantly evolving, and with it, the certifications that validate the expertise of security professionals must adapt to address new challenges and technologies. CompTIA’s CASP+ (CompTIA Advanced Security Practitioner) certification has long been a hallmark of advanced knowledge in cybersecurity, distinguishing those who are capable of designing, implementing, and managing enterprise-level security… Read More »
Azure DevOps Engineer Expert Certification: What’s Changed in the New AZ-400 Exam Blueprint?
The cloud landscape is evolving at a breakneck pace, and with it, the certifications that validate an IT professional’s skills. One such certification is the Microsoft Certified: DevOps Engineer Expert, which is validated through the AZ-400 exam. This exam has undergone significant changes to reflect the latest trends, tools, and methodologies in the DevOps world.… Read More »