ISACA CISM – Domain 04 – Information Security Incident Management Part 8
45. Lesson 8: Developing an Incident Response Plan
Now, in this lesson, we’re going to talk about developing an Incident Response plan. So what we’ll do is we’ll talk about the elements of the IRP or incident Response plan, which will also include a discussion about gap analysis, the business impact analysis, an escalation process, help desk processes, how to organize training, and equip the staff that are going to be involved in all aspects of security, which include their ability to notify us of an incident, and also how to look at some of the other challenges that we might find when it comes to incident management and what we should think about those challenges as we’re coming up with this incident response plan.
46. Elements of an Incident Response Plan
All right, so first, of course, I said are the elements of the IRP and what we’re going to do is we’re going to take a look at some of the some guidelines. And by the way, some of these guidelines go back to as late as 1990. But it’s kind of we’re looking at a six phase model of incident response which starts off with the preparation. Now this is basically the idea of preparation is, as it says, to prepare the organization to be able to develop this basic response plan. And the more we prepare, it’s kind of like if you were a chef at a restaurant, you know that you don’t chop up all the foods and get them all ready as soon as the customer orders. That’s all done with what they call food prep in the morning. It’s preparing. It’s planning on developing this plan prior to an incident happening.
So we’re not doing all this work at the time an incident occurs. Hate to use a food as an example, but maybe I need to go eat lunch or something like that. It’s designed to help us maybe establish an approach to how we want to handle an incident as they occur. It may even help us through planning on coming up with policies as far as maybe even warning banners and information, how to set up lines of communications. And again, we’re just setting all this up in preparation for the plan. Looking at criteria for reporting what’s important for us to make sure we have good documentation in. Is there a process to activate the team or what would that process be? How are we going to send that out? I mean, in the old days and I go back, unfortunately, a while we used to have pagers.
We didn’t have cell phones in the old days and we had to carry our pagers with us everywhere something went off. Then we had to go find a landline to call, maybe even making sure you have the adequate equipment. Kind of hard to develop a plan if you don’t have a guarantee that you’ll have certain equipment that you’re going to use throughout this response. From there, the next step is identification. And this phase is basically trying to verify if an incident has happened, how to determine, right, what’s the criteria for determining that an incident has occurred. Not all reports we get are going to be valid incidences. They could be false alarms, which is I think, better to get them than for somebody to say, oh, it’s probably a false alarm and it really be there.
So I’d rather have the false positive, I guess is what we call one of the things. Remember though, is we have people in our team with different skill sets and are tasked with certain actions. And so we might say that we have to decide who’s going to have ownership of an incident or a potential incident. Verify, I guess you could say the reports or events qualify. Like I said, could be a false positive, but that’s fine. Like I said, I’d rather deal with that. Keeps you busy. Oh. If there’s going to be evidentiary issues, you have to have some sort of way of identifying the chain of custody, and that’s basically a way of preserving evidence that you may be collecting. And the other part of identification is to know when to escalate.
Another thing that we’ll talk about as we continue to move on, another part of the plan is containment. Containment? Well, first of all, once you have confirmed that you do have an actual incident, I guess the next thing we have to do is decide who to contact. I’m going to imagine that’s the incident management team that we’re going to activate, they’re going to go into action, and part of that containment is to determine how to limit the exposure, as we’ve said before. So again, now, besides the containment, sometimes under containment, besides the IMT, when we say who to contact, that might be where we look at. Well, there’s a term we haven’t used very often, stakeholders.
The stakeholder, by the way, is just about anybody who has an interest in that business operating and making money. Of course, the other part of containment is knowing that we have certain actions that we need to be able to take, and of course, that’s going to depend on the threats of the actual event that’s involved in that process. Even through containment, we still may have to consider that we’re going to be gathering evidence. So that’s probably another part of what we’re going to be looking at as this response plan, that we don’t want to destroy potential evidence during the containment process, if at all possible. And remember that we have to be able to record what’s going on for later documentation. The fourth part of the elements is the eradication.
Now, eradication should come logically after containment, right? We’re trying to contain the threat, limit the loss, and then once we’ve got it under control, the goal is to eradicate this part of that. In order to eradicate it is you need to know what the root cause was. If you don’t know what the root cause was, it’s going to be very difficult for you to be able to eliminate it. If the consideration was I found out a bunch of my workstations in my building were part of a botnet, so I’ve got all these workstations sitting out here, and they’re doing the bid of some bot manager. So let’s just call it remote access trojan or rat. Maybe they have their connections in there. All right, well, containment could be easy enough by just blocking the IP address of the person that’s in control.
I can do that through the firewall. But the problem is they didn’t eradicate it because a lot of the software is very tricky, and we’ll find another path of communication to be able to get out to that controller. And so that’s where we want to find out what the root cause is and figure out how to eliminate it. So there’s no chance from these perspectives of even trying to make that contact anymore, at least hopefully that makes some sense here as far as an example of the eradication. So again, that’s where you find the signs and cause, locate maybe backups or alternative solutions. Maybe here you’re going to just reimage these workstations so that whatever the offensive software is, would be gone. As long as you have a good image.
That might actually be a part of what I would call recovery, which is trying to restore the systems or services back into production. Some people might say trying to just bring them back to normal, whatever normal is. Baseline would be what I would call that normal part. And you also need to find a way to validate at this point that your recovery was successful. Validation may mean there should be a testing process that you go through to make sure everything looks to be running the way that it’s supposed to be. And then finally the documentation, right, the reporting and discussions, not just amongst the team, but with the stakeholders, with the management.
And some of you might call that an incident report. And I know writing, basically writing papers isn’t for everybody, but you know what you’re going to learn from it. We can use that incident report to kind of help analyze what was wrong before. It gives us the ability, like I said, to make sure that there’s no repeat of that particular incident. And stakeholders are going to be looking at this because they’re not going to be involved with you on the team actively trying to contain eradicate, but they are going to want to keep up to date with what’s happening.
47. Gap Analysis
So gap analysis is a pretty straightforward little definition. I remember the old days as a kid when we were watching pirate movies, we used to have this pirate map with this fancy looking X and we always said X marks the spot. And in this case that’s where the treasure is, right where I want to be, where I want to get to. And you know, over here with this little usually it was a red dot and it was kind of the little thing saying you are here in today’s world. I guess it sounds like some of the cartoons my kids watch. Nonetheless, what we need to do is gather some information with a gap analysis about the current right where we are right now, response capabilities with the desired level. Desired level, let’s see if I can go around all this is to where we want to go.
And so that means that we have all of these questions in between, and that’s where we’re going to be gathering the information. And in that process, part of that process is that we’re going to see what needs to be improved and hopefully we can find a way to be efficient or effective in what we’re doing throughout this gap. In this gap of trying to follow the map of the little trail or however it is we’re getting there. We’re going to identify maybe technologies that we need, whether hardware, software or some other solution depending again on what we’re trying to get to, which I guess you could say other resources that are needed along the way. I’m still sticking with my treasure map because I’m having fun with this little thing of quicksand, right? Maybe the tool I need is a bridge to get over the top of that thing. Hopefully that kind of makes some sense.
Basically we’re doing this report for the planning purposes to know what steps we need to be able to resolve what that gap is, knowing that maybe we can’t foreseeably get there because some of these technologies as we know might cost some money. But at least I guess you can look at it this way if you don’t know where you want to go, you’re never going to get there. And I’ve started to keep throwing these analogies out at you as you’re going through this, but with trying to be kind of even vendor neutral or realizing that if we were just to talk about figuring out how to secure your services through fire prevention, I mean, what does that mean? Sprinkler technology, more fire hydrants outside the building, more fire extinguishers using noncombustible materials in the walls. I mean, so that’s why where I’m using kind of more of an analogy because we could just talk about this gap analysis with almost any type of threat that could be out there.
48. BIA Part1
The next part that we look at is the business impact assessment. Some people also call it a business impact analysis. In fact, you’ll see that term being used basically over and over again. It’s basically a way of reporting the impact an incident could have. Now here’s the thing. You’ve got to start off, I think, first with the criticality or prioritization. What’s the most important thing that at your company? And keep in mind we are still focused on the company’s needs. What’s the most important thing that keeps this company going? If I were to tell you that as a bank organization, and I have all of my customer accounts in a database and that’s got all of their balances and how much money they have, and it’s that information that all these little ATM machines are out there using to dispense money to the people that are there.
Or maybe they’re at a bank branch. In the old days, we didn’t have this. If I made a deposit, it would be a week before I saw the deposit because they’d read it on a ledger and then reconcile it at some central location and update records that way. And interestingly enough, if I went to another branch of that same bank, they’d have to come on the phone to see how much money I had and whether or not they’d let me have it today. It’s as we know, it’s all electronic. And how important is that to us? Would you say that on the prioritization or criticality that information in the database for you as a bank would be at the top of the list as opposed to somebody calls in and complains because this ATM lost its network connection and it’s not working.
All right, well at that point you might say, hey, that’s too bad, it’s so low on the list that just tell them we’re sorry and go use another one across the street. There’s ways of looking at it, but that’s the first part and one of the first goals anyway of doing this business impact assessment. And again, the higher the impact as well as the higher the priority. Now from that point, we need to also look at the downtime estimation. And I’ve used these terms a few times. One, like I said, is the maximum tolerable downtime. All right, so let’s say that something caused this poor database server to crash and I asked you what’s the maximum tolerable downtime? In other words, I’m over here going into the bank or I’m using the ATM that does work, or at least it’s supposed to work, and I go to withdraw money and it says can’t connect, can’t get my money.
I go into the bank branch and the bank branch people say computers are down or whatever the case may be. Am I willing to wait ten minutes for the computers to come up? Maybe. Potentially. Depends on how much I need the money, I guess. And, and other factors. Would I wait a day to be able to access the money in my bank? Probably not. After a day of not being able to get funds that belong to me as a customer, I may decide to go to bank B once you do get the machines up and I do get my money back and hope that they have a better system. And so when we think of maximum tolerable downtime, it’s basically what we would say consists of the maximum tolerable outage that a business can tolerate and still remain in business.
That’s the longest unavailability, I should say, of a critical process service information asset that if it goes beyond that, we’re doomed. We may never recover. And so that’s an important part of that business impact assessment of what we need to look at. And so now that we know that, we’ve got to figure out what can we do if an incident occurs and be able to survive the maximum tolerable outage. And so that’s where we’re going to ask what are the resource requirements for these critical processes? What’s the most time sensitive, highest impact process that needs the most resources? You know what it might be? It might be to say because of that potential problem, let’s have a duplicate or mirrored database server.
Let’s use something simple like just being able to do log sharing and have a network that is resilient enough that my bank branches or even my ATMs can follow a different path to be able to still get that information and have it, even me, look transparent. That’s why I talked about that one bank that had three different data centers both sides of the country and in the middle they had recognized that as expensive as it was, that was a resource they had to pay for. Because if they didn’t and they hit that maximum tolerable outage, then we’re talking about thousands of branches just basically dying and the bank going away.
49. BIA Part2
So we talked about the goals of the business impact assessment. Now let’s take a look at the activities. One of the first things we have to do is basically gather assessment material. You might even call that prioritization. It’s the initial step and like I said, it is this information the prioritization that we want to gather. We want to know what’s critical to the organization. And so what we’re doing is we’re drilling down to the critical task asks and then what we need to do is analyze the information that has been compiled. Now again, that may be a series of documentation maybe documenting the process we went through to get to this point to decide what are all possible disruptions, at least ones that we can think of. Now here’s a big reason why we want to have a team involved. Because a disruption from a firewall person may be different than a disruption to a server side operator, different to the person guarding the front door of your business.
They may all have their input for what the possible disruptions and how that could affect the mechanism. That means that by knowing what those disruptions are you should then be able to help know what those threats are. And sometimes we get into this qualitative. I’ll just put call or quantitative discussion where qualitative is the potential, the likelihood and the best opinion of how bad it could be. Quantitative we can add real numbers, hard dollar signs to the damage. And then the other things we’ll analyze are what we could do to restore what types of activities, what are the solutions? Is it having a backup server? Is it having a virtualization, is it having a hot site, a mirrored site? Is it just having good backup? I mean lots of different things we could throw out there.
And then the last part of analyzing it I guess you could say is and by the way these were all of the alternatives that were there for restoration as well that I was just talking about. And then of course we should have a way of being able to give a rationale for the threats. If you were going to tell me that a meteor might come out of the sky and hit the building and destroy my mainframe, okay, give me the rationale how often as that happened and have you been talking to NASA or something like that? And then of course from there we’re going to document the results of this analysis and present this as recommendations. And remember that we’re going to use this as a way of beginning the incident management process.
Interesting posts
The Growing Demand for IT Certifications in the Fintech Industry
The fintech industry is experiencing an unprecedented boom, driven by the relentless pace of technological innovation and the increasing integration of financial services with digital platforms. As the lines between finance and technology blur, the need for highly skilled professionals who can navigate both worlds is greater than ever. One of the most effective ways… Read More »
CompTIA Security+ vs. CEH: Entry-Level Cybersecurity Certifications Compared
In today’s digital world, cybersecurity is no longer just a technical concern; it’s a critical business priority. With cyber threats evolving rapidly, organizations of all sizes are seeking skilled professionals to protect their digital assets. For those looking to break into the cybersecurity field, earning a certification is a great way to validate your skills… Read More »
The Evolving Role of ITIL: What’s New in ITIL 4 Managing Professional Transition Exam?
If you’ve been in the IT service management (ITSM) world for a while, you’ve probably heard of ITIL – the framework that’s been guiding IT professionals in delivering high-quality services for decades. The Information Technology Infrastructure Library (ITIL) has evolved significantly over the years, and its latest iteration, ITIL 4, marks a substantial shift in… Read More »
SASE and Zero Trust: How New Security Architectures are Shaping Cisco’s CyberOps Certification
As cybersecurity threats become increasingly sophisticated and pervasive, traditional security models are proving inadequate for today’s complex digital environments. To address these challenges, modern security frameworks such as SASE (Secure Access Service Edge) and Zero Trust are revolutionizing how organizations protect their networks and data. Recognizing the shift towards these advanced security architectures, Cisco has… Read More »
CompTIA’s CASP+ (CAS-004) Gets Tougher: What’s New in Advanced Security Practitioner Certification?
The cybersecurity landscape is constantly evolving, and with it, the certifications that validate the expertise of security professionals must adapt to address new challenges and technologies. CompTIA’s CASP+ (CompTIA Advanced Security Practitioner) certification has long been a hallmark of advanced knowledge in cybersecurity, distinguishing those who are capable of designing, implementing, and managing enterprise-level security… Read More »
Azure DevOps Engineer Expert Certification: What’s Changed in the New AZ-400 Exam Blueprint?
The cloud landscape is evolving at a breakneck pace, and with it, the certifications that validate an IT professional’s skills. One such certification is the Microsoft Certified: DevOps Engineer Expert, which is validated through the AZ-400 exam. This exam has undergone significant changes to reflect the latest trends, tools, and methodologies in the DevOps world.… Read More »