Amazon AWS Certified SysOps Administrator Associate – Monitoring, Auditing and Performance Part 6
11. Service Quotas Overview
So here is a very helpful service named service quotas. So service quotas is a service that tells you the quotas that you have within your account and how close you are to reaching the threshold. The cool thing about it is that you can create a cloud watch alarm on top of the service quotas console, for example, like here. And you can say hey, I want to monitor how many lambda concurrent executions I have.
And then you set up the alarm threshold and in case you reach that threshold then you’re going to get a notification. So the service quota can modify, can monitor for example, any kind of quota available within your accounts, for example the lambda concurrent execution quota. And then if you go over 900 and the limit is 1000 by default, then you should trigger a CloudWatch alarm.
And this will tell us as administrators that we should probably request a quota increase for the lambda concurrency execution because that means that we’re maybe about to be getting some kind of threatling. And so this is super, super helpful not just for lambda concurrent executions but for any kind of actions that may result in some errors or some threading. The alternative is to use trusted advisor with CloudWatch alarms. But trusted advisor is a less complete solution because there is only a limited number of service limits checks happening in trusted advisor, about 50.
And so that means that trusted advisor will only get a few limits to be able to be monitored and all the results will still be in cloud watch though. So if you wanted to have a look at some service limits you can link it to cloud watch and then again trigger a notification. But I would recommend to use cloud watch alarms mostly on the service quota service because this is dedicated to monitoring all of your quotas within your account. So I will see you in the next lecture for some practice.
12. Service Quotas Hands On
So let’s have a look at the service quotas service. So in here you can look at all your quotas within your account and they’re grouped by services that you have already used in your account. So if we go back to our example, if we look at lambda for example, we can look at concurrent execution as I said. And so this is the maximum number of events that functions can process simultaneously. In the current region the default value is 1000. Okay? But as we can see, we can adjust it and request a quota increase if we need it to, okay, directly from within the service quota UI. But you can also create an alarm on top of this quota. So for this there is a monitoring and this shows you over time the value of the actual value of concurrent execution.
Okay? And at the bottom you have Amazon CloudWatch alarms. So you can create an alarm on this and saying hey, when the alarm threshold is 900, then this is alarm for concurrent Lambda executions, then please activate this alarm. Now the alarm is created and you can configure actions directly from the alarm. So the alarm is right here and you can edit on the top right corner the actions and saying hey, when you are this oops, let’s go again edit and then scroll down and click on next. Then for actions, maybe send a notification when you are in the alarm state, choose an existing SNS topic being my demo topic and then I would receive a little notification by email when this alarm is being triggered.
So this is quite handy, but as you can see from within this entire service quota UI, we can monitor all those services within AWS and all the limits available within them, which is very very handy. And for every single limit, for many limits you can request a quota increase and the one that are sector and create a cloud watch alarm on top of them. Okay? And finally, if you wanted to do the same intrusted advisor, I would just quickly show you how it works.
And so interested advisor, you can click on Service limits and this will give you the list of all the service limits available from within trusted advisor. But there is not all the limits obviously available within your account and they are already pre configured. So as you can see, this one is a check that will check for the usage if you have more than 80% of your auto scaling group limit. Okay, so this is a pre configured check and you could create an alarm on top of it if you wanted to. But this is less flexible and less complete than the service, this quota service. So that’s it for this lecture, I hope you liked it and I will see you in the next lecture.
13. [CCP/SAA/DVA] CloudTrail
Now let’s talk about Cloud trail. So Cloud Trail is a way to get governance, compliance and audit for your AWS account. And Cloud Trail is enabled by default. This will allow you to get a history of all the events and API calls made within your AWS account by the console, by the SDK, the CLI, other services on AWS, and all these logs will be appearing in Cloud Trail. What you can do is that you can also put these logs from Cloud Trail into CloudWatch Logs or Amazon history. And you can create a trail to be applied to all regions or a single region. If you wanted to accumulate all these history of events accumulated across all the regions into one specific. For example, S three buckets.
And when we use Cloud Trail, for example, well, say someone went ahead and deleted something in AWS, for example, say that an easy to instance was being terminated and you want to figure out who did it. Well, the answer is to look into Cloud Trail because Cloud Trail will have that API call in it and we’ll be able to get to the bottom of it and understand who did what and when. So to summarize, Cloud Trail is in the middle and the actions of the SDK, the CLI or the Console, or even Im users and Im Roles or other services will be in the Cloud Trail console. We can look in it to inspect and audit what happened. And if we want to have all the events for more than 90 days, then we can send them into Cloudwash Logs or we can send them into an entry bucket.
So let me dive a little bit deeper for Cloud Trail. So we have three kinds of events that you can see in Cloud Trail. The first one is called the Management Event, and these represents operations that are performed on resources in your AWS account. For example, whenever someone configures security, they will use the API call called Im Attach Role Policy, and this will appear in Cloud Trail. If you create a subnet, this will appear as well. If you set up logging, this will appear by default. Anything that modifies your resources or your AOS account will appear in Cloud Trail.
And by default, trails are configured to log management events no matter what. You can separate two kind of management events. You have the read events that don’t modify resources. For example, someone is listing all the users in IAM or listing all the EC Two instances in EC two, these kind of things, you can separate them from write events that may modify resources. For example, someone deletes or tries to delete a DynamoDB table.
And obviously the right events have probably a lot more importance because they can wreck damages into your AWS infrastructure, whereas the read events is just to get information which are silver importance, but maybe less destructive. Then you have data events so they’re separate and by default data events are not logged because they’re high volume operations. So what are data events? Well, you have Amazon s three object level activity, for example, Get Object, delete object, put object and as you can see these can be happening a lot on the NSF buckets and so this is why they’re not logged by default and you have the option to separate again read and write events.
So a read event will be a Get object whereas a write event would be a delete object or a put object. Another kind of event you can have in Cloud Trail are AWS lambda function execution activities. So whenever someone uses the invoke API so you can get insights about how many times your lambda functions are being invoked. And again, this could be really high volumes if your lambda functions are executed a lot. And the third kind of events in Cloud Trail are called Cloud Trail Insights events. And so I will talk to you about Cloud Trails Insights in a bit more details in the next slide.
So now let’s talk about Cloud trail insights. So when we have so many management events across all types of services and so many APIs happening very very quickly in your account, it can be quite difficult to understand what looks odd, what looks unusual and what doesn’t. And so this is where Cloud Trail Insights comes in. So with Cloud Trail Insights and you have to enable it and you have to pay for it, it will analyze your events and try to detect unusual activity in your account.
For example, inaccurate resource provisioning, hitting service limits, burst of AWS im actions, gaps in periodic maintenance activity. So the way it works is that Cloud Trail will analyze what normal management activities look like to create a baseline, the baseline and then it will continuously analyze anything that is the right type of event. So whenever something is changed or tried to be changed to detect unusual patterns. So very simply the management events are going to be continuously analyzed by Cloud Trail Insights which will generate insights events in case something is detected. And so these anomalies so these insight events will appear in the Cloud Trail console. They will also be sent to Amazon s Three if you want to, and an event bridge event. So in CloudWatch Event is going to be generated in case you need to automate on top of these Cloud Trail Insights, for example, to send an email. So this is the idea behind Cloud Trail Insights. Finally, let’s talk about Cloud Trail event retentions. So events by default are stored for 90 days in Cloud Trail and then afterwards they’re deleted.
But sometimes you may want to have events for longer in case you want to go back to something that happened maybe a year ago for audit purposes. And so to keep events beyond this period, what you have to do is that you have to lug them to SRE. So send them to SRE. And then you would use Athena to analyze them.
So, very simply, all your management events, your data events and your insights events are going to go into cloud trial for 90 days retention period. And then you would log those into your three buckets for long term retention. And when you’re ready to analyze them, you would use the Athena service, which is a serverless service, to query data in S three to find the events that you’re interested in and learn more about them. Okay, so that’s it. I hope you liked it, and I will see you in the next lecture.
Interesting posts
The Growing Demand for IT Certifications in the Fintech Industry
The fintech industry is experiencing an unprecedented boom, driven by the relentless pace of technological innovation and the increasing integration of financial services with digital platforms. As the lines between finance and technology blur, the need for highly skilled professionals who can navigate both worlds is greater than ever. One of the most effective ways… Read More »
CompTIA Security+ vs. CEH: Entry-Level Cybersecurity Certifications Compared
In today’s digital world, cybersecurity is no longer just a technical concern; it’s a critical business priority. With cyber threats evolving rapidly, organizations of all sizes are seeking skilled professionals to protect their digital assets. For those looking to break into the cybersecurity field, earning a certification is a great way to validate your skills… Read More »
The Evolving Role of ITIL: What’s New in ITIL 4 Managing Professional Transition Exam?
If you’ve been in the IT service management (ITSM) world for a while, you’ve probably heard of ITIL – the framework that’s been guiding IT professionals in delivering high-quality services for decades. The Information Technology Infrastructure Library (ITIL) has evolved significantly over the years, and its latest iteration, ITIL 4, marks a substantial shift in… Read More »
SASE and Zero Trust: How New Security Architectures are Shaping Cisco’s CyberOps Certification
As cybersecurity threats become increasingly sophisticated and pervasive, traditional security models are proving inadequate for today’s complex digital environments. To address these challenges, modern security frameworks such as SASE (Secure Access Service Edge) and Zero Trust are revolutionizing how organizations protect their networks and data. Recognizing the shift towards these advanced security architectures, Cisco has… Read More »
CompTIA’s CASP+ (CAS-004) Gets Tougher: What’s New in Advanced Security Practitioner Certification?
The cybersecurity landscape is constantly evolving, and with it, the certifications that validate the expertise of security professionals must adapt to address new challenges and technologies. CompTIA’s CASP+ (CompTIA Advanced Security Practitioner) certification has long been a hallmark of advanced knowledge in cybersecurity, distinguishing those who are capable of designing, implementing, and managing enterprise-level security… Read More »
Azure DevOps Engineer Expert Certification: What’s Changed in the New AZ-400 Exam Blueprint?
The cloud landscape is evolving at a breakneck pace, and with it, the certifications that validate an IT professional’s skills. One such certification is the Microsoft Certified: DevOps Engineer Expert, which is validated through the AZ-400 exam. This exam has undergone significant changes to reflect the latest trends, tools, and methodologies in the DevOps world.… Read More »