Amazon AWS Certified SysOps Administrator Associate – S3 Storage and Data Management – For SysOps Part 4
10. S3 Inventory
So let’s have a look at Sree inventory. So the idea is that using an Sree inventory operation you can list all the objects as well as their corresponding metadata in your SRE bucket, which is a better way than using an S Three list API operation to list all your objects and get all the associated metadata. So some usage examples for Sri inventory include creating an audit and reports on the replication and the encryption status of all your objects.
So using this for example, you can identify which objects are not encrypted, you can get the number of objects in an S Three buckets or you can identify the total storage of all the previous objects versions of your bucket. Because S Three inventory can list all object versions. So the output files is CSV or SV in Apache Parquet and this inventory can be generated on a daily or weekly basis. All data can be queried than using the famous tools you have in AOS such as Amazon, Athena, Redshift, Presto Hive and Spark. Including that Presto, Hive and Spark are not within AOS that can be from outside.
And you can generate a filtered report using S Three select and use that report into the S Three batch capability. The use cases for inventory are going to be business compliance and regulatory needs. So let’s go and hands on to see how it works. So let’s set up Sree inventory. And for this I’m going to create a bucket and the bucket name is going to be Stefan Inventory.
Stefan Demo, something like this and then I will go ahead and just create this bucket. So my bucket is not created and I can view the details. And now I need to set up inventory on another bucket of mine. Then I picked one of my buckets. So demos defined as for Bucket 2020, which has some data in it. So that’s all that matters to me. And I’m going to click on Management and then scroll down. And at the very bottom I have inventory configurations. So I can create an inventory configuration right here.
I’ll call it demo inventory. You can set up a prefix if you want to apply to a second prefix, but we won’t. And this will include current version only, but you can include all object versions if you wanted to. And then the report details should go in this account. And then you need to specify a destination for it. So we can go into my inventory demonstrate which I just created from before, and then there is going to be a following statement that would be added to this destination. Bucket Policy to allow Amazon industry to place data in that bucket. So here is the bucket policy that will be applied.
Then what do we want the frequency to be? So daily and the first report will be delivered within 48 hours or weekly as well as the in the put format. So CSV ORC or parquet. And CSV is the format you need to choose if you plan to use Sree batch operations or if you want to analyze the S Three inventory using Excel. If you use RSV and park, then you can, for example, analyze it using a tool like Athena. So this configuration is going to be enabled. Okay. And we’ll select CSV whether or not we want service that encryption. So I can disable it for now.
And if you want additional fields for the reports, you can have size, last modify, storage class, e tag, multiple airplane replication status, encryption bucket status, then whether or not they’re intelligent gearing, and the all object log configurations. So we’re good to go. So this bucket has not been found in the EU Swan region. So I need to choose to recreate that bucket, but in the same region that I’m in right now. So let’s go ahead and do this. So I’ll take this to this inventory bucket. And it’s always good to have errors, so that’s why I always keep them on video. So we’ll delete it, and as you can see, it shows that inventory has to be done from within the same region. So I will recreate this, but this time in EUs One, and then create my bucket. And this is not going to work right now because the old bucket is being deleted. So let me pause for a second. Or I will just rename it demo two, and this should work just fine. My bucket is now created, and I will just add demo two in here and we should be good to go. Let’s create this. And now my inventory configuration is created.
So what I need to do now is to wait a little bit for this inventory to be active. And this could take up to 48 hours. So we’ll get back to you in 48 hours. Okay, so I’ve left the inventory on for a few days now, and if you go into the demo adventure bucket, you can see that a folder has been created for every day. So if I take any of these days, as you can see, we get a manifest checksum and a manifest JSON. And this contains metadata around the data itself. So if we look at the manifest JSON file and we download it, that file manifest JSON is going to show us the source bucket, the destination bucket, the version of the manifest file, the file format, which is CSV, and the file scheme match.
So all the columns that are belonging into my CSV file, as well as the files that correspond to this manifest. As you can see, one file is in it. This is this one right here. Demonstrator data. So Demonventorydata. And then this key right here. So let’s go ahead and find that file right now. So back in my folder, I go back to the demo inventory and then scroll down, find data. And in here, I found all the files that correspond to every single day. So July 4 is this one. So I will take this file, but they should all be the same because my Bucket content has not changed, and I’m going to save this file and extract it. And so if we have a look at the content of this CSG file, we can find the first column corresponds to the Bucket name. The second column corresponds to the files in it.
The third column is going to correspond to a version ID of the files. Then we get some triggered files, information around encryption, I guess some file size, dates and so on, as well as the last column, which is going to be, sorry, the type of encryption right here that we have. Okay, so this file contains really all the metadata around all s three files, which is very helpful if you want to get a list or if you want to get an encryption status and so on. And when you are done with the hands on, please make sure to go to management and then to delete or disable this inventory configuration, because otherwise you will keep on running every single day. So that’s it for this lecture. I hope you liked it, and I will see you in the next lecture.
11. [SAA/DVA] S3 Storage Classes + Glacier
Into the exam, you need to know about all the S Three storage classes and understand which one is the most adapted to which use case. So in this lecture, which is going to be quite long, I want to describe to you all the different storage classes. The first one is the one we’ve been using so far, which is Amazon S Three standard which is for general purpose, but there are some more optimized one depending on your workload. The first one is S three in frequent to access or IA. Also called S three. IA. So this one is when your files are going to be infrequently accessed and we’ll have a deep dive by the way, on all of them. There’s going to be S Three one zone IA when we can recreate data. There’s going to be S Three Intelligent Tiering which is going to move data between your storage classes. Intelligently amazon glacier for archives and Amazon glacier deep archives for the archives you don’t need right away. Finally, there is one last class called Amazon S Three Reduced Redundancy Storage which is deprecated and therefore I will not be describing it in details through this lesson. Okay, so s three tendered general purpose. We have very high derivability, it’s called Eleven Nine. So 99. 9 of objects across multiple AZ.
So if you store 10 million objects with Amazon S Three general purpose, you can on average expect to incur a loss of a single object once every 10,000 years. Bottom line is you should not lose any objects on S Three standard. There’s a 99. 99 availability percentage over a given year. And all these numbers by the way, you don’t have to remember, they’re just indicative to give you some knowledge. You don’t need to remember exactly the numbers going into the exam just until you understand the general idea about a storage class and it can sustain two concurrent facility failures, so it’s really resistant to AZ disasters. The use cases for general purpose is going to be big data analytics, mobile and gaming applications, content distribution.
This is basically anything we’ve been using so far. Now we have S Three standard infrequent Access or IA, and this is suitable for data as the name indicates that it is frequently less frequently accessed, but requires a rapid access when needed. So we get the same durability across multiple AZ, but we have one nine less availability and it is lower cost compared to Amazon’s free standard. The idea is that if you access your object less, you won’t need to pay as much, it can sustain two concurrent facility failures and the use cases for this is going to be a data store for disaster recovery, backups or any files that you expect to access way less frequently. Now we have S 31 zone IA or infrequent Access and this is the same as IA.
But now the data is stored in a single availability zone before it was stored in multiple availability Zone, which allowed us to make sure the data was still available in case an AZ went down. So we have the same durability within the single AZ, but if that AZ is somewhat destroyed, so imagine an explosion or something like this, then you would lose your data, you have less availability. So 99. 5% Availability and you have still the low, latency and high throughput performance you would expect from Asteroid. It’s lower cost compared to it supports SSL for all the encryption, and it’s going to be lower cost compared to infrequent access by about 20%. So the use case for One Zone, IA is going to be to store secondary backup copies of on premise data, or storing any type of data we can recreate.
So what type of data can we recreate? Well, for example, we can recreate thumbnails from an image, so we can store the image on s three general purpose, and we can store the thumbnail on s three One Zone infrequent Access. And if we need to recreate that thumbnail over time, we can easily do that from the main image. Then we have S Three Intelligent tiering, and it has the same little NC and high throughput as s three standard. But there is a small monthly monitoring fee and two tiering fee. And what this will do is that it will automatically move objects between the access tiers based on the access patterns. So it will move objects between s three general purpose, s three IA.
And so it will choose for you if your object is less frequently accessed or not. And you’re going to pay a fee from S three to do that level of monitoring. So the durability is the same, it’s not eleven nine s, and it’s designed for 99. 9 Availability, and it can resist an entire event that impacts an Availability Zone. So it’s available. Okay, so that’s for the general purpose, s three storage tiers. And then we have Amazon Glacier. So Glacier is going to be more around archive. Glacier is cold, so think cold Archive. It’s a low cost object storage meant really for archiving and in backups.
And the data needs to be retained for a very long time. So we’re talking about tens of years to retain the data. In Glacier, it’s a big alternative to on premise magnetic type storage, where you would store data on magnetic types and put these types away. And so if you wanted to retrieve the data from these types, you would have to find the type manually, put it somewhere, and then restore the data from it. So we have still the eleven nines of durability, so we don’t lose objects. And the cost per storage is really, really low.
$0. 4 per gigabyte, plus a retrieval cost, and we’ll see that cost in a second. So each item in Glacier is not called an object, it’s called an archive. And each archive can be a file up to 40 terabytes. And archives are stored not in buckets, they’re stored in vaults. Okay, but this is a very similar concept. So we have two tiers within Amazon Glacier we need to know about. The first one is Amazon Glacier, the basic ones, and we have three retrieval options and they’re very important to understand expedited, which is one to five minutes. So you request your file and between one to five minutes you will get it back standard, which is three to 5 hours. So you wait a much longer time.
And bulk when you require multiple files to have at the same time, which takes between five to 12 hours to give you back your files. So as we can see here, Amazon Glacier is really to retrieve files and not have some kind of urgency around it. If you’re very in a rush, you can go and use expedited, but it’s going to be a lot more expensive than using standard or bulk. And the minimum storage duration for Glacier is going to be 90 days. So again, files that are going to be English here are there for the longer term. And we have an even deeper storage tier for Glacier called Deep Archive. And this is for super long term storage and it’s going to be even cheaper. But this time the retrieval options are standard 12 hours. So you cannot retrieve a file in less than 12 hours. And bulk, if you have multiple files and you can wait up to 48 hours, it’s going to be even cheaper.
So Deep Archivability is going to be for files that you really don’t need to retrieve urgently, even if it’s archived. And the minimum storage duration for Deep Archive is going to be 180 days. Now, you have to remember these numbers at a high level, because going into the exam, there will be questions asking you to understand which point to pick between glacier and glacier deep archive. And for example, if the storage file is going to be less than 180 days and you have to use glacier if you need to retrieve the file very, very quickly, between three to 5 hours is going to be glacier. But if it’s going to be a file to be retrieved in 72 hours and it’s going to stay for one year in your vault in Glacier, then maybe Deep Archive is going to provide you with the best cost savings.
So let’s compare everything that we’ve seen. We’ve seen s for standard intelligent tiering standard IA 100. And I a glacier and glacier deep archive. So for Durability they’re all eleven nine. So that means you don’t lose any objects for availability, while the ones that we look and look at is S three IA. Because it’s infrequently accessed, we have a little bit less availability. And if it’s one zone IA, then it’s going to be even less availability because we only have one availability zone. So that makes sense for the SLA. This is what Amazon will guarantee you to reimburse you. And it’s not something to know about, but I’ll just put it in this chart in case you need it in real life. Now, the number of AZ your data is stored onto is going to be three everywhere except in one zone IA, because as the name indicates, it’s only for one zone, so you’re going to have one. Then there is a minimum capacity charge per object. So when you have the normal S three or Intelligent Tiering, you’re fine. But when you’re using IA, you need to have a large object, or rather large object of 128 kilbytes. And for Glacier 40, minimum storage duration is going to be 30 days for Standard IA and 30 days for Pro one IA and for Glacier, 90 days. For glacier. D Archive. 180 days. And then finally, is there a retrieval fee for the first two?
No, there is not. But when you have standard IA because it’s infrequently accessed, then you’re going to be charged a fee anytime you retrieve the data. And then for Glacier, glacier again, there’s going to be a fee based on the number of gigabytes you receive and the speed you want to receive at. So you don’t need to know all the numbers in it, but the numbers should make sense from what the storage tier really, really means to you. And for those who like numbers, here’s just a chart that I have you can look up on your own time. But what it shows is that the cost of Sree standard is zero point 23, which is high. And if we go all the way to the right to Glacier, we have Deep Archive, we have 0. 000 99 per gigabytes per month, which is a lot cheaper.
And so if you want the data fast enough for Intelligent Tiering is going to be between zero point 23 and 0125. Standard IA is going to be that number and ones on it is going to be even cheaper and so on. And it shows also the retrieval cost. So if we want an expedited retrieval from Glacier, it’s going to cost us $10 per 1000 requests, whereas if we use standard or bulk, it’s going to cost us a lot less. Same for a glacier deep archive. Okay, so that’s it. And finally, for estra Intelligent tearing, there is a cost to monitor objects because it’s going to be able to move them between estra Standard and standard IA on demand. And so the cost is quite small, but it’s zero point 25 per 1000 objects monitored per month. Okay, well, that’s it. Let’s go in the hands on to see how we can use these tiers.
Interesting posts
The Growing Demand for IT Certifications in the Fintech Industry
The fintech industry is experiencing an unprecedented boom, driven by the relentless pace of technological innovation and the increasing integration of financial services with digital platforms. As the lines between finance and technology blur, the need for highly skilled professionals who can navigate both worlds is greater than ever. One of the most effective ways… Read More »
CompTIA Security+ vs. CEH: Entry-Level Cybersecurity Certifications Compared
In today’s digital world, cybersecurity is no longer just a technical concern; it’s a critical business priority. With cyber threats evolving rapidly, organizations of all sizes are seeking skilled professionals to protect their digital assets. For those looking to break into the cybersecurity field, earning a certification is a great way to validate your skills… Read More »
The Evolving Role of ITIL: What’s New in ITIL 4 Managing Professional Transition Exam?
If you’ve been in the IT service management (ITSM) world for a while, you’ve probably heard of ITIL – the framework that’s been guiding IT professionals in delivering high-quality services for decades. The Information Technology Infrastructure Library (ITIL) has evolved significantly over the years, and its latest iteration, ITIL 4, marks a substantial shift in… Read More »
SASE and Zero Trust: How New Security Architectures are Shaping Cisco’s CyberOps Certification
As cybersecurity threats become increasingly sophisticated and pervasive, traditional security models are proving inadequate for today’s complex digital environments. To address these challenges, modern security frameworks such as SASE (Secure Access Service Edge) and Zero Trust are revolutionizing how organizations protect their networks and data. Recognizing the shift towards these advanced security architectures, Cisco has… Read More »
CompTIA’s CASP+ (CAS-004) Gets Tougher: What’s New in Advanced Security Practitioner Certification?
The cybersecurity landscape is constantly evolving, and with it, the certifications that validate the expertise of security professionals must adapt to address new challenges and technologies. CompTIA’s CASP+ (CompTIA Advanced Security Practitioner) certification has long been a hallmark of advanced knowledge in cybersecurity, distinguishing those who are capable of designing, implementing, and managing enterprise-level security… Read More »
Azure DevOps Engineer Expert Certification: What’s Changed in the New AZ-400 Exam Blueprint?
The cloud landscape is evolving at a breakneck pace, and with it, the certifications that validate an IT professional’s skills. One such certification is the Microsoft Certified: DevOps Engineer Expert, which is validated through the AZ-400 exam. This exam has undergone significant changes to reflect the latest trends, tools, and methodologies in the DevOps world.… Read More »