Amazon AWS Certified SysOps Administrator Associate – S3 Storage and Data Management – For SysOps Part 10

  • By
  • June 10, 2023
0 Comment

29. S3 Batch Operations

Now let’s talk about Sree batch operations. So they allow you to perform bulk operations on existing S three objects with a single request. That means that you can modify the object metadata and properties all at once for all objects in your Svcket. You can copy objects between S Three buckets. You can replace object tax sets, modify sels, restore objects directly from Sri Glacier invoke another function, to perform a custom action on each object, and so on. The creativity is really endless. A job will consist of a list of objects as well as what action you want to perform on each and every single object and optional parameters. The cool thing about S Three batch operations is that internally it will manage all the retries. It can track the progress of your batch operation.

It can send you completion notifications and generate reports. When it’s done, you can use the Sree inventory, for example, to list all the objects, and then S three select to filter your objects and pass this selection of filtered objects directly into the batch operation. So S Three inventory optionally, you filter it with Sree select. If you have a big inventory in response, for example, to retrieve all the unencrypted objects, okay, then you pass on the filter list to an S Three batch operation. You pass on your operation that you want, as well as the parameters and S Three batch, we’ll make sure to process all the objects you have passed to it, which I think is pretty cool. So I will see you right now for the hands on.

30. S3 Batch Operations Hands On

Okay, so we are going to practice using Amazon S Three batch operations, which is on the left hand side. But first, we need to create a bucket. So to create a bucket, I’ll just call it s three batch demo. Stefan and I will create in this region. Then I will go ahead and click on Create buckets. Okay, this is good. And next, I’m going to just create another bucket for the reporting.

So this one will be called S three Batch reporting. Stefan and we’re good to go. Okay, so we have two buckets now being created, and I’m going to open them in new tabs one and two. And next we’re going to create a batch operation. Next, what I’m going to do is I’m going to upload a few files into my S Three buckets, and I’m going to make sure that I upload them being not encrypted. So I’ll upload my index error, HTML, Coffee, and beach JPEG.

So four files are going to be uploaded in this bucket, and I’m going to perform the upload right now. Okay, so the upload has succeeded. And if we look at one of these files, for example, beach JPEG, and we look at the encryption, as we can see, there is no encryption applied at all to this file. So the goal is going to be to do a batch operation on S Three to encrypt all these four files. Okay? So to do so, I need to set up a batch operation. Now, just know that a batch operation will cost you about $0. 25. It is not free to run. So if you don’t want to incur any cost, please do not do what I’m doing. But I’m going to show you how to create a job right now. So create a job, and you have to choose region, which is a region where your files are. So this is the same as the region of this bucket.

So we’re good to go. Next, we need to choose a manifest. So a manifest is a way to reference your files in your buckets, okay? And to tell them which ones are going to be considered by the S Three batch operation. Now, there are two ways of doing it. You can use an S Three inventory report, and by enabling industry and inventory, you can get a manifest adjacent file. And by referencing the manifest adjacent file, you’re good to go. Or you can create your own CSV to have two or three columns in the following order. So bucket name, object, key, and optionally version ID. Now, to keep things simple, I’m going to create a CSV. Okay? But the more industrial way of doing things is obviously to use a manifest file to use an S Three inventory report.

Okay? So let’s go ahead and create the CSV. So I’m going to create a file named Sfree Batch CSV, and it has to contain two columns. It has to contain the bucket name and the object key. So super easy. The bucket name is what I have right here. So it’s s three batch demo, Stefan. And then the object key is just the one right here. So I will have sree demostifan and then beach JPEG. And then the other files are coffee JPEG. So coffee JPEG. Then we have index of HTML and error HTML.

So here we go. I have now referenced all my files within my batch. Perfect. So next what I’m going to do is that I’m going to upload this CSV manifest, okay, directly in my bucket as well. So let’s upload this file, add a file and then I will navigate to it. So my sree batch CSV is right here. I will upload it. It’s successful. It’s right here. So now I can find the manifest objects by giving the entire format. So sree and then the bucket name. So this one right here and then sree batch CSV.

Okay, so you can click on view to see if that works. Yes, my file is being found. Okay, next I’m going to click on next, and then I’m going to choose an operation type. Okay, so we have copy invoke lambda function, replace all object tags, delete all object tags, replace access control list restore. If you want to restore some password glitch here, object lock retention or object lock legal hold.

So I’m going to do a copy operation to do a copy operation on all objects listed in the manifest. And the copy destination could be a new bucket, for example. So it must be in the EU Frankfurt region. But you can use the same bucket that I had from before. So we can, for example, use the same buckets. Okay. And we’re going to say Sri and then the bucket name slash. And then we’re going to do slash encrypted. Okay? And if you add a slash, the prefix is going to appear as an extra folder in the S three console.

Now, if we do have the buckets and we don’t enable versioning, okay, in that case, what’s going to happen is that the existing objects will be overwritten. So if I didn’t have the slash encrypted folder, then it would have encrypted my objects in place, so it would have replaced my objects in here. So it’s a warning that we know about, but we’re good to go. So we’ll say, yes, I acknowledge this and this is good. But if you wanted to encrypt the objects in place, you should enable versioning to really make sure that you’re not overwriting your own objects. Okay, next storage class. So on top of copying the objects somewhere, we can specify a different storage class, but I’ll keep it as standard. And here we can enable server side encryption. So we will enable it using Amazon S three key.

So amazon s three Sserate okay, we are going to copy the existing tags, copy the existing metadata so we can see a lot of options and we can change the ACLs, but for now, we will not do it. And then click on next. Okay, so this is the description of this operation. This is the priority. And then we want to generate a completion report for all tasks. And this completion report should go into a bucket.

It’s going to be S Three batch reporting. Stefan so we’re doing this. So s three batch reporting. Stefan okay. And then I can just have a folder named Reporting. Okay, next we need permissions. So this S Three batch job is going to have to operate on my files in this buckets to get them and then encrypt them. And it’s going to have to also send a report to this bucket. And so therefore, I need to create a im role for it. Now, there is an Im role template, an im trust policy that we can use to make sure that this is going to work. So from my experience, I will put this to be as permissive as possible right now because I’ve had issues in the past with this template. So what I’m going to do is just go into the im and I’m not applying the least privilege because this is not something I want to do right now. But this template right here starts to give you the lead on to how to do at least privilege.

But I want to make sure that my operations are going to work, so I’m going to give myself a lot of privilege. So this is going to be a trusted entity of type S Three. So this one is S Three and this is an S Three batch operation. And by specifying this, we are effectively applying this im trust policy. Next, we need to create a policy for it in JSON form and paste this one. So let’s have a look at how we can maybe modify this. So we have an S Three put object and so on for this bucket. And so I will just have an S Three star in here just to allow anything to be done on my S Three batch demo bucket by this role. Okay. And here, this source bucket in here should be the same as the S Three batch demo. So I do not need this block anymore. And this is to get the object svatch CSV.

So this is good enough. I will keep it even though this is again covered by this statement right here. That is very permissive. And then finally for the reporting to put objects and get bucket location of this, this is good enough as well. Click on next tags next. And the policy is going to be Sfree batchdemo. Stefan okay, we create this policy and now we need to apply this policy in here. So I’ll have an S Three batch and then I will refresh this. And the S Three batch demos define is right here. So I click on next tags next review and I call it demo. S three batch role. Create this role and now this role is created. I can probably look for it in here. So I will choose from an existing Im role and I will look for Sri batch demo role. Here we go. Next. And so now we can review the job. So this is the path to the manifest file to operate on a few files. And then this is the operation we’re going to do is to copy operation into this encrypted folder. So we’re not doing in place, we’re moving it to a new folder.

And we have server side encryption that’s going to be enabled for my files. So remember, this is $0. 25 if you enable this. So just so you know, so I have created a job right here and as you can see, it’s in Status New. And what we need to do is click on it and we need to validate this job. So I will show you what I mean right now. So if you refresh, you see the status is awaiting your confirmation to run.

So once we’ve enabled it, it says that it has found four objects in the manifest and so it needs to get our confirmation to go ahead. So I will scroll all the way down oops, I will go up and click on Run job. Now the job is going to run and the encrypt operation is going to happen for my profiles. So we get a status and we get a percentage complete and the percentage succeeded and a percentage fail. So we can really track the progress of our S Three batch job. Okay, but the cool thing about S Three batch in here is that it is going to give us a lot of information around what is happening on our files.

So somehow it is not being run. So let’s do runJob. I need to scroll all the way down now. Here we go. And then click on Run job. This is why I wanted to scroll all the way down. Okay, so now the job is ready and I will wait until it is done. So I just refreshed my page and the status is completing and percentage complete is 100%. Total succeeded rate is four and total failed rate is zero. So very cool. If I go to my bucket now and refresh this, as we can see there is an encrypted folder right here available to us and within it we can find our four files.

Okay? But if we click on one of these files and we have a look at the encryption status, as we can see, service that encryption is enabled for this file with Amazon SSC, S Three encryption, which is really, really nice, which is what we wanted. And if we go into the batch reporting bucket and we refresh this, we can find a reporting folder with the job ID. And in it we have a manifest file and a result that JSON.

And this result sorry, that CSV is going to give us information around all the files that have been processed and if it was successful or not. And so, as you can see, by opening the CSV file, what we get is the bucket name, the file name, then whether or not it succeeded, the status code and successful. And in case you had some error messages, they would show up in here as well. So that’s it for Sfree batches. As we can see, we can find all the jobs here and it is completed. So hopefully you see the power of it, especially when you start running Sree. Batch on a lot of files at the time. Okay, so I hope you’ll like this lecture and I will see you in the next lecture.

31. S3 Multi Part Upload Deep Dive

Okay, so now let’s do a little bit of a deeper dive into how the multipart upload works. So this is allowing you to update a large object in parts in any order. And it’s recommended when you have a file that is over 100 megabytes and it must be used for the files that are over 5GB. This helps you paralyze the upload. So this helps you speed up the transfers and also be safer because in case of a failure of one part you can retry it. So the Mac maximum amount of parts you can have in your upload is 10,000.

And the idea is that you have in your Amazon Sree bucket and your big file, you split it into parts and it could be up to 10,000 parts. Then you upload all these parts in parallel so you can speed up by maximizing your network bandwidth. But you can also retry a specific part upload in case it has failed. Then your files are uploaded into Amazon Street and then once they’re all there, you can finish the upload by doing a complete request. And this complete request is going to concatenate all the parts and all the files into your bigger file back into Amazon is free.

So with the failures you can restart uploading only the failed parts which will improve performance and improve the amount of time you have to do retries. And then if you want to automate delete the old parts, you can use a lifecycle policy to delete the unfinished uploads after X number of days. For example, in case you had a network outage or an application shutdown or that kind of stuff. You can upload using the CLI or the SDK to take advantage of the multipart upload. So if you go into management, I just want to show you the lifecycle rule that is applicable. So you can create a lifecycle rule and I call it demo multipart and you can apply it to all objects in the bucket. That’s fine.

And then the lifecycle rule action would be to delete expired delete markers or incomplete multipart upload. And as you can see, you can have a lifecycle rule to deal with incomplete multipart uploads. And now you just need to say hey, I want to delete incomplete multi part uploads after three days to clean up anything you want and this can come up in the exam. Then you create this rule and you’re good to go. So that’s it. I hope you like this lecture and I will see you in the next lecture.

Comments
* The most recent comment are at the top

Interesting posts

The Growing Demand for IT Certifications in the Fintech Industry

The fintech industry is experiencing an unprecedented boom, driven by the relentless pace of technological innovation and the increasing integration of financial services with digital platforms. As the lines between finance and technology blur, the need for highly skilled professionals who can navigate both worlds is greater than ever. One of the most effective ways… Read More »

CompTIA Security+ vs. CEH: Entry-Level Cybersecurity Certifications Compared

In today’s digital world, cybersecurity is no longer just a technical concern; it’s a critical business priority. With cyber threats evolving rapidly, organizations of all sizes are seeking skilled professionals to protect their digital assets. For those looking to break into the cybersecurity field, earning a certification is a great way to validate your skills… Read More »

The Evolving Role of ITIL: What’s New in ITIL 4 Managing Professional Transition Exam?

If you’ve been in the IT service management (ITSM) world for a while, you’ve probably heard of ITIL – the framework that’s been guiding IT professionals in delivering high-quality services for decades. The Information Technology Infrastructure Library (ITIL) has evolved significantly over the years, and its latest iteration, ITIL 4, marks a substantial shift in… Read More »

SASE and Zero Trust: How New Security Architectures are Shaping Cisco’s CyberOps Certification

As cybersecurity threats become increasingly sophisticated and pervasive, traditional security models are proving inadequate for today’s complex digital environments. To address these challenges, modern security frameworks such as SASE (Secure Access Service Edge) and Zero Trust are revolutionizing how organizations protect their networks and data. Recognizing the shift towards these advanced security architectures, Cisco has… Read More »

CompTIA’s CASP+ (CAS-004) Gets Tougher: What’s New in Advanced Security Practitioner Certification?

The cybersecurity landscape is constantly evolving, and with it, the certifications that validate the expertise of security professionals must adapt to address new challenges and technologies. CompTIA’s CASP+ (CompTIA Advanced Security Practitioner) certification has long been a hallmark of advanced knowledge in cybersecurity, distinguishing those who are capable of designing, implementing, and managing enterprise-level security… Read More »

Azure DevOps Engineer Expert Certification: What’s Changed in the New AZ-400 Exam Blueprint?

The cloud landscape is evolving at a breakneck pace, and with it, the certifications that validate an IT professional’s skills. One such certification is the Microsoft Certified: DevOps Engineer Expert, which is validated through the AZ-400 exam. This exam has undergone significant changes to reflect the latest trends, tools, and methodologies in the DevOps world.… Read More »

img