Amazon AWS Certified Machine Learning Specialty – ML Implementation and Operations Part 3

  • By
  • January 25, 2023
0 Comment

6. SageMaker Resource Management: Instance Types and Spot Training

Let’s delve into the world of resource management with Sage Maker making sure that you’re using just the right amount of computing power for your given algorithms. So we covered a lot of this under the modeling domain section because it just made sense to cover this all together at that time. Although technically, choosing what kind of instance type that you’re going to use for a specific algorithm is really an operations concern. So that’s part of why this specific domain is such a short section in this course. We covered a lot of it already under modeling.

When we talked about for a given algorithm like blazing text or object detection or what have you, these are the specific instance types you should use. It didn’t really make sense to split that up, but there are some general rules of thumb you can keep in mind for the general exam. So generally speaking, algorithms that rely on deep learning will benefit from GPU instances. And GPU instance types include p two and P three types for training. So if you have an algorithm that you know is using deep learning, like blazing text or deep AR, odds are that if you’re asked, what kind of an instance should I choose for training this algorithm, a GPU one would be a good choice. However, for inference, that’s usually less demanding if you just have a pretrained model.

So even if you have a neural network, running through that neural network once isn’t really that demanding of a task. So you can often get away with compute instances there, no matter what kind of an algorithm it is. So something like a C four or a C five is often appropriate to start with if you’re using a deep learning algorithm and deploying that, it will be faster if you have a GPU instance. So if speed is all you care about, a GPU instance would be beneficial there. But if you’re more concerned with saving money, a compute instance might be enough, because GPU instances can be really pricey. So that’s the general guidance for training. If you know it’s a deep learning algorithm, GPU is probably your best bet, otherwise you could get away with a CPU. For inference, usually a computing instance will work in most cases, but for deep learning algorithms, a GPU can help. And there are quirks of different specific algorithms where sometimes things that you expect would not benefit from a GPU do, or vice versa. Again, you’ll want to review the individual built in algorithms within Sage Maker for specific guidance on that. Let’s also talk about managed spot training.

One cool thing you can do to reduce your cost is using EC two spot instances for your training, and this can actually save up to 90% over using on demand instances. So if you really care about saving money and you have a very large and expensive training job, maybe you’re training a machine translation model, for example, that can be a very expensive thing to do. You could use spot instances to do that training and save a lot of money, potentially. The catch, of course there is a catch, is that your spot instances can be interrupted at any time. So you need to make sure you’re using checkpoints to s three so that your training can pick up where it left off in case your training instances are in fact interrupted in that case. So that does make things a little bit more complicated, but it can save a lot of money. So it could be time well spent for particularly involved training jobs. And also this can come at the cost of increasing your training time as well.

So if you need to wait for a spot instance to become available and that could be a very long wait, right? You have to sit around and wait for that resource to become available before your training can actually commence. But if what you care about the most is saving money, this can be a way to save a lot of money at the expense of using a little bit more complexity and setting up s three checkpoints and at the expense of time, because you have to wait for those resources, for those spot instances to become available.

7. SageMaker Resource Management: Elastic Inference, Automatic Scaling, AZ’s

Another way to save money with Sage maker is by using elastic inference, and this is something you definitely need to know about. So what it is, is a way of accelerating deep learning algorithms at the inference stage. When you’re actually vending back results from your deep learning model, you can use elastic inference to accelerate the performance of the inference there at a fraction of the cost of actually using a dedicated GPU instance for the inference. So the way that it works is that you add an elastic inference accelerator machine type alongside a CPU instance. So basically, when you’re deploying your inference model to the world, you will deploy that to a CPU instance, but also specify an elastic inference accelerator alongside with it. And the cost of that CPU instance together with an EI accelerator is going to be much less than actually deploying a GPU instance, but it will perform pretty well. Still, there are different specific instance types that you can use for that. They’re called ML EIA one medium or large or x or large, depending on your needs, and obviously they cost more depending on how big they are, but still a lot cheaper than, say, a p three x large, right? You can also apply EI accelerators to notebooks if you want to accelerate them as well.

So if you just want to have a speedier experience and testing things out within your notebook, you can attach an accelerator to that as well to speed that up. However, remember, elastic inference only works with deep learning frameworks, so it’s only going to work with TensorFlow or MXNet prebuilt containers. You can use a framework called onyx ONNX to export existing models to MXNet and make them compatible with elastic inference as well. So you have a little bit more flexibility there than you might think. You can also use it with custom containers built with EI enabled TensorFlow or MXNet. So there are specific TensorFlow and MX net libraries that Sage maker makes available to you that are compatible with elastic inference.

So as long as you’re developing your custom code using those packages, you can actually use that with elastic inference, which again can save a lot of money and gain you a lot of performance at inference time. Also, it works with the image classification and object detection built in algorithms as well. So if you’re just using the out of the box image classification or object detection algorithms in Sage maker, you can just attach elastic inference to that and it will just work. And also you can use automatic scaling.

This is really cool. So within Sage Maker, when you’re deploying stuff to production, deploying your inference model. You can set up a scaling policy and define target metrics that you care about, the minimum and maximum capacity that you want to allocate to this service, cool down periods and things like that. And it will automatically add more or fewer inference nodes to your deployment as needed. So obviously that can save a lot of money too. It works alongside Cloud Watch to monitor the performance of your inference nodes and scale them as needed. It will dynamically adjust the number of instances for a production variant based on that data. So it’s not just looking at the entire model as a whole, it’s looking at individual production variants.

So as you’re changing the amount of traffic that goes to different production variants at Runtime, it will work alongside that information and make sure that it’s actually scaling up each variant accordingly. Now, one good best practice here is to load test your configuration before you actually use it. So you want to make sure that whatever scaling policy you set up actually works the way you expect. Otherwise, very bad things could happen. If you don’t get that scaling policy right, you could end up not having enough capacity or having way more than you need. So you want to make sure that it’s working good in a test environment before you actually deploy automatic scaling into production. One last note surrounding resource usage is the use of Availability Zones in Sage Maker.

It’s good to know that Sage Maker will automatically attempt to distribute your instances across different Availability Zones for better resiliency. But obviously you’re going to need more than one instance for that to work. So I can’t distribute your inference across multiple Availability Zones if I only have one instance to work with. That’s a good reason to deploy multiple instances for every production endpoint, even if maybe you only need one of them. By having more than one instance, I can spread that out and make your application as a whole much more resilient to failure. And also, you should make sure that your VPCs, if you’re using any custom VPCs with Sage Maker, they should be configured with at least two subnets, each of them in a different Availability Zone. So that way we can make sure that things are being deployed across different Availability Zones. So if there’s a catastrophic failure in one place, your application will keep on running regardless.

8. SageMaker Inference Pipelines

And let’s talk about inference pipelines. Again, this is the sort of thing where you just have to understand what it is and what it does for the purpose of the exam. So we’ve talked about deploying a docker image to an inference node, right? So by default, you’re just going to take a prepackaged image from Docker that’s an ECR, and use that to work with your train model to do inferences at runtime. However, you can also use more than one container and string them together using inference pipelines. So you can have any combination of pretrained built in algorithms or your own algorithms that are hosted in docker containers and hook them all together. And you can have a linear sequence of between two and five of these containers that work in concert. So you could imagine combining your preprocessing, your predictions, and post processing of those predictions, all in different containers that are chained together in an inference pipeline.

You can also use containers from Spark ML or from ScikitLearn as part of that pipeline as well. With Spark ML, you can run that with glue or EMR. And those Spark ML containers will be serialized into M Leap format, which is just a little piece of trivia for you there. And you can use that for both handling real time inference and for doing batch transform. So your inference pipelines can be applied to either mode. We’ve not talked a lot about batch transforms. We seem to be focusing on real time inference because that’s the more complicated case. But inference pipelines can work with either usage mode if you’re doing real time inference through a web service or batch transforms of large amounts of data that you want to make inferences from all at once. And that’s inference pipelines in a nutshell, it’s just a way of chaining together multiple inference containers into one pipeline of results.

Comments
* The most recent comment are at the top

Interesting posts

The Growing Demand for IT Certifications in the Fintech Industry

The fintech industry is experiencing an unprecedented boom, driven by the relentless pace of technological innovation and the increasing integration of financial services with digital platforms. As the lines between finance and technology blur, the need for highly skilled professionals who can navigate both worlds is greater than ever. One of the most effective ways… Read More »

CompTIA Security+ vs. CEH: Entry-Level Cybersecurity Certifications Compared

In today’s digital world, cybersecurity is no longer just a technical concern; it’s a critical business priority. With cyber threats evolving rapidly, organizations of all sizes are seeking skilled professionals to protect their digital assets. For those looking to break into the cybersecurity field, earning a certification is a great way to validate your skills… Read More »

The Evolving Role of ITIL: What’s New in ITIL 4 Managing Professional Transition Exam?

If you’ve been in the IT service management (ITSM) world for a while, you’ve probably heard of ITIL – the framework that’s been guiding IT professionals in delivering high-quality services for decades. The Information Technology Infrastructure Library (ITIL) has evolved significantly over the years, and its latest iteration, ITIL 4, marks a substantial shift in… Read More »

SASE and Zero Trust: How New Security Architectures are Shaping Cisco’s CyberOps Certification

As cybersecurity threats become increasingly sophisticated and pervasive, traditional security models are proving inadequate for today’s complex digital environments. To address these challenges, modern security frameworks such as SASE (Secure Access Service Edge) and Zero Trust are revolutionizing how organizations protect their networks and data. Recognizing the shift towards these advanced security architectures, Cisco has… Read More »

CompTIA’s CASP+ (CAS-004) Gets Tougher: What’s New in Advanced Security Practitioner Certification?

The cybersecurity landscape is constantly evolving, and with it, the certifications that validate the expertise of security professionals must adapt to address new challenges and technologies. CompTIA’s CASP+ (CompTIA Advanced Security Practitioner) certification has long been a hallmark of advanced knowledge in cybersecurity, distinguishing those who are capable of designing, implementing, and managing enterprise-level security… Read More »

Azure DevOps Engineer Expert Certification: What’s Changed in the New AZ-400 Exam Blueprint?

The cloud landscape is evolving at a breakneck pace, and with it, the certifications that validate an IT professional’s skills. One such certification is the Microsoft Certified: DevOps Engineer Expert, which is validated through the AZ-400 exam. This exam has undergone significant changes to reflect the latest trends, tools, and methodologies in the DevOps world.… Read More »

img