Amazon AWS Certified Machine Learning Specialty – Modeling Part 8

  • By
  • January 25, 2023
0 Comment

20. Object Detection in SageMaker

Up next is a fun one object Detection everyone loves computer vision and I’m no exception. So it does what it says it does. It detects objects in an image. And not only does it detect what objects are in the image, it will also give you bounding boxes that tell you where those objects are. So this has obvious applications and things like self driving cars or any computer vision application really. It all uses a single deep neural network to do this.

And basically every class that it gives you back are accompanied by confidence score. So you can see just well, how sure is this thing that that’s actually a bowl there and a laptop and a chair and a bag. And you can either train this system from scratch using your own training data, maybe using something like Ground Truth, or you can use pretrained models based on the ImageNet. Pretrained model. Well, we’ll get there. But you can actually even extend that ImageNet model and actually training with additional data as well through transfer learning. The input that it expects are either record IO or image format.

So it can take straight up JPEGs or PNG image files if you want to, if you are supplying straight images. So you need to supply a JSON file together with that image that has the annotation data for that image while you’re training. So while you’re training, you need to not only give it an image, but also data about what objects are in that image and where they are. So that JSON file lets you say okay, here’s an image, here are the things in that image and here’s the bounding boxes that define where those things are and it can learn what those things look like. And once it’s done, take images that it hasn’t seen before and say okay, here’s what’s in it, here’s where they are.

So yeah, it just takes an image as input for inference and outputs all the instances of objects that are in the image with their associated categories and confidence scores. So an object in the image could have more than one category if it’s not sure about it, but you’ll be able to sort that by confidence scores and interpret that however you want to. Under the hood it’s using a convolutional neural network, no big surprise there, using something called Single shot Multibox detector or SSD, which is kind of cool. The CNN that it starts off with is either the VGG 16 or the ResNet 50 CNN model.

These are CNN topologies. So people have already figured out that work well that you can just reuse so you don’t have to form this CNN from scratch. The other cool thing that we touched on earlier was transfer learning mode or incremental training. So you can actually take one of those pretrained models for the base network using ImageNet that already knows about common objects in the world. And if you want, you can actually continue to train that further based on specific models that maybe you know about, that ImageNet doesn’t know about.

So the way that works is that you use that pretrained model for the base network weights instead of the random initial weights when you’re training. So you take the existing model with all the pretrained weights for all the objects that it knows about, and you just keep on training that with new types of objects and that actually refines the weights inside the model even further. So basically the idea there with transfer learning, you take an existing pretrained model and just keep on training it with new data. Pretty simple stuff. Internally, it’s using tricks like flipping and rescaling and jittering to avoid overfitting based on where specific objects are in a specific image. So that way it can generalize a little bit better to objects that might be oriented differently or facing a different way, or are in different parts of the image. And the hyper parameters are the usual ones you’d expect in a CNN or any deep learning system. The batch size, the learning rate, and your choice of optimizer, those are the main dials that will affect the performance of object detection. No big surprise that it will benefit from GPU instances, since we’re talking about CNNs here. They are very demanding algorithms that benefit greatly from GPUs.

And you can actually use multigpus on one machine and multimachines as well, so this scales up really nicely. You can even have like a fleet of multigpu machines if you want to, if you’re doing like a really intensive training job on this thing. But for inference, you know, you don’t need so much. You just need to basically run through that neural network once to figure out what comes out the other end. And for that you can use a C five or an M five if you want. If you need more performance, a P two or a P three would be appropriate as well.

21. Image Classification in SageMaker

Sticking with the computer vision theme, our next algorithm is image classification, which is a lot like object detection, but it’s a little bit simpler. So it will just assign labels to an image. So as opposed to object detection, which tells you bounding boxes of where the objects are in your image, it’s just telling you what’s in the image without really talking about where in the image it is. So if you give it this picture of a cat, it will say, well, that’s probably a cat. That’s it. And there might be other things in the image too, that it would pick up on. In this case, it’s not much more, except maybe a table. But that’s all image classification is. It’s telling you what objects are in this image. If you want to train one from scratch, it can take either MXNet Record IO format.

This is not protobuff wrapped, which is a little bit unusual, so it can actually interoperate with other deep learning frameworks that don’t specifically take the protobuff format. Or you can feed in raw JPEG or PNG images. If you do feed in Raw images, you also need to feed in a LST file that says, okay, what class labels are actually in this image and where is that image located? So it’s just telling you, here’s an image, here’s the things that are in that image, go learn from that. It also accepts Augmented Manifest image format, and if you use that, you can use Pipe mode, which, as we talked about earlier, allows it to stream that data in from S three, as opposed to copying everything over. And when we’re talking about a lot of big images, that can be a big performance impact, right? So definitely a good thing to use Pipe Mode, if you can.

This is an example of what that list file might look like when we talk about the list files that accompany the Raw images for training. Basically it’s an image index, a class label, and the path to the image. Pretty basic stuff. And then we have an example of the Augmented Manifest image format below. So under the hood, it’s using the ResNet CNN, which is again just a CNN architecture that’s already been optimized for this sort of a task. When you’re doing full training from scratch, we initialize the network with random weights, but if you want to use transfer learning, it has sort of a different approach. It’s pretty cool.

So we initialize it with all the pretrained weights from ImageNet or what have you, and then if you want to actually have it to be able to identify new kinds of images, the top fully connected layer will be initialized with random weights. And now you will fine tune that network with your new training data. So basically the lower levels of the network are left as is, because that’s kind of learning to identify the underlying shapes and features within the objects. The actual final identification of what object this is usually ends up in the top layer, right? So if you have transfer learning, you’re basically saying, I want to identify objects in an image. Well, the basic task of identifying objects in an image and the very high classification of those is going to be in the lower layers of that neural network. It’s the top layer that’s going to get really specific about what that thing is. So the idea here in transfer learning for image classification is that we’re only going to wipe out the weights on that top layer and try to relearn that with the new information that we’re giving it.

So it is important to remember how that works. The default image size is going to be three channel, one for red, one for green, one for blue, 224 by 224 dimensions. And you can change that if you want to, but that’s the default because that’s the size that ImageNet, which is what is trained on, expects. And again, the usual suspects for deep learning are the main hyper parameters that you would tune for this algorithm, including batch size and learning rate in the choice of optimizer.

And there are a bunch of parameters that are specific to given optimizers, such as weighted k, beta one, beta two, epsilon, and gamma. So these are the main knobs and dials that you might want to tune in, improving the training of either a raw image classification network that you’re training from scratch, or one that’s being built upon using transfer learning. Pretty cool stuff. And for the instance types, again, we shouldn’t be too surprised that it benefits from GPUs because we’re using CNN’s under the hood, a P two or a P three is recommended. And again, you can use multi GPUs on one machine or even multiple machines, each with multiple GPUs. Throw as much money as you want at it. For inference, CPUs or GPUs are okay, CP C four, or if you need more performance at P two or a P three.

22. Semantic Segmentation in SageMaker

Still going further with our computer vision stuff. Let’s talk about semantics segmentation, and you might recall that image classification only detects what objects are in an image. Object detection actually went further and told you the bounding boxes of where those objects are within the image. Semantics segmentation takes it one step further. So it’s actually pixel level object classification. So what you get back is a mask that looks like this that says for every single pixel in the image, what do I think that is an image of? What object does that pixel belong to? So in this case, you can see there’s probably some sort of a stuffed animal or something holding a rose with a bucket in front of it, right? So for every single pixel, it can tell you this is a stuffed animal, this is a rose, this is a bucket. And obviously that comes into play and things like self driving cars where you really need that kind of granularity of where things are. So again, we’re not just assigning labels to whole images or even bounding boxes. We’re actually going down to the pixel level here.

And some other applications besides self driving vehicles are medical imaging, diagnostics, robot sensing, things like that. We call this mask a segmentation mask that maps individual pixels to labels or classifications, if you will. When you’re training it, it expects either JPEG or PNG files with annotations associated with them and use both training and validation data. Of course, it also has label maps to describe what those annotations are in plain English. And it can also accept augmented manifest image format if you want to use pipe mode, which again can be a big performance boost by allowing you to stream the data in from S three. Instead of copying everything over for inference, it will accept a JPEG image to say, okay, well, here’s a picture. Tell me everything that’s in it at a pixel level under the hood. It’s built on, glue on, and glue on CV, which in turn is built on top of Apache MX net. And it gives you a choice of three different algorithms to use. So you can use a fully convolutional network or an FCN, also an algorithm called Pyramid Scene Parsing or PSP, and finally Deep Lab V Three. The details of how that works is not important for the exam, so I’m not going to clutter up your brain with it. You have your choice of a couple of flavors of ResNet for the underlying architecture of the neural network, resonant 50 or ResNet 101. And both of those are going to be trained on the ImageNet database, which knows about most common everyday objects. And like the other computer vision techniques we’ve talked about, you can do incremental training with it as well, or train it from scratch, whatever you want to do. You can train it on an entirely new set of objects, or start off with ImageNet and build upon that if you want to. Again, the usual suspects for deep learning hyper parameters here at box learning rate back to size optimizer.

Also, your choice of algorithm that we talked about is obviously important when you’re tuning things and the choice of backbone as well as an important parameter to tune when you’re doing hyper parameter tuning of semantic segmentation. Now, the instance types here are a little bit more restrictive. This is a very intensive algorithm, even more so than your typical CNN. So only GPU nodes are supported for training. You have to use a p two or a p three, and furthermore it only supports a single machine. This can’t be paralyzed across multiple machines. So you’re probably going to end up using a fairly beefy GPU node for training this thing, which is not cheap.

If you can use incremental training or a pretrained model, obviously that will cut your training costs down significantly. For inference, you can use either a CPU or GPU. Once you’ve trained that model, the hard work is done. You just have to feed an image through that resulting trained network and see what comes out the other side. Obviously a GPU will give you a little bit better performance, but as we’ll talk about later, this actually ways to accelerate CPU instances as well using elastic inference. But I’m getting ahead of myself. Bottom line is you need to use a GPU, a single GPU machine for training and for inference you can use a CPU or a GPU with symantec segmentation.

Comments
* The most recent comment are at the top

Interesting posts

The Growing Demand for IT Certifications in the Fintech Industry

The fintech industry is experiencing an unprecedented boom, driven by the relentless pace of technological innovation and the increasing integration of financial services with digital platforms. As the lines between finance and technology blur, the need for highly skilled professionals who can navigate both worlds is greater than ever. One of the most effective ways… Read More »

CompTIA Security+ vs. CEH: Entry-Level Cybersecurity Certifications Compared

In today’s digital world, cybersecurity is no longer just a technical concern; it’s a critical business priority. With cyber threats evolving rapidly, organizations of all sizes are seeking skilled professionals to protect their digital assets. For those looking to break into the cybersecurity field, earning a certification is a great way to validate your skills… Read More »

The Evolving Role of ITIL: What’s New in ITIL 4 Managing Professional Transition Exam?

If you’ve been in the IT service management (ITSM) world for a while, you’ve probably heard of ITIL – the framework that’s been guiding IT professionals in delivering high-quality services for decades. The Information Technology Infrastructure Library (ITIL) has evolved significantly over the years, and its latest iteration, ITIL 4, marks a substantial shift in… Read More »

SASE and Zero Trust: How New Security Architectures are Shaping Cisco’s CyberOps Certification

As cybersecurity threats become increasingly sophisticated and pervasive, traditional security models are proving inadequate for today’s complex digital environments. To address these challenges, modern security frameworks such as SASE (Secure Access Service Edge) and Zero Trust are revolutionizing how organizations protect their networks and data. Recognizing the shift towards these advanced security architectures, Cisco has… Read More »

CompTIA’s CASP+ (CAS-004) Gets Tougher: What’s New in Advanced Security Practitioner Certification?

The cybersecurity landscape is constantly evolving, and with it, the certifications that validate the expertise of security professionals must adapt to address new challenges and technologies. CompTIA’s CASP+ (CompTIA Advanced Security Practitioner) certification has long been a hallmark of advanced knowledge in cybersecurity, distinguishing those who are capable of designing, implementing, and managing enterprise-level security… Read More »

Azure DevOps Engineer Expert Certification: What’s Changed in the New AZ-400 Exam Blueprint?

The cloud landscape is evolving at a breakneck pace, and with it, the certifications that validate an IT professional’s skills. One such certification is the Microsoft Certified: DevOps Engineer Expert, which is validated through the AZ-400 exam. This exam has undergone significant changes to reflect the latest trends, tools, and methodologies in the DevOps world.… Read More »

img