Amazon AWS Certified Machine Learning Specialty – Modeling Part 5

  • By
  • January 25, 2023
0 Comment

11. Precision, Recall, F1, AUC, and more

Let’s talk about some metrics that you can derive from a confusion matrix. The stuff is super important on the exam, guys, so pay attention. So let’s revisit our friend, the confusion matrix. Again, in this particular example of one, we have actual values going down the columns and predictive values across the rows that can be different. But in this format, we have the number of true positives in the upper left corner, the number of true negatives in the lower right column corner, the number of false negatives in the lower left corner, and the number of false positives in the upper right corner.

Okay? So make sure you understand where your true positives and negatives are, where your false positive and negatives are when you’re starting to look at a confusion matrix. And again, that can vary based on the layout of the confusion matrix itself. Let’s start with recall. So recall is computed as the true positives over the true positives plus the false negatives. You should seriously memorize this. You need to know this. It goes by other names as well, just to make things more confusing.

So it’s also known as sensitivity, true positive, rate, and completeness and completeness kind of harkens back to its original use. In the world of information retrieval, you can also think of recall as the percent of negatives that were wrongly predicted. So it’s a good choice of metric when you care a lot about false negatives.

Okay? So fraud detection is a great example of where you might be focusing on recall because a false negative in the world of fraud means that something was fraud, but you failed to identify it as fraud. You had a fraudulent transaction that was flagged as being perfectly okay. That is the worst possible outcome in a system that’s supposed to be detecting fraud, right? You want to be erring on the side of false positives, not false negatives in that case. So recall, good choice symmetric when you care about false negatives. Fraud detection being an example of that is true positives over true positives plus false negatives. Write that down.

Make yourself a little cheat sheet cram on this stuff before the exam. Okay? Let’s make it real with an example here. So in this particular example of a confusion matrix, again, recall as true positives over true positives plus false negatives. We just pluck the values out of this confusion matrix. In this particular layout, true positives will be five, false negatives will be ten. So we just say five over five plus ten, which is five over 15 or one third or 33. 33%, right? So that’s recall, recall’s partner in crime is precision, and precision is computed as true positives over true positives plus false positives. This goes by other names as well, including the correct positive rate or the percent of relevant results. So this is a measure of relevancy in the world of information retrieval, when should you care about precision? Well, it’s an important metric when you care about false positives. Some examples would be medical screening or drug testing. You don’t want to say somebody’s on cocaine or something when they’re not that would have like really bad effects on their life and career and stuff, right? So again, precision when you care about false positives, more so than false negatives, drug testing being a classic example of that. Again, it is computed as true positives over true positives plus false positives. And again we’ll dive into an example here. In this particular confusion matrix, the true positives will be five. The false positives in this example are 20. So the precision is calculated as five over 25, which is 20%. There are other metrics as well, for example specificity, which is the true negatives over true negatives plus false positive, also known as the true negative rate. Also F one score is a very common thing that’s used and again, you should memorize this formula. Two guys write it down somewhere that is two times the true positives over two times true positives plus false positives plus false negatives.

You can also compute it as two times the precision times the recall over the precision plus the recall. Either way works mathematically. It is the harmonic mean of precision and sensitivity. So if you care about precision and recall, remember, recall and sensitivity are the same thing. F one score is a metric that balances the two. If you know that your model doesn’t just care about accuracy alone and you want to capture both precision and recall, f one score can be a way of doing that. But in the real world you’re probably going to care about precision or recall more so than the other. So it really pays to think about what you care about more. Using F one score in my opinion, is a bit of a shortcut, a little bit of laziness, but the exam may expect you to know what it is and how to compute it. Also, RMSC is often used as a metric. It’s just a straight up measure of accuracy and it’s exactly what it sounds like, the root mean squared error. So you just add up all of the squared errors of each prediction from its actual true value and take the square root of it. That’s it.

So it only cares about right and wrong answers. It doesn’t get into the nuances of precision and recall. So if all you care about is accuracy, RMSC is a common metric used for that. Another way of evaluating your models is the Rock curve, that stands for Receiver operating characteristic curve. And what it does is plot your true positive rate or your recall versus your false positive rate at various threshold settings in your model. So as you choose different thresholds of choosing between true and false, what does that curve look like? Basically the way to interpret a raw curve is that you want it to be above the diagonal line there. So the ideal curve would just be a single point at the upper left hand corner, just a big right hand angle where the whole thing is in that upper left hand side of the graph, if you will, to the left of that diagonal line. So the more bent a rock curve is toward that upper left corner, the better. That’s how you interpret these things. We can also talk about the area under the curve, which is the area under the rock curve, exactly what it sounds like.

So you can actually interpret that value as the probability that a classifier will rank a randomly chosen positive instance higher than a randomly chosen negative one. So an AUC of zero five would be what you’d expect to see if you were at that diagonal line, right? So if you actually have the area underneath that diagonal line where things are no better than random, that turns out to be an area of zero five, right? So that kind of makes sense. So if you see an AUC of zero, five or below, that’s useless, or worse than useless. The perfect classifier would have an area under the curve, an AUC of 1. 0, that would again be that perfect case where the curve is just this right angle with a one point at one up there in the upper left hand corner. That would include the entire area of that entire graph, which works out to one. So AUC can be a useful metric for comparing different classifiers together, where the higher the value, the better. So there you have it.

Some common metrics for evaluating classifiers precision recall f one score. Roc and AUC are the important ones to remember. Again, make yourself a cheat sheet, guys. This stuff is important.

12. Ensemble Methods: Bagging and Boosting

Let’s talk about the world of ensemble learning, and specifically Bagging and Boosting. Any machine learning experts should know this stuff, and therefore the exam might expect you to know it too. What is an ensemble method? Well, it’s best illustrated by an example if you’ve heard of random forests, that’s one example of an ensemble method. Basically, at a high level, an ensemble method takes multiple models and they might be just variations of the same model and lets them all vote on a final result. So if I have a decision tree, maybe I’ll make a little variant of that decision tree that was created using different thresholds or a different set of training data, and I can create many decision trees for the same problem and let them all vote on the final result. It turns out that often that ends up with a better result than just using a single decision tree.

So that’s how random forests work. So how do those trees differ? Well, that’s where we get into the nuances between Bagging and Boosting. So Bagging would generate multiple training sets by random sampling with replacement. So basically, we create a bunch of different models that just have different training sets that are generated this way that are resampled from the original set of training data. The advantage of this is that you can train all of those models in parallel, right, and then just let them all vote on a final result. So model one might say this is classification one. Model two might say it’s also one. Model three might say it’s two.

So in the end, they all vote on what the actual classification is and together they end up being more robust than a single model would have been. Boosting, in contrast, works in a more serial manner. So with Boosting, we assign weights to each observation in our data set. We start off assuming that every data point has equal weight. But as we run our model, we assign weights to those data set points. And as we learn that over time, we keep refining those weights of the underlying observations. And this yields a model that’s better at the end of the day.

So that runs in a more sequential manner, where we start off running our model with equal weights on each observation. At each stage, we reweight the data and the model and we run it again with that data and the new weights, and we just keep iterating on that until we get a better and better result. That’s how boosting works. How do you choose one versus the other? Well, I got to say, Boosting is pretty hot these days. XGBoost, which is a big part of Sage Maker as well, is a very hot algorithm these days. If you look at Kaggle, lot of Kaggle challenges are being won by Xg Boost right now. Boosting really works well. So if you care about accuracy, boosting is probably a good thing to try out its strength is accuracy. However, bagging is a good way to avoid overfitting, right, because that’s actually sort of spreading out your data and sub sampling it more so that prevents any individual model from overfitting and they all end up voting on a final result.

And even if any one model in a bagging model were overfitting, the other model should balance that out, right, and sort of average out that overfitting that might be happening in any one specific model. So it’s a good way to sort of smooth out any overfitting that might be happening by having multiple models that all vote on the results. That might if they are overfitting, at least they’re probably overfitting in different ways, right? So that cancels itself out. Bagging is going to be a lot easier to parallelize, as we said, as well. So, like we said, boosting is more of a serial process. Bagging can happen in parallel, so bagging obviously can happen faster if you have parallel resources to work with. So what should you use? It depends on your goal. If you care about accuracy above all else, boosting is worth looking at. If you care about preventing overfitting, having a regularization effect, and having a more parallelizable architecture, bagging is probably the thing you want to be looking at instead. So that’s bagging and boosting the difference between the two when you’d use one over the other.

13. Introducing Amazon SageMaker

By far the service that comes up the most on the exam is Amazon Sage Maker. Sage Maker is basically the heart of Amazon’s machine learning offering, so we’re going to spend a lot of time on it. Let’s start at a high level. So what’s sage maker for? Well, it’s intended to manage the entire machine learning workflow. So ideally, this is what your process looks like in the real world. You start off by fetching and cleaning and preparing some training data and doing all your feature engineering on it. And then you feed that into a model that is trained on that data, and you can evaluate that model and say, okay, it looks good. Let’s go ahead and deploy that model, actually use it in production to actually make inferences on observations that we haven’t seen before. Now, once that’s in production, we can learn from those results, see how well it’s actually doing, and gather even more information, use that to fetch, clean, and prepare even more data, and maybe use those learnings to do a better job of feature engineering. Then the cycle begins again.

We’ll take that new data, train our model again, deploy it again, take in more data, train the model again, deploy it again, and hopefully things just keep getting better and better and better over time. Sage Maker allows you to manage all of this stuff, so it will spin up training instances to make your training happen at large scale. It gives you notebooks where you can actually do your data preparation, and it will actually spin up instances in EC Two to actually deploy your model and sit there as an endpoint waiting to make inferences in production. Architecturally, this is the idea. So let’s start at the bottom here. When we’re doing our training, our training data that we’ve already prepared will be sitting in an S Three bucket somewhere. And Sage Maker’s job is to go out there and provision a bunch of training hosts to actually do that training on.

Now, the code that it uses, the actual model itself comes from a docker image that’s registered in Elastic Container Registry. So it will take that training code from a docker image, deploy that out to a fleet of hosts to actually do that training and get that training data from S Three. When it’s done, it will save that trained model in any artifacts from it also to S Three. At this point, we’re ready to deploy that model and actually put it out there in production, right? So at this point, we’re also going to have some docker image in ECR. That’s the inference code. It’s potentially a lot simpler. Its only job is to take in incoming requests and use that saved model to actually make inferences based on that request. So it pulls in our inference code from ECR again. It will spin up as many hosts as it needs to actually provide endpoints and serve those requests coming in and it will spin up Endpoints as well that we can use to communicate with the outside world. So now we might have some client application that’s sending in requests to our model, and that Endpoint will then very quickly make those predictions and send them back. For example, maybe we have a client that’s taking pictures and we want to know what’s in the picture. It might say, hey Endpoint, here’s a picture, tell me what’s in it.

It would then refer to that inference code and the trained model artifacts that we have to say, okay, I think it’s a picture of a cat, and send that back to the client location. It’s just one of many examples. There are a couple of ways to work with Sage Maker. Probably the most common is by using the Sage Maker notebook. And it’s just a notebook instance running on an EC Two instance somewhere that you specify, and you spin these up from the console, and it’s very easy to use. As you’ll see, your Sage Maker notebook has access to S Three, so it can actually access its training and validation data there or whatever else you need.

You can do things like use ScikitLearn or Pi, Spark or TensorFlow within them if you want to. And it has access to a wide variety of built in models. So there are pre built docker images out there that contain a wide variety of models you can just use out of the box, and we’re going to spend a lot of time talking about those. You can also spin up those training instances from within your notebook. So within your very notebook there, you can say, go spin up a whole fleet of servers that are dedicated specialized machine learning hosts to execute that training on. And when your training is done and saved to S Three, you can also, from the notebook say, okay, deploy that model to a whole fleet of Endpoints and allow me to make predictions at large scale. And you can even say from the notebook, go ahead and do an automated hyper parameter tuning job to actually try different parameters on my model and try to find the ideal set of parameters to make that model work as well as possible. All this can be done from within a notebook.

You can also do a lot of this from the Sage Maker console as well. The notebook obviously gives you more flexibility because you can actually write code there, but sometimes you’ll use them together. So a pretty common thing is to kick off a training job or a hyper parameter tuning job from within your notebook, then switch back to the console and just keep an eye on it, see how well it’s doing. Let’s talk about the data preparation stage and how that interacts with Sage Maker. So again, Sage Maker expects your data to come from S Three somewhere. So we kind of assume that you’ve already prepared it using some other means if you need to. The format it expects will vary with the algorithm, with the actual training code that you’re deploying from ECR, right, for the built in algorithms that’s often record IO proto buff format, which is just a data format that’s very well suited as the input to deep learning and other machine learning models.

But usually these algorithms will also just take straight up CSV data or whatever you might have. But record IO proto buff will usually be a lot more efficient if you can get your data into that format. And you can do that preprocessing within your Sage Maker notebook if you want to, that’s fine. You can also integrate Spark with Sage Maker, which is pretty cool. So if you want to use Apache Spark to preprocess your data at massive scale, you can actually use Sage Maker within Spark, and we’ll see an example of that later on in the course. You also have the usual tools at your disposal that you can use within the Jupiter notebooks ikitlearn, NumPy, Pandas. If you want to use that to slice and dice and manipulate your data before you actually feed it into your training job, that’s totally fine. Once you’re ready to train, you’ll just create a training job either from the console or from your notebook. All it needs is the URL to S three, where your training data lives that’s already been prepared.

You need to specify what machine learning compute resources that you want to use for that’s. The specific EC two instances that you’re going to use to do this training on, those could be compute nodes, they could be GPU nodes like P twos or P threes, whatever is appropriate to the training algorithm that you’re using, and you’ll need the URL to S Three to actually put your trained model artifacts into. Finally, you need a path to ECR to actually tell it where to get the training code from to run on those ML compute resources. It’s all you need. There’s different ways of doing training. I mean, like I said, there’s a huge variety of built in algorithms that we’re going to talk about a lot, so you just need to know where in ECR those lives to use them. You can also use Spark, ML, Lib for training. You can use your own custom code that just lives in your own docker image, and you can also have your own custom code written on top of TensorFlow or MXNet. That’s also easy to do, and we’ll have a lab about that later on. There are also algorithms you can purchase from the AWS marketplace, where you can purchase access to a docker image that contains a Sage Maker training algorithm if you want to as well. Once your model is trained, you need to deploy it, and again, this can just be done from a notebook. You’ll save the train models S Three somewhere, and at that point, there’s two things you can do with that train model. One is you can ask sage maker to spin up a fleet of persistent endpoints to make individual inferences and predictions on demand from some sort of external application. You can also do batch transforms if you know you just have an existing set of observations that you want to make predictions for on mass.

That can work too. And there’s a lot of cool options there. When you’re deploying a train model, it has something called inference pipelines. If you need to chain different steps together as you’re doing your inferences. It has something called sage maker neo that allows you to deploy your train models all the way to edge devices. It has something called elastic inference, which you can use to accelerate how quickly that deployed model actually comes back by having dedicated instance types that are just made for accelerating that. And it also has automatic scaling capabilities, where it can automatically scale up and down the number of endpoints you have as needed. We’ll talk about that more in detail later, but for now, let’s start digging into some of those builtin algorithms to give you a better feel of what sage maker can do.

Comments
* The most recent comment are at the top

Interesting posts

The Growing Demand for IT Certifications in the Fintech Industry

The fintech industry is experiencing an unprecedented boom, driven by the relentless pace of technological innovation and the increasing integration of financial services with digital platforms. As the lines between finance and technology blur, the need for highly skilled professionals who can navigate both worlds is greater than ever. One of the most effective ways… Read More »

CompTIA Security+ vs. CEH: Entry-Level Cybersecurity Certifications Compared

In today’s digital world, cybersecurity is no longer just a technical concern; it’s a critical business priority. With cyber threats evolving rapidly, organizations of all sizes are seeking skilled professionals to protect their digital assets. For those looking to break into the cybersecurity field, earning a certification is a great way to validate your skills… Read More »

The Evolving Role of ITIL: What’s New in ITIL 4 Managing Professional Transition Exam?

If you’ve been in the IT service management (ITSM) world for a while, you’ve probably heard of ITIL – the framework that’s been guiding IT professionals in delivering high-quality services for decades. The Information Technology Infrastructure Library (ITIL) has evolved significantly over the years, and its latest iteration, ITIL 4, marks a substantial shift in… Read More »

SASE and Zero Trust: How New Security Architectures are Shaping Cisco’s CyberOps Certification

As cybersecurity threats become increasingly sophisticated and pervasive, traditional security models are proving inadequate for today’s complex digital environments. To address these challenges, modern security frameworks such as SASE (Secure Access Service Edge) and Zero Trust are revolutionizing how organizations protect their networks and data. Recognizing the shift towards these advanced security architectures, Cisco has… Read More »

CompTIA’s CASP+ (CAS-004) Gets Tougher: What’s New in Advanced Security Practitioner Certification?

The cybersecurity landscape is constantly evolving, and with it, the certifications that validate the expertise of security professionals must adapt to address new challenges and technologies. CompTIA’s CASP+ (CompTIA Advanced Security Practitioner) certification has long been a hallmark of advanced knowledge in cybersecurity, distinguishing those who are capable of designing, implementing, and managing enterprise-level security… Read More »

Azure DevOps Engineer Expert Certification: What’s Changed in the New AZ-400 Exam Blueprint?

The cloud landscape is evolving at a breakneck pace, and with it, the certifications that validate an IT professional’s skills. One such certification is the Microsoft Certified: DevOps Engineer Expert, which is validated through the AZ-400 exam. This exam has undergone significant changes to reflect the latest trends, tools, and methodologies in the DevOps world.… Read More »

img