Amazon AWS Certified Machine Learning Specialty – Modeling Part 12
32. Automatic Model Tuning
Let’s talk about automatic model tuning within Sage Maker, which is a very exciting capability of the Sage Maker system. So hyper parameter tuning is kind of a really big problem in the world of machine learning. So for all these algorithms we talked about, we’ve talked about the different hyper parameters that they exposed, and there’s a lot of them, right? How do you find the most optimal values to suffer these things? Well, we have some guidance on some of these thing s. I mean, we’ve talked about the effect of different learning rates and batch sizes and depths. Some of these cause you to find local minima that aren’t the right answer.
Some of them can cause you to overfit your model, things like that. But to find the absolute best value of these things, it’s tough. I mean, these are very complicated systems and there’s really no better way that we’ve come up with yet than just trying different values and seeing which one works the best. So often you just have to experiment with different values of these parameters to end up with a model that’s as optimal as it can be. It’s kind of a machine learning dirty little secret that we don’t fully understand what’s going on inside there. And a lot of it’s just trial and error to see what works. And this problem blows up very quickly when you have many different parameters that you want to optimize at once.
So if I have ten different values of learning rate that I want to try to drill in on, well, that’s fine. I can just train my model and test it ten times and figure out which learning rate worked the best. But if I have ten different learning rates and ten different batch sizes that I want to try out, well, now I have ten times ten different possibilities to try. If I want to throw in different depths of the network as well, I just blew it up by another order of magnitude. So as you add more and more hyper parameters that you want to try to tune at once, this problem just grows exponentially. You have to try every combination of every possible value and every time you have to train a model and evaluate that model. And as you can see, this gets really, really, really expensive both in terms of time and money very quickly. So that’s what automatic model tuning and Sage Maker tries to help with. Basically, you just define the hyper parameters that you care about and the ranges of values that you want to try on those hyper parameters and what metric you’re optimizing for.
Sage Maker can then spin up what we call a hyper parameter tuning job that will train as many of those combinations as you allow. So you can set an upper bound on how many training steps you want to run to control your costs and it will try to work within that bound. And as it goes, it will actually spin up training instances to run as much in parallel as it can, potentially quite a few of them, and just try to plow through all those different combinations of parameters as quickly as it can. It can involve quite a bit of computing power, but at least we can use the parallel capabilities of Sage Maker and the ability to spin up entire separate instances to do this for you to try to make that as quick as possible. Once you’re done, the set of hyper parameters that produce the best results can be turned around and deployed as a highly tuned model that uses the best parameters you could find. But here’s where it gets really cool. So the thing that’s special about automatic model tuning in Sage Maker is that it learns as it goes, so it doesn’t actually try every possible combination. It can actually learn over time that going in this direction on this parameter is having a positive effect and this one’s having a negative effect, and it can use that to be more intelligent about the actual parameters that it tries out. So it doesn’t necessarily try every possible combination of those parameters. It learns as it goes to try to figure out intelligently which ones make the most sense to try out next. And by doing that, it can save a lot in terms of the resources required to do your hyper parameter tuning.
Now, there are some best practices you should follow when doing automatic model tuning in Sage Maker, and this is important to remember this stuff. First of all, don’t try to optimize too many hyper parameters at once like we talked about, this explodes very quickly, and as you add more hyper parameters, that’s basically another dimension of parameter space that you need to explore and it just blows up exponentially. So try to focus on the hyper parameters that you think will have the most impact on the accuracy of your model or whatever metric you’re optimizing for. Start with those first. You can always do more tuning on other parameters as a second pass later on. Also, make sure you limit your ranges to a smaller range as possible. If you have some guidance as to what parameters might work, don’t explore crazy values on the outside of that, because that will just yield work that you don’t need to be done. Another key one is using logarithmic scales when appropriate. So whenever you do an automatic model tuning job, you tell it not only the range, but also the scale in which you want to explore this range. Linear would just go through in a linear manner. But if you have a hyper parameter where the values tend to range from something like 0. 1 to 0. 1, for example, you probably want to try a logarithmic scale for that instead, right? If you did a linear scale, you’d be there all day, but logarithmic would explore that more quickly.
Also, do not run too many training jobs concurrently. Like we talked about Sage makers parameter tuning learns as it goes, and it can’t do that learning if it’s doing everything in parallel. It works much better if you just run one or two training jobs at once. Allow Sage maker to learn from those results, and then run the next set of training jobs. So don’t run too many training jobs concurrently with parameter tuning. That can limit how well the process can learn, which is really the key to Sage maker’s efficiency in doing hyper parameter tuning.
Also, finally, if you have a training job that’s running across multiple instances, you have to take care to make sure that the correct objective metric in the end is being reported from some result of all those training instances. So if you’re doing your own training job code, that can be a little bit tricky. You want to make sure that it plays nice with hyper parameter tuning by reporting the objective that you’re trying to optimize on in hyper parameter tuning at the end when all those instances come back together. But the key ones to remember here are use a small range if you can. Don’t do too many hyper parameters at once. Don’t run too many training jobs concurrently because the learning relies on sort of that sequential learning over time. And also, whenever appropriate, use logarithmic scales for exploring your parameter space.
33. Apache Spark with SageMaker
So let’s talk about the intersection of Sage Maker and Apache Spark. So Apache Spark is a very popular framework for preprocessing data, and it also has a very powerful ML Lib library as well that can perform machine learning at large scale too. So in a lot of ways, Apache Spark does a lot of what Sage Maker does, but it does even more because it’s really good at preprocessing data. Basically, the way it works is that you load up your data into something called a data frame within Spark, and you can distribute the processing of that data frame to sort of manipulate and massage that data across an entire cluster on Spark. So wouldn’t it be cool if you could combine Sage Maker and Spark together? Actually use the power of AWS as well as the power of Spark? Well, turns out you can.
And there’s a Sage Maker Spark library that AWS provides that basically lets you use Sage Maker within a Spark driver script. So what does that look like? How do you use that? Well, you would pre process your data as normal with Apache Spark. So whatever processing your data to collect that data and map it and reduce it or whatever you need to do, you would still do that using Apache Spark as you would normally. And when you’re done, at least in the world of Python, you would end up with what’s called a data frame object from Spark that contains all of your preprocessed data.
At that point, instead of using Spark’s ML Lib, you could use what’s called a Sage Maker Estimator, which works the same way. And it exposes a few different of the more popular algorithms within Sage Maker as basically things that you can use in Spark. For example, K means and PCA and XGBoost XGBoost being a very popular algorithm these days, it’s winning a lot of competitions, PCA for dimensionality reduction and K means for clustering that will then produce a Sage Maker model that you can use to make inferences. So it looks a lot like just normal Spark code if you look at it here, if you’re familiar with Spark code. But instead of using a Spark ML Lib implementation, we’re using a Sage Maker estimator and a Sage Maker model instead. So for the machine learning portion, we’re handing things off to Sage Maker to run within its own framework as opposed to the Spark cluster itself. We’re going to go off and spin up our own ML instances within Sage Maker to perform that final stage while still using Spark for all the preprocessing. The way this works in practice, you can take a Sage Maker notebook and connect that to a remote elastic MapReduce cluster running Spark.
So remember, EMR can run Spark on it. We just need to connect our Sage Maker notebook to that Spark cluster so we can use it, or you can use Sapling if you prefer. The training data frame that you’re preprocessing and creating in spark should end up with a features column that’s a vector of doubles, double precision values, and an optional labels column of doubles as well. If you’re doing supervised stuff, then you just create a sage maker estimator call fit on it using that data frame, and that will give you back a sage maker model. You can then call transform on the sage maker model to make inferences on that trained model.
This also works with spark pipelines, too. So pretty good integration between sage maker and spark. Why would you bother with all this? Well, it allows you to combine the power of preprocessing. Big data is sets in spark with training and inference in sage maker, so it’s kind of the best of those two worlds. And, yes, spark can actually do massive scale machine learning as well. But if you have AWS resources you want to use and take advantage of all the special capabilities of sage maker, such as automatic hyper parameter tuning, you might want to use both together. So it’s good to know that you can do that.
Interesting posts
The Growing Demand for IT Certifications in the Fintech Industry
The fintech industry is experiencing an unprecedented boom, driven by the relentless pace of technological innovation and the increasing integration of financial services with digital platforms. As the lines between finance and technology blur, the need for highly skilled professionals who can navigate both worlds is greater than ever. One of the most effective ways… Read More »
CompTIA Security+ vs. CEH: Entry-Level Cybersecurity Certifications Compared
In today’s digital world, cybersecurity is no longer just a technical concern; it’s a critical business priority. With cyber threats evolving rapidly, organizations of all sizes are seeking skilled professionals to protect their digital assets. For those looking to break into the cybersecurity field, earning a certification is a great way to validate your skills… Read More »
The Evolving Role of ITIL: What’s New in ITIL 4 Managing Professional Transition Exam?
If you’ve been in the IT service management (ITSM) world for a while, you’ve probably heard of ITIL – the framework that’s been guiding IT professionals in delivering high-quality services for decades. The Information Technology Infrastructure Library (ITIL) has evolved significantly over the years, and its latest iteration, ITIL 4, marks a substantial shift in… Read More »
SASE and Zero Trust: How New Security Architectures are Shaping Cisco’s CyberOps Certification
As cybersecurity threats become increasingly sophisticated and pervasive, traditional security models are proving inadequate for today’s complex digital environments. To address these challenges, modern security frameworks such as SASE (Secure Access Service Edge) and Zero Trust are revolutionizing how organizations protect their networks and data. Recognizing the shift towards these advanced security architectures, Cisco has… Read More »
CompTIA’s CASP+ (CAS-004) Gets Tougher: What’s New in Advanced Security Practitioner Certification?
The cybersecurity landscape is constantly evolving, and with it, the certifications that validate the expertise of security professionals must adapt to address new challenges and technologies. CompTIA’s CASP+ (CompTIA Advanced Security Practitioner) certification has long been a hallmark of advanced knowledge in cybersecurity, distinguishing those who are capable of designing, implementing, and managing enterprise-level security… Read More »
Azure DevOps Engineer Expert Certification: What’s Changed in the New AZ-400 Exam Blueprint?
The cloud landscape is evolving at a breakneck pace, and with it, the certifications that validate an IT professional’s skills. One such certification is the Microsoft Certified: DevOps Engineer Expert, which is validated through the AZ-400 exam. This exam has undergone significant changes to reflect the latest trends, tools, and methodologies in the DevOps world.… Read More »