Amazon AWS Certified Machine Learning Specialty – Modeling Part 16
44. Lab: Tuning a Convolutional Neural Network on EC2, Part 2
And we’re ready to go. So let’s go ahead and execute this first block of code here. First thing we need to do is import TensorFlow itself in the packages that we need. Again, the Python code itself is not important here. You will not be expected to actually understand or interpret or even look at Python code on the actual exam. First block usually takes a little bit longer because it has to go and load the guts of that environment first and import TensorFlow itself. All right, that completed. And if this is your first time using a jupyter notebook, the way that I ran that was I just clicked inside the block and I hit Shift Enter to run that code. All right, you could also just hit the run button here.
It would have the same effect. Way notebooks work is that you can intersperse code and documentation here along with the code. So we have this block of explanatory text here done in markup language, and then some actual Python code here that we can actually execute by selecting that block of code and just hitting Shift Enter. Now that we’ve loaded up all the packages we need, we can run this next block. All this is doing is loading up the MNIST data set itself and partitioning it into training and testing images and labels. So it’s going to split up our handwriting recognition data into a train test and a test set that we can hold out for evaluation purposes.
And for each of those sets, we will have a set of image data and a set of label data indicating what the true intention was of what number that person was actually writing Shift Enter to run that downloads the data that it needs, and it’s done. All right, now we need to massage our data a bit. So usually when you’re doing machine learning, you can’t just use the data in the form that it came in. You have to transform it in some way first. And fundamentally, that’s what’s going on here. We’re going to reshape what is originally one dimensional pixel data into 2D images of 28 by 28 pixels each. So every input image is being shaped into a 2D array of 28 by 28 pixels. And as convolutional, neural networks deal with 2D images. And once we’re done with that, we are further going to cast it into floating point values, which is what our neural network expects, and that we also need to normalize the data. So the raw image data is going to be pixel values that range from zero to 255. Because it’s eight bit data, we need to scale that down to the range zero to one, which is what this division by 255 is doing. So this block of code is preparing our data. It’s formatting it in the structure that our neural network expects, and it’s scaling the data down to the range of data that our neural network expects. Generally, speaking neural networks like normalized data centered around zero. Furthermore, we need to deal with the label data as well. So we just dealt with massaging our image data.
The labels also need to be manipulated, as you might recall, neural networks like one hot encoded data. So the number one would, for example, be encoded as 01000 and so forth. We basically need to represent these in some sort of a binary fashion, and that’s what this two categorical function does for us. It’s just going to turn around and take all of those training and testing labels, which are integers from the zero to nine, and convert that to one hot format for us. All right, so now we can make sure that our data makes sense. All this is doing is actually displaying in line some of the sample data here so we can see what it looks like. So, for example, we have a function here that will display an image from our training set and we’ll call it to actually display sample number 1234. The library that actually displays the graphics needs to load up first, so that takes a moment. And there we have it. So we see here the sample data that we had here. We extracted an image that happens to be the number three. This is the image itself. The label associated with it is the number three. And we’re also looking at the one hot encoded format of that. So again, the number three in one hot format would be 0123. So we have a number one there in that fourth slot there.
All right, so it looks like all of our data transformation worked. Our data is intact and it seems to make sense. Now we can actually set up the neural network itself. So we’re going to start by creating your most basic convolutional neural network here. It’s just going to be two convolution layers going into a max pooling layer. We’re then going to flatten that to one dimension and pass that into a dense layer of 128 neurons, which in turn goes into ten output neurons that will be used to actually deliver our final classification results. So the world’s simplest convolutional neural network here, but it works. So let’s go ahead and set that up. We’re using the Keras library to set up our model here in just a few lines of code.
So it’s pretty cool that we were able to create a convolutional neural network from scratch, if you will, with just a few lines of code here. We can call Model Summary to actually get a printout of what that structure looks like and make sure that it’s what we wanted. And you can double check that the output shapes here and the layer types all are what we want. This is what we want the structure of our neural network to be. And it does qualify as deep learning because there is much more than one layer going on here right all right, next we need to actually compile that model.
And to do that, we need a loss function and an optimizer. Since this is a multiple classification problem, we’re going to use categorical cross entropy for our loss function, and we have a wide choice of optimizers. We can use RMS prop is what Keras uses in its own examples, but Adam is usually a good choice, too. So I’m going to actually use the atom optimizer instead. And we will be optimizing on the metric of accuracy, which makes sense for this data set. We’re trying to measure how accurately we can predict what number a given handwriting sample maps to. So we’re not dealing with any crazy edge cases here of where recall is more important because we’re dealing with a lot of imbalanced data sets here or anything like that. It’s pretty evenly distributed, so accuracy makes sense. Go ahead and run that. All right. And now we can actually train the model. So this is where it’s actually going to train the neural network. And on my P, two large, it will take a few minutes to run. If you’re using a smaller instance, the compute instance, it could take even longer than that. So be prepared to go grab a cup of coffee while this takes place. I’m going to use a batch size of 32 over ten epochs and just let it run. And let’s go ahead and kick that off.
Okay, after a couple of minutes here, it’s just about done here. So the exam is really going to focus on how you interpret these results and what you do in response to it. That’s what this is all about. So let’s take a look at what happened here. We’re still waiting for that final epoch to finish up, but what we have here as we go through each epoch of training is both the accuracy on the training set that it’s using as it optimizes itself internally as it’s training the neural network, and also the validation results. So at every epoch, it’s also testing the validation, the test data that we held out to see how well the neural network performs against data that it has not been trained on yet. All right, so we can see that pretty quickly. We’re getting to a point where we’re overfitting because I can see that the accuracy on the training data is increasing. It’s getting bigger than the accuracy on the test data. This means that we’re overfitting. It means that we’ve built a neural network, a deep learning model that’s better at its own training data than in the general case of trying to predict data that it hasn’t seen before. So that’s a sign that we might be overfitting.
What do we do in general? To prevent overfitting, you need to perform some sort of regularization technique, and different algorithms will have different techniques that you use. In the world of deep learning, one popular choice is what’s called dropout layers. The way they work is by basically dropping out some random set of neurons in the network at each epoch, each training pass. And by doing that, it forces the learning to be more distributed throughout the entire network so things can’t be learned in one specific part of the neural network that results in overfitting. It’s forced to spread that information out more widely. And this has a regularization effect. It causes it to be a more general purpose model and perform a little bit better on data that it hasn’t seen before.
So let’s do that, see what effect it has. What we’re going to do is create that neural network again, that deep learning network, the CNN, if you will. And we’re going to add a couple of new layers in here. So you see that we have the convolutional layers, the pooling layers and the flattened and the top layers there that we have before. But we’re also going to add in two dropout layers as well. So we’re going to have a dropout layer after the convolution and max pooling steps and we’re going to do another dropout before our final categorization layer that produces the final results as well. So at a high level, we’re adding dropout layers to our deep learning network in order to prevent overfitting. Let’s go ahead and create that new model and you can see that I move that into a function here so I can do this more quickly in the future. We’ll go ahead and compile that again with the same loss function and optimizer as before and we will run it again and see how our results differ this time. Again. We’ll come back when it’s done.
45. Lab: Tuning a Convolutional Neural Network on EC2, Part 3
Alright, that finished. And these results are a lot nicer. So first of all, our accuracy in the validation set is more than it was before. We’re up to 99. 23% and before we only achieve 99. 07. So, you know, by any metric, this is a better model. It’s actually performing better on the validation data set, meaning that it’s more general, it does a better job of classifying images that it hasn’t seen before. And we can also look at the training accuracy that here. We can see that they ended up at more or less the same place. And in fact, the training accuracy is a little bit lower than the test one, so we’re not overfitting. And it did take a little bit longer for that accuracy to build up because of those dropout layers. It had fewer neurons to work with, so it took a little bit longer to train. But the result was a model that was better suited for predicting values of images that it hasn’t seen before.
So this is a good thing. We have a more generalized model here. It is not overfitting, it is better at predicting stuff it hasn’t seen before. All right, so that’s regularization and using dropout to actually improve regularization and preventing overfitting on a deep neural network, we’ve seen that it works firsthand here. You’re also going to need to know things like what effect does that batch size have, how did I choose 32, how many epochs do I need, what’s the learning rate? When I’m specifying an optimizer, I can specify how that optimizer works. These all have different effects on how training works, so well, let’s play around and find out. Let’s explore a batch size. So I’m going to create that same model again with the same optimizer and same loss function, but this time I’m going to use a batch size of 1000 instead of 32. Let’s see what effect that has.
Go ahead and run that. And again, we’ll come back when that’s done. All right, that finished. And you can see the results weren’t as good as before. So our final accuracy on the validation set is only 99. 06%. We did better before with the smaller batch size. So what’s going on here? Well, another thing that you might notice is that if you were to run this multiple times, you’re going to get very different results every time too. The danger of having a batch size that’s too large is that you get stuck in what’s called local minima. There’s a danger that you’ll end up converging on the wrong solution because you aren’t using small enough batches to sort of explore different nooks and valleys inside the space of solutions, if you will. It’s a little bit counterintuitive, but they tend to get stuck at the wrong solution and they can do this at random. It’s a bad thing.
They do tend to train a little bit faster, but at the expense of potentially giving you the wrong solution. Smaller batch sizes tend to be a little bit more resilient to that. They will give you more consistency and they tend not to get stuck in the wrong solution. They can jiggle around a little bit more and work their way out of those situations. Let’s try something else. Let’s play with the learning rate as well. So by default, the learning rate for the atom optimizer is 0. 1. Let’s see what happens if I increase that by an order of magnitude to 0. 1. So again, same models before, same loss function optimizer, but I’m specifying a different learning rate on that optimizer and increasing that learning rate. Let’s see what that does. And I’m going to go back to the batch size of 32 just to make things even again. Let’s let that complete and see what happens.
All right, that finished and the results are pretty terrible. We’re down to just a 98% accuracy in the validation set and only 95% internally on its own training data. So with that higher learning rate, it was not able to converge on a good solution at all. Now, this is because when you have a small batch size, small batch sizes parallel with smaller with lower learning rates and a large learning rate like we tried here, has a tendency to overshoot the correct solution entirely. That’s a bad thing. So again, you don’t want too high of a learning rate. Generally, if you have a small batch size, you also want a small learning rate. And in any case, a learning rate that’s too big is going to have bad effects. So this is very interesting, right? So there’s a way more to getting good results out of deep learning than just your choice of a model. We chose to use a convolutional neural network and that is sort of a textbook thing to do for doing image classification, handwriting classification, which is what we did here.
However, we got a huge range of results here just by tweaking these hyper parameters. We started off without any sort of regularization at all and we ended up with a deep learning network that overfit quite a bit. It overtrained on its own training data and it wasn’t able to generalize as well on data it hadn’t seen before. What we did to counteract that was introducing dropout layers to have a regularization effect on that neural network. Maybe just reducing the number of neurons and the number of layers would have had a similar effect as well. That would be another way to address that problem. We also saw that playing with a batch size can have an effect as well. Too large of a batch size can yield inconsistent results and can cause your network to get stuck in local minima that it can’t get out of. We also saw the effective learning rate and that having too large of a learning rate can cause it to overshoot the correct result entirely and yield a bad result in the end as well. So things like the batch size and the learning rate are what we call hyper parameters.
And a large part of successful machine learning is just being able to tune those hyper parameters and choose the best values for it. It’s all a bit of a dark art, but there are some principles behind it that we talked about here, and learning how to apply those is a big part of how to be successful in the world of machine learning and practice. And that’s a big part of what this exam is trying to get at. All right, so I hope you learned something there.
Before we go away, though, remember to shut down this EC Two instance because we don’t want to be built on it forever. Let’s go to shut down this notebook first. Close and halt under the file menu just to clean things up nicely. And we can quit the actual jupyter notebook itself, go back to putty to our terminal here, and it did, in fact, shut it down. Type and exit to shut down that session. And finally, we can close that out and go back to the EC Two dashboard and most importantly, make sure you terminate that instance. So I’m going to click on our EC Two instance that we’ve been playing with. Go to actions instance state terminate. Only until I do this can I be sure that I’m not going to be billed for that instance any longer. So that is now shutting down and we are done racking up money on that. All right? So I hope you learned something here about regularization and identifying overfitting and how to tune your deep learning models effectively.
Interesting posts
The Growing Demand for IT Certifications in the Fintech Industry
The fintech industry is experiencing an unprecedented boom, driven by the relentless pace of technological innovation and the increasing integration of financial services with digital platforms. As the lines between finance and technology blur, the need for highly skilled professionals who can navigate both worlds is greater than ever. One of the most effective ways… Read More »
CompTIA Security+ vs. CEH: Entry-Level Cybersecurity Certifications Compared
In today’s digital world, cybersecurity is no longer just a technical concern; it’s a critical business priority. With cyber threats evolving rapidly, organizations of all sizes are seeking skilled professionals to protect their digital assets. For those looking to break into the cybersecurity field, earning a certification is a great way to validate your skills… Read More »
The Evolving Role of ITIL: What’s New in ITIL 4 Managing Professional Transition Exam?
If you’ve been in the IT service management (ITSM) world for a while, you’ve probably heard of ITIL – the framework that’s been guiding IT professionals in delivering high-quality services for decades. The Information Technology Infrastructure Library (ITIL) has evolved significantly over the years, and its latest iteration, ITIL 4, marks a substantial shift in… Read More »
SASE and Zero Trust: How New Security Architectures are Shaping Cisco’s CyberOps Certification
As cybersecurity threats become increasingly sophisticated and pervasive, traditional security models are proving inadequate for today’s complex digital environments. To address these challenges, modern security frameworks such as SASE (Secure Access Service Edge) and Zero Trust are revolutionizing how organizations protect their networks and data. Recognizing the shift towards these advanced security architectures, Cisco has… Read More »
CompTIA’s CASP+ (CAS-004) Gets Tougher: What’s New in Advanced Security Practitioner Certification?
The cybersecurity landscape is constantly evolving, and with it, the certifications that validate the expertise of security professionals must adapt to address new challenges and technologies. CompTIA’s CASP+ (CompTIA Advanced Security Practitioner) certification has long been a hallmark of advanced knowledge in cybersecurity, distinguishing those who are capable of designing, implementing, and managing enterprise-level security… Read More »
Azure DevOps Engineer Expert Certification: What’s Changed in the New AZ-400 Exam Blueprint?
The cloud landscape is evolving at a breakneck pace, and with it, the certifications that validate an IT professional’s skills. One such certification is the Microsoft Certified: DevOps Engineer Expert, which is validated through the AZ-400 exam. This exam has undergone significant changes to reflect the latest trends, tools, and methodologies in the DevOps world.… Read More »