Amazon AWS Certified Machine Learning Specialty – Modeling
1. Section Intro: Modeling
We’re about to dive into the most involved section of this course and the domain that carries the most weight on the exam modeling. This is where we finally do machine learning. After collecting, analyzing, and preparing our training data, deep learning has taken over the field of machine learning, and we’re going to lead off with a crash course in how deep neural networks work and some common flavors of them, such as convolutional and recurrent neural networks. Then we’ll get into the nuances of tuning these networks that aren’t often taught. How do we identify and address overfitting? How do we decide on the topology and depth of our networks? How do we optimize their performance? These real world concerns are tested heavily on the exam, and it catches a lot of people by surprise. But you’ll be ready for it. This is still an AWS exam, so you can expect to be tested on AWS’s own machine learning services in depth.
That means we’re going to talk a lot about AWS Sage Maker, the wide variety of built in algorithms it offers, and how to do automatic model tuning using Sage Maker. We’ll also cover AWS’s higherlevel services, including comprehend, translate, poly, transcribed, lex, deep Lens and more. Passing this exam also requires a lot of depth on how to evaluate the results of your training. When should you look at accuracy, precision recall, or F one scores? How are they computed and how are they interpreted? What’s a confusion matrix and a rock curve? We’ll dive into all of that and make sure you know it like the back of your hand. We’ll wrap up this section with a handson lab with a real convolutional neural network. We’ll evaluate its results and improve it using various regularization techniques, and experiment with the effects of different hyper parameters so you can get a real, real world feel of how they work. Ultimately, that’s what you’re being tested on for this certification. Let’s get started. There’s a lot to cover here.
2. Introduction to Deep Learning
So we can’t talk about machine learning without talking about deep learning. I mean, that is the latest hotness in the field. Now, you’re not going to need a whole lot of depth about the internals of deep learning itself for the exam, but we at least need to recognize the different types of neural networks that are out there and the best way to deploy them in AWS to learn from big data. So let’s dive in. Overall, it’s pretty amazing stuff. This whole field of artificial intelligence is is based on an understanding of how our own brains work. Over millions of years of evolution, nature has come up with a way to make us think. And if we just reverse engineer the way that our brains work, we can gain some insights into how to make machines that think within your brain, specifically within your cerebral cortex, which is where all of your thinking happens. You have a bunch of neurons. These are individual nerve cells, and they are connected to each other via axons and dendrites. You can think of these as connections, wires, if you will, that connect different axons together.
Now, an individual neuron will fire or send a signal to all the neurons that is connected to when enough of its input signals are activated. At the individual neuron level. It’s a very simple mechanism. You just have this neuron that has a bunch of input signals coming into it. And if enough of those input signals reach a certain threshold, it will, in turn fire off a set of signals to the neurons that it, in turn, is connected to as well. But when you start to have many, many, many of these neurons connected together in many, many different ways with different strengths between each connection, things get very complicated. This is a perfect example of emergent behavior. You have a very simple concept, a very simple model. But when you stack enough of them together, you can create very complex behavior that can yield learning behavior. This actually works. Not only does it work in your brain, it works in our computers as well. Now, think about the scale of your brain. You have billions of neurons, each of them with thousands of connections.
That’s what it takes to actually create a human mind. And this is a scale that we can still only dream about in the field of deep learning and artificial intelligence. But it’s the same basic concept. You just have a bunch of neurons with a bunch of connections that individually behave very simply. But once you get enough of them together, wired in enough complex ways, you can actually create very complex thoughts and maybe even consciousness. The plasticity of your brain is basically tuning where these connections go to and how strong each one is. And that’s where all the magic happens. Furthermore, if we look deeper into the biology of your brain, you can see that within your cortex, neurons seem to be arranged into stacks or cortical columns that process information in parallel. So for example, in your visual cortex, different areas of what you see might be getting processed in parallel by different columns or cortical columns of neurons.
Each one of these columns is in turn made of mini columns of around 100 neurons per mini column. Mini columns are then organized into larger hyper columns. And within your cortex, there are about 100 million of these mini columns. So again, they just add up really quickly. Coincidentally, this is a similar architecture to how the 3D video card in your computer works. It has a bunch of very simple, very small processing units that are responsible for computing how little groups of pixels on your screen are computed. It just so happens that that’s a very useful architecture for mimicking how your brain works. So it’s sort of a happy accident that research behind your favorite video games lent itself to the same technology that made artificial intelligence possible on a grand scale and at low cost. The same GPUs on the same video cards that you’re using to play your video games can also be used to perform deep learning and create artificial neural networks.
Think about how much better it would be if we actually made chips that were purpose built specifically for simulating artificial neural networks. Well, turns out some people are designing chips like that right now, and by the time you watch this, they might even be a reality. I think Google’s working on one right now. So how do deep neural networks work? Well, we’ve translated all those ideas that were inspired by the biology of your brain into artificial neurons. And to be honest, these days the research and development that’s happening in the field of AI has diverged from the biological basis. And we’re sort of improving on the artificial ones that we’ve made at this point. Now, an artificial neuron still works in the same way as a biological one. It just sums up weighted inputs from the layer below them and applies some sort of activation function to that weight and it passes the results up to the next layer. So you can think of this as just another machine learning model.
You input your feature data or attributes at the bottom of the neural networks and predicted labels come out of the top. It’s usually a classification of some sort. The network is trained using data with known labels that are probably onehot encoded, like we talked about in the feature engineering section. And during the training process, it figures out the ideal weights between each neuron to get the right answers. At the top, you see that those lines between each neuron that connect everything, there’s a lot of them. And every one of those lines has a weight associated with it. There’s also a bias term you see that’s added in as well. So the job of a deep neural network is to learn the appropriate weights and biases throughout this network to generate the classifications that you want at the end. Now, the details of how that training works isn’t terribly important for the exam, but you just need to know that it’s called deep learning, because there is more than one layer of neurons here. So if we just had one layer there, that would be a deep neural network. But when you have multiple layers, that’s where we talk about deep learning. Now, let’s talk about how we actually make these things reality and the frameworks you might use. Neural networks lend themselves very well to parallelization, so that’s a great thing. That means that we can use GPUs to do this, because individual neurons are simple enough to be modeled on a GPU. And GPUs are made to parallelize processing at massive scale, originally to generate all the pixels on your computer screen at once. But we can use that same technology to execute an entire neural network all at once.
So a GPU is capable of parallelizing a lot of neurons, and you can also have more than one GPU on one computing node, and you can have many nodes in a cluster. So you could really scale this up infinitely and build something massive, maybe even brainscale someday. However, we need some sort of a framework to use this from a programming standpoint and actually define the networks that we want to set up before they sent out to training on your GPUs. A very popular choice is TensorFlow, which is made by Google, and it also incorporates a higher level API called Keras that we’ll look at here. This is actually a snippet of real Keras code here that’s setting up a neural network similar to the one that we just looked at. Basically, it has a dense layer of 64 neurons that has a input dimension of 20. So we’re feeding in 20 input neurons that might be a one hot encoded category of 20 different categories. For example, writing a dropout layer for regularization.
We’ll talk about that later. There’s another layer of neurons in the middle there of 64 with an activation function called ReLU, which we’ll also talk about another dropout layer. And finally, an output layer that has ten output classification neurons at the end using softmax to choose one of them. We also define the optimizer function here, in this case called SGD, and we compile the model, and at that point we can train it and make predictions from it. Now, in addition to TensorFlow and Keras is also something called Apache MXNet. That’s a very similar thing. It does pretty much the same stuff, but it’s not made by Google, it’s made by Apache. And I don’t know, maybe that has something to do with why Amazon tends to gravitate toward MXNet more than TensorFlow. You’ll find that AWS will support both. But for most of Amazon’s own deep learning products, they tend to be built on top of MXNet, which is basically an alternative to TensorFlow. There are three main types of neural networks out there in the wild. One is a basic feed forward neural network, which is one that we just looked at, where you just have a bunch of layers of neurons on top of each other, where you feed in your features at the bottom, and output predictions of classifications come out at the top. So that’s the most straightforward kind.
There are also convolutional neural networks, or CNNs for short. Those are commonly used for image classification because they’re built to deal with two dimensional data. So if you need to figure out there’s a stop sign in an image, a CNN is probably what you’ll use for that. There are also recurrent neural networks called RNNs, and these generally are made for dealing with sequences of some sort, sequences in time, perhaps, like making predictions of stock prices over time, or just things that have some order to them. So for example, you might use an RNN for machine translation. If you want to understand the words in a sentence and the order of those words matter. An RNN might be a way of capturing those relationships and predicting how to complete a sentence or how to translate a sentence from one language to another. Couple of key acronyms in that space are LSTM, that stands for Long Short Term Memory, and GRU, a gated recurrent unit. Those are basically the different flavors of RNNs that work really well. So if you see those terms LSTM or GRU, remember those are just types of RNNs. Let’s dive into more detail on CNNs and RNNs next.
Interesting posts
The Growing Demand for IT Certifications in the Fintech Industry
The fintech industry is experiencing an unprecedented boom, driven by the relentless pace of technological innovation and the increasing integration of financial services with digital platforms. As the lines between finance and technology blur, the need for highly skilled professionals who can navigate both worlds is greater than ever. One of the most effective ways… Read More »
CompTIA Security+ vs. CEH: Entry-Level Cybersecurity Certifications Compared
In today’s digital world, cybersecurity is no longer just a technical concern; it’s a critical business priority. With cyber threats evolving rapidly, organizations of all sizes are seeking skilled professionals to protect their digital assets. For those looking to break into the cybersecurity field, earning a certification is a great way to validate your skills… Read More »
The Evolving Role of ITIL: What’s New in ITIL 4 Managing Professional Transition Exam?
If you’ve been in the IT service management (ITSM) world for a while, you’ve probably heard of ITIL – the framework that’s been guiding IT professionals in delivering high-quality services for decades. The Information Technology Infrastructure Library (ITIL) has evolved significantly over the years, and its latest iteration, ITIL 4, marks a substantial shift in… Read More »
SASE and Zero Trust: How New Security Architectures are Shaping Cisco’s CyberOps Certification
As cybersecurity threats become increasingly sophisticated and pervasive, traditional security models are proving inadequate for today’s complex digital environments. To address these challenges, modern security frameworks such as SASE (Secure Access Service Edge) and Zero Trust are revolutionizing how organizations protect their networks and data. Recognizing the shift towards these advanced security architectures, Cisco has… Read More »
CompTIA’s CASP+ (CAS-004) Gets Tougher: What’s New in Advanced Security Practitioner Certification?
The cybersecurity landscape is constantly evolving, and with it, the certifications that validate the expertise of security professionals must adapt to address new challenges and technologies. CompTIA’s CASP+ (CompTIA Advanced Security Practitioner) certification has long been a hallmark of advanced knowledge in cybersecurity, distinguishing those who are capable of designing, implementing, and managing enterprise-level security… Read More »
Azure DevOps Engineer Expert Certification: What’s Changed in the New AZ-400 Exam Blueprint?
The cloud landscape is evolving at a breakneck pace, and with it, the certifications that validate an IT professional’s skills. One such certification is the Microsoft Certified: DevOps Engineer Expert, which is validated through the AZ-400 exam. This exam has undergone significant changes to reflect the latest trends, tools, and methodologies in the DevOps world.… Read More »