Amazon AWS Certified Machine Learning Specialty – Modeling Part 2

  • By
  • January 25, 2023
0 Comment

3. Convolutional Neural Networks

Let’s dive into more depth with CNNs first. Usually you hear about CNNs in the context of image analysis. Their whole point is to find things in your data that might not be exactly where you expected them to be. Technically we call this feature location invariant. That means that if you’re looking for some pattern or some feature in your data, but you don’t know where exactly it might be in your data, a CNN can scan your data and find those patterns for you wherever they might be. So for example, in this picture here, that stop sign could be anywhere in the image. But a CNN is able to find that stop sign no matter where it might be. Now, it’s not just limited to image analysis, it can also be used for any sort of problem where you don’t know where the features you have might be located within your data machine. Translation and natural language processing comes to mind for that. You don’t necessarily know where the noun or the verb or the phrase you care about might be in some paragraph or sentence you’re analyzing, but a CNN can find it and pick it out for you. Sentiment analysis might be another application of CNN.

So you might not know exactly where a phrase might be that indicates some happy sentiment or some frustrated sentiment or whatever you might be looking for. But CNNs can scan your data and pluck it out and you’ll see that the idea behind it isn’t really as complicated as it sounds. This is another example of people using fancy words to make things sound more complicated than they really are. I should point out that for using CNN for things like language and sentiment analysis, there are variances of CNN’s that are more appropriate like that we call them like with attention. So in those specific kinds of problems, it doesn’t matter where those features are and you actually need to keep track of it. But anyway, that’s a detail that’s not that important. So how do CNNs work? Well, CNN’s convolutional neural networks, again are inspired by the biology of your visual cortex. They take cues from how your brain actually processes images from your retina. And it’s another fascinating example of emergent behavior. The way your eyes work is that individual groups of neurons service a specific part of your field of vision. We call these local receptive fields.

They are just groups of neurons responding to a part of what your eyes see. They subsample the image coming in from your retinas and they have specialized groups of neurons for processing specific parts of the field of view that you see with your eyes. Now, these little areas from each local receptive field overlap each other to cover your entire visual field. And this is called convolution. Convolution is just a fancy way of saying I’m going to break up this data into little chunks and process those chunks individually. And then the system assembles a bigger picture of what you’re seeing higher up in the chain. The way it works within your brain is that you have many layers, just like a deep neural network that identifies various complexities of features, if you will. So the first layer that you go into from your convolutional neural network inside your head might just identify horizontal lines or lines at different angles or specific kinds of edges. We call these filters, and they feed into a layer above them that would then assemble those lines that it identified at the lower level into shapes. Maybe there’s a layer above that that would be able to recognize objects based on the patterns of shapes that you see.

So we have this hierarchy that detects lines and edges and then shapes from the lines and then objects from the shapes. If you’re dealing with color images, we have to multiply everything by three because you actually have specialized cells within your retina for detecting red, green and blue light. These are processed individually and assembled together later. So that’s all a CNN is. It’s just taking a source image or a source data of any sort, really breaking it up into little chunks called convolutions. And then we assemble those and look for patterns and increasingly higher complexities at higher levels in your neural network. So how does your brain know that you’re looking at a stop sign there? Well, let’s talk about this in more colloquial language. Like we said, you have individual local receptive fields that are responsible for processing specific parts of what you see. And those local receptive fields are scanning your image and they overlap with each other, looking for edges. You might notice that your brain is very sensitive to contrast and edges that it sees in the world. Those tend to catch your attention, right? That’s why the letters on this slide catch your attention, because there’s high contrast between the letters and the white background behind them. So at a very low level, you’re picking up the edges of that stop sign and the edges of the letters on the stop sign.

Now, a higher level might take those edges and recognize the shape of that stop sign. That layer says, oh, there’s an octagon there, that means something special to me. Or those letters form the word stop. That means something special to me too. And ultimately, that will get matched against whatever classification pattern your brain has of a stop sign. So no matter which receptive field picked up that stop sign at some layer, it will be recognized as a stop sign. And furthermore, because you’re processing data in color, you can also use the information that the stop sign is red and further use that to aid in the classification of what this object really is. So somewhere in your head there’s a neural network that says, hey, if I see edges arranged in an octagon pattern that has a lot of red in it, and says stop in the middle. That means I should probably hit the brakes on my car. And at some even higher level where your brain is doing higher reasoning.

That’s what happened. There’s a pattern that says, hey, there’s a stop sign coming up here. I better hit the brakes in my car. And if you’ve been driving long enough, you don’t even really think about it anymore, do you? It feels like it’s hardwired, and that literally may be the case anyway. A convolutional neural network and an artificial convolutional neural network works the same way. It’s the same exact idea. So how do you build a CNN with Keras and TensorFlow? Well, obviously you probably don’t want to do this at a very low level, even though you could. CNNs can get pretty complicated, so higher level APIs such as Keras become essential. First of all, you need to make sure that your source data is of the appropriate dimensions and the appropriate shape. You’re going to be preserving the actual 2D structure of an image if you’re dealing with image data here.

So the shape of your data might be the width times the length times, the number of color channels. If it’s a black and white image, there’s only one color channel that indicates some grayscale value between black and white at every point in the image. We can do that with a single value at each point. But if it’s a color image, you’d have three color channels, one for red, one for green, and one for blue, because you can create any color by combining red, green and blue together. There are some specialized types of layers and Keras that you can use when you’re dealing with convolutional neural networks. For example, there’s the Conv 2D layer type that does the actual convolution on a 2D image. And again, convolution is just breaking up that image into little subfields that overlap each other.

For individual processing. There’s also a Conv one D and a Comf 3D layer available as well. You don’t have to use CNNs with images, like we said. It can also be used with text data, for example, and that might be an example of one dimensional data. The Comf 3D layer is available as well if you’re dealing with 3D volumetric data of some sort. So there’s a lot of possibilities there. Another specialized layer in Keras for CNNs is max pooling two D, and there’s a variant of that as well. The idea of that is just to reduce the size of your data down. It just takes the maximum value seen in a given block of an image and reduces it to a layer down to those maximum values. It’s just a way of shrinking the images in such a way that it can reduce the processing load on the CNN.

As you can see, processing a CNN is a very computing intensive operation, and the more you can do to reduce the work you have to do the better. So if you have more data in your image than you need, a max pooling 2D layer can be useful for distilling that down to the bare essence of what you need to analyze. Finally, at some point, you need to feed this data into a flat layer of neurons. At some point, it’s just going to go into a perceptron. And at this stage, we just need to flatten that 2D layer into a 1D layer so we can just pass it into a layer of neurons. From that point, it just looks like any other feedforward neural network or a multilayer perceptron. So the magic of CNN’s really happens at a lower level. It ultimately gets converted into what looks like the same types of multilayer perceptrons that we’ve been using before.

The magic happens in actually processing your data, convolving it, and reducing it down to something that’s manageable. So, typical usage of image processing with a CNN would look like this. You might start with a conv 2D layer that does the actual convolution of your image data. You might follow that up with a max pooling 2D layer on top of that, that distills that image down, just shrinking the amount of data that you have to deal with. You might then do a dropout layer on top of that, which just prevents overfitting. And at that point, you might apply a flattened layer to actually be able to feed that data into a perceptron. That’s where a dense layer might come into play. A dense layer in Keras is just a perceptron, really. It’s a hidden layer of neurons. And from there, we might do another dropout pass to further prevent overfitting and finally do a soft max to choose the final classification that comes out of your neural network. As I said, CNN’s are very computationally intensive. They are very heavy on your CPU, your GPU, and your memory requirements. Shuffling all that data around and convolving it adds up really, really fast. And beyond that, there’s a lot of what we call hyper parameters, a lot of different knobs and dials. So you can adjust your CNN.

So in addition to the usual stuff, you can tune the topology of your neural network, or what optimizer you use, or what loss function you use, or what activation function you use. There are also choices to make about the kernel sizes. That’s the area that you actually can evolve across. How many layers do you have? How many units do you have? How much pooling do you do when you reduce the image down? There’s a lot of variance here. There’s almost an infinite amount of possibilities for configuring a CNN. But often just obtaining the data to train your CNN is the hardest part. So, for example, if you own a Tesla that’s actually taking pictures of the world around you and the road around you, and all the street signs and traffic lights that as you drive every night, it sends all those images back to some data server somewhere. So Tesla can actually run training on its own neural networks based on that data. So if you slam on the brakes while you’re driving a Tesla at night, that information is going to be fed into a big data center somewhere, and Tesla is going to crunch on that and say, hey, is there a pattern here to be learned from what I saw from the cameras, from the car?

That means you should slam on the brakes. When you think about the scope of that problem, just the sheer magnitude in processing and obtaining and analyzing of all that data, that becomes very challenging in and of itself. Now, fortunately, the problem of tuning the parameters doesn’t have to be as hard as I described it to be. There are specialized architectures of convolutional neural networks that do some of that work for you. A lot of research goes into trying to find the optimal topologies and parameters for a CNN for a given type of problem. And you can think of this as like a library you can draw from. So for example, there’s the Lenette Five architecture that you can use that’s suitable for handwriting recognition. In particular. There’s also one called AlexNet, which is appropriate for image classification. It’s a deeper neural network than Lennet. In the example we talked about on the previous slides, we only had a single hidden layer, but you can have as many as you want.

It’s just a matter of how much computational power you have available. There’s also something called Google Lenette, and you can probably guess who came up with that one. It’s even deeper, but it has better performance because it introduces a concept called Inception Modules. Inception Modules group convolution layers together, and that’s a useful optimization for how it all works. Finally, the most sophisticated one today is called ResNet. That stands for Residual Network, and it’s an even deeper neural network, but it maintains performance by what’s called skip connections. It has special connections between the layers of the perceptron to further accelerate things. So it builds upon the fundamental architecture of a neural network to optimize its performance. As you’ll see, CNN’s can be very demanding on performance. ResNet comes up a lot in the world of AWS. Typically variations of ResNet like ResNet 50 specifically come up in the world of image classification algorithms within, say, Maker and other spots.

4. Recurrent Neural Networks

Now let’s go into some more depth on recurrent neural networks and what they’re all about. So what’s an RNN for? Well, a couple of things. Basically, they’re for sequences of data that might be a sequence in time. So you might use it for processing time series data where you’re trying to look at a sequence of data points over time and predict the future behavior of something over time. In turn, RNNs are fundamentally for sequential data of some sort. Some examples of time series data might be web logs where you’re receiving different hits to your website over time, or sensor logs where you’re getting different inputs from sensors from the Internet of things. Or maybe you’re trying to predict stock behavior by looking at historical stock trading information. These are all potential applications for recurrent neural networks because they can look at the behavior over time and try to take that behavior into account when it makes future projections.

Another example might be if you’re trying to develop a self driving car, you might have a history of where your car has been its past trajectories, and maybe that can inform how your car might want to turn in the future. So you might take into account the fact that your car has been turning along a curve to predict that. Perhaps it should continue to drive along a curve until the road straightens out. The sequence doesn’t have to just be in time. It can be any kind of sequence of arbitrary length. Something else that comes to mind are languages. Sentences are just sequence of words, right? So you can apply RNNs to language or a machine translation or producing captions for videos or images. These are examples of where the order of words in a sentence might matter and the structure of the sentence and how these words are put together could convey more meaning than you could get by just looking at those words individually without context. So again, an RNN could make use of that ordering of the words and try to use that as part of its model. Another interesting application of RNNs is machine generated music. You can also think of music sort of like text, where instead of a sequence of words or letters, you have a sequence of musical notes. You can actually build a neural network that can take an existing piece of music and extend upon it, using a recurring neural network to try to learn the patterns that were aesthetically pleasing from the music in the past. Conceptually, this is what a single recurrent neuron looks like in terms of a model. It looks a lot like an artificial neuron that we’ve looked at before. The big difference is that this little loop here is going around it. As we run a training step on this neuron, some training data gets fed into it.

Or maybe this is an input from a previous layer in our neural network. It will apply some sort of step function after summing all the inputs into it. In this case, we’re going to be using something more like a hyperbolic tangent because mathematically we want to be sure that we preserve some of the information over time in a smooth manner. Now, usually we would just output the result of that summation and that activation function as the output of this neuron. But we’re also going to feed that back into the same neuron. So the next time we run some data through this neuron, that data from the previous run also gets summed into the results as an extra input.

So as we keep running this thing over and over again, we’ll have some new data coming in that gets blended together with the output from the previous run through this neuron and it just keeps happening over and over and over again. So you can see that over time, the past behavior of this neuron influences its future behavior and influences how it learns. Another way of thinking about this is by unrolling it in time. So what this diagram shows is the same single neuron just at three different time steps. When you start to dig into the mathematics of how RNNs work, this is a more useful way of thinking about it. So if we consider this to be time step zero at the left, you can see there’s some sort of data input coming into this recurrent neuron and that will produce some sort of output.

After going through its activation function, that output also gets fed into the next time step. So if this is time step one in the middle with this same neuron, you can see that this neuron is receiving not only a new input, but also the output from the previous time step and those get summed together. This activation function gets applied to it and that gets output as well. The output of that combination then gets fed into the next time step, called this time step two, where a new input for time step two gets fed into this neuron and the output from the previous step also gets fed in. They get summed together, the activation function is run and we have a new output.

This is called a memory cell because it does maintain memory of its previous outputs over time. And you can see that even though it’s getting summed together at each time step over time, those earlier behaviors kind of get diluted. So we’re adding in time step zero to time step one and then the sum of those two things end up working into time step two. So one property of memory cells is that more recent behavior tends to have more of an influence on the current time step. This can be a problem in some applications, but there are ways to work against that that we can talk about later. Stepping this up, you can have a layer of recurrent neurons. In this diagram. We are looking at four individual recurrent neurons that are working together as part of a layer.

You can have some input going into this layer as a whole that gets fed into these four different recurring neurons and then the output of those neurons can get fed back into the next step to every neuron in that layer. So all we’re doing is scaling this out horizontally. So instead of a single recurrent neuron, we have a layer of four recurrent neurons where all the output of those neurons is feeding into the behavior of those neurons. In the next learning step, you can scale this out to have more than one neuron and learn more complicated patterns. As a result, RNNs open up a wide range of possibilities because now we have the ability to deal not just with vectors of information or static snapshots of some sort of a state. We can also deal with sequences of data as well. There are four different combinations here that an RNN can deal with. We can deal with sequence to sequence neural networks. If our input is a time series or some sort of sequence of data we can also have an output that is a time series or some sequence of data as well. So if you’re trying to predict stock prices in the future based on historical trades that might be an example of a sequence to sequence topology. We can also mix and match sequences with the older vector static states that we predicted just using multilayer perceptron’s.

We would call that sequence to vector. If we were starting with a sequence of data, we could produce just a snapshot of some state as a result of analyzing that sequence. An example might be looking at the sequence of words in a sentence to produce some idea of the sentiment that sentence conveys. You can go the other way around too. You can go from a vector to a sequence. An example of that might be taking an image, which is a static vector of information and then producing a sequence from that vector. For example, words in a sentence, creating a caption created from an image. We can chain these things together in interesting ways too. We can have encoders and decoders built up that feed into each other. For example, we might start with a sequence of information from a sentence of some language, embody what that sentence means as some sort of a vector representation and then turn that around into a new sequence of words in some other language.

That might be how a machine translation system could work. You might start with a sequence of words in French, build up what we call an embedding layer, just a vector that embodies the meaning of that sentence and then produce a new sequence of words in English or whatever language you want. That’s an example of using a recital neural network for a machine translation. There are lots of exciting possibilities here. Training RNNs just like CNNs is hard, and in some ways, it’s even harder. The main twist here is that we need to back propagate not only through the neural network itself and all of its layers while we’re training, but also through time. From a practical standpoint, every one of those time steps ends up looking like another layer in our neural network while we’re trying to train it. And those time steps can add up fast. Over time, we end up with a deeper and deeper neural network that we need to train, and the cost of actually performing gradient descent on that increasingly deep neural network becomes increasingly large.

So to put an upper cap on that training time, often we limit the back propagation to a limited number of time steps. We call this truncated backpropagation through time. It’s something to keep in mind when you’re training in RNN. You not only need to back propagate through the neural network topology that you’ve created, you also need to back propagate through all the time steps that you’ve built up to that point. Now, we talked earlier about the fact that as you’re building up an RNN, the state from earlier time steps end up getting diluted over time because we just keep feeding in behavior from the previous time step in our render, the current step. This can be a problem if you have a system where older behavior does not matter less than newer behavior. For example, if you’re looking at words in a sentence, the words at the beginning of a sentence might be even more important than words toward the end. If you’re trying to learn the meaning of a sentence, there is no inherent relationship between where each word is and how important it might be. In many cases, that’s an example of where you might want to do something to counteract that effect. One way to do that is something called the LSTM cell. It stands for long Short Term Memory Cell.

The idea here is that it maintains separate ideas of both short term and long term states, and it does this in a fairly complex way. Now, fortunately, you don’t really need to understand the nitty gritty details of how this works. There’s an image of it here for you to look at if you’re curious, but the libraries that you use will implement this for you. The important thing to understand is that if you’re dealing with a sequence of data where you don’t want to give preferential treatment to more recent data, you probably want to use an LSTM cell instead of just using a straight up RNN. There’s also an optimization on top of LSTM cells called GRU cells. That stands for Gated Recurrent Unit. It’s just a simplification on LSTM cells that performs almost as well. So if you need to strike a compromise between performance in terms of how well your model works and performance in terms of how long it takes to train, it, a GRU cell might be a good choice. GRU cells are very popular in practice.

Training in RNN is really hard. If you thought CNNs were hard, wait till you see RNNs. They are very sensitive to the topologies you choose and the choice of hyper parameters. And since we have to simulate things over time and not just through the static topology of your network, they can be become extremely resource intensive if you make the wrong choices. You might have a recurrent neural network that does not converge at all. It might be completely useless even after you’ve run it for hours to see if it actually works. So again, it’s important to build upon previous research when you can try to find some sets of topologies and parameters that work well for similar problems to what you’re trying to do.

Comments
* The most recent comment are at the top

Interesting posts

The Growing Demand for IT Certifications in the Fintech Industry

The fintech industry is experiencing an unprecedented boom, driven by the relentless pace of technological innovation and the increasing integration of financial services with digital platforms. As the lines between finance and technology blur, the need for highly skilled professionals who can navigate both worlds is greater than ever. One of the most effective ways… Read More »

CompTIA Security+ vs. CEH: Entry-Level Cybersecurity Certifications Compared

In today’s digital world, cybersecurity is no longer just a technical concern; it’s a critical business priority. With cyber threats evolving rapidly, organizations of all sizes are seeking skilled professionals to protect their digital assets. For those looking to break into the cybersecurity field, earning a certification is a great way to validate your skills… Read More »

The Evolving Role of ITIL: What’s New in ITIL 4 Managing Professional Transition Exam?

If you’ve been in the IT service management (ITSM) world for a while, you’ve probably heard of ITIL – the framework that’s been guiding IT professionals in delivering high-quality services for decades. The Information Technology Infrastructure Library (ITIL) has evolved significantly over the years, and its latest iteration, ITIL 4, marks a substantial shift in… Read More »

SASE and Zero Trust: How New Security Architectures are Shaping Cisco’s CyberOps Certification

As cybersecurity threats become increasingly sophisticated and pervasive, traditional security models are proving inadequate for today’s complex digital environments. To address these challenges, modern security frameworks such as SASE (Secure Access Service Edge) and Zero Trust are revolutionizing how organizations protect their networks and data. Recognizing the shift towards these advanced security architectures, Cisco has… Read More »

CompTIA’s CASP+ (CAS-004) Gets Tougher: What’s New in Advanced Security Practitioner Certification?

The cybersecurity landscape is constantly evolving, and with it, the certifications that validate the expertise of security professionals must adapt to address new challenges and technologies. CompTIA’s CASP+ (CompTIA Advanced Security Practitioner) certification has long been a hallmark of advanced knowledge in cybersecurity, distinguishing those who are capable of designing, implementing, and managing enterprise-level security… Read More »

Azure DevOps Engineer Expert Certification: What’s Changed in the New AZ-400 Exam Blueprint?

The cloud landscape is evolving at a breakneck pace, and with it, the certifications that validate an IT professional’s skills. One such certification is the Microsoft Certified: DevOps Engineer Expert, which is validated through the AZ-400 exam. This exam has undergone significant changes to reflect the latest trends, tools, and methodologies in the DevOps world.… Read More »

img