Amazon AWS Certified Machine Learning Specialty – Modeling Part 11
30. IP Insights in SageMaker
And let’s cover the IP insights algorithm in Sage maker. IP Insights is all about finding fishy behavior in your web logs. So it’s an unsupervised technique that learns the usage patterns of specific IP addresses and it automatically identifies suspicious behavior from given IP addresses. So it can identify login attempts from anomalous IP addresses, it can identify accounts that are creating resources from anomalous IP addresses. So basically it’s used as a security tool as a way of analyzing your web blogs for suspicious behavior that might cause you to flag something or maybe shut down a session. It can take in user names and account IDs directly, so you don’t really need to pre process your data a whole lot. It has a training channel, obviously, but since it is unsupervised, the validation channel is optional.
If you want, you can use that to compute an area under the curve score which we talked about earlier. Remember, the input has to be CSV data and it’s a very simple CSV file. It’s just entity and IP address. So that entity again can be a username or an account ID or whatever Identifier you use, followed by the IP address associated with that entity. That’s it. Under the hood. It’s using a neural network to learn latent vector representations of entities and IP addresses. So it’s doing some pretty fancy modeling there to try to learn what specific IP addresses do. Those entities are hashed and embedded, so it has an embedding layer to try to organize those IP addresses together. You need a sufficiently large hash size for this to work, so that’s going to end up being one of our important hyper parameters and it will automatically generate negative samples during training by randomly pairing entities and IPS.
So that’s kind of a neat little twist to the algorithm there. This is a case where we have an unbalanced data set, right? So it’s kind of like fraud detection where the vast majority of transactions are not going to be anomalous. So it actually generates anomalous examples by just randomly pairing together entities and IP addresses. And those are random fishy things that probably are anomalous. So kind of a neat idea there. The important hyper parameters, the number of entity vectors, the hash size, that’s all that is that we talked about earlier. They recommend that you set this to twice the number of unique entity Identifiers. So a little bit of a manual step there. Also the size of the embedding vectors is given by vector dim. Something else you might want to tune.
Too large of result there could result in overfitting, so something to be careful of there. And since it is a neural network under the hood, we have the usual suspects for tuning neural networks the number of training epochs, the learning rate, and the batch size. You can use a CPU or GPU, but since it’s a neural network, GPUs are recommended if you can p three, two x large or higher is recommended and it can use multiple GPUs on one machine. The size of a CPU instance would depend on the parameters that you chose if you’re going to go with CPUs instead. So again main thing with IP insights is used to identify anomalous behavior from IP addresses using a neural network at work. That’s the main takeaway.
31. Reinforcement Learning in SageMaker
Now let’s dive into the world of reinforcement learning in Sage Maker. And while I could frame this as just yet another builtin algorithm of Sage Maker, it’s really its own entirely different beast. So let’s go into a little bit more depth on this one. Reinforcement learning isn’t like the other algorithms in Sage Maker. You don’t, like, train a model and then deploy a model to make inferences for classifications or regressions. It’s more about learning about some virtual environment and how to navigate that environment in an optimal manner as you encounter different states within that environment. So the example I’m going to use here is an AI driven Pacman. Hopefully you’re familiar with the old game Pacman. If you grew up in the 80s, I’m sure you did. But the idea is that you have some sort of an agent. In this case, the agent is Pacman, and he’s exploring some space.
In this case, that space is the game board of Pacman itself. And as it goes as it goes through this space, it learns the values, the rewards associated with different state changes and different conditions. So, for example, if I turn left, what happens? I’m going to hit a wall. That’s not good. If I go right a little bit there, I might hit a power pill. That’s probably a good thing made as a reward associated with that. But if I go down, I’m going to run into a ghost and die. So that would be a very negative reward in that case. So it just learns for a given position within this environment and a given set of things around me, what’s the best thing to do? And it just does this by randomly exploring the space over time and building up this model of how to navigate this thing most efficiently.
Once it’s been trained, once it’s explored this entire space and learned about it, it’s very quick for it to be deployed and actually run in practice because it has a very fast lookup table of, okay, I’m in this spot. This is a state around me. Here’s what I should do. So the online performance is very fast once you’ve actually gone through the work of exploring this space and training it. And although you do see this a lot in the world of games, and you hear a lot of press about AI winning different types of games using reinforcement learning, because it’s a fun example. You can pit a machine against a man here and watch the machine win. But it also has some more practical examples. For example, supply chain management, HVAC systems, industrial robotics, dialogue systems, and autonomous vehicles. Even you can think of those as just an agent in a giant environment of the world, if you will. So that’s what reinforcement learning is all about.
Let’s dive into more of the mathematical notation around it. So a very specific implementation of reinforcement learning is called Q learning. It just formalizes what we talked about a little bit more. So, again, we start with a set of environmental states. We’ll call that s, and possible states are the surrounding conditions of the agent. So is there a ghost next to me? Is there a power pill in front of me? Things like that. Those are states, and I have a set of possible actions that I can take in those states. We’ll call that set of actions A. And in the case of Pacman, those possible actions will be things like move up, down, left, or right. Finally, we’ll have a value for each state action pair, and we’ll call that value Q. That’s why we call it QLearning. So, for each state, for a given state of conditions surrounding Pacman, a given action will have a value Q. So moving up might have a given value of Q. Moving down would have a negative Q value if it means encountering a ghost. So we start off with a Q value of zero for every possible state that Pacman can be in.
And as Pacman explores the maze, as bad things happen to Pacman, we reduce the queue value for the state that Pacman was in at that time. So if Pacman ends up getting eaten by a ghost, we penalize whatever he did in that current state. And as good things happen to Pac Man as he eats a power pill or eats a ghost, we’ll increase the queue value for that action for the state that he was in. And then what we can do is use those Q values to inform Pacman’s future choices. And we built a little intelligent agent that can perform optimally and make a perfect little Pacman. So, getting back to a real example here, some state actions here.
We can define the current state of Pacman by the fact that he has a wall to the west and a space to the north and east, a ghost to the south. And we can look at the actions he can take. He can’t actually move left at all, but he can move up, down, or right. And we can assign a value to all of those actions. So by going up or right, nothing really happens at all. There’s no power pill or dots to consume. But if he goes left, that’s definitely a negative value. So we can say for the state given by the current conditions that Pacman is surrounded by here, moving down would be a really bad choice. There should be a negative Q value for that. Moving left just can’t be done at all. That would have basically an infinitely negative Q value. And moving up or right are just neutral. So the Q value would remain zero for those action choices for that given state. Now, you can also look ahead a little bit more to make an even more intelligent agent. So I’m actually two steps away from getting a power pill here. So if Pacman were to explore this state. If I were to hit the case of eating that power pill on the next state, I could actually factor that into the Q value for the previous state.
And if you just have some sort of a discount factor based on how far away you are in time, how many steps away you are, you can factor that all in together. So that’s actually a way of building in a little bit of memory into the system. So the Q value that I experienced when I consume that power pill might actually give a boost to the previous Q values that I encountered along the way. It’s a way of kind of propagating that value back in time and giving a little bit of a boost to the actions that led to this positive cue value later on.
So one problem that we have in reinforcement learning is the exploration problem. How do I make sure that I efficiently cover all of the different states and actions within those states during the exploration phase, or the training phase, if you will? So a sort of naive approach is to always choose the action for a given state with the highest cue value that I’ve computed so far. And if there’s a tie, just choose one at random. So initially all of my Q values might be zero and I’ll just pick actions at random at first. And as I start to gain information about better of Q values for a given action instead of given states, I’ll start to use those as I go. But that ends up being pretty inefficient and I can actually miss a lot of paths that way if I just tie myself into this rigid algorithm of always choosing the best Q value that I’ve computed so far. So a better way of doing exploration is to introduce a little random variation into my actions as I’m exploring. We call that an epsilon term. So we have some value where I roll the dice and I have a random number. And if it ends up being less than this epsilon value, I don’t actually follow the highest Q value. I don’t do the thing that makes sense.
Instead, I just take a path at random to try it out and see what happens. And that actually lets me explore a much wider range of possibilities. It just lets us cover a much wider range of actions in states than we could otherwise. So what we just did can be described in very fancy mathematical terms, but conceptually it’s still pretty simple. I explore some set of actions that I can take for a given set of states. I use that to inform the rewards associated with the given action for a given set of states. And after that exploration is done, I can use that information, those Q values, to intelligently navigate through an entirely new maze. But this can also be called a Markov decision process. So again, a lot of data science is just assigning fancy, intimidating names to simple concepts. And there’s a ton of that in the world of reinforcement learning.
So if you look up the definition of Markov decision processes, it is a mathematical framework for modeling decision making. Decision making, like what action did we take given a set of possibilities for a given state in situations where the outcomes are partly random? Well, that kind of sounds like the random exploration that we just talked about and partly under the control of a decision maker. That decision maker is our Q values that we computed. So MDPs Markov decision processes are just a fancy way of describing our exploration algorithm that we just described for reinforcement learning. And the notation is even similar. States are still described as S, and S prime is the next state that we encounter. We have state transition functions that are just defined as P sub A for a given state to have S and S prime. And we have our Q values, which are basically represented as a reward function. So our sub A value for a given S and S prime is basically the same thing as our Q value. Moving from one state to another has a given reward associated with it. And by moving from one state to another, that’s defined by a state transition function. So again, describing what we just did, just with a different mathematical notation and a fancier sounding word, markov decision processes. And if you want to sound even smarter, you can also call a Markov decision process by another name a discrete timesocastic control process. Holy cow, that sounds intelligent. But the concept itself is the same simple thing we just described. So to recap, you can make an intelligent pacman agent or anything else by just having it semi randomly explore different choices of movement given different conditions. Where those choices are actions, those conditions are states. We keep track of the reward or penalty associated with each action or state as we go. And we can actually propagate those rewards and penalties backwards multiple steps if we want to make it even better.
And then we store those Q values that we ended up associating with each state, and we can use that to inform its future choices. So we can go into a whole new maze and have a really smart pacman that can avoid the ghosts and eat them up pretty effectively all on its own. It’s a pretty simple concept, but it’s very powerful. And you can also say that you understand a bunch of fancy terms now because it’s all called the same thing. Cue learning, reinforcement learning, Markov decision processes, dynamic programming, it’s all tied up in the same concept. So, I mean, I think it’s pretty cool that you can actually make a sort of artificially intelligent pacman through such a simple technique. And it really does work. Let’s tie this back into Sage Maker now.
So Sage Maker offers an implementation of reinforcement learning that’s built on deep learning, and it uses TensorFlow and MXNet to do this. It also supports different toolkits. So when we talk about reinforcement learnings in Sage Maker, we have frameworks, those are TensorFlow or MXNet. We have toolkits which includes Intel, Coach and Ray RLlib, and we have environments, and it supports a wide variety of environments. They can be custom ones, open source, or commercial ones from MATLAB, Simulink Energy Plus, Roboschool, Pie, Bullet, Amazon, Sumerian, and AWS RoboMaker are examples that they give in the documentation. The other cool thing about Stage Makers reinforcement learning is that it can be distributed so that exploration, that training stage can be distributed amongst many machines. And you can also distribute the environment rollout as well. So you can just deploy that trained model where it learned all those Q values for different states, what actions to take in a distributed manner as well.
It can do both multicore and multiinstance. So you can take advantage of multiple cores on one PC and an entire fleet of multiple PCs as well. So to recap again, here are some of the key terms associated with reinforcement learning. The environment that’s the layout of the board or the maze or whatever it is you’re working within the state would be where the player or pieces are. Like, where exactly is our agent right now? An action would be the things that agent can do, like moving in a given direction. And a reward is the value associated with the action from a given state. So we have a given state, what’s the reward associated with a given action from that state? Finally, observation, which would be the surroundings in Amaze or the state of a chessboard, basically, what’s the state of the environment right now? All right, so talking about hyper parameters, it’s a little bit weird with reinforcement learning because, again, it’s not a traditional machine learning model. We’re not doing old school trained chest here. It’s a little bit different. So the parameters that you might want to optimize are probably going to be very specific to your specific implementation.
So reinforcement learning in Sage Maker just allows you to abstract away whatever hyper parameters that you want. Internally, there’s nothing built in, but if there are things that you want to expose to be tuned, you can do that. And then you can use Sage Maker’s hyper parameter tuning capabilities to optimize them automatically. So no set list of hyper parameters with reinforcement learning, but you can make your own if you want to. And again, due to the general nature of reinforcement learning, there’s not a lot of specific guidance for what instance types to use for it with Sage Maker. But, you know, keep in mind, it is built on deep learning frameworks like TensorFlow and MXNet. So a GPU is probably going to be helpful. And we do know that it supports multiple instances and multiple cores. So you can have more than one machine. Even if you’re going with CPUs.
Interesting posts
The Growing Demand for IT Certifications in the Fintech Industry
The fintech industry is experiencing an unprecedented boom, driven by the relentless pace of technological innovation and the increasing integration of financial services with digital platforms. As the lines between finance and technology blur, the need for highly skilled professionals who can navigate both worlds is greater than ever. One of the most effective ways… Read More »
CompTIA Security+ vs. CEH: Entry-Level Cybersecurity Certifications Compared
In today’s digital world, cybersecurity is no longer just a technical concern; it’s a critical business priority. With cyber threats evolving rapidly, organizations of all sizes are seeking skilled professionals to protect their digital assets. For those looking to break into the cybersecurity field, earning a certification is a great way to validate your skills… Read More »
The Evolving Role of ITIL: What’s New in ITIL 4 Managing Professional Transition Exam?
If you’ve been in the IT service management (ITSM) world for a while, you’ve probably heard of ITIL – the framework that’s been guiding IT professionals in delivering high-quality services for decades. The Information Technology Infrastructure Library (ITIL) has evolved significantly over the years, and its latest iteration, ITIL 4, marks a substantial shift in… Read More »
SASE and Zero Trust: How New Security Architectures are Shaping Cisco’s CyberOps Certification
As cybersecurity threats become increasingly sophisticated and pervasive, traditional security models are proving inadequate for today’s complex digital environments. To address these challenges, modern security frameworks such as SASE (Secure Access Service Edge) and Zero Trust are revolutionizing how organizations protect their networks and data. Recognizing the shift towards these advanced security architectures, Cisco has… Read More »
CompTIA’s CASP+ (CAS-004) Gets Tougher: What’s New in Advanced Security Practitioner Certification?
The cybersecurity landscape is constantly evolving, and with it, the certifications that validate the expertise of security professionals must adapt to address new challenges and technologies. CompTIA’s CASP+ (CompTIA Advanced Security Practitioner) certification has long been a hallmark of advanced knowledge in cybersecurity, distinguishing those who are capable of designing, implementing, and managing enterprise-level security… Read More »
Azure DevOps Engineer Expert Certification: What’s Changed in the New AZ-400 Exam Blueprint?
The cloud landscape is evolving at a breakneck pace, and with it, the certifications that validate an IT professional’s skills. One such certification is the Microsoft Certified: DevOps Engineer Expert, which is validated through the AZ-400 exam. This exam has undergone significant changes to reflect the latest trends, tools, and methodologies in the DevOps world.… Read More »