Amazon AWS Certified Advanced Networking Specialty – Networking & AWS Primer Part 7
19. Lambda@Edge Practical Demo
Hey everyone and welcome back. Now in the earlier video we were discussing about the lambda at the edge. Now we are discussing more into a theoretical perspective. However, with theory little becomes difficult to see what exactly origin request is or how exactly it looks in a practical manner. So what I will do, I will quickly give you a demo on how exactly lambda edge would look like if you go ahead and you configure within your AWS environment.
So lambda edge integrates with the cloud front. Now what I have, I have a sample cloud front distribution over here and I have a simple lambda function. Basically, this is the lambda function that we use at lambda the edge. Now within this cloud front distribution if you look into the distribution you go to behaviors and currently there is one behavior. Now let me quickly do an edit over here. So here you have viewer protocol policy and various others. But if you look down at the lambda function association there’s a cloud front event.
So here you have various types viewer request, viewer response, origin request, origin response. Now depending upon the use case that your lambda at the edge function would be doing, you will have to place it at one of these locations. All right? So in my case, this specific function is placed at the origin request over here. Now if you look into my lambda function over here so this is my function and if you go a bit down, this is the code which is used at the lambda at the edge function. Now along with that, let me also show you the s three bucket. Now if you look into the CloudFront distribution here it is basically let’s let’s come out of this.
Now if you look into the origin it has the origin of test seven website s three bucket. Now within the s three console I do have a bucket called as s seven hyphen website. Now within this I have a directory called as experiment group and within this there are two image files which are present over here. Now if I directly open up the CloudFront distributions domain name let’s open this up you see the image automatically loads.
However, ideally if I do not have a lambda at the edge function over here now this redirect is done by the lambda at the edge function. If that function would have not been present, you would have to specify the entire Uri like you would have to specify experiment. Then you will have to specify control like control pixel, jpg, something like this. But since we have the lambda the edge function which is listening at the origin request, we do not really have to specify this uri.
This function takes care of that. Now if you look into this function, it basically says that if the experiment uri is not present, it basically gives an output of experiment cookie has not been found. It throws a dice where you have a random function and experiment Uri can either be path experiment A or path experiment B.
Now path experiment A is basically if you go a bit up, path experiment A is the control pixel GPG. It says experiment groupcontrolpixel GPG and if it is path experiment B, then it is experiment group treatment pixel Jpg. So generally whenever we load and the cookies not found, one of this image is automatically taken into consideration by the lambda at the edge function and our image will load accordingly. Now this lambda at the edge function, if you go into actions you do have the option to deploy at the rate lambda at the edge.
And here you can specify the distribution ID. You can also specify the cloud front event here like origin request, origin response, viewer request and viewer response. All right, this is one thing. Now the second thing that I wanted to show you typically if you work with lambda ADH is that whenever someone visits your page and you have the lambda ADH function ongoing. So this specific log over here, so currently you see it is outputting a console log.
So this output will be at the region where the request came from and where the edge location is. So currently I am in Mumbai and Mumbai does have an edge location of cloud front. So this specific log associated with the query that we made would be stored in the cloud watch logs in the Mumbai region. So let’s look into that as well. So let me open up the cloud watch. So within the cloud watch console I do have a log group of AWS lambda us east one edge.
So edge is basically the lambda function and within that you have if you look into the log here, it basically states that the experiment cookie has not been formed throwing the dice and it is automatically setting the request uri to experiment group control pixel jpg. So this uri has been set by the lambda at the edge function. If you remember, we never set this specific uri.
All we did was we made a request to the cloud front distribution and that’s about it. We never specified the uris. So once this specific request goes, it goes to the lambda function lambda functions checks if the uri is present or not. If uri is not present then it will generate one of the uri depending upon the mathematical function and it will automatically set this uri and you will get the output according to the uri which is set by the lambda at the edge function. So this is the high level overview on how exactly lambda at the edge might look like. I hope this video has been informative for you and I look forward to seeing you in the next video.
20. Elastic Map Reduce (EMR)
Hey everyone and welcome back to the Knowledge Pool video series. Now, in today’s lecture we will be speaking about Elastic Map Reduce. Now, this is a quite important service, specifically when it comes to the big data world. So before we understand more about what exactly EMR is all about, let’s spend some time in understanding what big data is all about. So big data is an aspect which is basically beyond a generic storage capacity and generic processing power.
So what do I mean by this? So by the term itself we can know that big data is some data which is going to be huge. And when it comes to huge amount of data, using a traditional log monitoring or log processing tools are not quite good enough. So for example, Elk, Elastic search Log stash, cabana so when you put terabytes of data and you try to process it, they will not work as good. So there was a need to develop a new technology which would specifically solve the big data use case.
So what do I mean when I talk about huge amount of data? So specifically when you talk about sensors or social networks or even online shopping websites. So these are some of the examples of big data. So one of the examples of big data is the Twitter feed. So if you want to count how many tweets are coming every second or every minute, so it will be huge. So typically those data will be in terabytes.
So the technology which can process that terabytes of data is basically completely different. And if you want to look into how exactly it would work, let’s take this simple example where you have a large click stream logging data which are basically in terabytes.
Now, how would you process it is you split the entire log file into multiple small pieces. So if you will see on the left hand side we have a large clickstream logging data. Now you split that data into multiple small chunks and then you process these small chunks together individually and then you aggregate the results and then you get a specific data. So let’s assume from all the Twitter feeds you want to get a feed of a specific user and you only have a huge log file which is typically in terabytes. So you split that log files into terabytes, you create individual servers and each server is responsible for processing a part of that log file and at the end all the servers will aggregate their result and will come to know on what an individual user did. And this is how basically the big data solutions really work. So data is broken down into smaller portions and each portion is handled by individual set of nodes and results are then aggregated. So I hope you understood what and how the technology related to big data is built. So basically this is one of the ways in which EMR or an Elastic Map reduce really works. So with this, let’s talk about EMR. EMR or elastic. Map reduce is a hosted version of Apache Hadoop Clustering software. So there is a really nice software called as Apache Hadoop. Let me just show you. So if you do apache hadoop So this is one of the big data software’s which is commonly used everywhere. And there are certain components of the Hadoop software which includes the MapReduce.
It also includes the HDFS file system. So let’s go back to our presentation and we can understand more about it. So, there are certain primary components of EMR. So, since EMR is a hosted version of Apache Hadoop, lot of things, in fact most of things will remain similar as far as the concepts are concerned. So there are three primary components that we have to remember of EMR cluster. And these primary components are master node, core node and task node. Now, basically when I talk about EMR is a cluster. So Cluster is basically a collection of EC two instance which has the Hadoop software installed. Now, so in this diagram you see an Amazon EMR cluster. Thanks to the AWS documentation, I was able to get the diagram instead of drawing the entire one.
So you have a master node and then on the left hand side you have a core node and on the right hand side you have the task node. Now, core node, if you will see it, has the HDFS file system. So this is where the data can be stored. So this is the HDFS file system. The task nodes are basically specifically for completing or processing a task. So when you look over here, when you divide a large click stream data into smaller chunks. Now, there needs to be some kind of a software or some servers which will process or which will analyze these small chunks of data. And these servers are called as the task nodes. Now, one important thing to remember is that even core nodes can perform tasks and that feature is optional. So core nodes can do tasks as well and they are assigned with the HDFS File system where the data can be stored. So these are the three components that we need to remember. Now, before we go ahead and understand more, let’s do one thing. Let’s go to the console and select the EMR service. Now, when you click on create cluster, let me just show you. So here you have to give the cluster name.
Now, if you go a bit down, you can select the version of EMR. So there are a lot of versions which are available depending upon your requirement. You can select a version, then comes the hardware configuration. Now, if you’ll see the number of instances are basically user defined. So if I put five, master node is compulsory, just remember that and core nodes can be changed. So if I move instances from three to five, you see what happened is master node is one and the core node has increased. So there has to be a master node which must be present. Now let’s go to the advanced configuration so that we can understand things in a much more better way. So if you’ll see over here I have one master node, I have one code node and I have one task node and each of these nodes are responsible for a certain operations.
So we already discussed master node is basically if you see over here, master node is responsible for coordinating and distributing of data and it will also look into the status whether all the other nodes are healthy or not. So master node is responsible to check if the core nodes or if the task nodes are healthy are running properly or not. So that is the responsibility of master node. So whenever you create an EMR cluster, a master node is something which should always be present. Second is the core node we already discussed Cornode can be used as a task node to perform certain processing on log file and cornered is where the HDFS file system also resides. And last is the task node.
Task node is only and only responsible for certain kind of task processing. Now, each of these nodes can be either on demand or spot depending upon what you need. Now in exam there might be certain questions related to these purchasing options. Now definitely master node, you should never put it as spot because if someone puts a higher bit then your master node will be deleted. So this is something that you should never put it as spot core node again, it should ideally be on demand because this is where your HDFS file system is present. So if you’re using storage then core node should never be in spot. Task node is something that you can put it in spot. And let’s assume if I put five task nodes and I can select spot instances over here so even if it gets deleted, it is not a major issue for my cluster. So this is something that you need to remember.
So let’s go back to the PowerPoint presentation. We already discussed that master node is responsible for coordination and distribution of data. So whenever certain data arises, the master nodes will distribute the data to the core node as well as to the task instance node. So that is a responsibility of master node. It will also keep track of status and the overall health of a cluster. Now, core node contains both data nodes and task tracker Damon. So what this basically means is that they can store data based on the HDFS and they can also run task on the log that is being given to the EMR cluster. On the other hand, task nodes only run the task tracker demon and it can only perform tasks. Now certain times there can be a use case that will be given to you in the exam where cost is one of the factors. So in the cases where cost is a factor, you can select task nodes as spot instances instead of on demand.
So certain kind of a scenario will be given to you where you have to select on whether task node should be on demand or spot, master node should be on demand or spot. So now you know you cannot really have a master node as spot because that is too risky. So sample EMR task if we look into where you have to calculate a number of repetitive words in a specific text document. Now, there are two ways in which you can put data within an EMR cluster.
One of the easiest and one of the most ideal way is through an S Three bucket. So what you can do is you can put your data in an S Three bucket whatever log files that you have, put those log files in a S Three bucket and fetch the data from that S Three bucket. And once the processing is done within the EMR cluster, you can put the output, the analyzed output again in a destination S Three bucket. Now, this approach is much more flexible because once the processing is done, you can delete the entire EMR cluster. Otherwise, if you are storing the results in the SDFs as well, then basically you cannot really delete the EMR cluster because the data is still present over here. So fetching the data from the SD bucket and storing the results in the SD bucket is one of the ideal solutions.
21. Overview of Hybrid DNS Architectures
Hey everyone and welcome back. In today’s video, we will be discussing the hybrid DNS architectures based on AWS. Now, before we go ahead and discuss in detail, let’s understand one of the problem statement. And the problem statement is that Route 53 does not respond to queries which are not originating from the VPC. Now this specific statement, really it changes the way you architect the hybrid DNS architectures within your organization. So this can be understood with the below diagram, where on the left hand side you have on premise, and on the right hand side you have AWS. So this is typically a hybrid setup. Now you have a virtual private gateway on the AWS side, which is connected either through a Direct Connect or the VPN. And the on premise service can connect to your AWS environment either through Direct Connect or the virtual private gateway. Now, if the client wants to resolve this domain, which is cloud Acme. com, the setup might seem very simple because client can connect to the on premise DNS server.
On premise DNS server either it can forward the request to the Route 53 from either Direct Connect or the VPN gateway. So it might seem pretty straightforward, but it is not the reason why it is not because the Route 53 will not respond to queries which are not originating from the VPC. So even though there is a virtual private gateway, but you need to have something inside the VPC, it can be an easy to instance or something similar through which the queries can be sent through the Route 53 service. So in order to get around with this kind of a problem statement, what organizations began to do, and this was pretty common, is to have a custom DNS server. So this custom DNS server, it could be an easy to instance. So let’s say you put an EC to instance within the VPC.
And so here what happens is on the AWS side, you create one more easy to instance and the on premise DNS server, this on premise DNS server, through Direct Connect or through the VPN, it can send the request to the DNS server, which is the EC to instance inside the VPC. Now, since this EC Two instance is inside the VPC, it can forward the request to the Route 53 service. All right, so let’s say on premise DNS server sent a request asking for what is the a record associated with EC Two AWS internal, and it sent it to the DNS server.
So DNS server forwarded it to the Route 53 service. Route 53 responded back to the custom DNS Server and your custom DNS server sent back the reply to your on premise DNS server. And this is one of the pretty common architectures that you will see within the organization. So let me quickly show you on how exactly this might look like. So I’m in my EC Two console, and within here, you see there are two instances. One is my EC One and second is the DNS server. And along with that, if I can quickly show you the Route 53 console. There are two private hosted zones here. Basically one is local and second is Kplabs internal. So this is the Kplabs internal private hosted zone and it has certain test entries which is available over here. Now, if you talk about the hybrid DNS architecture, we already discussed that through the setup of Direct Connect or through the setup of VPN, you will not be able to connect to the Route 53. So you always would need some kind of a proxy server. So in this case, this is a proxy server. So let me quickly show you how exactly this might work.
So what I’ll do is currently if you see this specific DNS server, I had put it in a public subnet for a demo purposes. So let me copy the IP address of this specific EC two instance. And what I’ll do, I’ll do a Nslookup on AB Kplabs internal and I’ll put the IP address of the EC to instance which is acting as a DNS server. And if you see over here, it basically gave a response with the A record associated with AB Kplabs internal which is 173 198. So you can quickly verify. So this is the A record which it sent back. Now in a similar way on how we sent the query to the public IP address here, if we have some kind of a similar setup, let’s say there is an easy to instance in this setup then the on premise DNS server, either through the Direct Connect or through the VPN gateway, will send the query to this specific DNS server. And this DNS server in turn will forward the request to the Route 53. It will get the reply back and it will send it to the on premise DNS server.
Now, do remember that this specific EC two instance should have some kind of a DNS server present. Let me quickly show you, I do have one. So I’ll quickly log into the EC to instance here and if I quickly check the service status, you see I have a Bind, a domain name server which is up and running. And if you quickly do a netstatlp you will see that you have the named program which is listing on port 53. Along with that what I quickly did for this setup. Let me show you this. So within my configuration file I have an ACL of Trusted. Now within Trusted I have put my IP address so that it will resolve. So here you should also put in case if you go with the bind option, you should also put the IP address or the mask associated with your on premise. So let’s say that your on premise has a CID of 170 216. Then you should put it within the ACL list so that it would be resolvable now, the problem associated with this type of architecture is the high availability. So let’s say you have one EC two instance over here and this EC two instance goes down. Then your name resolutions will stop working. So either you have two EC two instances and you configure bind in both of those EC two instances or aeros also suggest to make use of simple Ad. So simple ad can be highly available and you can use a simple ad instead of configuring binding in the easy to instance. And that simple ad can act as a forwarding DNS server. So just remember that either you can have custom DNS server over here or you can even make use of simple Ad in this type of architecture.
Interesting posts
The Growing Demand for IT Certifications in the Fintech Industry
The fintech industry is experiencing an unprecedented boom, driven by the relentless pace of technological innovation and the increasing integration of financial services with digital platforms. As the lines between finance and technology blur, the need for highly skilled professionals who can navigate both worlds is greater than ever. One of the most effective ways… Read More »
CompTIA Security+ vs. CEH: Entry-Level Cybersecurity Certifications Compared
In today’s digital world, cybersecurity is no longer just a technical concern; it’s a critical business priority. With cyber threats evolving rapidly, organizations of all sizes are seeking skilled professionals to protect their digital assets. For those looking to break into the cybersecurity field, earning a certification is a great way to validate your skills… Read More »
The Evolving Role of ITIL: What’s New in ITIL 4 Managing Professional Transition Exam?
If you’ve been in the IT service management (ITSM) world for a while, you’ve probably heard of ITIL – the framework that’s been guiding IT professionals in delivering high-quality services for decades. The Information Technology Infrastructure Library (ITIL) has evolved significantly over the years, and its latest iteration, ITIL 4, marks a substantial shift in… Read More »
SASE and Zero Trust: How New Security Architectures are Shaping Cisco’s CyberOps Certification
As cybersecurity threats become increasingly sophisticated and pervasive, traditional security models are proving inadequate for today’s complex digital environments. To address these challenges, modern security frameworks such as SASE (Secure Access Service Edge) and Zero Trust are revolutionizing how organizations protect their networks and data. Recognizing the shift towards these advanced security architectures, Cisco has… Read More »
CompTIA’s CASP+ (CAS-004) Gets Tougher: What’s New in Advanced Security Practitioner Certification?
The cybersecurity landscape is constantly evolving, and with it, the certifications that validate the expertise of security professionals must adapt to address new challenges and technologies. CompTIA’s CASP+ (CompTIA Advanced Security Practitioner) certification has long been a hallmark of advanced knowledge in cybersecurity, distinguishing those who are capable of designing, implementing, and managing enterprise-level security… Read More »
Azure DevOps Engineer Expert Certification: What’s Changed in the New AZ-400 Exam Blueprint?
The cloud landscape is evolving at a breakneck pace, and with it, the certifications that validate an IT professional’s skills. One such certification is the Microsoft Certified: DevOps Engineer Expert, which is validated through the AZ-400 exam. This exam has undergone significant changes to reflect the latest trends, tools, and methodologies in the DevOps world.… Read More »