1z0-821 Oracle Solaris 11 System Administration – System Processes and Tasks Part 3
7. Performance Issues
Now, continuing our discussion on monitoring performance, we have to learn how to deal with system performance issues. Now, system performance issues are almost always attributable to hardware or the interaction of the system with the hardware. Now obviously, you can have hardware that doesn’t work that’s broke, you can have a bad nick, you can have a thrashing drive that’s not working correctly that’s about to crash. You can also have poorly written software that interacts badly with the hardware that takes up too much hardware resources like uses too much Ram or has memory leaks or takes up too much CPU time, doesn’t allocate its threads and processes properly.
You can have programs or applications or processes that maybe they’ve started too many threads or too many processes and they have to be killed. You can also have insufficient hardware. Sometimes you’re loading a lot of things on there and it’s great hardware for a desktop and not so great for a server. So you may have a hardware issue in terms of not having enough of it.
Now, most of the hardware issues that we see are attributable to four things CPU, Ram, the network card, and the disks. So those four things can actually cause poor performance when you overload them. And you may need to look at adding more of those types of things to the system. Now, how do we discover these performance issues? Well, when the system stops functioning the way we think it ought to is one important indicator.
Unfortunately, that’s the way we see it happen most of the time. We’re running a lot of apps, we’re running some important things and the system just isn’t doing what it’s supposed to do. The proactive way to do it is to monitor the system. And that was the subject of our last session, is monitoring performance so we can see how the system does. We can put the system under normal load and baseline it to see how it does under normal conditions. And if we see that it’s not performing well, then we start looking at how to troubleshoot these performance issues.
Now, during our last session, we also looked at some of the monitoring tools that we can use and there’s some additional ones out there that we didn’t mention, things like the logs and the message utilities and so forth. But we did look at performance monitoring some of the command utilities and we didn’t even scratch the surface of how many utilities and tools there are to actually monitor performance on the system. Now, what are some of the things we typically need to do to solve performance issues and performance problems?
Well, one of the big things you can do is upgrade your hardware. If you’re using a laptop, it’s probably difficult to do that. But if you’ve got a full desktop station, that might be easy to upgrade. Things like your Ram and your disk drive and your network card in order to replace the CPU you’d probably have to replace the motherboard. So that might get a little expensive. If you’re using a server class system, then you definitely want to buy the biggest bang for your buck that you can afford and get the good hardware out the first time.
Because you’re not going to buy low end, cheap hardware and then try to run an enterprise on it. That just won’t work. The other thing you can do is replace faulty hardware and you can tell when you have faulty hardware because things just won’t work right. Your disk may thrash too much, maybe your network card will chatter or the throughput will be very low. So you may look at replacing some of that faulty hardware. Typically if Ram goes bad, you’ll know it because of beep codes when the system boots up, or the operating system may not even load because Solaris Eleven actually requires a little bit of memory right off the bat, requires typically at least a gig and a half of Ram. So if a Ram stick goes bad, you’ll typically know it because the OS won’t load.
Now the CPU, if it goes bad, typically you won’t even get a computer that boots. So if any of those things happen, you’ll typically know it and you’ll be able to replace the faulty hardware in most cases, although in the case of CPU it you might wind up replacing a motherboard as well. Now the other things you can do with software issues, you can kill processes or at least change their priorities so they’re not using high priority resources. Things like memory and CPU, they’re not running at a high priority on those. You can also update applications. Sometimes there are issues with applications that cause them not to perform well and updating them with a patch or a fix may solve the problem. So you may look at that if a particular application is causing issues. The best way to tackle this, though, is to monitor your resource usage, look at what your system is using, and if we actually, I’ll take a look again at Performance Monitor here for you in a moment.
And you can use that as a baseline to kind of watch where your system is doing and we start seeing issues happen. That may be when it’s time to start troubleshooting or upgrading or replacing hardware. Let’s take a look at Performance Monitor really quickly. I’ve already got System monitor open and what you can see again, we’ve seen this before a little bit. CPU history, memory and swap history and network history. Those are three things that really affect the four items I mentioned the CPU, the memory and disk, and the network. If you see consistent high usage on the CPU history, and typically that’s like above 40% consistently, then there’s things that you have to look at.
If your CPU is consistently above the 40% usage, then you might want to take a look at upgrading and replacing it sometimes that’s difficult to do without replacing the motherboard. Memory is easy to add, and if you see your swap history increase, you might want to look at adding memory. So it causes that swap history to decrease with networks. If you see a lot of high utilization on the network card, or it’s high use but it’s erratic, you might need a new network card or a faster one if it’s an older card. So monitoring your system is probably the best way to head off performance issues for those big hardware items. Those four big items we spoke of you.
8. Monitor System Logs
Now, in addition to some of the other tools we’ve talked about that will help you monitor performance, troubleshoot performance issues, manage processes, and troubleshoot process issues, we also have another tool that’s extremely valuable to us, and it may be the most valuable tool of all, and those are the system logs. Now, the system logs give you a lot of indepth information, information, probably more information than you could possibly use. And sometimes they can be a little bit unwieldy. But they will tell you every single thing that has happened, almost has ever happened with your system. Performance monitoring tools may tell you what’s going on at the moment, but everything may be fine at the moment. You may need to know what happened two days ago. So that’s where you have to go back and look at the logs. And Unix and Linux in general have always been extremely good about logging just about everything.
Some examples that you’ll see, and these are by no means conclusive installation logs, service logs, security logs. You can have all kinds of logs for a system, and even down to the component probably. And again, there’s no end to the logs you can have on Unix and Linux, whereas you only have two or three logs on a Windows system. So that’s great. But it also can be difficult to manage those logs. Now there are several services that we have that help manage the logging facilities on Linux and Unix. Two in particular are the Syslog D and the Audit D, for example. Syslog D pretty much takes care of just about every system log you can think of application, process, install, everything. Syslog D typically manages those logs, and there’s been different variations of the Syslog D facility, but generically, that’s what it is. For Unix and Linux, there’s also the Audit D, which isn’t always turned on by default on a Linux or Unix system, it typically audits security events. Those are just two of the services that help manage logs. There are a few more logs are stored in various locations. Unfortunately, there’s no rhyme or reason, sometimes to where they’re at. Most of the logs you need to look at, however, on most variations of Unix and Linux, and Solaris Eleven is no exception will be in a subdirectory of some sort of VAR.
That’s typically where you’re going to find all of your system logs. Now again, viewing these logs can be unwieldy. They’re all text files. But when you cat them even into less or more, there can be miles and miles of information to sift through. There’s several utilities that you can use to monitor the system logs, to look at them, view them, and so forth. And they’re both GUI tools and command line interfaces. And we’ve used a lot of command line tools here in this course. But I think the logging facility, the system log viewer that’s in the GUI is actually excellent. And I’d probably take a look at it even as a seasoned Unix user, probably before I’d look at the command line interface.
It’s that good. So I’ll show it to you in just a second. Logs can also be monitored in real time. You don’t have to just go back and look at what happened two weeks ago. If you’re monitoring a log, you can use like a tail command for example, if you’re an old Unix user, you know what that is and it will show you the last few lines of a log file, typically the newest lines of a log file in real time. But you can use the GUI and get a lot of use out of it for monitoring logs. Let’s go ahead and take a look at it right now before we actually take a look at the GUI, I want to show you where the logs are typically stored and we’ll take a look at a couple of them really quickly. We go navarre and you can see there are all kinds of things there and it depends on what you’re really looking for.
For example, the service logs are in the service directory. So let’s look there and we’ll do another LS and we see a log directory and if we take a peek in there we see a ton of logs. Now you can do a cat. In fact I think I have done one in the past few minutes and I can kind of show you what one looks like. A cat is a system log. Here the system Zones default log. So if you want to take a look at that tons and tons of data and actually not too hard to understand, but there’s a lot of data here and we can actually go to the Applications Top Line menu here and go to System Tools. And right below Performance Monitor will see the System Log viewer and some people still prefer to look at them in the command line and that’s fine. There’s actually more flexibility at the command line obviously than there is at a Gui. But if you look in the system log viewer it actually consolidates all the logs in the system, at least probably 99% of them. And you can look at the different logs that are here and you can add logs as well. You’re not just limited to these logs by the way.
You can add logs for other points in the system and you can just thumb through here and kind of look at the logs that you see and you’ll see the dates starting and stopping and the reason why and so forth. So if you’re in the mind of troubleshooting or just going back in time and seeing what happened on your system, you can actually do that. And this will let you actually go back by date and you can click over here and just kind of see what happened. You can scroll around, there’s different logs you can open if they’re not open by default. You can copy, you can export them and so forth. So it’s actually a very good utility to help you view logs. But again, some people like the old command line interface utilities, and that’s okay, and there are plenty of them out there. And if you get the chance, just go into the system and take a look at the system log viewer and maybe even some of the command Manline utilities that are out there to look at the logs. Obviously, cadding them is probably the most efficient way to do it. So that’s a quick look at looking at the system logs and managing them.
9. Core and Dump Files
Despite all of our efforts to monitor the system and its performance to troubleshoot issues, occasionally things happen. Sometimes the system actually crashes. What will happen is the system will crash, it will throw an error on message on the screen and it will dump the contents of its physical memory to a dump file. In the Windows world we call this the blue screen of death and we see this happen infrequently these days, but it used to happen quite a bit. Well, Unix and Linux are not immune to these types of things. We do see that happen on those systems as well. And on Solaris Eleven there’s a few things that happen you probably need to be aware of for the exam and for real life. Now, system crashes typically happened in hardware failures or malfunctions or software errors and the occasional input output problem. Now again it will, when, when the crash happens, it will throw an error message, it will dump the contents of physical memory to a file and then the system will reboot. Once the system reboots, it will try to boot back up. And when it does, a command called Savecore is executed that will retrieve the dump data files.
And these dump data files are typically not in human readable form at that point. Now it will automatically put these files in the Save core directory and then once they’re in there, you can take a look at them later. Now they’re written to something called a Vmdump n file. And this n is typically a number, so it could be a different number based upon every crash. And then you would use the MDB command, the user ben MDB command, to view the crash dump files. Now you would want to review them so that you can figure out exactly what happened, what error caused the crash, and possibly that might lead you to a way to figure out how to fix it. So you should always configure your system to save crash dump files. Now you configure your system to do this using the dump adm command and we’ll take a look at that command in a moment. One important thing to note is you must be root or in the root role to run the dump adm command, to configure crash dump files and to manage that and also to view crash dump files. So if nothing else, you’ve got to be able to get back to root to do that. Now a second type of file is a core file. And a core dump can happen when an application file, not the system itself, but an application terminates abnormally, it stops functioning, the application crashes, not the whole system. When it does that, it dumps its core and these core files are saved application files that happen when the app or the program terminates abnormally. Now there’s a command called the core adm command that manages core dump files. And just like the dump adm command, the core adm command helps you configure the location where the dump occurs, where it’s saved to, and various other parameters. So we’ll go ahead and take a look at dump adm and core adm. Right now let’s go to a command shell and in order to look at dump adm and core adm files, you have to be root. So let’s change the root really quickly. And the first thing we’re going to do is go ahead and bring up dump adm.
Now if you just type dump adm help, we get all of the different options that we can use with dump adm and configuring. This can be a little tricky, but if you just type dump adm at the command line, you’ll see what it’s set for. It’ll dump kernel pages. That’s the dump device that it’s using. The VAR crash is the save core directory. Savecore is enabled, so it will automatically recover the dump files on a reboot, and it compresses the files when they dump because they can be quite large if your physical memory is large. If you’ve got four gig of memory, you may have a four gig crash dump file. So the next thing we’ll look at, typically you would not need to play with dump adm much unless you really just want to change some of the parameters of it. The next thing we’ll look at is the core adm command and that manages core dumps from applications. And we’ve got several options we can use here, obviously, and let’s just show you what core adm is configured to use by default. So those are defaults that you can set.
Some things are enabled, some things are disabled, some things are default. So you may want to change some of these things if you’ve got a specific reason to, but other than that, there may not be anything you really need to configure on this or the dump adm command for that matter. So we’ve just basically touched on a little bit about dump adm and core adm and the core and dump files. So this is a bit of an advanced topic. You will see a little bit of it on the exam, probably it is one of the objectives, but you’re probably not going to see much because again, it’s an advanced topic. But it would be a good idea for you to play around a little bit with the commands and maybe read up a little bit on it in the Oracle documentation because they can be a little bit complex to set the different options for the core and the core dumps and the crash dump files. Just kind of giving you a surface introduction to these things. Bye.
10. For the Exam
Well, we’ve covered a lot of information for this particular topic on managing system processes, scheduling tasks, and so forth. Let’s briefly go over what we covered and talk about what you may need to know for the exam a little bit. We first talked about getting system information because it’s important to gather information on the hardware, software, OS, and various other things on your system before trying to monitor its performance, manage its processes, and so forth. So we looked at things like you name, getting the OS information, getting the package information, looking at the system hardware by running the PRT comp command. We also looked at how to get the Services configuration and the disk usage with DF. Then we talked about managing system processes, and we looked at the PS command and some of its related commands. We learned how to do a PSDF to get the processes and find the ProcessID.
Then we learned to grep send that to Grep to get a specific process or specific word that we were looking for. And then we looked at how to do it with PGRP. And we also looked at how to get other information about processes so that we could kind of see what’s running on the system and that would help us determine what our resource usage is and that would help us troubleshoot. Now, when we troubleshoot process issues, we talked about some of the things that can happen systems locking up processes, using too much Ram, too much CPU time, and so forth, possibly multiple processes, multiple instances of a process, and so forth. So we talked about a lot of those things and how to troubleshoot them by looking at the information. We looked at the Psrgs. We also looked at changing the priority. We talked about using the Nice command to change the priority of a process. And finally we looked at how to kill a process if it’s necessary to do so, to completely stop a process so that it releases its system resources and stops running.
Then we looked at scheduling system administration tasks, and there were two commands. We looked at the Cron tab and the at command. We talked about how the at command only does one job at a time and it can schedule it for a onetime good deal. It doesn’t work on a daily basis or a periodic basis or anything like that. And we talked a little bit about how to configure at. We talked about the at deny file as well that explicitly keeps people from running at. We also looked at Cron tab and some of its files. We know that we can run a Cron tab to schedule multiple jobs at many different times, recurring times, if possible, are necessary. And basically we run Crontab by doing a Crontab dash e in the username, because Cron tabs are stored under the username of the user who created them, and they’re typically long, complex files that are written in something like VI. You might use a text editor like VI to write a Cron tab. And we looked at the Cron allowingcrodden deny files as ways to keep the use of Cron tab restricted from certain people using them or certain people not using them. Then we looked at monitoring performance.
We looked at a couple of different commands that we could use to do so. We looked at dis usage and the performance monitor. We looked at ways like VM Stat to manage virtual memory, DF to manage and monitor disk space usage, and so forth. So we looked at different ways we could monitor performance so that we could establish a baseline and determine when something goes wrong. What’s out of the norm? What is the system performing out of its normal operating parameters? Then we talked about performance issues that can happen and how performance issues are almost always attributable to hardware. And there’s four big pieces of hardware that typically cause performance issues CPUs, Ram, network cards, and disk drives.
And any one of those four can cause issues, can cause bottlenecks, can cause performance problems, and even if they’re unintended, can cause system crashes. We looked at how to fix some of those problems by upgrading hardware, repairing faulty hardware, replacing it, possibly looking at patching our applications that may run poorly on hardware, and so forth. We also looked at monitoring the system logs. Solaris Eleven has a great GUI system log monitor that we looked at, but there are commandline utilities as well. And we talked about how most of logs are in the slash VAR and a sub directory of R, depending upon what the logs are like the service logs or the install logs or whatever. And we actually looked at those. You can look at those with a cat command because they’re simple text files, but it’s much easier to look at them using the system log monitoring facility in the GUI tools. And finally, we looked at core and dump files. We talked about what a crash dump file is versus a core file. A crash dump file is from the system itself when it crashes and reboots. A core dump file is from an application that terminates abnormally.
And we looked at the different commands we can configure crash dumps and core dumps with dump adm and core adm and how you need to be root in order to run those commands and manage them. We talked about the Savecore facility that runs when a system crashes and reboots. The Savecore basically takes that memory, that physical memory, crash dump and saves it to a directory called Savecore, and this can be configured with the dump adm command. So we’ve talked about many different things during this discussion on system monitoring and performance and monitoring logs, core and dump files, scheduling tests and processes, just a lot of system maintenance type of discussions. And they all kind of fit together. Some of them we included, that are not in the OCA objectives, but they needed to be there anyway. So you got a lot of information here to digest. Look over it and you will see some of these things on the exam.
Interesting posts
The Growing Demand for IT Certifications in the Fintech Industry
The fintech industry is experiencing an unprecedented boom, driven by the relentless pace of technological innovation and the increasing integration of financial services with digital platforms. As the lines between finance and technology blur, the need for highly skilled professionals who can navigate both worlds is greater than ever. One of the most effective ways… Read More »
CompTIA Security+ vs. CEH: Entry-Level Cybersecurity Certifications Compared
In today’s digital world, cybersecurity is no longer just a technical concern; it’s a critical business priority. With cyber threats evolving rapidly, organizations of all sizes are seeking skilled professionals to protect their digital assets. For those looking to break into the cybersecurity field, earning a certification is a great way to validate your skills… Read More »
The Evolving Role of ITIL: What’s New in ITIL 4 Managing Professional Transition Exam?
If you’ve been in the IT service management (ITSM) world for a while, you’ve probably heard of ITIL – the framework that’s been guiding IT professionals in delivering high-quality services for decades. The Information Technology Infrastructure Library (ITIL) has evolved significantly over the years, and its latest iteration, ITIL 4, marks a substantial shift in… Read More »
SASE and Zero Trust: How New Security Architectures are Shaping Cisco’s CyberOps Certification
As cybersecurity threats become increasingly sophisticated and pervasive, traditional security models are proving inadequate for today’s complex digital environments. To address these challenges, modern security frameworks such as SASE (Secure Access Service Edge) and Zero Trust are revolutionizing how organizations protect their networks and data. Recognizing the shift towards these advanced security architectures, Cisco has… Read More »
CompTIA’s CASP+ (CAS-004) Gets Tougher: What’s New in Advanced Security Practitioner Certification?
The cybersecurity landscape is constantly evolving, and with it, the certifications that validate the expertise of security professionals must adapt to address new challenges and technologies. CompTIA’s CASP+ (CompTIA Advanced Security Practitioner) certification has long been a hallmark of advanced knowledge in cybersecurity, distinguishing those who are capable of designing, implementing, and managing enterprise-level security… Read More »
Azure DevOps Engineer Expert Certification: What’s Changed in the New AZ-400 Exam Blueprint?
The cloud landscape is evolving at a breakneck pace, and with it, the certifications that validate an IT professional’s skills. One such certification is the Microsoft Certified: DevOps Engineer Expert, which is validated through the AZ-400 exam. This exam has undergone significant changes to reflect the latest trends, tools, and methodologies in the DevOps world.… Read More »