ISACA COBIT 5 – Measure (BOK V) Part 9
27. Descriptive statistics – Part 3(BOK V.D.3)
So earlier we talked about central tendency in descriptive statistics. Now look at the measurement of variability. So in measurement of variability we have three approaches here. One is range. To find a range, to find interquartile range and to find standard deviation. So let’s look at each of these one by one, starting with range on the next slide. Range as a measurement of variation is the easiest one. And how do you calculate range? You calculate range by looking at the highest value in the data and the lowest value in the data and finding out the difference between these two.
So for example, if we have a data which is given here on the slide as an example which is 6910, 1111, 14, what’s the range? So you look at the highest value here, which is 14, and you look at the lowest value which is six, and the difference between these two is eight. Here again the problem is this is affected by the outliers or the extreme values. So if you get extreme value, so for example, instead of 14, if you get 140, this is going to change the range big way. So this value is affected by extreme values or the outliers. So if there is a way to remove top few values or the bottom few values, that will be a good indicator and that will not be affected by extreme values and that is achieved by interquartile range. So what we do in interquartile range is we leave away the bottom 25, we leave away the top 25 and the middle 50% only we look at that and find the range of that.
So how do we do that? Let’s see that on the next slide. So here we are talking about interquartile range. So interquartile range, as I earlier said, we’ll leave the bottom 25, we’ll leave the top 25 and we just focus on the middle 50% of the data. So interquartile range is Q three minus Q one. And as you remember earlier when we talked about the percentile. So here is our data. The central value was the median or the Q two and we had Q one which is at 25% level and at 75 we have Q three. So we leave this data, forget about this, forget about upper extreme and you just look at the range of the middle 50% of the data and which is Q three minus Q one. Just take a simple example here. Example here is 6910, 1111, 14, these six numbers and we have already done how to calculate Q one, Q two and Q three of these. So Q two is the middle value here the average of ten and eleven, q one was nine and Q three was eleven. So this we have already done that.
So interquartile range will be Q three minus Q one, q three is eleven, Q one is nine. So interquartile range comes out to be two. Now this is not affected by extreme values because instead of 14, if we had 140, that also wouldn’t have changed the interquartile range. So when we talk of interquartile range we need to think about a graphical tool which is box and whisker plot and box and whisker plot will be something like this, a box here and the central value here and something like this. So we will be talking about this box and whisker plot later on. But just to tell you that this is q two, this is q one and this is q three. So that plot uses these quartile values q one, q two and q three to plot the graph. So we will be looking at that later on in this course. So after talking about range and interquartile range now let’s look at standard deviation as a measurement of variability.
This is the most important concept in six sigma because when you say six sigma the term sigma is standard deviation. So I hope that you are aware of this concept, but if not take a little bit of time to understand this. So in range what we did was we took the top value and the bottom value and we found the difference to find the variability, how much variation is in the data. But what we did in that case was we just took two numbers.
We were not really bothered about what was in between the top and the bottom value. So in range we just look at two values, we didn’t look at values in between that. What we do here in standard deviation is that each value in the data is considered when we are calculating standard deviation. And as we have earlier talked when we were talking about sampling, that there are two notations, two way you can represent standard division as s or as sigma. Sigma is when you are talking of population, standard division. So this is sigma is for population and s is for sample. We have similar formula for both but there is slight difference when you calculate the standard division for sample and when you calculate standard division for population. So here on this slide I have two formulas. The first one is for sample.
So this one is for sample and the second one is for population. One big difference which you will see here is n minus one and n. So when you are taking standard deviation of population you take n, the total number of data in that and when you are calculating the standard deviation for sample you use n minus one. We will not go into the reason of that, but this is what you need to remember. So even in your calculator also you will see that sigma n minus one and sigma n. So if you have a sample and you are calculating the standard division of the sample, use sigma n minus one. If it is for population then use sigma n. And how do we calculate here and let’s look at this formula which is X minus X bar square divided by let’s say N minus one for square root of this. So this is S which is standard division. For sample S square is the variance. If you take the square of standard division, that is variance. So, S here would be X minus X bar square divided by N minus one.
Let me put a line here to differentiate. So this is S square and S square is the variance. So two things you need to understand one is variance and one is standard deviation. What you have is X minus X bar. So what you do is you calculate the average of all the values, which is your X bar. And then you subtract that average from each individual value. Take square of that and then divide by N minus one. And that’s what I have written here, that average of squared division. About the arithmetic mean, if it still confuses you, don’t worry about that. Let’s look at a simple example how to calculate standard division. And that calculation we are doing manually with hand. Most of the time you will be doing this calculation either using a calculator, a scientific calculator or statistical calculator, or using Excel or any other tool. But to understand the concept, let’s do a hand calculation on the next slide to find out. Standard deviation. So, here is an example of calculating standard deviation.
Standard deviation for a sample. So, in this sample, we have these six values which are 100 100 and 199 100 and 298 and 100. These are six samples which we have drawn from population. So the first thing which we do is find the average of this. So average of this will be 100 plus 101 plus 99 plus 102 plus 98 plus 100 and divide this by six. This comes out to be 100. This was a simple case. Now what we do is since the average is 100. We subtract every item here all the six items from 100. So 100 -100 is equal to zero 101 -100 is equal to plus 199 -100 is equal to minus one and so on so this is how we get x minus X bar here, which are zero one minus one, two minus two and zero. Now the next thing you want to do is take the square of each of these. So square of zero is zero square of one is one minus one is also plus one the square of plus two is four and minus two is also four and the zero square is zero. So once you have done that, then you add all these X minus X bar square and which comes out to be ten. Going to the formula of S square, s square is variance. So variance is equal to X minus X bar square and sum of that which is ten here. So ten divided by five and y five because five is N minus one.
N here is six. So six minus one is equal to five. So this gives us s square is equal to two. So variance in this data is in this sample is two. And once you take square root of that, so this gives you the standard deviation. Standard deviation is square root, two is equal to 1. 4014. So that’s how you calculate standard deviation for a sample. Same thing you could have done for population also. But in population the difference would have been instead of n minus one, you would have used capital n here. The formula here would have been sigma x minus instead of x bar you would have used Mu, because that’s how we represent the central value and that’s how we represent the mean for the population that’s represented by Mu. So this would have been the formula for population. Similarly, that’s what we have done for a sample here. So this completes our discussion on descriptive statistics. We talked about finding the central tendency, mean, mode, median and percentile. And in that we looked at quartile as well. And then when we came to measurement of variability, we talked about range, we talked about interquartile range and we talked about standard deviation. And we also did manual calculation for finding out the standard deviation.
28. Descriptive statistics – Using Microsoft Excel and SigmaXL (BOK V.D.3)
Hey, welcome back. In this video we will be looking at how to find out descriptive statistics using Microsoft Excel. Later on we will be looking at how how to find out descriptive statistics using Sigma Excel software. So, what I have here is a sheet which is the customer data which I have taken from SigmaXL. So this sheet is coming from Sigma XL software. And here I need to find out the descriptive statistics related to average number of orders per month which is in column number D. So this is what I want to find out. And when I say descriptive statistics, I mean I need to find out what’s the average, what’s the standard division, what’s the lowest value, what’s the highest value? All that information I need to find out for column number D and which is starting from row number two to row number 10 one. So there are 100 items here. How do I find out using Excel without using any other software? Let’s do that. So to do that, I go to Data. I click on Data and if you see right here on the right side, I don’t have an analysis pack here. Data Analysis pack is missing from here.
Let’s add that because if you need to do any statistical analysis using Microsoft Excel, you need to activate that which is the data analysis tool. Let’s do that. So for that I go to File, I go to Options and in Option I go to Add Ins and in Add in here is the one which I need to add in which is Analysis Data Pack. Once I add this analysis data pack I will be able to do a lot of statistical analysis. So to do that, I click on Manage Excel addins and I press on Go. And here is the dialog box. And in this I click on Analysis tool pack and press OK. So now what you have seen here is you have Data analysis here on the right side of the data ribbon. This is what will help you in doing lot of statistical analysis. So let’s do that. So let’s click on data analysis. And now what I need to do is I need to find out the descriptive statistics. So let’s choose that. Click on Descriptive Statistics, press OK.
And now it’s looking for input range. What all data I need to find out descriptive statistics for. So my input data I select here and I select my input data as column number D. And what do I need here? I need to find out summary statistics and that’s the only thing I need to find out. And where would the output will be. So let’s put that output on new worksheet ply and the new and the description of that would be descriptive stats. So it will create a new sheet here and put all the descriptive statistics on that. And another thing is that levels in first row. Yes, I have labels on the first row, because my first row is the level of that data, which is here, which says that average number of orders per month. So let’s click on that. And with this, I press. OK. Here. So here I get my descriptive statistics for column number D. So let me make it a little bit bigger here. And this as well.
If you look at the count here, it tells that there were 100 items there. Some of that what was the minimum value, what was the maximum value? It tells you that, it tells you the range as well. And it tells you the mean mode and median and standard division. So mean is here mode, median and standard division. Square of standard division is variance. So it gives me the sample variance as well. So all the descriptive statistics I could get with few simple clicks. So that’s how you calculate descriptive statistics using Microsoft Excel. So previously we looked at how to find out descriptive statistics using Microsoft Excel. Now let’s do the same thing using Sigma Excel software. So now what I have is I have opened SigmaXL. So you can see Sigma Excel here. So I click on that. So this is my Sigma Excel sheet. Now I need to open that database. So for that I go to help and then I go to sample data. I press yes, I want to continue and I open the sample data, which is the customer data. So that’s the data I need to open. So I double click on that, yes, I want to open that.
So here is the data. This was exactly the same data which we used earlier to demonstrate how to find out descriptive statistics using Excel. Microsoft Excel. Now let’s do the same thing using SigmaXL. And the column D was the one for which earlier we found out the descriptive statistics. Let’s do the same thing using SigmaXL. And for that I go to Sigma XL, I go to I go to Statistical Tools. And in Statistical Tool there is descriptive statistics. So I click on descriptive statistics. Now this is looking for data. Select the data. What I will do is I will use the entire data table here and I press next. And I know I need to find out descriptive statistics for column D, which is the average number of orders. So here is the average number of orders. I add that as a numeric.
Data variables. I don’t need to do any grouping here without that grouping. Let me press OK here, and that gives me the descriptive statistics in the next tab. And here I have the count as 100 mean standard deviation range. Earlier, a few things which you couldn’t find using Microsoft Axel you have here, which is 25% quartile, which was Q one, and 75% quartile, which is Q three. Those things you find here. And in addition to that you have things like Anderson Darling Normality Test and which gives here the p value of zero point 210, point 21, which is greater than 0. 5. So let me put it here. So this is this is greater than 0. 05. And once this is greater than 0. 05, with the 95% confidence, you can say that this particular data is normally distributed data also, so that also you could find out straight away in the descriptive statistics. So this completes our discussion on how to find out descriptive statistics using Sigma Excel.
29. Graphical Methods – 1 Box and Whisker Plots(BOK V.D.4)
So after talking about descriptive statistics, now let’s look at some of graphical methods. So here on this slide I have a list of some of the commonly used charts or diagrams. Let’s look at that. How do we construct these? How do we interpret these? So things which will be covered in this section are box and whisker plot, scatter diagrams, histograms normal probability plots, frequency distribution and cumulative frequency distributions. So let’s start this discussion with box and whisker plots. Box and whisker plots are also known as box plots. So earlier when we talked about descriptive statistics, we talked about how to calculate median, we talked about how to calculate q one and q three, which are the first and the third quartile. And we also talked about that the median is q two, which is the second quartile. And the difference between q one and q three is interquartile range. So all these things we have already talked in descriptive statistics. So what we do in box and whisker plot is basically draw these things q one the median and q three and the extreme limits. Let’s see how do we do that? So suppose we have a list of data, let’s put it here.
So there are few numbers here. So as you know, to find out the median we have to arrange these numbers in ascending or descending order. So let’s say we put all these numbers in ascending order, then the middle value is your median, median or q two. So this divides q two divides your data into two halves, first half and second half. Now if these halfs we further divide into another half. This gives us q one and q three. This is something which we have already talked about that. Now if I want to draw a box and whisker plot, the only thing I need to do is put a box between q one and q three. So this is q one, this is q three. So this is my q one, this is my q three and here is my median which is q two. So let me put this as median and then the data which is on left and right extreme is shown as a whisker. So here we have two whiskers for the box and whisker plot. So this is how you draw your box and whisker plot. Now in this also there are two variations. Sometimes you will see that these lower and upper limits are shown whatever is as per the data. So whatever is the lowest is shown here as the lowest one and whatever is the highest value is shown as the highest value.
But there is another trend where you limit these whisker sizes. So you don’t want to have a whisker. Something like let’s say you have an example where you have a median here, q one, q three and you have lowest point is here. But the highest point is let’s say there was an abnormal high value in this group of data. So your Whisker becomes very long. So what is done here is number of times the length of Whisker is limited to 1. 5 times the interquartile range and your interquartile range is this. This is your interquartile range and you limit the size of Whisker to 1. 5 times interquartile range. So here you put the whisker. But what about this data which is beyond this? This is shown as outlier and this is what you see here in this sample box and whisker plot. You see this as a outlier because the Whisker at the top is limited to 1. 5 times the interquartile range and your interquartile range is this range and anything beyond that is shown as outliers.
So let’s take an example and construct a box and whisker plot using Sigma XL software. So for that, what I’m going to do is I will open Sigma Excel, then I will open Customer Data which is the sample file attached with the program SigmaXL and then I will be drawing box and Whisker for average number of orders per month. And this is what this box and whisker plot is. So let me open my Sigma XL and I will show you how do you make box and whisker plot using Sigma Excel. So let’s do that. So here I have my Sigma Excel open and now let me open my Customer Data file. So for that I go to help and I go to sample data. And in sample data, I look for the customer data file. So I open this. So this opens my customer data file. Now this is the column column number D for which I want to plot box and whisker plot or also known as box plot. So for that I go to Sigma Excel Tab. So here I go to graphical tools and I go to box plots. Click on that and select the entire table next and the column D which is the average number of orders per month. This is what I want to draw the box and whisker plot for.
So I select that as a variable Y. I don’t want to do the grouping here and with this let me press OK, so this gives me the box and whisker plot and this you have already seen on a slide where we talked about how to draw box and whisker plot. At the top we have Q three. So we have a Q three. At the top we have a Q one which is first quartile at the bottom and this is your Q two or the median. These are your Whiskers upper and lower and this is your outlier which was beyond 1. 5 inter quartile range. So this is how you draw box and whisker plot. So let me go back to my previous dialogue. I recall my dialog box here. Let’s say I want to do grouping, grouping by the size of customer and let’s say the size of customers I group by size of customer the average number of orders per month and my guess here is if the size of customer is bigger, the number of orders will be bigger per month. So let’s see whether that is the case or not. So with this I press OK, so now I get box and whisker plots by different customer sizes. So here I have a small size customer and large size customers.
So, as you can very apparently see from this, that large size customers have average number of orders per month, which are much more higher than the smaller customer, small size customers. And you can see some outliers here in small customer you can see these two outliers, which show that there are some customers, small customers with a very low number of average orders per month. And here for the large size customer also there are some customers with a very high number of orders per month. And here, if you see at the bottom, the Whisker size is very small. So there is no problem with that. The only concern becomes when your Whisker size is high too long then you make it only 1. 5 times interquartile range and this is what has been done here and balance one data has been shown.
30. Graphical Methods – 2 Scatter Plots (BOK V.D.4)
So after talking about box and Whisker plot, let’s talk about the next diagram which is Scatter diagram. Scatter diagram is one of seven basic quality tools and this is used to show the relationship between two variables, how two variables are linked to each other. The first thing when you are doing this, make sure that that relationship makes some practical sense. For example, let’s say we want to find out the relationship between the temperature, outside temperature and the ice cream sale. So generally we would think that as temperature goes up, as heat goes up, weather becomes hot, there is more sale of ice cream but we want to see the relationship between that.
So what we have done is we have collected data related to temperature on a particular day and the number of ice cream sales. And based on that, if you draw a plot between these two and that will look like something like this which will be on an x axis, let’s put temperature and on y axis let’s put the ice cream sale. And if you draw the point for each day, let’s say day number one, there was a specific temperature and specific number of ice cream sale. So you draw a point. So this is sale ice cream sale and you keep on drawing these points based on the data which you have collected. Now, if you look at this, this will show visually that there is a relationship between the temperature and sale. If you can draw a line which is joining, roughly joining all these points, then you will see that this line is rising up. So this shows that as temperature goes up, the ice cream sale also goes up.
We will be talking about this and correlation regression when we come to analyze phase of the project and probably this is all you needed to know at this time that Scatter diagram is to find out the relationship between two variables. One important thing which you need to note here is that sometimes this relationship could be caused by a third variable. So when you are looking at two variables, let’s say if you are looking at two variables and those are the ice cream sale and the heat stroke cases. So in hospital they note down how many heat stroke cases came and they noted down the ice cream sale on that day. So probably they will find a relationship something like this which shows that sale versus heat stroke cases. So there is a relationship between that. But thing to note down here is that there is no direct relationship between the heat stroke and the ice cream sale. This relationship is because of the third thing which is the weather or the temperature.
So temperature, as temperature goes high that causes heat stroke, as temperature goes high that causes sale of the ice cream. So the relationship is actually between temperature and the sale of ice cream and between the temperature and the heat stroke. There is no direct relationship. There is no practical relationship between this. So when you use Scatter Diagram make sure that this particular relationship makes some practical sense. Let’s leave it here and let’s talk about this topic once we reach to correlation and regression in analyze phase of the process.
Interesting posts
The Growing Demand for IT Certifications in the Fintech Industry
The fintech industry is experiencing an unprecedented boom, driven by the relentless pace of technological innovation and the increasing integration of financial services with digital platforms. As the lines between finance and technology blur, the need for highly skilled professionals who can navigate both worlds is greater than ever. One of the most effective ways… Read More »
CompTIA Security+ vs. CEH: Entry-Level Cybersecurity Certifications Compared
In today’s digital world, cybersecurity is no longer just a technical concern; it’s a critical business priority. With cyber threats evolving rapidly, organizations of all sizes are seeking skilled professionals to protect their digital assets. For those looking to break into the cybersecurity field, earning a certification is a great way to validate your skills… Read More »
The Evolving Role of ITIL: What’s New in ITIL 4 Managing Professional Transition Exam?
If you’ve been in the IT service management (ITSM) world for a while, you’ve probably heard of ITIL – the framework that’s been guiding IT professionals in delivering high-quality services for decades. The Information Technology Infrastructure Library (ITIL) has evolved significantly over the years, and its latest iteration, ITIL 4, marks a substantial shift in… Read More »
SASE and Zero Trust: How New Security Architectures are Shaping Cisco’s CyberOps Certification
As cybersecurity threats become increasingly sophisticated and pervasive, traditional security models are proving inadequate for today’s complex digital environments. To address these challenges, modern security frameworks such as SASE (Secure Access Service Edge) and Zero Trust are revolutionizing how organizations protect their networks and data. Recognizing the shift towards these advanced security architectures, Cisco has… Read More »
CompTIA’s CASP+ (CAS-004) Gets Tougher: What’s New in Advanced Security Practitioner Certification?
The cybersecurity landscape is constantly evolving, and with it, the certifications that validate the expertise of security professionals must adapt to address new challenges and technologies. CompTIA’s CASP+ (CompTIA Advanced Security Practitioner) certification has long been a hallmark of advanced knowledge in cybersecurity, distinguishing those who are capable of designing, implementing, and managing enterprise-level security… Read More »
Azure DevOps Engineer Expert Certification: What’s Changed in the New AZ-400 Exam Blueprint?
The cloud landscape is evolving at a breakneck pace, and with it, the certifications that validate an IT professional’s skills. One such certification is the Microsoft Certified: DevOps Engineer Expert, which is validated through the AZ-400 exam. This exam has undergone significant changes to reflect the latest trends, tools, and methodologies in the DevOps world.… Read More »