The Standard Deviants present the high-stakes world of statistics, history, Chaz Mastin, Alyssa Rosen, Kalila Asrads, and Michael LaForte. Whoa, oops. Sorry. Hi, I'm Chaz Mastin. And welcome to the high-stakes world of basic statistics. Over the next two hours, we'll do our best to supplement and enhance your learning of the principles and concepts of statistics, including statistical problems in data sets, probability in distributions, sampling, and much, much, much more. Realize that we move very fast, so you'll have to pay attention. But keep in mind that we're on video, so you can pause the tape, stop the tape, or rewind the tape at any time you want. If you miss something or need to go over something again, don't freak out. It's not like class. You can just rewind and watch us again. Also, because we'll be covering a lot of statistics, we've included a video outline at every large break in our three-part program, detailing exactly what's in each of these sections. Well, we're on our way. But before we move on, here's a word from a concerned member of our community. Hey, although the deviants have done a wonderful job putting this very thorough statistic supplement together, you must take a few things into account. To truly learn statistics, you must attend your classes, read your books, and most importantly, listen to your professors. So all I'm saying is just go to class. I get no kicks from statistics. Those little wacky numeric values always get me down. Down, down, down, down. Yeah, I'm sorry. It gets me down. Yeah. Part one, statistical problems and data fair. Section A, what is statistics? Statistics is a lot like cooking in a microwave. And stat problems are a lot like a microwave pizza or microwave popcorn, whatever you like. Just like when you need to nuke your food in the microwave, when you need to solve a stat problem, you just follow the directions and plug in the numbers. In order to do the right plugging, you'll need logical thinking and a good understanding of the basics of statistics. Hopefully, we can all add, multiply, and divide, right? Well, that's a great start. You don't have to have stat phobia. We'll supply you with the right directions and explanations, and then you'll be on your way. Yeah, by the end of this video, instead of hearing this when we say statistics, you'll hear this. There's a couple ways for us to look at the word statistics. First, let's look at statistics in the big picture. In this way, statistics refers to the overall science of extracting information from a group of numerical data and using the information you found to make inferences about that larger group of data, which leads us to our second way of looking at statistics. Now, this broad science of statistics is comprised of individual numeric values that you use in the first place to investigate the data. Now, these values are statistics themselves. It's a classic case of the part making up the whole. So our angle is to make you comfortable with the most basic individual statistics so that you can put them all together and build an understanding of the bigger picture. Our video is a progression from the most basic statistic itself to far more elaborate ways of solving statistical equations and tests. Everything we use in the beginning of the video we use in the end, just a couple different directions. But if you build with us as we move along, you'll eat them alive. Section B, statistical problems. This science of statistics is ultimately concerned with examining statistical problems. And a statistical problem exists when there's something unknown about your population. A population is exactly that, a population. But to be a little more formal, it's a set of values representing all of the measurements that you're interested in. Meet five-card Charlie. Now, Charlie is a notorious entrepreneur who has a reputation for products that flop miserably. But he presses on for the one cutting-edge product that will allow him to open his own little boat casino. Well, Charlie's latest product is a soft drink called Fuzzy Dice Cola. And he wants to know whether or not it will meet with the same fate as many of his previous products. If Charlie wants to know how much of a success his cola is, he's faced with a statistical problem. You see, he wants to know the percent of the population that enjoys the taste of his cola. What he needs to do is to make inferences using data he collects from those who have tasted his new product, Fuzzy Dice Cola. Oftentimes, the statistician, or Charlie in this case, is faced with a population that's virtually immeasurable due to extreme size. The solution is often to take a sample from the entire population and use this sample data to make inferences about the population. We'll go into sampling later on in the video. But for now, let's deal with five-card Charlie. To collect the data, Charlie has to perform experiments on his population of Fuzzy Dice Cola drinkers. Or he may use a sample. A sample is a part or subset of the population. In this case, a subset of the population of drinkers. In order to solve statistical problems like Charlie's, there are certain things that must be established. These are known as the elements of a statistical problem. Here's the first element. You have to know exactly what you're trying to find out. In other words, you have to know the question you want answered. What we mean by this is you must identify the question itself and the population for which it exists. Secondly, you need to know the method you'll use to get information from your sample. This means you have to establish the design of your experiment or how you'll choose the sample from the population. Third, you must determine the way you'll analyze the data you collect. As we'll learn, there are a lot of ways to analyze your data. Fourth, after you analyze your data, you need to follow a procedure to make specific inferences, predictions, and decisions concerning your statistical problem. There are often varied methods of interpreting data, and we'll describe them as we move along. The fifth and last step is obvious. You want to make sure that the work you've performed is correct or at least heading in the right direction. So as you'll see later on in our presentation, you'll need to set up certain measures of reliability to assess the accuracy of the inferences and conclusions you've made about the population you're working with. These five elements are the basic steps to solving a stat problem. You need to be clear on these steps so you're not totally lost as we work through them during the video. So let's go over them again quickly. One, identify the question being asked and identify the population. Two, determine the design of the experiment or the sampling procedure. Three, establish the method you'll use to collect and analyze your data. Four, choose a procedure for making inferences about the population based on the data. And finally, five, find a measure of reliability for whatever inferences you've made. OK, great job. We're on our way. Bolo, cali, bolo. Baby needs a Vienna sausage. 11 winner. Woo! Bring it on, baby. Bring it down here. All right. You're working with that shit. Section C, data sets. Before going any further into the intricacies of statistical problems themselves, you need to know more about how to present the data collected from the sample or population. You should understand right now that over and over, we'll be looking at presentations of collected data or what we call distributions of data. As we move through the material, our ability to work with them and analyze them will increase dramatically. So let's keep on trucking. After statisticians collect all their data, they're left with a set of numbers representing some sort of information from the sample or population. These are called our data sets. Here's an example of a data set. Suppose that these numbers represent the grade point averages collected from 12 statistic students. We'll use this data set as we continue to explain different ways of presenting data. The first way to present the data from your data set is to do so graphically. And the type of graphical method we'll use now is called a relative frequency histogram. These are really easy to construct, so we'll do it quickly. Due to the graphic nature of the relative frequency histogram, this graphic has been rated E for explanation. Parental guidance is suggested. A relative frequency histogram divides the data you've collected into subintervals or classes. This method provides us with a way each measurement or each individual value in the data set can be classified. All it means for a measurement to be classified is that it falls into one and only one class based on its value. This way, it's placed specifically where it belongs. Normally, when statisticians use this type of graphic representation of data, they use between five and 20 classes, depending on how many measurements the data set contains. Basically, a histogram sorts out the data into orderly classifications, which oftentimes simply make the best sense. So now we know that after measurements of a data set are collected, they're categorized according to class and represented graphically. It's important that you understand the histogram now because of its relationship to the distribution curve. The distribution curve is what statisticians use most often when representing data. There we go. That's the strangest thing. One day, I was just walking through the woods, just like a little boy does. And I was looking at fossils and little beaver dams and stuff and sticks and jugs and floats through the rivers. And all of a sudden, I got this very strange feeling that came around me. And I was, oh boy, here it goes again. Da da da da da da da da da da da da da da da da da. Step boy, here it goes, it's happening. Da da da da da da da da da da da da da da. Step boy. Da da da da da da da da da da da da da da da da. Step boy, step boy, step boy. Let's show you how to build a histogram for yourself. Of course, there are a few rules you'll have to follow when constructing a relative frequency histogram. And here's Alyssa to tell you what they are. The number of classes will, again, depend on the total size of your data set. If there aren't enough classes, then important characteristics of the data may not be accurately represented by the histogram. If there are too many classes, empty classes may result. And the presentation of your data or your distribution of data won't be as useful. To be honest, the particular number of classes is fairly subjective, so sometimes you'll just have to wing it. Second, you need to determine the width of your classes. Now, when determining what the class width will be, the difference between the largest and the smallest measurement should normally be divided by the number of classes you want. After you divide these numbers, you can round the quotient up to the most convenient figure if it's helpful. We'll do an example in just a second. All classes are usually of equal width, allowing for equal comparison of your measurements. In some cases, though, classes of equal width are not always suitable to accurately represent the distribution of the data. For example, when summarizing income data, some income brackets may require a larger range than others. You know, $25,000 to $50,000, $50,000 to $125,000, these are obviously not equal. They do this so that each bracket or class width has a similar amount of measurements in each bracket, say for taxing or other reasons. Of course, this income data example is a very practical application of this material in the real world. When we do our GPA example that we started, we'll use a more textbook method of determining the class width. Here's rule number three. When locating the classes, the lowest class must at least contain the smallest measurement. The remaining classes are determined by adding the class width we just discussed to the upper boundary of the previous class. The boundaries are often set so that no measurement can fall on or be equal to any class boundary. It's possible, just so you know, to classify your data in a less strict textbook fashion by simply establishing more broad and reasonable intervals. In the case that a measurement falls on a boundary, you can simply make the decision to include that measurement in an upper class. It's almost like rounding. In keeping, though, with our more traditional procedure to get the basics down pat, we'll make sure that none of our measurements fall on a class boundary. In our grade point average example, the GPAs range from 2.0 to 3.2. This means that the total span is 1.2. And according to our calculation, the number of classes would be 6, meaning that each class would span about 0.20. However, as we said before, this choice is a fairly subjective one with the standard usually between 5 and 20 classes. Because we don't want any empty classes or classes that are too full or measurements that fall on the class boundaries themselves, 7 classes is a more convenient choice because it ensures us that all of these conditions will be fulfilled. In order to ensure a greater degree of accuracy, we'll use an additional significant digit to define the classes. We'll begin the first class at 1.95. Once again, it's important that the classes are divided so that each measurement falls into only one of the classes. Now, we can set up our intervals according to the rules we've just established. The first class contains GPAs between 1.95 and 2.15, the second class between 2.15 and 2.35, the third between 2.35 and 2.55, and so on. These are the classes we'll use as intervals in order to present our data. Now that we have our classes and boundaries determined, we can categorize the measurements in the data set. The first GPA measurement of 2.0 falls into the first class, the second measurement of 2.4 falls into the third class, and so on and so on. In this graphic, we see the tallies of the different GPAs and the classes in which they fall. We also show the class frequencies, or how frequently the measurements fall into each class, and the class relative frequencies, or the proportion of the number of measurements falling into each class. This simple arrangement of data set information is what we use to form our histogram. To create this graphic representation, rectangles are constructed over each interval or class. Because the class widths in our example are equal, the height of each rectangle in the histogram is the frequency of that specific class. Thus, the relative frequency histogram for our set of GPAs looks like this. Our graph also shows how the measurements in the data set are distributed along the horizontal axis of this typical statistical graph setup. Don't forget again that if you need to go back and look at any example, or if you're just a little bit confused, you can very easily press Rewind on your VCR. Also, realize that a lot of what we're covering right now is laying the groundwork for the rest of our material. So make sure you have these basics covered. Let's go back to our original data set of the 12 GPAs for a second and then move right along. Suppose we wrote each GPA measurement on a slip of paper and put it in a hat. The chance that we would pull out a slip of paper with a GPA in a specific class is equal to the relative frequency of that class. To put it another way, the relative frequency of the third class, for example, is 2 divided by 12, or about 17%. Therefore, there's a 17% chance that, of the 12 slips of paper in the hat, we would choose one with a GPA between 2.35 and 2.55. With this quick example, we started to touch upon the ever so popular subject of probability. Now, we just wanted to wet your whistle a little bit to show you how all this material is so intricately intertwined. But before we go full force into probability, we need to go over some other material to cover all of our bases. Now, it's time for another graphical method of presenting the data set of measurements. This new method is something called a stem-and-leaf display. OK, class, is everyone ready to show their stem-and-leaf display? Alyssa? Wonderful. Chucky? That's nice. Michael? Well. The stem-and-leaf display is somewhat similar to the relative frequency histogram in that it serves as a picture-like representation of your data. But unlike the relative frequency histogram, the stem-and-leaf display allows you to retain the actual observed values or measurements of the data set. You don't create any classes or change the numbers at all. Let's use this data set in order to construct a stem-and-leaf display. This table of values shows the top 40 colleges and universities by percentage of student population expelled for various reasons over a four-year period. OK, when you're creating a stem-and-leaf display, each measurement of the data set is divided into two parts, the stem and the leaf. In our current example, we can separate each value at the decimal point so that whatever comes before the decimal point is the stem and whatever comes after the decimal point is the leaf. This is easy stuff, right? Here's a graphic to help you visualize it. For example, the first value of 7.98 is separated so that 7 is the stem and 98 is the leaf. There may be several ways to break up the values, but the easiest way to break up the values in our example is to use the decimal point. For other data sets, it's your choice to break up the stems and leaves at any significant digit in the measurement, which best displays the distribution of the data set. A stem-and-leaf display is constructed by vertically listing the stems of the data set in ascending numerical order from top to bottom, then horizontally listing each corresponding leaf in ascending numerical order from left to right, placed in a line to the right of each corresponding stem. The stems are like classes, and their leaves show a picture-like representation of the data measurements. This picture is defined by the number of leaves per stem. Also, it's practical to include a key which shows how the measurements were broken up. The result of our stem-and-leaf display of the top 40 colleges to boot students looks like this. The key at the bottom indicates that we've separated the measurements at their decimal points by providing an example. The purpose of the key is to allow you to reconstruct the data set by just looking at the stem-and-leaf display. With this graph, it's easy to see the representation of the data set in this form. Great job, you crazy statisticians. Here's a quick recap on how to construct a stem-and-leaf display. First, decide how to break up the measurements into stems and leaves. Next, list the stems in a vertical column in ascending numerical order. Then, list the leaf for each measurement in ascending order horizontally in a row next to the corresponding stem. Finally, provide a key in order to decode the stem-and-leaf display if needed. Remember, we're showing you the simple ways we start our stat problems, so let's move on. The different display methods for our data we've just gone over obviously use graphics to do the job. However, the methods we're going to explore right now don't rely on any sort of visual representation of the data. We're now going into numerical methods, and they're used just as easily to describe sets of data as they rely on numeric values to describe samples and populations. Oh, jeez, listen, it's obvious that I'm a statistic genius. We got that established, but I'm also got some more good state power. Oh, boy. These numerically descriptive measures are given names that really make a lot of sense. Now, throughout your investigation into the broad body of data, if you get a number that's calculated from a population, it's called a parameter. And if you get a number that's calculated from a sample, it's called a statistic. Everybody together, please. Statistic. Great job. Ha ha ha. OK, here's a joke for you, all right? Statistic walks into a bar. Bar tender says, hey, hey, sorry. We don't serve statistics here. Statistics says, hey, I was counting on that. Thank you, thank you, my mother wrote that one. In terms of statistics, the first type of numeric representation is measures of central tendency. These measures describe and locate the center of the distribution of data that you're working with. A common measure of central tendency is called the arithmetic mean, or just plain and simple, the mean. Now, the mean is nothing more than the arithmetic average of the set. Let's keep it simple. Here's the definition. The mean of a set of measurements is equal to the sum of all the values in the set divided by the number of values in the set. It's really easy, so don't make anything more out of it than it is. It's an average. There's a distinction, however, in the symbols we use to identify the mean for a sample and the mean for an entire population. The mean of a sample is identified as x bar, and it looks just like this. On the other hand, the mean of an entire population is identified as the Greek lowercase letter mu. Mu. So how does a statistician display the method for calculating the mean? Take a look. The procedure for calculating the mean is represented by a formula, as are most statistics. Here is the statistical formula for the arithmetic mean of a data distribution. The large E-shaped character is the Greek uppercase letter sigma, which indicates that you should take the sum of a set of values. We'll be using this throughout the video, so learn it now. Sigma equals summation. The lowercase i equals 1 indicates that you begin the sigma summation with the first value of the set. If the formula were to say i equals 2, then we would begin the summation with the second value of the set. The lowercase n indicates the number of values in the set. X sub i represents the values to add up. Then, to find the average of the data set or the mean, divide the summation by the number of values in the set. Remember, the summation begins with the first value of the set and ends with the nth value. Let's go back to the example of the 12 STAT students and their GPAs. We'll calculate the mean of that set of data. If we add up the 12 GPAs and divide that total by 12, we get a mean of 2.66. In this case, n is equal to 12 because there are 12 GPAs in the data set, and the average of those 12 is about 2.66. Talk to me. Oh, hi, mom. Yeah, mom, how you doing? Good. Well, you know, did I, I don't know, a little below average. All right. After the mean, the next measure of central tendency is known as the median. The median is identified as the lowercase m and is the measurement that falls in the middle position when the data is ranked in order from smallest to largest. Remember, median equals middle. Take a look here. Here are five salami sandwiches in a row, from the largest sandwich, a big sandwich, to the smallest sandwich, a very smallest sandwich. Now, I know these aren't numbers, but this is just to illustrate a point. Now, the median of these salami sandwiches is this sandwich because it's the middle sandwich in the third ranking. Here are a couple of good rules to follow. If a total number of measurements in a set is odd, then the median is the rank position of n plus 1 over 2. If the total number of measurements in a set is even, then the median is halfway between the two middle measurements or between the measurements ranked n over 2 and n over 2 plus 1. For our GPA example, the median would be between the sixth and seventh values of the set, both of which happen to be 2.7. Therefore, the median of this data set is 2.7. In addition to these two measures of central tendency, the mean and the median, there's another very common statistic called the mode. The mode is a really simple statistic which determines the measurement that occurs most often in a set of data. As you can see here from our original data set, there are two measurements which occur most often. They're the measurements 2.7 and 2.8. So our data set is a bit different in that it has two modes or, again, two specific measurements that occur more often than the rest. Okay. If you're given any data set, you should be able to figure out the mean, median, and mode right away. Now on to other types of numerical representations. Another type of numerical representation of your measurements is a measure of variability or dispersion of the data set. The most simple measure of variability is the range of the data set. To find the range is really easy. All you need to do is find the difference between the largest and smallest measurements in the set of data. That's it. Remember earlier on in the GPA example, we used the term total span to describe the difference between the highest and lowest measurements? From now on, this will officially be known as the range. Our next numeric representation is the deviation. Like the range, the deviation measures variability. So what's the deviation? Deviation is the numeric distance a measurement is from the mean or, again, the average of the set. The formula for calculating the deviation of measurement x in the following example is x minus x bar. Don't forget that x bar represents the mean of a sample data set. So it's our x measurement minus the mean. Take a look at this graphic to help you calculate deviation. For our example, the deviations of each of these randomly selected measurements are as follows. Remember that the mean, or x bar, is equal to 2.66. Also, make sure you note that the sum of all the deviations from the mean is 0. Why is this? Because measurements that are less than the mean have a negative deviation, and those that are greater than the mean have a positive deviation, thereby canceling each other out to a total sum of 0. In order to avoid the difficulty of working with negative deviations, statisticians developed another statistic called the variance. We have to be honest. The variance is a little bit strange, but it serves a very specific purpose and will be extremely helpful later on when we'll need to compute other statistics, such as standard deviation. To continue, the variance of a data set is equal to the sum of the squares of the deviations. Dealing with the squares of the numbers ensures us that they'll all be positive. However, just as the mean had different symbols for both populations and samples, the variance of a population and of a sample are represented in slightly different ways. For a population, the variance is designated by the Greek lowercase letter sigma squared and is equal to the sum of the squares of the deviations divided by the number of measurements in the set. So it's the average squared deviation. Remember that the Greek lowercase letter mu is the mean of a population. Most of the time, measurements of a population are not able to be obtained, so you'll have to use the measurements of a sample. In this case, we need to use a slightly different formula. Here's the formula for the variance of a sample. Lowercase s squared is equal to the sum of the squares of the deviations divided by 1 less than the number of values in the sample. To help grasp the idea of variance a little bit better, let's assume that our set of GPAs is a sample from a larger population. In this case, we'll use the sample variance formula to find the statistic we're looking for. As you know, to find the variance, we need to find the deviations. Then we need to find the sum of the squares of the deviations. You'll notice that this sum is positive because we've eliminated the negative signs by squaring them. We then divide by 1 less than the number of measurements in the set, or 11. This table shows the squares of the x minus x bar values and their sum of 1.3188. The variance of the sample is calculated by dividing 1.3188 by 11. After the calculation, we find that the variance is 0.1199. One more time. The variance of a sample is equal to the sum of the squares of the deviations divided by 1 less than the number of sample values. That's all for calculating variance, but it's pivotal in determining our next statistic, the standard deviation. Here to describe the last measure of variability will be, oddly enough, the standard deviance. Good evening, and thank you. I present to you now the standard deviation. The positive, the square root. Of the variance. Thank you. OK, anyway, the standard deviation is equal to a positive square root of the variance. So for our GPA example, it's equal to the positive square root of 0.1199, which was our variance, or about 0.3463. Listen carefully. When you're finding the standard deviation of a distribution, remember that the number is always positive. But what is the standard deviation, aside from a statistic that uses a lot of other statistics to find it? Well, basically, the standard deviation represents where measurements lie in the data set in relation to the mean of the set. Those measurements that are very large or very small, with respect to the mean, may be several standard deviations from the mean, while those measurements that are close to the mean on either side are within one or two standard deviations. The standard deviation is a good measure of variability, because it indicates where measurements lie on the x-axis. Due to its nature, the standard deviation does determine the flatness or concentration of the distribution curve once it's eventually drawn, meaning that the smaller the standard deviation, the more narrow the curve. And the larger the standard deviation, the wider the curve. You should realize what an important statistic the standard deviation is, and they will be using it in our more detailed work later on in the video. Time for a deep breath. Now that you know how to calculate all these statistics, you know, mean, range, deviation, we can start showing you how they're used in more practical ways. All right, box cards, baby. Let's go. Come on. Come on. Get it. Set. Hey, you're five-card Charlie, aren't you? Yeah, what's it to you? My name's Caesar, you know, like the salad. I couldn't help but noticing you are calculating ferociously. Yeah, I'm working out a relative frequency histogram for this game. I think I have something that can help. I use it whenever I have a problem. I think we should bet it all, baby. Charlie, I think our luck's beginning to change. Let's think way back to the relative frequency histogram and how it graphically shows the distribution of the measurements of your set of data. Well, we've got some news for you. Graphical representations of most population distributions end up as curved lines. These distributions often represent extremely large populations or approximations of populations. Distribution graphs for these larger data sets take on the shape of a curve because the frequency of each measurement is so large that each single measurement could conceivably be a single class. Because there are so many individual measurements in classes, the distribution must be represented by smoothing out the data into a curve. The frequency on a distribution curve is shown along the vertical axis, while the possible values for x are shown along the horizontal axis. From now on, when we discuss and represent our data and subsequently do our analysis, it will be in the form of a curved line. Typically, these graphs look like this. This graph in particular illustrates what's referred to as a normal distribution or one that's shaped like a bell. We'll discuss this specific type when we get into our distribution section. But for now, we'll show you how to use the mean and the standard deviation with respect to this example of a bell curve. In a normal distribution like the one we just saw, 68% of the measurements are within plus or minus 1 standard deviation from the mean of the distribution. Another way of stating this is that 34% of the measurements are within 1 standard deviation to the left and right of the distribution's mean on the graph. But realize that this does not indicate a negative standard deviation. To continue, 95% of the measurements are within plus or minus 2 standard deviations from the mean. And almost all of the measurements in this type of distribution are within plus or minus 3 standard deviations of the mean. This is a very important result in statistics, and it's known as the empirical rule. Remember, the empirical rule states that nearly every measurement is within 3 standard deviations from the mean, unless it is rebel scum known as an outlier. Don't underestimate the dark side of the empirical rule. He, he, he, he, he, he, he. Again, the empirical rule states that nearly every measurement in your distribution is within plus or minus 3 standard deviations from the mean, unless, of course, it lies dramatically outside of this area and is given a special determination as an outlier. We'll use this quite often throughout the video, so keep it in mind as we continue. There are a few statistics that describe where measurement lies with respect to other measurements in the distribution. This is known as relative standing. The first measure of relative standing is called the z-score. I am z-score, z-a-b-a, a measure of relative standing. I am not your relative, and I am no longer standing. The z-score measures the distance a specific measurement is from the mean in terms of the standard deviation. In other words, the z-score tells us how many standard deviations a measurement is away from the mean of the distribution. Here's the z-score formula. The z-score equals x minus x bar divided by the standard deviation. Once again, the z-score identifies a measurement along the x-axis in terms of the variance and standard deviation of the distribution of data. For example, suppose that for a certain stat course, there are two different sections of the class, section 1 and section 2. On the midterm exam, the distribution of the exam scores for section 1 has a mean of 75 and a standard deviation of 5. The scores of section 2 also have a mean of 75, but the standard deviation is 7. Now let's consider a score of 85 on the exam as a measurement on the distribution curve. In section 1, the z-score for an 85 is 2.0 using our z-score formula. And in section 2, the z-score for an 85 is 1.43. As you can see, an exam score of 85 in section 1 is relatively higher than the exact same score in section 2. This is shown by the z-scores of the same measurement in the two different data distributions. The higher z-score of 2.0 in section 1 indicates a better exam score relative to the rest of the scores in that section. According to the empirical rule, most or all of the measurements of any distribution curve should be within three standard deviations of the mean on either side, or in other words, have a z-score of 3 or less. A measurement with a z-score greater than 3 is called an outlier. Obviously, we call it an outlier because it lies outside of the normally expected area. It may help you to understand that an outlier is either very large or very small in relation to the other measurements. This accounts for its far out positioning. Another measure of relative standing that you should be aware of is called the percentile. This is an easy concept because everyone knows what percentiles are. And all a percentile does in statistics is place a measurement in the distribution as a percentage ranking in relation to all other measurements. For example, if Alyssa here gets a score on her stat exam that's higher than 70% of all the other scores but lower than 30% of all the other scores, then that exam is said to be the 70th percentile in the distribution of scores. Statisticians commonly use the 25th, 50th, and 75th percentiles, which are usually referred to as quartiles, of which there are three. These three quartiles separate the data into four quarters. Let's say that under this curve, areas A, B, C, and D are all equal. By this, we mean that they contain the same number of measurements. The dotted line on the left represents the boundary of the first quartile because it's greater than 25% of the other measurements. The median represents the second quartile because it's greater than 50% of the other measurements. The dotted line on the right represents the upper quartile because it's greater than 75% of the other measurements. And the measurements line in the portion marked D are percentiles greater than 75. you should know that in terms of the distribution curve, the z-score is very important because it's used to determine the area under the curve between two points. The determined area shows the percentage of measurements in that area. An example of this is in one of the quartiles in the previous graphic. Each quarter, A, B, C, and D, has an area of 0.25 or 25% of the total amount of measurements in the set of data you're working with. Good morning, my little bearings. You are so helpful in calculating standard deviation. Time for morning exercises. Ow, my eyes. Let's go, my little bearings. Come on. Before we move into a more detailed discussion of distributions and probability, make sure you understand everything. As we said in the beginning of the video, this science of statistics is cumulative, and a solid understanding of the most basic concepts is vital to much more challenging work in statistics. So again, rewind if you need to. Thank you. Hey, what's the matter, buddy? You in debt? Ha ha ha. Woo hoo! Ah! Part two, probability and distributions. Section A, probability. When we look at probability, let's keep in mind that we're dealing with the base root of the word, which is probable, referring to the chance that something may happen. Now, there's a really easy way to explain probability, and practically everybody uses it. All you have to do is flip a coin. This side's heads, this side's tails. Blue Captain, call in the air. Heads! If you flip a regular balanced two-sided coin, there's a one-in-two chance that the coin will land heads up. Therefore, the probability that heads will appear face up is one in two, or as we just learned with percentiles, a 50% chance. Probability is extremely useful in statistics because it helps us identify characteristics of a sample from its greater population in repeated sampling situations. In order to obtain measurements for a sample, statisticians usually perform controlled experiments. Each time an experiment like this is performed, it basically draws a measurement or a probable result from the population for the sample. So we have to use the sample to determine the probability that the sample is an event. So we use the sample to determine the probability that the sample is an event. And when we're dealing with probability in statistics, we use the sample to describe the population. However, with the use of probability, we instead move from the entire population to the sample itself. Also, keep in mind that when dealing with probability, we're working with populations, so we need to use the correct population formulas. Good morning and welcome to Cathy in Kitchen where this morning I'm preparing for a wonderful event. I'm preparing eggs penado with hollanda swiss and little Englishman fans. I'm preparing a wonderful event. A possible event of rolling a six sided die would be to roll a four for example. Another possible event could be to roll an odd number. Because these are possibilities of the experiment of rolling a six sided die they're called events. Events aren't always measured in numbers. Let's say for example that the experiment takes the form of asking a voter what party they favor in an election poll. Each party type is a different possible event within the sample space of the population of voters. So in this case a possible event could be that the voter answer is Republican or Democrat or even in my case Communist. Again probability is extremely helpful because it allows the statistician to identify the nature of a sample without having to actually perform the experiment over and over by choosing measurements from the population. Okay now Galila will begin to discuss how we classify the different types of events that occur in statistical experiments. There are two ways to classify events and it's really pretty easy. Events are said to be either simple or compound. Let's take the first one. A simple event is one that cannot be broken down any further. In other words there's only one possible outcome for that event and it cannot be identified with any other simple event. A compound event on the other hand is a combination of two or more of these simple events we just described. This is caffeine kitchen still and I'm preparing for another wonderful event. But this time I'm making poached eggs, poached eggs and poached eggs. How about if we go back and roll some dice okay? A simple event would be to roll a five. This is a simple event because it defines only one possible result of the experiment. A compound event would be to roll a number less than three. This is considered a compound event because it contains several single possible outcomes of the experiment all under the heading of rolling a number less than three. Each one of these single possibilities which make up the compound event is itself a simple event. Therefore the compound event of rolling a number less than three is comprised of two simple events. The simple events in this case are rolling a one and rolling a two. Look at me I'm making sandwiches here. I'm a sandwich maker. Rolling on baby. Three craps. Section B probability of an event. We already have a good idea of what probability is right? It's simply the chance that something may happen. Now in statistics we talk about the probability of an event. Now this is no more than the chance of an event occurring when we perform an experiment like rolling a certain number with a die. A way of determining the probability of an event is to perform the experiment a large number of times and then record how many times the event occurs. The statistical notation looks just like this and it is read the probability of event A is equal to the number of times event A occurred represented by a lowercase n divided by the number of times the entire experiment was performed represented by an uppercase n when n is very large. As uppercase n or the number of times the experiment is performed becomes much much larger the calculated probability of event A becomes much more of a specific and precise value. This formula as easy as it is describes the relative frequency concept of probability. To put it even more simply how many times and how probable it is that an event will occur is in relation to how many times it is tested. For example a selection of one card from a deck of cards is itself an experiment with 52 simple events with predetermined outcomes. Each denomination is represented four times one in each suit. Therefore the probability of drawing an ace is four divided by 52 or.077. The probability of drawing a face card is 12 divided by 52 or.23. This is easy probability. When calculating probability you'll realize that the probability of an event will always be a value between one and zero. The closer it is to one the more likely the event is to occur and the closer it is to zero the less likely it is to occur. Remember earlier when we told you the difference between simple and compound events? Well whether an event is simple or compound is determined by the events composition. You know what it's made of and when we determine the probability of an event it also depends on you guessed it the events composition. The composition of simple events is the one possible outcome that the simple event describes. This means that it's composed of only one thing or one possibility. Therefore it's not possible for two simple events to occur at the same time meaning on the same performance of the experiment. In this kind of relationship simple events of the same experiment are mutually exclusive because when one simple event occurs no others can. However simple events are not the only events that are mutually exclusive. Regardless of whether or not they're simple or compound if two events of the same experiment have absolutely no results in common then they're also mutually exclusive. For example the event of rolling an odd number and the event of rolling an even number are not simple events. However they're mutually exclusive because none of their results are the same. If we look at all the possible events as being contained in a two dimensional shape say a circle then this circle encompasses every result the experiment may have. Inside the circle are all of the events. This circle is called a sample space because it contains all the possibilities for our sample. Remember the goal of probability is to determine the likelihood of choosing a certain measurement for a sample from the population. Looking at our sample space we see how events can be mutually exclusive. Each one of the smaller circles represents one of the simple events of rolling a six sided die. In this diagram none of these circles cross each other because the results of their events have nothing in common and are therefore considered mutually exclusive. If you toss a coin it will come up either heads or tails. Both of these results are simple events and are mutually exclusive. Therefore if on one toss of the coin heads comes up then it's impossible that tails can come up on the same toss. This can obviously go for more than just coins. Think about this. An experiment can have many simple events as possible outcomes and remember simple events can't be broken down any further. Now if you take the sum of the probabilities of the simple events of the experiment they must equal one. Once again if you toss a coin the probability of the simple event of heads coming up is one in two or one half. The probability of a simple event of tails coming up is also one in two or one half. The only two events that can possibly occur in a coin toss experiment are heads or tails. Therefore the sum of the probabilities of these two simple events equals one. Alright good work. Let's roll into the event composition of compound events. The composition of a compound event depends again on the simple events that make it up. If we want to find the probability of a compound event occurring we have to add the probabilities of the simple events that make it up. This should be fresh in your mind. The probability of a compound event is equal to the sum of the probabilities of the simple events. Keeping things simple let's go back and roll some dice. The probability of rolling a number less than four which is a compound event is equal to the sum of the probabilities of the simple events of rolling a one two or three. The probability of rolling a one is one in six because there are six possible simple outcomes to the experiment and rolling a one is one of them. Therefore the probability of rolling a one is one sixth. Now this is the same for the probabilities of rolling a simple event of a two or a three. Okay the probability then of rolling a number less than four is equal to one sixth plus one sixth plus one sixth which is three over six or one half. If you want to deal in percentages we'll find there's a fifty percent probability of the compound event of rolling under a four occurring. There are two new events to learn which can be created from our simple and compound events. They are the intersection and the union. Let's begin with the intersection. If we have two events let's call them A and B then the intersection of events A and B is whatever simple events A and B have in common. The intersection of events A and B is shown as either A intersection B or more often shown as simply AB. For intersections of any two events say events A and B again the intersection occurs where both compound events find a simple event that they have in common. So the key word for an intersection is and. For event A we'll use rolling a number greater than three as our event and for event B we'll use rolling an odd number. The intersection of these two events is the simple events they have in common. When we break down both compound events into their simple events we see this more clearly. Event A consists of rolling a number greater than three or in other words rolling a four five or six. Event B consists of rolling an odd number or in other words rolling a one three or five. Again event A is a four five or six and event B is a one three or five. The simple event that both A and B have in common is the event of rolling a five. Therefore the intersection of rolling a number greater than three and rolling an odd number is rolling a five. This intersection is represented by the shaded circle in this diagram which is the simple event of rolling a five. Getting back to the key word and the intersection occurs where it's both greater than three and an odd number. Again as you see here it occurs with rolling a five. Let's take the same two events A and B and find the union of the two sets. A union consists of any simple events in A, B or both. Because of this stipulation the key word for a union is or. This is simply due to the fact that a union exists where any of the simple events of one compound event or any simple event of another compound event occur. In this graphic we see that the union of rolling a number greater than three and rolling an odd number is the combination of all the simple events in both event A and event B. The union is equal to rolling a one three four five or six. Rolling a four five and six are all greater than three while one three and five are all odd numbers. This union is represented by the shaded areas in the diagram. Though it's possible to roll a two with the die it's not part of either compound event and therefore it is not a part of the union of the two. Try to remember when events are mutually exclusive they don't intersect at all because it's impossible for one result to occur at the same time as the other and their union is simply the sum of the two events occurring. Before we move on to our next topic here's a quick FYI on something called a complement in case you face it in your next class. The complement of an event is equal to all the simple events of the experiment that are not contained in that event and because the sum of all the probabilities of a simple event in an experiment equals one the sum of the probabilities of an event and its complement is always one. Alright let's wrap this up. An event and its complement are mutually exclusive because it's impossible for an event and its complement to occur at the same time. In terms of our two types of compound events the intersection of an event and its complement has zero probability and its union has a probability of one because an event and its complement encompass all of the simple events of an experiment. Sometimes the probability of an occurrence of an event changes, often dependent upon whether or not another event has occurred. This leads us to finding the probability of an event when conditions are applied to it and guess what it's called? Conditional probability. Conditional probability is kind of like revised probability in the sense that it's probability under the advisement of a little extra information or stipulations attached. For instance, let's take for example a person playing blackjack. Suppose the hand dealt to her is comprised of a nine and an eight for a total of seventeen. She's faced with the question of whether or not to draw another card to either stay under twenty one or hit twenty one right on the money. So her dilemma is written as the probability of selecting a card less than or equal to four or the probability of selecting an ace, two, three, or four. Because there are four cards and four in each suit the probability of selecting one of these cards is sixteen over fifty two. After the calculation the probability is point three zero seven or about a thirty one percent chance of selecting one of these cards if nothing else is known. The point here is that we assume our player knows nothing else and therefore this probability is unconditional. But now suppose our same player is counting cards as she moves along and has noticed that twenty cards have already been played. She noticed fifteen of the cards were greater than four and five were less than or equal to four. Because of her additional knowledge of the situation her probability has changed. Now the probability of selecting a card less than or equal to four is eleven over thirty two or point three four three about thirty four percent. So her probability has increased a bit. This is an example of a conditional probability problem. Conditional probability can be calculated using a formula. Let's work through a conditional probability problem. The conditional probability of event A given that event B has already occurred is shown like this and is read just like a regular probability formula but you add the condition which here is event B. So with conditional probability problems you say the probability of A given B. Let's go on with the rest of the equation. Continuing on with our established events A and B the equation for a conditional probability problem is as follows. The probability of A given B is equal to the probability of the intersection of A and B or A B divided by the probability of B. In this equation the probability of B is given the qualification that it can't equal zero because well you just can't divide anything by zero. Okay let's work one out. For our example of events A and B the probability of A given B is the probability of rolling a number greater than three given that the number is odd. Using the formula for conditional probability the probability of intersection A B divided by the probability of B is equal to the probability of rolling a five divided by the probability of rolling a one three or five. Numerically this is one over six divided by three over six or one half which calculates to one third. Therefore the probability of rolling a number greater than three given that it's an odd number is about one third or about thirty three percent. Often times conditions are applied to your stat problems and these conditions are figured into the probability of events occurring. Let's roll on. What are the chances of rolling a seven shall we? I don't know. All right. Baby needs a new pair of retreads. Seven winners. It's really helpful in conditional probability problems to think of events as either dependent or independent. We've given you an example of how the probability of an event is dependent on another condition. However two events can be independent of each other as we saw with our blackjack player. If we have two events the probability of event one given event two is simply equal to the probability of event one. In this case the independent occurrence of event two does not change the chance of event one occurring. Now we're going to test you in calculating probability using everything we just learned. Let's say that the experts at the casino are analyzing their clientele in terms of the games customers play and the customer's composition. So the experiment conducted by the experts is randomly selecting customers from their casino. The three games are blackjack, craps and roulette. Each customer plays only one game and there is no crossing over. And playing the games are men, women and horses. That's right horses. The casino experts want to determine the probability of what a single customer is playing in terms of whether that customer is a man, a woman or a horse. Here's the table which provides us the information. This type of table is known as a two-way classification because it classifies the customer game and biological makeup. The games are listed along the top and the player description is listed along the left side. Let's now take two possible events from our table of information and calculate their respective probabilities and the probabilities of their intersection, union and also a conditional example. Event A is that the customer is a horse and Event B is that the customer is playing blackjack. Both of these are compound events. The probability of Event A, a horse winning, is the sum of the probabilities of its three simple events. A horse playing blackjack is 0.13, a horse playing craps is 0.03, and a horse playing roulette is 0.04. Thus the probability of any randomly selected customer being a horse is 0.20 or 20 percent. The probability of Event B selecting a customer playing blackjack is the sum of the probabilities of its three simple events. A man playing blackjack is 0.17, a woman playing blackjack is 0.10, and a horse playing blackjack is 0.13. Thus the probability of a randomly selected customer playing blackjack is 0.40 or 40 percent. Now we'll need to find the probability of the intersection of A and B, the probability of the union of A and B, and the probability of B given A. The intersection of A and B is that a horse playing blackjack is selected. This is only one simple event, and its probability is 0.13, as you can see from the graph. The union of A and B is that our randomly selected customer is a horse, is playing blackjack, or both. Looking at the table, five simple events make up this union. The total of their probabilities is 0.47. Lastly, we can calculate the probability the customer is playing blackjack given that the customer is a horse. Recalling our formula for conditional probability, we see that there's about a 65 percent chance that the customer is playing blackjack given also that the customer's a horse. Well, how'd you do? I'm sure you did great. You can go over that section again if you need to. Now, let's move on to distributions. Section C, distributions. Now that we can consider ourselves statisticians, when we perform experiments, we use some sort of measurement. This measurement is called a random variable or our X variable because its outcome is unknown before the experiment occurs, and the outcome will differ with each performance of the experiment. So a random variable is a quantity whose value depends on the outcome of a random experiment. Alright, there are two types of these random variables, discrete random variables and continuous random variables. With discrete random variables, you can assume a countable number of values, or basically, you can count the values. Whereas, with continuous random variables, you're dealing with a value that comes from an experiment with an unlimited or uncountable number of possibilities. Think of it this way, discrete, countable, continuous, uncountable. Here's an example to help you understand this point. An example of a discrete random variable would be the number of people who, let's say, live in a randomly selected dorm room. You know, zero, one, two, three, four. You just can't have 2.5 or 3.6 people in a dorm room. This is a measurement that can be taken from a countable number of choices. An example of a continuous random variable would be the exact weight of a randomly selected newborn baby. This population is continuous because the weights of newborn babies have an infinite range to choose from. With tenths of ounces of pounds, there's so many possibilities. There are obviously different probability models for these two types of random variables, so it's really important to be able to distinguish between the two. Remember that discrete random variables are countable, and continuous random variables are uncountable. Okay, we need to get acquainted with exactly what a probability distribution is before learning the specific types for each of our new random variables. Let's take probability distributions in the most general terms for discrete random variables. A probability distribution is the representation of the probability associated with each value of the random variable. We'll assign a lowercase x to denote our random variable. Therefore, according to what we've just gone over, a probability distribution for x shows the values x can take and the probability for each value. Again, in order to have a probability distribution, a few things must be established first. The probability of each value must be between 0 and 1, or 0% and 100%, as we said earlier, going over probability. And the sum of the probabilities of all possible values of the variable x must be equal to 1. Suppose we toss two coins and we want to measure the number of heads that appear. This means that our random variable, or variable x, now represents the number of heads we observe. The simple events and their probabilities of this experiment, designated by uppercase E1 through 4, are shown here in this table. The first possible event is to toss a head on both the first and second coin. The next possible event would be to toss a head on the first coin and a tail on the second, and so on. For each event, the value for x is also listed. The probability for each of these simple events is one-fourth, because the coins are fair and all outcomes are equally likely to occur. In order to form the actual probability distribution for this example, we have to list all the possible values and the probability of variable x for each of its respective values. When our variable x is 0, or no heads have appeared, E4 has occurred, and the probability of E4 is one-fourth. When x is 1, or one head has appeared, either E2 or E3 has occurred, and the probability is equal to the sum of the probabilities of these two events, or one-half. When x is 2, E1 has occurred, and the probability of E1 is one-fourth. Therefore, the probability of x being 0 is one-fourth, of x being 1 is one-half, and of x being 2 is one-fourth. Now that we have our probabilities listed, and as you might have guessed from the heading of this section, we need to create some type of representation of our findings. We can do this in the form of a histogram, or draw out the distribution to represent the data we found. When working with probability distributions, the mean, or average, of the distribution is known as the expected value. It's represented by the symbol mu, because we're dealing with a population, not a sample. The mean of a probability distribution, or expected value, is equal to the sum of each x value multiplied by its probability. So, the formula looks and sounds like this. The summation of x times the probability of x. Remember that the uppercase sigma, that big mean E-looking thing, means summation of the whole thing. In our example, you multiply the number of times that heads occurs by the probability of that number of heads occurring. So, using this expected value formula for our example, the summation of x times the probability of x would be, get ready for this, 0 times one-fourth plus 1 times one-half plus 2 times one-fourth, which equals one. Therefore, we find that after doing the calculation, that the mean, or expected value for x, is equal to one. Hey, how's it going? I just got in from the grocery store. The woman there says, hey, you want paper or plastic? I said, what the f*** do you think, okay? Look at my head. Alright, it's not that like I'm really tired and I got bags under my eyes or something, okay? I am a bag. Look at my head. Let's move back to our old friends, variance and standard deviation. The variance and standard deviation can also be calculated for our random variable, x, in this example. Recall our formula for variance in the previous section. Remember to rewind if you need to. Oh, there it is. In the case of probability distributions though, we use a slightly modified formula to find the variance of x. Let's take a look. For a probability distribution, the variance, lowercase sigma squared, is equal to the summation of the squares of the deviations, times the probability of x. In this case, the lowercase mu is still the mean, but it's now the mean of the probability distribution, or what we now call the expected value, which for our last coin tossing example, was one. The standard deviation for a random variable, designated by lowercase sigma, is simply the square root of the variance. We've graduated a bit in difficulty and jargon, but the meaning is exactly the same. Moving right along with our double coin tossing example, recall that our expected value is one. Now we'll use our new formula for variance, the three x values and each of their probabilities. If we insert the three values for x, zero, one, and two, and their respective probabilities into the formulas, and add them together, we end up with a variance of one half. And the standard deviation, or the square root of the variance, is roughly.707. Let's go over the results of our experiment again. When we tossed the two coins and observed the number of heads as the variable x, we found the expected value is one, the variance is one half, and the standard deviation is.707. Keep in mind as we move through our material that all we've done thus far is employ the basic statistics we've learned from the beginning of the video, starting with datasets. We've taken these stats and simply applied them to the more complex probability distributions for our variable x. As we continue to move, keep in mind that we're building on the principles we've already learned. Now, we'll continue to build on our understanding of probability distributions. In the high stakes world of statistics, there are several types of probability distribution models that are used, and a great deal of what we'll cover in the remainder of the video is frequently used in physical science, social science, business, and economics. Yes, that is correct. The specific types of probability distribution models that we will be discussing, of course, are particular to the discrete random variables known as binomial to binomial probability distributions. Of course, not related to my cousin, Biafrid. It does sound a little bit intimidating, I'll give you that, but don't sweat it. Just follow along. A binomial probability distribution describes the probabilities of the possible results of a binomial type experiment. This is just a specific type of probability distribution and involves binomial coefficients, which you'll recognize in just a little bit. All this means is that one of the elements we'll use to calculate probabilities in this type of distribution or experiment comes from this particular type of algebraic function. I use this material all the time in my line of work as a professional economics nebbish, and perhaps you should use it too. I like you, and I am attracted to your statistics. We already know that this type of probability distribution is for a discrete random variable, but there's a whole lot to this model. A binomial experiment has the following characteristics. One, the experiment has a certain number of trials, n, and all the trials are performed in exactly the same manner. This is effectively the sample size of the experiment. Two, each trial has one of two outcomes. The outcome will either be called a success, s, or a failure, f. Three, the probability of observing a success for a single trial is the same for each and every trial, and is shown as a lowercase p. The probability of observing a failure is also the same for every trial, and is shown in our formula as a lowercase q, which is equal to 1 minus p. Four, each single trial is independent of all others, meaning that the outcome of each consecutive trial of the experiment does not depend on the results of the previous trial, nor any other trial in the experiment for that matter. And five, the experiment observes the number of successes denoted by the variable x in n trials. Let's summarize a little here. In a binomial experiment, which we've just described, we have the following elements, n, which equals the number of trials, x, which equals the number of successes, p, which equals the probability of a success, q, or 1 minus p, which equals the probability of a failure, s, which signifies a success, and finally, f, which signifies failure. The binomial probability calculates the probability of observing x number of successes all the way up to the specified n, which is the number of trials in the experiment or the size of the sample. All right, here's our formula. The formula for observing x number of successes in n trials is as follows. The probability of x equals the quantity n factorial divided by x factorial multiplied by the quantity n minus x factorial multiplied by p to the x power multiplied by q to the n minus x power. Just as a quick reminder, when we say n factorial represented by n exclamation point, as in the previous formula, we mean n times n minus 1 times n minus 2 all the way down to 1. Oh, and just another quick reminder in case your algebra is a little rusty. Zero factorial always equals 1. Therefore, 4 factorial, as you can see here, is 4 times 3 times 2 times 1. After doing the multiplication, you'll see that 4 factorial is equal to 24. With the information we have, let's go on to another coin tossing example. Suppose we're tossing one coin 10 times. Therefore, n equals 10, so our experiment has 10 trials. This time, let's observe the number of tails. That means that x, our variable, now represents the number of tails that we observe in 10 trials. The probability of observing a tail on a single toss of the coin, as we know, is one half, or.5. Therefore, p, the probability of a success, equals.5, and q, the probability of a failure, or 1 minus p, also equals.5. With that stated, now we need to find the probability of observing 0 through 10 tails in 10 trials. What we're basically doing is finding the probability of how many tails you'll see in 10 tries, meaning the probability of 0 tails in 10 tries, the probability of 1 tail in 10 tries, and so on. We need to evaluate the probability of x, finding a tail, for all its possible values in 10 tries. Plugging in our formula for a binomial problem, the probability of observing 0 tails in 10 trials would mean that x, our variable, is 0. Here's what we get if we plug our numbers into the equation. The probability of 0 equals the quantity 10 factorial divided by 0 factorial, multiplied by the quantity 10 minus 0 factorial, multiplied by.5 to the 0 power, multiplied by.5 to the 10 minus 0 power. After we do the math, we end up with the probability of observing 0 tails, x, in 10 trials, n, being.001, or.1%. Using the exact same formula, we can calculate the probabilities of all the other possible values of x, and end up with the following values. So, we have the probability of observing 0 tails in 10 trials, n, being.001, or.002, 10 tails equals.001, or.1%. Suppose in this experiment that we wanted to find the probability of observing a number of tails less than 5. This is, of course, a compound event, and it's calculated by adding the individual probabilities of observing 0, 1, 2, 3, and 4 tails. Thus, the probability of observing a number less than 5 would equal.205 plus.117 plus.043 plus.010 plus.001, which equals.376. Our results show there's almost a 38% chance of observing a number of tails less than 5 in 10 trials. The mean, which is a lowercase mu, because we're dealing with a population, is equal to the number of trials n multiplied by the probability of a success on a single trial p. The variance, shown as a lowercase sigma squared, is equal to the number of trials multiplied by the probability of a success multiplied by the probability of a failure. And the standard deviation, shown simply as a lowercase sigma, is calculated by easily taking the square root of the value you found for the variance. These formulas follow the same algebraic principles as the ones we used earlier, but they are adjusted to the binomial model. Let's plug some numbers in and go through the calculations. The mean of our most recent example is equal to 10 trials times.5, which equals 5. The variance is equal to 10 times .5 probability of success times.5 probability of failure, which equals 2.5. The standard deviation equals the square root of the variance, or 1.58. I did not think so. Now listen up as I enlighten you. We have the relative frequency histogram and the statement leaf display. With a numeric representation, we have measures of central tendency, variability, and relative standing. Central tendency leads us to mean, medium, or mode. I want to hear you say it. Good. I want to hear you say sir. Good. Now variability leads us to range, deviation, variance, or standard deviation. Let me hear you say that. You are idiots, and you are no longer in your brother's kitchen making sandwiches. You are here. Now, relative standing leads us to z score and percentiles, but all of this is only important in terms of probability. Probability. Probability. Do you understand probability? Probability leads to events, simple events, or compound events, and eventually to the binomial probability distribution. Do you understand what I've been talking about? No sir. Well then let's dance. The binomial distribution also applies to urn models, which conceptualize several practical sampling experiments. Urn models are basically descriptions of random sampling using the analogy of balls in an urn. Let's first consider sampling with replacement. In other words, each ball chosen for the sample is thrown back in the urn before the next ball is chosen. In our urn, we have 30 balls, 12 red and 18 blue, such that the probability of randomly selecting a red ball is 12 and 30, or.4, and a blue ball is 18 and 30, or.6. If we want to find the probability of selecting a certain number of blue balls from the urn using replacement and a sample size of 5, then we can use the binomial model. In this case, n, the sample size, is 5. X, the discrete variable, is the number of blue balls in the sample. P, the probability of success or selecting a blue ball, is.6, and the probability of failure is.4. The example of the binomial distribution we just went through follows the five rules of the binomial to the T. However, there's another situation where an experiment is treated as a binomial, though it does not strictly follow the five rules and regulations of the binomial. This situation occurs where sampling without replacement is performed on a population that's extremely large. For sampling without replacement, of course, the probability of selecting a blue ball changes as the number of balls left in the urn diminishes due to our sampling. Therefore, the probability of selecting a blue ball the first time is 12 and 30, but the probability of selecting a blue ball the second time is 11 and 29 because we have just taken one blue ball from the urn. However, when the population is extremely large, the removal of one item changes the probability of success only slightly. Ignoring the slight variation, we assume that the probability of success remains the same, and we can use the binomial model to describe the distribution of the population. Suppose, however, that we're sampling without replacement in a population that's not so large and we cannot use the binomial model. Well, here's an example of what this type of distribution is all about. Let's say we have a room with 30 people, all of whom have either blonde or green hair. The distribution of these colors is 20 blonde and 10 green. Our sample size is 3 people. Therefore, in our sampling, the probability of choosing a person with a certain color hair is equal to the number of people in the room with that hair color divided by the total number of people in the room. Remember that we're sampling without replacement, so every time we choose a person for the sample, the population goes down by one, and the probability of choosing each hair color changes. So for this sample, we need to calculate the probability of getting a certain number of blonde heads and a certain number of green heads. We'll calculate the probability of selecting an n equals 3 sample of blonde, blonde, green in that order. So our sample consists of two blonde heads and one green head. The specific order indicates that we're using conditional probability, because each new selection depends on the result of the previous selection. We calculate this by multiplying the probability for each event before we choose the new measurement from the sample. Thus, the probability of choosing a sample of blonde, blonde, green is about.16 or 16 percent. We've just gone over sampling without replacement to explain not only this type of sample, but to illustrate a discrete random sampling experiment that does not use the binomial. Before we go on to our other X variable, realize that we've just covered probability distributions for discrete random variables. What are they again? Variables that have a clear, countable number of values. As we said earlier, continuous random variables can take any value in an interval or any degree of accuracy, so the number of possibilities is uncountable. The probability of any specific value is 0. The total area under the curve, however, is 1, and the area under the curve over the two points here of A and B is the probability that the random variable will take value between A and B. Well, we are on our way to lunch, but first I pose a very critical question. How many standard deviation between my left hand on the steering wheel and my right hand on the steering wheel? Approximately 17 standard deviations, give or take one or two or three, but depending on what I'm having for lunch. But I'm quite excited, so let us, we're going to be in a car accident. There are several types of distributions for continuous random variables, and the first of the two types we'll cover is called the uniform distribution. In this distribution, the variable can assume any value between two points on a line, say A and B. There's an equal probability of drawing any measurement on the line. Thus, the difference between B and A is the range in which the measurements lie. Say, for example, that this difference is 10. Then the height of the curve is 1 over 10, which is equal to 0.1, as the total area equals 1. It should be clear that in a uniform probability distribution, there's an equal probability of drawing any measurement on the line, but the measurements on the line do vary, but they don't vary any further than the two points on the line will allow. Equal probability, varied measurements. Yes, and how many measurements can occur on a line between point A and point B? Well, the answer, my friend, is uniform distribution. The answer is uniform distribution. This means that the distribution is a straight line between A and B at the height of 1 over B minus A, and the shape of this distribution is a rectangle. Because of the nature of the measurements, it's clear to see where the term uniform comes from. It's also very easy to calculate probability using this model because we're dealing with the areas of rectangles and not curves. These are the formulas for the mean and standard deviation in a uniform probability distribution. The mean is equal to A plus B divided by 2. The standard deviation is equal to B minus A divided by the square root of 12. This formula is a constant in determining the standard deviation for a uniform distribution. An example of this kind of distribution is a person's waiting time for a bus that arrives every 10 minutes. The waiting time could be anywhere from 0, 1 point, to 10 minutes, a second point, depending on when the person arrives at the bus stop. The possibilities of arrival times are virtually uncountable in a statistical situation like this and are perfectly suited to its nature as a continuous random variable. Using our explanation of the uniform distribution, because this is a continuous random variable, there's an enormous amount of possible measurements from 0 to 10 minutes. And although the bus has an equal probability of arriving any time in that period, the measurements are vast. The bus could possibly arrive at any tenth or hundredth of a second in that 10-minute span of time. The probability that the value of X will fall between two specific measurements is equal to the area of the rectangle formed by those two measurements on the X axis. For example, the probability that a person will wait between three and five minutes for the bus is equal to the area of the sub-rectangle formed by three and five in our uniform distribution. As we said earlier, most commonly used probability distribution curves for continuous random variables end up as smooth curves, usually in a bell shape. The most important of such curves is known as a normal probability distribution, which was also the first curve we showed in the video. It's the second type of probability distribution in our treatment of the continuous random variable. A normal probability distribution curve has an area underneath it of 1. This is true for all continuous distributions. It's symmetric about its mean, lowercase mu, and is also determined by the population's standard deviation, lowercase sigma. The larger the value of the standard deviation, the shorter the height of the distribution curve. Since the curve is symmetric about its mean, half the area under the curve, or.5, lies both to the right and to the left of the mean. Recall one of our measures of relative standing, the Z-score. Sometimes when Z-score is waiting for his food, he thinks about when he was a Z-score, measuring little itty bitty things, like stirring carrots and peas and rattles and other things he got from his mummy. But now Z-score is all alone, ah, cursed to measure standard deviations across the land. But he will get over it. Z-score is strong. Once again, the Z-score denotes the distance a measurement lies from the mean in terms of the standard deviation of the distribution. Therefore, a Z-score of zero means that the measurement is equal to the mean. Also, a measurement with a positive Z-score lies to the right of the mean, while a measurement with a negative Z-score lies to the left of the mean. In this vein, a Z-score of one means that the measurement is one standard deviation away from the mean. For example, if we calculate that a measurement from a distribution has a Z-score of one point five, then that measurement lies one and a half standard deviations away to the right of the mean of the population. If we show the placement of the Z-score on the graph, we see that the area under the curve between the mean and the Z-score, denoted as capital A, is the percentage of the measurements that the Z-score of one point five encompasses between it and the mean. The value of this area is also the probability of choosing a measurement with a Z-score between zero and one point five. Whether the area lies to the right or left of the mean depends on whether the Z-score is positive or negative. There's a table which helps us to determine this area. This area actually represents the probability of choosing a measurement with a Z-score between zero and some Z-score value. Rather than have to calculate this, statisticians use a prefabricated table, which you should be able to find in your statistics book. The left hand column lists the Z-score in increments of one tenth. The top row lists further increments of one hundredth. The numbers inside the table are the corresponding areas for each Z-score in a combination of its measurement to the tenth and to the hundredth. If we wanted to find the area for a Z-score of point two one, we would look down the left column for point two and across the top row for point zero one and then find the area at which they intersect. This area is point zero eight three two and is the probability of choosing a measurement with a Z-score between zero and point two one. Be careful here. Point zero eight three two is not the probability of choosing a measurement with a Z-score of point two one, but of choosing a measurement with a Z-score between zero and point two one. As an example, suppose we have a normal distribution with a mean of ten and a standard deviation of two. What if we needed to find the probability of choosing a measurement of X between nine and twelve? First, we'll need to find the Z-scores for nine and twelve. Remember that the Z-score is equal to X minus the mean and we take that quantity and divide it by the standard deviation. So for nine, we take nine minus the mean of ten divided by the standard deviation of two and get a negative point five. For twelve, we take twelve minus ten divided by the standard deviation of two and arrive at one. So we get a Z-score of negative point five for nine and of one for twelve. To find the probability associated with these Z-scores, we'll have to use the table. Because of the symmetry of the normal distribution, the area associated with a negative Z-score is equal to the area associated with the positive value of that Z-score. Therefore, we'll need to find the area between zero and point five and between zero and one. These areas according to the table are point one nine one five and point three four one three respectively. Now that we have these two areas from the table, we must add them together in order to find the total area between the Z-scores of negative point five and one. By adding point one nine one five and point three four one three, we get a total area of point five three two eight. This total area represents the probability of choosing a measurement between nine and twelve in our distribution. This area of point five three two eight also represents the percentage of measurements in the data set that lie between nine and twelve. In other words, 53.28 percent of the measurements in the data set are between nine and twelve. I'm in trouble. I don't see what the problem is. Why don't you guys understand this stuff? I don't get it. I don't even have a blue book. Hey, come on you guys. It's not so bad. Let's just go over this again. Yeah, Chaz, would you explain it just one more time? Alright, just listen up and I'll explain it to you. Beginning with probability or the chance that an event will occur, we discussed events, event composition, and conditional probability. An event's composition, whether it's simple or compound, determines how to calculate its probability. Simple events are mutually exclusive of each other and cannot be broken down into smaller events. And compound events are made of several of these building block simple events. The union of two events means that either or both events can occur. An intersection of two events is only the simple events which the two have in common. And don't forget about conditional probability. Events can be either dependent or independent. Then we moved on to our random X variables, which are either discrete or continuous. Discrete random variables can take on a countable number of values, while continuous random variables can take unlimited or uncountable number of values. In statistics, we're concerned with distributions or ranges for the values of a certain variable in a population. For discrete random variables, the probability distribution of a population shows the likelihood of the variable taking on a certain value within the population. We also calculated the expected value, or mean, the variance and the standard deviation, which are common things you need to calculate when you do probability distributions. Don't forget the importance of the binomial probability distribution. If you remember, it's one type of model for discrete random variable probability distributions, which deals with the probability of successes and failures, or whether a certain event occurs or doesn't occur. Michael? Right. Moving right along with distributions for continuous random variables, a probability is represented as the area under a curve. To determine areas under the normal distribution curve, we use the standard deviation and the z-score of measurements. We also discuss the uniform distribution, where there's an equal likelihood for all the possible measurements of the population. Remember, too, that the normal probability distribution is distributed so that the graph is a bell curve, and the population follows the empirical ruler. Also, don't forget that we learned even more about our favorite measure of relative standing, the z-score. There's a term that you all know I am z-scoring, and now I am going to determine how many are standard deviations between these sockets, and this little... In terms of the z-score, we not only learned where a measurement lies in relation to the mean, but also how to get the probabilities of certain z-scores with the z-score table, which brings us up to where we are now. Hopefully, you're feeling comfortable and caught up, and we can move on to part three. In the immortal words of my dog, Varine's, woof. Part three, sampling, and sampling, and sampling, distribution. Section A, a look at sampling distributions. As we've said over and over, when populations are too large to measure, or too large to determine anything from them, statisticians analyze samples in order to make inferences about the population. In order to assess the reliability of inferences about the population based on a particular statistic from the sample, statisticians look at the sampling distribution of that one statistic. Here's a nice way to visualize how a sampling distribution is created. Let's imagine that there are bins set up along the x-axis of the graph of the sampling distribution. We've created hypothetical numbers for the calculations, so you don't have to worry about the specifics. But every time the mean of a sample is calculated, a bin is dropped into one of the bins in a location along the x-axis specific to its value. From the calculation of the first sample to the nth sample, a bin is dropped each and every time there's a new measurement. As you can see, the sampling distribution is quite vividly formed in this way. Now the means of these samples tend to have an approximately bell-shaped distribution. This is called the central limit theorem. Theorem, theorem, theorem, complex, pedantic theorem, theorem, theorem, blah, blah, blah, blah. Yes. Question, question, confusion, bewilderment, question? Response. Theorem, theorem, theorem. Yes. Combative response. Disgusted reply. Theorem, theorem, theorem. Yes. My pen blew up. This theorem states that when n number of measurements are drawn from a population with a mean of mu and a standard deviation of sigma, the sampling distribution of the means of the samples, x-bar, is approximately normally distributed, provided that n is very large. Now this is very similar to what we saw with our distributions of continuous random variables. Basically, what we have is a population, and we take samples with equal number of measurements from that population. Then we find the mean of each of the samples. The mean of the samples form their own distribution, which is a sampling distribution. According to the central limit theorem, this type of sampling distribution should be normally distributed, or curve-shaped like a bell when the sample size is large. Here's a couple other important points. The mean of the distribution of the sample means should be equal to the mean of the original population, and the standard deviation of the distribution of the sample mean should equal the standard deviation of the population, divided by the square root of the number of measurements in the sample, or n. These approximations become more accurate as n, or our sample size, becomes larger. Also realize that the standard deviation of the mean sampling distribution is the best measure of reliability of a sample. It tells us how good of an estimation the sample mean is of the population mean. This is our goal, to make estimations about our population from our samples. According to the central limit theorem, the mean of the sampling distribution is just about equal to the mean of the population. Therefore, the standard deviation of the sampling distribution tells us how different the mean for each sample is from the mean of the population. For this reason, the standard deviation of a sampling distribution of the means is also called the standard error of the mean, which basically gives us an idea of how far away we are from the population. Statisticians use what are called estimators to make inferences from the sample about the population itself. In other words, we use the mean, or any other measure from a sample, to make inferences about the greater population. Now, what exactly is an estimator? Well, an estimator is simply the rule that tells us how to calculate an estimate. It's used to draw conclusions about the population from the sample in the form of this estimate. So an estimator walks into a bar. All right, bartender says, Hey, estimator, what time is it? Estimator says, I don't know about three o'clock. Thank you. Thank you. I wrote that one myself. There are two types of estimators, point and interval. Both of these types use some statistic from the sample to calculate a value which helps us make inferences about the population parameter. Remember from the beginning of the video, population, parameter. Let's take the first one. A point estimator of a population parameter is a statistic which indicates how to calculate a single number using sample data. Now, this number is called a point estimate. A point estimate allows us to draw a conclusion about a parameter of the population. There's another type called interval estimators of a population parameter. Interval estimators allow us to calculate two numbers forming an interval within which the population parameter is believed to lie. This pair of numbers is called the interval estimate, or also known as the confidence interval. Unlike the point estimator which estimates that the population parameter is a single value, the interval estimator indicates that the population parameter lies within a certain range. Also, statisticians use measures of accuracy in order to determine trueness of these estimators. Such measures of accuracy are the error of estimation and the confidence coefficient. Let's start with the first one. The error of estimation is the distance from or the difference between the calculated estimate and the parameter we're estimating. The confidence coefficient, on the other hand, is the probability that the calculated confidence interval will enclose this parameter that we're estimating. Thus, if we're trying to estimate the mean of a population, we would use the mean of the sample as our estimator. Alright, thank you, thank you. Now I want to show you a little dance that was invented by my cousin Shecky. It's called the interval dance. Watch. See, I was dancing at intervals. Thank you, thank you cousin Shecky. You helped out a lot for my act. It's the last time I ever give you that rent check. Alright, thank you, thank you. Remember that the standard deviation of the sampling distribution is calculated by dividing the standard deviation of the population by the square root of the number of measurements in the sample, n. This new calculated value is useful in determining the error of estimation only if the number of measurements in each sample is greater than or equal to 30. n equals 30 is a standard in statistics for using the central limit theorem. Let's go back to five-card Charlie as an example. Remember that Charlie needed information about his population of fuzzy dice cola drinkers. Suppose he takes a random sample of 50 measurements. It produced the mean of 871 ounces of cola consumed and a standard deviation of 21. Charlie wants to estimate the mean of the population, mu, using the mean of the sample, x-bar. He does this by defining a range called the error bound. For this example, the error bound is defined by plus or minus 1.96 multiplied by the quantity 21 divided by the square root of 50. This value is 5.82 ounces. If you recall, the central limit theorem says that x-bar is normally distributed. Here, we use the z-score of 1.96 to identify a 95% interval because the z-score of 1.96 contains 47.5% of the measurements on either side of the mean. We are also using the standard error of x-bar. Therefore, we're 95% confident that the mean of the population is between plus or minus 5.82 of our estimate, 871 ounces of cola consumed. In other words, we are 95% confident that the actual mean of the population will be between 876.82 and 865.18. These two values are the upper confidence limit and the lower confidence limit, respectively. Excuse me. I was wondering if you'd like to come back with me to my cabin in Saskatchewan. Please. I've got a wood stove. Please. How you doing, baby? How about you and me take off to my cabin in Rio? Yes. Let's do it. As we said earlier, statisticians make inferences from the sample about the population. Hypothesis testing is one way in which to make inferences. Using this method, statisticians make a hypothesis and then need to test this hypothesis. Using a statistical test, this hypothetical statement is given the name of the alternate hypothesis. In calculations like this, the alternate hypothesis is represented as H sub A. In order to test this hypothesis, there must be something to test it against. Therefore, the exact opposite of the alternate hypothesis is created, and it's called the null hypothesis. It's shown in a similar way to the alternate hypothesis in calculations as H sub 0. When set up properly, if the alternate hypothesis is true, then the null hypothesis is false. Basically, the null hypothesis and the alternate hypothesis must be written to support one of them as acceptable and reject the other as unacceptable. If we test the null hypothesis, the result of the test will determine whether to accept or reject it. In order to go ahead and test the null hypothesis, there must be a decision-making process. This process is based on what's called a test statistic. Remember that we're testing the null hypothesis. Therefore, if the test statistic lies in the rejection region, then we're accepting the alternate hypothesis, thereby rejecting the null hypothesis. If the test statistic lies in the acceptance region, we're accepting the null hypothesis, thereby rejecting the alternate hypothesis. There are two types of errors that can be made in the testing process, a Type I error and a Type II error. A Type I error is made by rejecting the null hypothesis when it's true. The probability of making a Type I error is denoted as lowercase alpha. A Type II error is made by accepting the null hypothesis when it's false. The probability of making a Type II error is denoted as lowercase beta. This table represents the decisions a statistician can make from a statistical test. If he or she rejects the null hypothesis and the null hypothesis is true, then a Type I error has been made. However, if the null hypothesis is false, then the correct decision has been made. If he or she accepts the null hypothesis and the null hypothesis is true, then a correct decision has been made. However, if the null hypothesis is false, a Type II error has been made. In these types of tests, the value which separates the rejection from the acceptance region is called the critical value. In our case, the critical values would have z-scores of 1.96 and negative 1.96 because they are the boundaries between the acceptance region and the rejection region. Because we're still dealing with normal distributions, keep in mind we still have a symmetrical bell curve about the mean. Therefore, the rejection region can lie on either or both sides of the curve. Why is this important? Take a look at the graph. A statistical test that locates the rejection region on only one side of the curve is called a one-tailed test, while one that locates the rejection region on both sides of the curve is called a two-tailed test. It's easy to see because they really do look like tails. Let's go back to our example with five-card Charlie. Suppose we want to test the null hypothesis that mu is equal to 880 against the alternate hypothesis that mu is not 880. Remember n equals 50, the mean is 871, and the standard deviation is 21. Also, we're 95% confident that the mean, mu, is between 865.18 and 876.82. Since this interval does not include the value 880 given by the null hypothesis, we're 95% confident that the null hypothesis is not true and should therefore reject it. Usually, statisticians test a hypothesis using a test statistic, which is a z-score in our case. Oh, how many standard deviations in z-scores pick up? Oh, I think it's six standard deviations or maybe five. Here's how to calculate the test statistic for this test. Since we are testing our hypothesis using the z-score distribution, our test statistic is a z-score which corresponds to the null hypothesis in our distribution of the means of the samples. Remember that the point estimate for mu is x-bar. Therefore, our test statistic is z equals the quantity x-bar minus mu sub-zero divided by the quantity sigma over the square root of n. Mu sub-zero indicates the value of our null hypothesis. Here's what we get when we insert the values. z equals the quantity 271 minus 880 divided by 21 over the square root of 50. This is our test statistic and calculates to be z equals negative 3.03. Therefore, we should reject the null hypothesis that mu equals 880. All right. With the foundation of large samples established, you should do just fine in the next section. Section C, small samples. Statistical tests for smaller samples and for larger samples are basically the same procedure with one variation. The statistical test for smaller samples uses the t-statistic, which is similar to the z-score when using larger samples. The t-statistic is calculated by subtracting the mean of the population from the mean of the sample and dividing that by the quantity of the standard deviation of the sample divided by the square root of the sample size. In order to use the t-statistic, we have to also determine what's referred to as degrees of freedom. Degrees of freedom is equal to the sample size minus 1 or n minus 1. The degrees of freedom is used to adjust the t-statistic to the specific size of the small sample. The critical values for a test based on the t-statistic come from a t distribution. For a given t distribution and a given value alpha, we use t sub alpha as the critical value in our test. T sub alpha is such that the area to the right of the value on the distribution is equal to alpha, meaning the probability of a t-statistic taking on a value larger than t sub alpha is equal to alpha. There's a table in your statistics book which helps you easily determine certain critical values of the t distribution. The degrees of freedom are down the first column and the values of the percentage of the area to the right of t or alpha is along the top row. For example, if we wanted to find the critical value for t such that the degrees of freedom is 6 and 5% of the area under the curve is to the t's right, then we look down the degrees of freedom column to 6 and along the top row for t sub.050 since we're looking for 5% of the area to be to the right of t. The corresponding value for t in this case is 1.943. We should note that for a one-tailed test, we find the value for t using t sub alpha along the top row. However, for a two-tailed test, the value for t is determined by using alpha divided by 2 because the critical region is split between both ends of the distribution on the curve. As an example, let's say for a sampling distribution, we have n equal to 6, x-bar equal to.53, and s equal to.0559. We may test that the null hypothesis is mu equal to.5, and the alternate hypothesis is mu greater than.5, making this a one-tailed test. Using the formula for the t statistic of the null hypothesis, we find that t equals 1.32. Now we need to find the rejection region such that alpha equals.05. Looking on the table, we find degrees of freedom of origin, n minus 1, or 5, and t sub.050. This value is 2.015. Since the t statistic of the null hypothesis does not fall in the rejection region, we must accept it, and thus we cannot accept the alternate hypothesis that mu is greater than.5. Guys, wake up. That's a test for smaller samples. As you can see, it's pretty much like a statistical test for large samples, except that you use the t statistic instead of the z score. You've come a long way. You know now how to test inferences about a population using sample statistics and sampling distributions, which are based on probability distributions, which are based on the models for discrete and continuous random variables, which of course come from our data sets, which is where this whole thing started. You've done a great job. Pat yourself on the back. Sometimes when you're studying your statistics, it's really helpful to sing a song or two or just take a little bit of a study break. I like to take a study break with a good cup of Joe, and I've got a song about it. Coffee, coffee, the wonderful drink. You can make it from the water in the sink. You drink it down and it wakes you up. It's made from beans and it comes in a cup. It's coffee, coffee, coffee, coffee. Thank you. Well, we hope you enjoyed watching this as much as we enjoyed making it. In this video, we covered everything from statistical problems and data sets to probability and distributions to sampling and sampling distributions. That's right. And make sure you look in your bookstore for standard deviants video reviews with the standard deviants. Bye. Bye mom. Bye bye. The stems of the data set in ascending numerical order. Sorry, did I say data or data? And to represent the data we found. Oh, but I said data. The stems of the data set. Data, data. Burp Charlie, burp. Burp Charlie. Why are you going towards me? This was a really good video. How long have we been studying? A long time. Hey Steve, what did you think of that? Steve? Steve? Oh my God. Oh my God. I am Z-score. In N trials. In N trials. In M trials. In N trials. In M trials. In N. Action. How you How you doing, baby? How about you and me go to my cabin in Rio? Yes! What was it doing? You Thank you. Thank you.