Statistics is the study of the collection, analysis, interpretation, presentation, and organization of data. In applying statistics to, e.g., a scientific, industrial, or societal problem, it is conventional to begin with a statistical population or a statistical model process to be studied.
These are homework exercises to accompany the Textmap created for “Introductory Statistics” by Shafer and Zhang. Complementary General Chemistry question banks can be found for other Textmaps and can be accessed here. In addition to these publicly available questions, access to private problems bank for use in exams and homework is available to faculty only on an individual basis; please contact Delmar Larsen for an account with access permission.
Descriptive Statistics
Statistics naturally divides into two branches, descriptive statistics and inferential statistics. Our main interest is in inferential statistics to try to infer from the data what the population might thin or to evaluate the probability that an observed difference between groups is a dependable one or one that might have happened by chance in this study. Nevertheless, the starting point for dealing with a collection of data is to organize, display, and summarize it effectively. These are the objectives of descriptive statistics, the topic of this chapter.
A well-known adage is that “a picture is worth a thousand words.” This saying proves true when it comes to presenting statistical information in a data set. There are many effective ways to present data graphically. The three graphical tools that are introduced in this section are among the most commonly used and are relevant to the subsequent presentation of the material in this book.
Measures of Central Location – Three Kinds of Averages
This section is be titled “three kinds of averages” because any kind of average could be used to answer the question “where is the center of the data?”. We will see that the nature of the data set, as indicated by a relative frequency histogram, will determine what constitutes a good answer. Different shapes of the histogram call for different measures of central location.
The Mean
The first measure of central location is the usual “average” that is familiar to everyone: add up all the values, then divide by the number of values. Before writing a formula for the mean let us introduce some handy mathematical notation.
Measures of Variability
The two sets of ten measurements each center at the same value: they both have mean, median, and mode . Nevertheless a glance at the figure shows that they are markedly different. In Data Set I the measurements vary only slightly from the center, while for Data Set II the measurements vary greatly. Just as we have attached numbers to a data set to locate its center, we now wish to associate to each data set numbers that measure quantitatively how the data either scatter away from the center or cluster close to it. These new quantities are called measures of variability, and we will discuss three of them.