1 CHAPTER 1 Exploring Data1.2 Displaying Quantitative Data with Graphs
2 Displaying Quantitative Data with GraphsMAKE and INTERPRET dotplots and stemplots of quantitative data DESCRIBE the overall pattern of a distribution and IDENTIFY any outliers IDENTIFY the shape of a distribution MAKE and INTERPRET histograms of quantitative data COMPARE distributions of quantitative data
3 Ways to chart quantitative dataHistogram, stemplots, dotplots, and boxplots These are summary graphs for a single variable. They are very useful to understand the pattern of variability in the data. Line graphs: time plots Use when there is a meaningful sequence, like time. The line connecting the points helps emphasize any change over time.
4 Dotplots A dotplot is a simple display. It just places a dot along an axis for each case in the data. The dotplot to the right shows Kentucky Derby winning times, plotting each race as its own dot. You might see a dotplot displayed horizontally or vertically. Slide 4- 4
5 Displaying Quantitative DataExamining the Distribution of a Quantitative Variable The purpose of a graph is to help us understand the data. After you make a graph, always ask, “What do I see?”?” Displaying Quantitative Data How to Examine the Distribution of a Quantitative Variable In any graph, look for the overall pattern and for striking departures from that pattern. Describe the overall pattern of a distribution by its: Shape Center Spread Note individual values that fall outside the overall pattern. These departures are called outliers. Don’t forget your SOCS! 5
6 Displaying Quantitative DataDescribing Shape When you describe a distribution’s shape, concentrate on the main features. Look for rough symmetry or clear skewness. Displaying Quantitative Data Definitions: A distribution is roughly symmetric if the right and left sides of the graph are approximately mirror images of each other. A distribution is skewed to the right (right-skewed) if the right side of the graph (containing the half of the observations with larger values) is much longer than the left side. It is skewed to the left (left-skewed) if the left side of the graph is much longer than the right side. Symmetric Skewed-left Skewed-right 6
7 Displaying Quantitative DataExamine this data The table and dotplot below displays the Environmental Protection Agency’s estimates of highway gas mileage in miles per gallon (MPG) for a sample of 24 model year 2009 midsize cars. Displaying Quantitative Data Describe the shape, center, and spread of the distribution. Are there any outliers? 7
8 Displaying Quantitative DataComparing Distributions Some of the most interesting statistics questions involve comparing two or more groups. Always discuss shape, center, spread, in context,and possible outliers whenever you compare distributions of a quantitative variable. Displaying Quantitative Data Example, page 30 Compare the distributions of household size for these two countries. Don’t forget your SOCS! U.K South Africa Place 8
9 Displaying Quantitative DataStemplots (Stem-and-Leaf Plots) Another simple graphical display for small data sets is a stemplot. Stemplots give us a quick picture of the distribution while including the actual numerical values. Displaying Quantitative Data How to Make a Stemplot Separate each observation into a stem (all but the final digit) and a leaf (the final digit). Write all possible stems from the smallest to the largest in a vertical column and draw a vertical line to the right of the column. Write each leaf in the row to the right of its stem. Arrange the leaves in increasing order out from the stem. Provide a key that explains in context what the stems and leaves represent.
10 Stemplots These data represent the responses of 20 female AP Statistics students to the question, “How many pairs of shoes do you have?” Construct a stemplot. 50 26 31 57 19 24 22 23 38 13 34 30 49 15 51 Stems 1 2 3 4 5 Add leaves 4 9 Order leaves 4 9 Add a key Key: 4|9 represents a female student who reported having 49 pairs of shoes.
11 Stemplots When data values are “bunched up”, we can get a better picture of the distribution by splitting stems. Two distributions of the same quantitative variable can be compared using a back-to-back stemplot with common stems. Females Males 50 26 31 57 19 24 22 23 38 13 34 30 49 15 51 14 7 6 5 12 38 8 10 11 4 22 35 Females 333 95 4332 66 410 8 9 100 7 Males 0 4 1 2 2 2 3 3 58 4 5 1 2 3 4 5 “split stems” Key: 4|9 represents a student who reported having 49 pairs of shoes.
12 Stemplots versus histogramsStemplots are quick and dirty histograms that can easily be done by hand, therefore very convenient for smaller data sets. However, they are rarely found in scientific or laymen publications. When might you NOT want to use a Stemplot?
13 Displaying quantitative data: HistogramsDisplays counts or percents Shows trend of data User defines number of classes Good for large data sets Does not display actual data values The bars have the same width and always touch (the edges of the bars are on class boundaries which are described below). The width of a bar represents a quantitative variable x, such as age rather than a category. The height of each bar indicates frequency. Quantitative variables often take many values. A graph of the distribution may be clearer if nearby values are grouped together. The most common graph of the distribution of one quantitative variable is a histogram.
14 Displaying Quantitative DataHow to Make a Histogram Divide the range of data into classes of equal width. Find the count (frequency) or percent (relative frequency) of individuals in each class. Label and scale your axes and draw the histogram. The height of the bar equals its frequency. Adjacent bars should touch, unless a class contains no individuals. Displaying Quantitative Data To find the class width, First compute: Largest value - Smallest Value Desired number of classes Increase the value computed to the next highest whole, number even if the first value was a whole number. This will ensure the classes cover the data.
15 How to create a histogramIt is an iterative process – try and try again. What bin size should you use? Not too many bins with either 0 or 1 counts Not overly summarized that you loose all the information Not so detailed that it is no longer a summary rule of thumb: start with 5 to10 bins Look at the distribution and refine your bins (There isn’t a unique or “perfect” solution)
16 Using the TI-83 to make histogramsThe TI-83 can be used to make histograms, and will allow you to change the location and widths of the ranges. Turn to Page 36 in your textbook and follow the directions in the Technology Corner. 16
17 Using the TI-83 to make histogramsIYou can also change the size and location of the ranges by using the Window button Use the scale key to change the number of classes. Enter the CLASS WIDTH. Press the Graph button to see the results 17
18 When do we use the frequency key?Suppose that the distribution of scores for a class on the AP test were: SCORE FREQUENCY 1 3 2 5 14 4 6
19 Be sure to choose classes all the same width.Histogram Tips Be sure to choose classes all the same width. Use your judgment in choosing classes to display the shape. Too few classes will give a 'skyskaper' graph; Too many will produce a 'pancake' graph.
20 Same data set Not summarized enough Too summarized
21 Describing the Shape of a HistogramDoes the histogram have a single, central hump or several separated bumps? Humps in a histogram are called modes. A histogram with one main peak is dubbed unimodal; histograms with two peaks are bimodal; histograms with three or more peaks are called multimodal . Slide 4- 21
22 Humps and Bumps (cont.) A histogram that doesn’t appear to have any mode and in which all the bars are approximately the same height is called uniform: Slide 4- 22
23 Most common distribution shapesSymmetric distribution A distribution is symmetric if the right and left sides of the histogram are approximately mirror images of each other. Skewed distribution A distribution is skewed to the right if the right side of the histogram (side with larger values) extends much farther out than the left side. It is skewed to the left if the left side of the histogram extends much farther out than the right side. Complex, multimodal distribution Not all distributions have a simple overall shape, especially when there are few observations.
24 Anything Unusual? Don’t forget to make note of any unusualfeatures denoted in the shape of the distribution. Sometimes it’s the unusual features that tell us something interesting or exciting about the data. You should always mention any stragglers, or outliers, that stand off away from the body of the distribution. Are there any gaps in the distribution? If so, we might have data from more than one group. Slide 4- 24
25 Outliers Always look for outliers and try to explain them.For Example: The overall pattern is fairly symmetrical except for 2 states clearly not belonging to the main trend. Alaska and Florida have unusual representation of the elderly in their population. A large gap in the distribution is typically a sign of an outlier. This is from the book. Imagine you are doing a study of health care in the 50 US states, and need to know how they differ in terms of their elderly population. This is a histogram of the number of states grouped by the percentage of their residents that are 65 or over. You can see there is one very small number and one very large number, with a gap between them and the rest of the distribution. Values that fall outside of the overall pattern are called outliers. They might be interesting, they might be mistakes - I get those in my data from typos in entering RNA sequence data into the computer. They might only indicate that you need more samples. Will be paying a lot of attention to them throughout class both for what we can learn about biology and also because they can cause trouble with your statistics. Guess which states they are (florida and alaska). Alaska Florida 25
26 Histograms are Similar to Bar Graphs and so:A relative frequency histogram displays the percentage of cases in each bin instead of the count. Relative frequency histograms qre good for comparing distributions of unequal counts Slide 4- 26
27 Notice the shape does not change when comparing frequency and relative frequency HistogramsAP Statistics, Section 1.1, Part 4 27 27
28 Displaying Quantitative DataUsing Histograms Wisely Here are several cautions based on common mistakes students make when using histograms. Displaying Quantitative Data Cautions Although they are similar, don’t confuse histograms and bar graphs. Don’t use counts (in a frequency table) or percents (in a relative frequency table) as data. Use percents instead of counts on the vertical axis when comparing distributions with different numbers of observations. Just because a graph looks nice, it’s not necessarily a meaningful display of data.
29 Constructing Frequency PolygonsMake a frequency table that includes class midpoints and frequencies. For each class place dots above class midpoint at the height of the class frequency. Put dots on horizontal axis one class width to left of first class midpoint, and one class width to right of of last midpoint. Connect dots with straight lines.
30 Frequency Polygon
31 Line graphs: time plotsIn a time plot, time always goes on the horizontal, x axis. We describe time series by looking for an overall pattern and for striking deviations from that pattern. In a time series: A trend is a rise or fall that persist over time, despite small irregularities. A pattern that repeats itself at regular intervals of time is called seasonal variation.
32 Retail price of fresh oranges over timeTime is on the horizontal, x axis. The variable of interest—here “retail price of fresh oranges”— goes on the vertical, y axis. This time plot shows a regular pattern of yearly variations. These are seasonal variations in fresh orange pricing most likely due to similar seasonal variations in the production of fresh oranges. There is also an overall upward trend in pricing over time. It could simply be reflecting inflation trends or a more fundamental change in this industry.
33 A time plot can be used to compare two or more data sets covering the same time period.The pattern over time for the number of flu diagnoses closely resembles that for the number of deaths from the flu, indicating that about 8% to 10% of the people diagnosed that year died shortly afterward from complications of the flu.
34 Scales matter How you stretch the axes and choose your scales can give a different impression. A picture is worth a thousand words, BUT There is nothing like hard numbers. Look at the scales.
35 Section 1.2 Displaying Quantitative Data with GraphsSummary In this section, we learned that… You can use a dotplot, stemplot, or histogram to show the distribution of a quantitative variable. When examining any graph, look for an overall pattern and for notable departures from that pattern. Describe the shape, center, spread, and any outliers. Don’t forget your SOCS! Some distributions have simple shapes, such as symmetric or skewed. The number of modes (major peaks) is another aspect of overall shape. When comparing distributions, be sure to discuss shape, center, spread, and possible outliers. Histograms are for quantitative data, bar graphs are for categorical data. Use relative frequency histograms when comparing data sets of different sizes.
36 Looking Ahead… In the next Section…We’ll learn how to describe quantitative data with numbers. Mean and Standard Deviation Median and Interquartile Range Five-number Summary and Boxplots Identifying Outliers We’ll also learn how to calculate numerical summaries with technology and how to choose appropriate measures of center and spread.