1 Lecture #3 Tuesday, August 30, 2016 Textbook: Sections 2.4 through 2.6Statistics 200 Lecture #3 Tuesday, August 30, 2016 Textbook: Sections 2.4 through 2.6 Objectives (all relating to quantitative variables): • Recognize and interpret two plots: – Histogram – Boxplot • Calculate and interpret two measures of center – Mean – Median • Calculate and interpret five-number summary • Recognize and understand effects of outliers & skewness
2 Motivating example A group of students was randomly assigned to one of two classes. One class was taught by teacher A and the other by teacher B. At the end of the semester, all students took the same exam. Investigate whether there is any difference in exam scores between the two teachers.
3 Summarizing Quantitative VariablesThe distribution of a quantitative variable is the overall pattern of how often the possible values occur. Four key aspects of the distribution are: Location: center, average Spread: variability Shape: symmetric, bell, skew outliers Let’s begin with the shape, which is best seen with a visual summary
4 Visual summaries for quantitative variablesHistogram Boxplot A chart of the data that shows how many observations are in each equally spaced interval. Usually use 6-15 intervals Can use frequency or relative frequency
5 Histograms Teacher A Scores Teacher B Scores
6 Outlier An individual value that is unusual compared to the bulk of the other values. Outlier!
7 Example When considering study hours/week, what percent of the students spend: at most 3 hours? at least 11 hours? between 5 and 9 hours?
8 Shapes of distributionsSymmetric the shape of the data is similar on both sides of the center. Bell-shaped is a special case of symmetric Skewed: Values are more spread out on one side than the other. Left-skewed: lower values more spread out than higher values Right-skewed: higher values more spread out than lower values.
9 Shape Examples: SymmetricQuestion: What is the fastest you have ever driven a car? Symmetric
10 Shape Examples: Right-skewed Left-skewedQuestion: How many coins are you carrying? Right-skewed Left-skewed Question: What is your grade point average?
11 Breakdown of Descriptive Statistical Methods: Quantitative Datagraphs numbers: statistics Measures of center did one: histogram do now
12 Quantitative Data: Measures of CenterMean: ___________ of all numbers symbol for sample mean: Value is sensitive to ______________ Median: middle observation of ___________ data value is resistant to ________________ Mode: observation that occurs most frequently don’t really use in this course Average Outliers ordered outliers
13 Example: Center and outliersSample 1 (n = 5) Sample 2 (n = 6) Sample Mean ( )/5 = 32/5 = Mean = ____ ( )/5 = 55/5 = Ordered Data/ Median Median = ____ Median = _____
14 Sensitive vs. Resistant statisticsCalculated using ALL observations Affected by skewness and / or unusual observations. Example: Mean Sensitive Statistic Resistant Statistic Calculated using only some observations Not affected much by outliers Example: Median
15 Examples: mean = 94.8 mph median = 95 mph mean = 17.3 coinsmedian = 9 coins
16 Work together question:Which is most likely true when considering salaries($) in a company that employs: 1. 20 factory workers and 2 very highly paid executives: one would find with the salaries that the: mean > median mean < median mean ≈ median 2. 2 factory workers and 20 very highly paid executives: one would find with the salaries that the:
17
18 A percentile tells us how much of the data is below a specific value.Percentiles What is the value (in studyhrs/week) for the: 5th percentile? 90th percentile?
19 Percentiles of Interest25th percentile: ___________Quartile (QL) ___________ Quartile (Q1) Lower First 50th percentile: Second Quartile (Q2) ________ Median 75th percentile: __________ Quartile (QU) __________ Quartile (Q3) Upper Third
20 We use quartiles for the…Five Number Summary smallest number (min) lower or first quartile median upper or third largest (max) Numerical method for summarizing quantitative data.
21 Example: 5-Number summaryDescriptive Statistics: Fastest_Speed Variable N Minimum Q1 Median Q3 Maximum Fastest_Speed Fill-in the five number summary 25th 50th 75th Min Q1 Median Q3 Max
22 Another look: 5-number summaryThe 5-number summary divides your data into 4 quarters:
23 Approximately what percent of the fastest speeds: Min Q1 Median Q3 Max 45 90 95 100 135 Approximately what percent of the fastest speeds: are at least 100 mph? are at most 90 mph?
24 Approximately what percent of the fastest speeds lie: Min Q1 Median Q3 Max 45 90 95 100 135 Approximately what percent of the fastest speeds lie: between 90 and 100 mph? (at most 95) or (at least 100?) 45 90 95 100 135
25 Visual summaries for quantitative variablesHistogram Boxplot A chart of the data that shows how many observations are in each equally spaced interval. Usually use 6-15 intervals Can use frequency or relative frequency Visualization of the 5-number summary Shows Q1, Median, Q3 as lines around and through a middle box. Identifies outliers.
26 Boxplots: Examples Max 135 mph Q3 100 mph Median 95 mph Q1 90 mph Min80 coins Q3 25 coins Median 9 coins Q1 5 coins Min 0 coins
27 Boxplot shows same shape as histogramSymmetric
28 Boxplot shows same shape as histogramRight-skewed
29 Boxplot shows same shape as histogramLeft-skewed
30 Link measures of center to shape
31 Another example: Parties per monthOutliers!
32 Parties per month, without the outliers
33 Median: 50% of students surveyed partied less than 4.5 times per month.Right-skewed mean > median
34 Consider the variables Party and YearResponse How many parties do you attend in a month? What year are you in school? Explanatory
35 Consider the variables Party and YearHow many parties do you attend in a month? What year are you in school? Quantitative Categorical (ordinal)
36 Explore relationship with boxplot
37 Which year has highest median?Largest box? Most outliers? Do we observe a trend?
38 Review: If you understood today’s lecture, you should be able to solveObjectives (all relating to quantitative variables): • Recognize and interpret two plots: – Histogram – Boxplot • Calculate and interpret two measures of center – Mean – Median • Calculate and interpret five-number summary • Recognize and understand effects of outliers & skewness