Stat 31, Section 1, Last Time Linear transformations

1 Stat 31, Section 1, Last Time Linear transformationsSta...
Author: Todd Small
0 downloads 1 Views

1 Stat 31, Section 1, Last Time Linear transformationsStandardization (subt. mean / div. be SD) 5 Number Summary & Outlier Rule Modelling distributions density Normal distributions Density Interpretation Computation

2 Computation of Normal AreasClassical Approach: Tables See inside covers of text Summarizes area computations Because can’t use calculus Constructed by “computers” (a job description in the early 1900’s!)

3 Computation of Normal AreasEXCEL Computation: works in terms of “lower areas” E.g. for Area < 1.3

4 Computation of Normal AreasInteractive Version (used for above pic) From Webster West’s Website:

5 Computation of Normal AreasEXCEL Computation: (of above e.g.) Enter parameters x is “cutoff point” Return is Area below x

6 Computation of Normal AreasComputation of areas over intervals: (use subtraction) =

7 Computation of Normal AreasComputation of areas over intervals: (use subtraction for EXCEL too) E.g. Use Excel to check % Rule https://www.unc.edu/~marron/UNCstat /Stat31Eg10.xls

8 Normal Area HW HW: 1.89 1.92 (Hint: the % above 500 =100% - % below 500) 1.97 (50%, 9.18%, 0.38%, 40.82%) Caution: Don’t just “twiddle EXCEL until answer appears”. Understand it!!!

9 Inverse of Area FunctionInverse of Frequencies: “Quantiles” Idea: Given area, find “cutoff” x I.e. for Area = 80% This x is the “quantile”

10 Inverse of Area FunctionEXCEL Computation of Quantiles: Use NORMINV Continue Class Example: https://www.unc.edu/~marron/UNCstat /Stat31Eg10.xls “Probability” is “Area” Enter mean and SD parameters

11 Inverse Area Example When a machine works normally, it fills bottles with mean = 25 oz, and SD = 0.2 oz. The machine is “out of control” when it overfills. Choose an “alarm level”, which will give only 1 % false alarms. Want: cutoff, x, so that Area above = 1% Note: Area below = 100% - Area above = 99% https://www.unc.edu/~marron/UNCstat /Stat31Eg10.xls

12 Inverse Area HW 1.105 (-0.675, 0.675, 89.9, 110.1, 1.35, 0.7%) 1.107

13 Normal Diagnostic When is the Normal Model “good”?Useful Graphical Device: Q-Q plot = Normal Quantile Plot Won’t devote class time: Useful info in text about this

14 Variable RelationshipsChapter 2 in Text Idea: Look beyond single quantities, to how quantities relate to each other. E.g. How do HW scores “relate” to Exam scores? Section 2.1: Useful graphical device: Scatterplot

15 Recall Scatterplot E.g. Toy Example: (1,2) (3,1) (-1,0) (2,-1)

16 Scatterplot E.g. Data from related Intro. Stat. Class (actual scores)How does HW score predict Final Exam? = HW, = Final Exam https://www.unc.edu/~marron/UNCstat /Stat31Eg11.xls In top half of HW scores: Better HW  Better Final For lower HW: Final is much more “random”

17 Scatterplots Common Terminology: When thinking about “X causes Y”,Call X the “Explanatory Var.” or “Indep. Var.” Call Y the “Response Var.” or “Dep. Var.” (think of “Y as function of X”) (although not always sensible)

18 Scatterplots Note: Sometimes think about causation,Other times: “Explore Relationship” HW:

19 Class Scores Scatterplotshttps://www.unc.edu/~marron/UNCstat /Stat31Eg11.xls How does HW predict Midterm 1? = HW, = MT1 Still better HW  better Exam But for each HW, wider range of MT1 scores I.e. HW doesn’t predict MT1 as well as Final “Outliers” in scatterplot may not be outliers in either individual variable e.g. HW = 72, MT1 = 94 (bad HW, but good MT1?, fluke???)

20 Class Scores Scatterplotshttps://www.unc.edu/~marron/UNCstat /Stat31Eg11.xls How does MT1 predict MT2? = MT1, = MT2 Idea: less “causation”, more “exploration Still higher MT1 associated with higher MT2 For each MT1, wider range of MT2 i.e. “not good predictor” Interesting Outliers: MT1 = 100, MT2 = 56 (oops!) MT1 = 23, MT2 = 74 (woke up!)

21 Important Aspects of RelationsForm of Relationship Direction of Relationship Strength of Relationship

22 I. Form of Relationship Linear: Data approximately follow a linePrevious Class Scores Example https://www.unc.edu/~marron/UNCstat /Stat31Eg11.xls Final vs. High values of HW is “best” Nonlinear: Data follows different pattern Nice Example: Bralower’s Fossil Data https://www.unc.edu/~marron/UNCstat /Stat31Eg12.xls

23 Bralower’s Fossil Datahttps://www.unc.edu/~marron/UNCstat /Stat31Eg12.xls From T. Bralower, formerly of Geological Sci. Studies Global Climate, millions of years ago: Ratios of Isotopes of Strontium Reflects Ice Ages, via Sea Level (50 meter difference!) As function of time Clearly nonlinear relationship

24 II. Direction of RelationshipPositive Association X bigger  Y bigger Negative Association X bigger  Y smaller E.g. X = alcohol consumption, Y = Driving Ability Clear negative association

25 III. Strength of RelationshipIdea: How close are points to lying on a line? Revisit Class Scores Example: https://www.unc.edu/~marron/UNCstat /Stat31Eg11.xls Final Exam is “closely related to HW” Midterm 1 less closely related to HW Midterm 2 even related to Midterm 1

26 Linear Relationship HW2.3, 2.5, 2.7, 2.9