Confidence Intervals: Bootstrap Distribution

1 Confidence Intervals: Bootstrap DistributionSTAT 250 Dr...
Author: Beverly Powell
0 downloads 1 Views

1 Confidence Intervals: Bootstrap DistributionSTAT 250 Dr. Kari Lock Morgan Confidence Intervals: Bootstrap Distribution SECTIONS 3.3, 3.4 Bootstrap distribution (3.3) 95% CI using standard error (3.3) Percentile method (3.4)

2 Question of the Day What is the average mercury level of fish (Large Mouth Bass) in Florida lakes?

3 Mercury Levels in Fish Lange, T., Royals, H. and Connor, L. (2004). Mercury accumulation in largemouth bass (Micropterus salmoides) in a Florida Lake. Archives of Environmental Contamination and Toxicology, 27(4),

4 Mercury in Fish The sample mean is 0.527 ppmIn the US, the FDA action level is 1 ppm Is this safely below the US limit? In Canada, the safety limit is 0.5 ppm Is this clearly above the Canadian limit? We need a confidence interval!

5 Mercury Levels in Fish Lange, T., Royals, H. and Connor, L. (2004). Mercury accumulation in largemouth bass (Micropterus salmoides) in a Florida Lake. Archives of Environmental Contamination and Toxicology, 27(4),

6 Confidence Intervals . . . statistic ± ME PopulationSample Population Sample Sample . . . Margin of Error (ME) (95% CI: ME = 2×SE) Sample Sample Sample Sampling Distribution Calculate statistic for each sample Standard Error (SE): standard deviation of sampling distribution

7 Reality One small problem… … WE ONLY HAVE ONE SAMPLE!!!!How do we know how much sample statistics vary, if we only have one sample?!?

8 Assessing UncertaintyPopulation (???) Sample Best Guess at Population Sample Sample Sample . . . GOAL: Sample Sample Sample statistic ± ME Calculate statistic for each sample Distribution of the statistic Margin of Error (ME) (95% CI: ME = 2×SE) Standard Error (SE): standard deviation of the statistic

9 Simulating Samples What is our best guess at the population, given sample data? Draw samples of the same sample size repeatedly from the sample data This is known as bootstrapping Simulate many bootstrap samples Calculate statistic for each Find SE as standard deviation of these statistics

10 BOOTSTRAP! Reality One small problem… … WE ONLY HAVE ONE SAMPLE!!!!How do we know how much sample statistics vary, if we only have one sample?!? BOOTSTRAP!

11 Remember: sample size matters!Bootstrap Sample: Sample with replacement from the original sample, using the same sample size. Remember: sample size matters! Patti Original Sample Bootstrap Sample

12 Bootstrap Sample Your original sample has data values18, 19, 19, 20, 21 Is the following a possible bootstrap sample? 18, 19, 20, 21, 22 Yes No

13 Bootstrap Sample Your original sample has data values18, 19, 19, 20, 21 Is the following a possible bootstrap sample? 18, 19, 20, 21 Yes No

14 Bootstrap Sample Your original sample has data values18, 19, 19, 20, 21 Is the following a possible bootstrap sample? 18, 18, 19, 20, 21 Yes No

15 Bootstrap DistributionBootstrapSample Bootstrap Statistic BootstrapSample Bootstrap Statistic Original Sample Bootstrap Distribution . . Sample Statistic BootstrapSample Bootstrap Statistic

16 Mercury Levels in Fish Create a bootstrap distribution with StatKey

17 “Pull yourself up by your bootstraps”Why “bootstrap”? “Pull yourself up by your bootstraps” Lift yourself in the air simply by pulling up on the laces of your boots Metaphor for accomplishing an “impossible” task without any outside help

18 Sampling DistributionPopulation BUT, in practice we don’t see the “tree” or all of the “seeds” – we only have ONE seed

19 Bootstrap DistributionWhat can we do with just one seed? Bootstrap “Population” Estimate the distribution and variability (SE) of 𝑥 ’s from the bootstraps Grow a NEW tree! 𝑥

20 Center The sampling distribution is centered around the population parameter The bootstrap distribution is centered around the population parameter sample statistic bootstrap statistic bootstrap parameter

21 Standard Error The variability of the bootstrap statistics is similar to the variability of the sample statistics The standard error of a statistic can be estimated using the standard deviation of the bootstrap distribution!

22 Confidence Intervals . . . statistic ± ME Sample Confidence IntervalBootstrap Sample Sample Bootstrap Sample Bootstrap Sample . . . Margin of Error (ME) (95% CI: ME = 2×SE) Bootstrap Sample Bootstrap Sample Bootstrap Distribution Calculate statistic for each bootstrap sample Standard Error (SE): standard deviation of bootstrap distribution

23 Mercury Levels in Fish 0.527 ± 2 x 0.047 (0.433, 0.621)SE = 0.047 0.527 ± 2 x 0.047 (0.433, 0.621) We are 95% confident that average mercury level in fish in Florida lakes is between and ppm.

24 Same process for every parameter!Estimate the standard error and/or a confidence interval for... proportion (𝑝) difference in means (µ1 −µ2 ) difference in proportions (𝑝1 −𝑝2 ) standard deviation (𝜎) correlation (𝜌) ... Generate samples with replacement Calculate sample statistic Repeat...

25 Mercury and pH in Lakes For Florida lakes, what is the correlation between average mercury level (ppm) in fish taken from a lake and acidity (pH) of the lake? r = Give a 95% CI for ρ Lange, Royals, and Connor, Transactions of the American Fisheries Society (1993)

26 Mercury and pH in Lakes r = -0.575 Give a 95% CI for ρ± 2 × 0.085 (-0.745, ) We are 95% confident that the true correlation between mercury and pH level in Florida lakes is between and

27 Confidence Interval What if we want to be more than 95% confident?A P% confidence interval contains the true parameter value for P% of all samples. P% is known as the confidence level How might we use the bootstrap distribution to get a P% confidence interval?

28 Percentile Method For a P% confidence interval:

29 Level of Confidence Which is wider, a 90% confidence interval or a 95% confidence interval? (a) 90% CI (b) 95% CI

30 Average Mercury in Fish0.005*1000 = 5 dots 99% Confidence Interval We are 99% confident that the average mercury level of fish in Florida lakes is between and

31 Two Approaches for 95% CI 0.527 ± 2 x 0.046 (0.435, 0.619)SE = 0.046 0.527 ± 2 x 0.046 (0.435, 0.619) Chop 2.5% in each tail Keep 95% in middle Chop 2.5% in each tail 95% Confidence Interval

32 Bootstrap Cautions These methods for creating a confidence interval only work if the bootstrap distribution is smooth and symmetric ALWAYS look at a plot of the bootstrap distribution! If the bootstrap distribution is skewed or looks “spiky” with gaps, you will need to go beyond intro stat to create a confidence interval

33 Bootstrap Cautions

34 Bootstrap Cautions

35 Number of Bootstrap SamplesThe number of bootstrap samples is NOT the sample size Increasing the number of bootstrap samples will minimize random fluctuation from simulation to simulation, but as long as it is large, this number does not matter much

36 Two Numbers You generate a bootstrap distribution based on a sample of size n = 53, and simulate bootstrap samples. How many dots will be in the bootstrap distribution? 53 1000

37 Two Numbers How many dots will be in the dotplot of the original data (mercury levels in fish)? 53 1000

38 Summary The standard error of a statistic is the standard deviation of the sample statistic, which can be estimated from a bootstrap distribution 95% confidence intervals can be created using the standard error or the percentiles of a bootstrap distribution Increasing the number of bootstrap samples will not change anything (except for random fluctuation) Confidence intervals can be created this way for any parameter, as long as the bootstrap distribution is approximately symmetric and continuous

39 To Do HW 3.1, 3.2 (due Wednesday, 2/15)