1 Frequentist approach Bayesian approach Statistics I
2
3 2
4
5 PHYSTAT 05 - Oxford 12th - 15th September 2005 Statistical problems in Particle Physics, Astrophysics and Cosmology
6
7
8
9 Frequentist confidence intervals q2 q1 x
10 q1< q x= x2 q
q1< q
11 Elementary statisticsNEYMAN INTEGRALS q1 q2 Elementary statistics may be WRONG!! x x
12
13 Search for pivotal variablesq1 x q2 Neyman integrals Bootstrap Search for pivotal variables This method avoids the graphic procedure and the resolution of the Neyman integrals
14 Because P{Q} does not contain the parameter!
15 Estimation of the sample meansince Due to the Central Limit theorem we have a pivot quantity when N>>1 Hence:
16 ta t is the quantile of the normal distribution t=1, area 84%Quantile a=0.84 P[|f-p|
17 p1 p2 p
18 The 90% CL gaussian upper limit90% area 10% area 1.28 s Observed value Meaning I: with this upper limit, values less than the observed one are possible with a probability <10% Meaning II: a larger upper limit should give values less than the observed one in less than 10% of the experiments Meaning III: the probability to be wrong is 10%
19
20
21 The trigger problem The probability to be a muon after the trigger P(m|T):
22 prior 10.000 particles 9000 p 1.000 m trigger trigger 8550 50 450 950enrichment 950/( ) = 68% Efficiency ( )/ = 14%
23 Bayesian
24
25
26
27
28 Bayesian credible interval
29
30 From coin tossing to physics: the efficiency measurementArXiv:physics/ v1 Valid also for k=0 and k=n
31 e =[0.104, 0.455] e1, e2 e =[0.122, 0.423] Elementary example20 events have been generated and 5 passed the cut What is the estimation of the efficiency with CL=90%? x=5, n=20, CL=90% Frequentist result: e1, e2 e =[0.104, 0.455] Bayesian result: What meaning?? e1, e2 e =[0.122, 0.423]
32 Efficiency calculation: an OPEN PROBLEM!!Wilson interval (1934) Wald (1950) Standard in Physics Exact frequentist Clopper Pearson (1934) (PDG) Bayes.This is not frequentist but can be tested in a frequentist way
33 Coverage simulation e=k/n k++ x = gRandom → Binomial(p,N) → x 1-CL = aTmath:: BinomialI(p,N,x) p1 p2 p2 p1 k++ e=k/n p 0ne expects e ~ CL
34 Simulate many x with a true p and check when the intervals contain the true value p . Compare this frequency with the stated CL CHAOS ? CL=0.95, n=50
35 Simulate many x with a true p and check when the intervals contain the true value p . Compare this frequency with the stated CL CL=0.90, n=20
36 BYE-BYE In the estimation of the efficiency (probability)the coverage is “chaotic” The new standard (not yet for physicists) is to use the exact frequentist or the formula The standard formula should be abandoned BYE-BYE
37 The problem persists also with large samples!0.95 0.90 0.86
38 (2001)
39 Counting experiments: Poisson caseWilson interval (1934) Wald (1950) Standard in Physics Exact frequentist Clopper Pearson (1934) (PDG) Bayes.This is not frequentist but can be tested in a frequentist way
40 Poissonian Coverage simulationCL=68%
41 Poissonian Coverage simulationCL=90%
42 Poissonian Coverage simulation maximum probability constraintCL CL k n k
43 Poissonian Coverage simulation max likelihood constraintFeldman & Cousins, Phys. Rev. D 57(1998)3873 k n k
44 Poissonian Coverage simulationCL=68%
45 Poissonian Coverage simulationCL=90%
46 frequentism is the best way to give By adopting a practical attitude, also bayesian formulae can be tested in a frequentist way frequentism is the best way to give the result of an experiment in pysics x ± s
47 The standard interpretation is frequentistQuantum Mechanics: frequentist or bayesian? Born or Bohr? The standard interpretation is frequentist
48 Signal over Background in Physics Analysis of counting experimentsSome case studies Statistics II
49
50 .. From the Curtis Meyer review (Miami 2004)
51 The first result PRL 91(2003)012002
52 4.6 sigma! Is it convincing???
53 Hypothesis test I true density N
54 Hypothesis test II mb + ms true density N
55 Parameter estimation Nb N= Ns + Nb
56 PRL 91(2003)012002
57 This is the most common Seldom used Recently Proposed(hypothesis test)
58
59 ???
60 HERMES : 27.6 positron beam on deuterium (2004)
61
62 No 5s effect!!
63 A powerful method: Maximum Likelihoodhypothesis observation the p(x;q) form is fitted to data by maximizing the ordinates of the observed data
64
65
66
67 ... in Physics P(H1) P(H0) 1- a 1- b a b exp value power
68 ) A Milestone: the Neyman-Pearson theorem Likelihood Ratio Test
69 Likelihood Ratio ni from MC samples!
70
71 Steps of the likelihood ratio testDetermine the ratio si/bi for each bin (model + MC simulation) Find lnQ pdf simulating ni from background (with the same experimental statistics) Find lnQ pdf simulating ni with signal+backg. Calculate the lnQ for the data ni and make the test -lnQ ni
72
73 ALEP, DELPHI, L3, OPAL, 2003 One can sum-up over the bins of histograms from different experiments and to construct a GLOBAL statistics!
74 ALEPH DELPHI L3 OPAL 2003 mH 5% mH ≥ GeV/c2 CL=95%
75 Conclusions
76
77
78
79
80
81 Bayes formula s [P(S|T)]
82
83
84
85
86 The non parametric Sampling methods The best one !!!
87 Non parametric Bootstrap
88
89 same element in different samples (sampling with and wihout replacement) same element in the same sample (sampling with replacement)
90 check the bootstrap with MC !!!
91
92 The dual Bootstrap Fix the background on one sample andcalculate the peak signal with another sample to avoid biases !! Repeat on bootstrap samples (dual bootstrap)
93
94 Conclusions Poissonian Counting: most of the testsdo not consider the error on background and overestimate the signal. Often true (mean) values and measured values are improperly confused. Binomial counting: a general theory there exists and should be applied. The errors should be calculated by MC methods and the procedure checked with MC toy models Nonparametric Bootstrap methods should be used also by physicists
95 end
96 The other branch of Statistics: Hypothesis Testing
97 2s
98
99
100 Photoproduction on a deuterium target
101
102
103
104 The extended likelihoodSince N is a function of q as in the case of a detector efficiency, If there is no functional relation between and the result is the same as for the non extended likelihood
105 error on the mean 1/√n bootstrap underestimates
106 error on the mean 1/√n
107
108 Bootstrap of B1 and Bi dataTrimmed mean 50% Correlation between measurements Weighted resampling Int(1/s2) times The error on measurements is not considered Scope of the analysis: to test wether errors only or the data itself are unreliable
109
110 Results Bootstrap: Some data are unreliable Standard analysis:
111
112
113
114 Useful when the two samples aresignal and background....
115 if 110 GeV H is true...
116
117 A word on the permutation tests
118
119
120
121
122 Hypothesis test III
123 end
124
125 Elementary example II There is a large number of marbles, which are either white or black, and you wish information on the white fraction, m. You draw a single marble, and it is white. What is the fraction m with 90% of confidence? Classical: p1 = 1 – CL = m ≥ 0.1 Bayesian: flat prior p21 = 1 – CL = m ≥ 0.316 1/m prior p1 = 1 – CL = m ≥ 0.100 m prior p31 = 1 – CL = m ≥ 0.464
126 A medicl test is 100% effective on sicks, but is positive also on 5% of sounds. If the didease affects 1% of tet populatin, which is the probability to be sick if the test is positive ? The probability to be sick if the test is positive P(M|P)
127 100 persone 99 sounds 1 sick 94 negativi 0 negative 5 positive 1 positive sick/positive= 1/6 ~ 17%
128
129 Check with the simulationSimulate many x with n=20 and the true e=0.25 and check when frequentist and bayesian intervals contain the true value 0.25 Frequentist result: CL=90% p1, p2 CL=93.6 ± 0.3 % Bayesian result: p1, p2 CL=86.8 ± 0.3% Bayes tends to underestimate
130 wrong Last Informative Prior (LIP)
131
132
133 Uniform Jeffreys’ Prior
134
135 Simulate many x with a true p and check when the intervals contain the true value p . Compare this frequency with the stated CL CHAOS !!!!!!! CL=0.90, n=20
136 Poissonian Coverage simulation
137
138