Noise Analysis and Outlier Detection

1 Noise Analysis and Outlier DetectionSaniya Behzadpour I...
Author: Adela Harrell
0 downloads 2 Views

1 Noise Analysis and Outlier DetectionSaniya Behzadpour Institute of Geodesy WG Satellite Geodesy and Theoretical Geodesy Advanced Satellite Geodesy, SS 2016 ASG

2 Outliers and Influential Observations Robust regression Outline Introduction Linear regression Residuals analysis Outliers and Influential Observations Robust regression Robust estimation of gravity field parameters ASG 1

3 Introduction ASG 2

4 Gravity field determination from GRACE dataBased on GRACE observations, in recent years a steady, step-by-step improvement has occurred on global gravity field determination. Gravity model based on decades of tracking geodetic satellites ASG 3

5 Gravity field determination from GRACE dataBased on GRACE observations, in recent years a steady, step-by-step improvement has occurred on global gravity field determination. Gravity model based on decades of tracking geodetic satellites GRACE gravity model01 based on 111 days data ASG 3

6 Gravity field determination from GRACE dataBased on GRACE observations, in recent years a steady, step-by-step improvement has occurred on global gravity field determination. Gravity model based on decades of tracking geodetic satellites GRACE gravity model01 based on 111 days data Gravity anomalies from four years ( ) of GRACE data(GGM03S) ASG 3

7 Gravity field determination from GRACE dataBased on GRACE observations, in recent years a steady, step-by-step improvement has occurred on global gravity field determination. Gravity model based on decades of tracking geodetic satellites GRACE gravity model01 based on 111 days data Gravity anomalies from four years ( ) of GRACE data(GGM03S) ITSG-GRACE 2014 monthly solution ( ) ASG 3

8 Gravity field determination from GRACE dataBased on GRACE observations, in recent years a steady, step-by-step improvement has occurred on global gravity field determination. Gravity model based on decades of tracking geodetic satellites GRACE gravity model01 based on 111 days data Gravity anomalies from four years ( ) of GRACE data(GGM03S) ITSG-GRACE 2014 monthly solution ( ) Every improvement in accuracy and spatial resolution of results will be very beneficial for geophysical Earth system studies ASG 3

9 Gravity parameter estimation procedureNonlinear system model Instrument data KBR instrument GPS receiver Star cameras Accelerometer Physical model Linearization Parameter estimation in LS sense Postfit residuals Gravity parameters ASG 4

10 Possible source of errors in the processKBR instrument GPS receiver Star cameras Accelerometer Physical models Error sources Systematic sensor and system modeling errors Mathematical model ASG 5

11 Possible source of errors in the processKBR instrument GPS receiver Star cameras Accelerometer Physical models Error sources Systematic sensor and system modeling errors Mathematical model ASG 5

12 Linear Regression ASG 6

13 Linear regression RegressionStatistical process for estimating the relationships among variables “a” independent variables ( explanatory variables, predictors,…) “b” dependent variables (response variables, observed values,…) Simple linear regression Multiple linear regression Multivariate linear regression ASG 7

14 Linear regression Example: Simple linear regression True (theoretical) relationship between observation and independent variable: The observed values vary about the regression line. Taking the variability into account: Unknown parameters Random error component ASG 8

15 Least squares regressionTheoretical relationship: Estimated regression line: Least squares method The parameters are chosen to minimize Which one is the best? Residual Residual Observation Predicted value ASG 9

16 Least squares regressionSimple LS regression solution Where, Sample variance of predictors Sample variance of observations Sample covariance ASG 10

17 Least squares regressionMultivariate LS regression solution The Gauss-Markov theorem “In a linear model in which the errors have expectation zero conditional on the independent variables, are uncorrelated and have equal variances, the best linear unbiased estimator of any linear combination of the observations, is its least-squares estimator. “ In a linear model, if the errors belong to a normal distribution the least squares estimators are also the maximum likelihood estimators. ASG 11

18 Least squares assumptionsAssumptions in order for LS to be the best linear unbiased estimator Theoretical random error Uncorrelated Normally distributed Homoscedastic Error values Fitted values Time Error values Histogram For a given problem, if these assumptions are true, then the observed residuals should behave in a similar fashion. ASG 12

19 Residual Plots ASG 13

20 Standardizing residualsWhy is the standardizing necessary? Variance of residuals is not always constant, even though the variance of errors is constant. Residuals have the same unit as observations. Methods for standardizing 1) Standardized residuals Using Median Absolute Deviation (MAD) as an approximation of the variance of 𝑖th residual. 𝑀𝐴𝐷 𝑒 𝑖 =𝑚𝑒𝑑𝑖𝑎𝑛 𝑒 𝑖 −𝑚𝑒𝑑𝑖𝑎𝑛( 𝑒 𝑖 ) the Tuning constant makes an approximately unbiased estimator if the data is large and error distribution is normal. 𝑢 𝑖 = 𝑒 𝑖 𝜎 , where 𝜎 = 𝑀𝐴𝐷 𝑒 𝑖 ASG 14

21 Standardizing residualsMethods for standardizing 2) Studentized residuals Improve the residual scaling by dividing 𝑒 𝑖 by the standard deviation of the 𝑖th residual. 𝑣𝑎𝑟 𝜺 = 𝜎 2 (𝑰−𝑯), where 𝑯=𝑨 ( 𝑨 𝑇 𝑨) −1 𝑨 𝑇 is the hat matrix 𝑣𝑎𝑟 𝑒 𝑖 = 𝜎 2 (1− ℎ 𝑖 ), ℎ 𝑖 is the 𝑖th element on the diagonal of H 𝑟 𝑖 = 𝑒 𝑖 𝜎 1− ℎ 𝑖 t-distribution, mean near 0 and a variance slightly larger than 1. In large data sets 𝑢 𝑖 and 𝑟 𝑖 do not differ dramatically. ASG 15

22 Assessing non-constant varianceResiduals vs. fitted values b or predictors 𝒂 Plotting against observations! There is a built-in correlation 𝒃= 𝒃 +𝒆 Residuals Fitted values ASG 16

23 Assessing non-constant varianceResiduals vs. fitted values b or predictors 𝒂 Plotting against observations! There is a built-in correlation 𝒃= 𝒃 +𝒆 Residuals Fitted values ASG 16

24 Assessing non-constant varianceResiduals vs. fitted values b or predictors 𝒂 Plotting against observations! There is a built-in correlation 𝒃= 𝒃 +𝒆 Residuals Fitted values ASG 16

25 Assessing non-constant varianceResiduals vs. fitted values b or predictors 𝒂 Plotting against observations! There is a built-in correlation 𝒃= 𝒃 +𝒆 Residuals Fitted values Having control over the fitted values (Discrete values) Approximately constant variability, no systematic curvature,…. Residuals Fitted values ASG 16

26 Assessing non-constant varianceResiduals vs. fitted values b or predictors 𝒂 Plotting against observations! There is a built-in correlation 𝒃= 𝒃 +𝒆 Residuals Fitted values Having control over the fitted values (Discrete values) Approximately constant variability, no systematic curvature,…. Residuals Fitted values ASG 16

27 Assessing non-constant varianceResiduals vs. fitted values b or predictors 𝒂 Plotting against observations! There is a built-in correlation 𝒃= 𝒃 +𝒆 Residuals Fitted values Having control over the fitted values (Discrete values) Approximately constant variability, no systematic curvature,…. Residuals Fitted values ASG 16

28 Assessing non-constant varianceWedge pattern Increasing the variance with mean Weighted Least Square Residuals Fitted values ASG 17

29 Assessing non-constant varianceWedge pattern Increasing the variance with mean Weighted Least Square Residuals Fitted values ASG 17

30 Assessing non-constant varianceWedge pattern Increasing the variance with mean Weighted Least Square Residuals X Fitted values ASG 17

31 Assessing non-constant varianceWedge pattern Increasing the variance with mean Weighted Least Square Residuals X Fitted values Bend pattern Systematic curvature Linear relation assumption is not correct Nonlinear regression Residuals Fitted values ASG 17

32 Assessing non-constant varianceWedge pattern Increasing the variance with mean Weighted Least Square Residuals X Fitted values Bend pattern Systematic curvature Linear relation assumption is not correct Nonlinear regression Residuals Fitted values ASG 17

33 Assessing non-constant varianceWedge pattern Increasing the variance with mean Weighted Least Square Residuals X Fitted values Bend pattern Systematic curvature Linear relation assumption is not correct Nonlinear regression Residuals X Fitted values ASG 17

34 Assessing non-constant varianceSystematic pattern Other predictor needed Residuals Fitted value ASG 18

35 Assessing non-constant varianceSystematic pattern Other predictor needed Residuals X Fitted value ASG 18

36 Assessing non-constant varianceSystematic pattern Other predictor needed Residuals X Fitted value Time dependent Residuals Time ASG 18

37 Assessing non-constant varianceSystematic pattern Other predictor needed Residuals X Fitted value Time dependent Residuals X Time ASG 18

38 Assessing non-constant variance - TestComparing group variance Dividing the data into m groups. Let 𝑒 𝑖𝑗 denote the 𝑖th of 𝑛 𝑗 variables in group 𝑗. Within-group sample variances are: 𝑠 𝑗 2 = 𝑖=1 𝑛 𝑗 ( 𝑒 𝑖𝑗 − 𝑒 𝑗 ) 𝑛 𝑗 −1 For non-normal distribution this test is not valid because the mean is not a good summary of the data ASG 19

39 Assessing non-constant variance - TestLevine’s test Replacing each value with 𝑍 𝑖𝑗 𝑍 𝑖𝑗 = 𝑒 𝑖𝑗 − 𝑒 𝑗 where 𝑒 𝑗 is the median (or median-dependent) value in group 𝑗. The test statistic, W, is defined as follows: 𝑊= (𝑛−𝑚) 𝑗=1 𝑚 𝑛 𝑗 (𝑍 0𝑗 − 𝑍 00 ) 2 (𝑚−1) 𝑗=1 𝑚 𝑖=1 𝑛 𝑗 (𝑍 𝑖𝑗 − 𝑍 0𝑗 ) 2 If 𝑊 ≤𝛼 (usually 0.05) violation of the assumption of constant variance ASG 20

40 Assessing non-normalityIf the residuals are normal Model inference is valid. Checking Normality : Normal Quantile-Quantile Plot If a set of residuals are approximately normally distributed, a Normal Quantile-Quantile (NQQ) plot of the residuals will result in an approximately straight line. ASG 21

41 Assessing non-normalityExample: QQ plot for a sample of size 9 from a normal distribution population 3.89, 4.75, 6.33, 4.75, 7.21, 5.78, 5.80, 5.20, 6.64 Step 1: order the data from smallest to largest: 3.89, 4.75, 4.75, 5.20, 5.78, 5.80, 6.33, 6.64, 7.21 Step 2: plot these values against appropriate quantiles from standard ND. f(z) z ASG 22

42 Assessing non-normalityExample: QQ plot for a sample of size 9 from a normal distribution population 3.89, 4.75, 6.33, 4.75, 7.21, 5.78, 5.80, 5.20, 6.64 Step 1: order the data from smallest to largest: 3.89, 4.75, 4.75, 5.20, 5.78, 5.80, 6.33, 6.64, 7.21 Step 2: plot these values against appropriate quantiles from standard ND. f(z) z -1.28 -0.52 0.52 1.28 Divide the distribution into 10 (n+1) equal areas. Reading the values on z axis from computer software or normal table ASG 22

43 Assessing non-normalityExample: QQ plot for a sample of size 9 from a normal distribution population Plotting the ordered value against its corresponding quantile from ND Theoretical Quantiles Sample Quantiles ASG 23

44 Assessing non-normalityHow is the QQ plot constructed? 𝑖th ordered value vs. 𝑖 𝑛+1 th quantile of standard normal distribution Instead of using 𝑖 𝑛+1 there is a variety of other possibilities, such as 𝑖−1/2 𝑛 or more generally 𝑖−𝑎 𝑛+1−2𝑎 ASG 24

45 Assessing non-normalityQuantile-Quantile plot of uniform distribution Sample Quantiles Uniform distribution Theoretical Quantiles ASG 25

46 Assessing non-normality Heavy-tailed distributionQuantile-Quantile plot of heavy-tailed distribution Sample Quantiles Heavy-tailed distribution Theoretical Quantiles ASG 26

47 Assessing non-normality Right-skewed distributionQuantile-Quantile plot of skewed distribution Sample Quantiles Right-skewed distribution Theoretical Quantiles ASG 27

48 Outliers and Influential ObservationsASG 28

49 Outliers and Influential ObservationsAn outlier is a point that falls far from the other data points. If the parameter estimates change a great deal when a point is removed from the calculations the point is said to be influential. X Y Outlier 1 Outlier1 included Outlier1 omitted X Y Outlier 2 Outlier2 included Outlier2 omitted ASG 29

50 Outliers and Influential ObservationsX Y Outlier 2 Outlier2 included Outlier2 omitted Low leverage Some influence The points with extreme values of X are said to have high leverage. How influential a point is , is a combination of how much leverage it has and how extreme it is in Y direction. Outlier 1 Outlier 4 Outlier 3 ASG 30

51 Outliers and Influential ObservationsX Y Outlier 2 Outlier2 included Outlier2 omitted Low leverage Some influence The points with extreme values of X are said to have high leverage. How influential a point is , is a combination of how much leverage it has and how extreme it is in Y direction. Outlier 1 High leverage High influence Outlier 4 Outlier 3 ASG 30

52 Outliers and Influential ObservationsX Y Outlier 2 Outlier2 included Outlier2 omitted Low leverage Some influence The points with extreme values of X are said to have high leverage. How influential a point is , is a combination of how much leverage it has and how extreme it is in Y direction. Outlier 1 High leverage High influence Outlier 4 Outlier 3 High leverage Low influence ASG 30

53 Outliers and Influential ObservationsX Y Outlier 2 Outlier2 included Outlier2 omitted Low leverage Some influence The points with extreme values of X are said to have high leverage. How influential a point is , is a combination of how much leverage it has and how extreme it is in Y direction. Outlier 1 High leverage High influence Outlier 4 High leverage Low influence Outlier 3 High leverage Low influence ASG 30

54 Numeric measures Leverage Recall: 𝑯=𝑨 ( 𝑨 𝑇 𝑨) −1 𝑨 𝑇 The leverage score for the 𝑖th data is defined as: ℎ 𝑖 = 𝐻 𝑖𝑖 . In simple linear regression ℎ 𝑖 = 1 𝑛 + ( 𝑋 𝑖 − 𝑋 ) 2 𝑗=1 𝑛 ( 𝑋 𝑗 − 𝑋 ) 2 High leverage Hat-values exceeding about twice the average hat-value. Influential observation Cook’s distance: 𝐷 𝑖 = 𝑒 𝑖 2 𝑀𝑆𝐸(𝑘+1) ℎ 𝑖 (1− ℎ 𝑖 ) 2 High influence 𝐷 𝑖 > 4 𝑛 Discrepancy Leverage ASG 31

55 Robust Regression ASG 32

56 Robust regression methodsWe expect: A robust regression procedure dampens the effect of observations that would be highly influential if LSM were used, Making the identification of outliers much easier It should produce the same results as LSM when there are no outliers, The estimation procedure should be relatively easy to perform. ASG 33

57 Robust regression methods Heavy-tailed distributionModifying the cost function LSM: 𝐽 𝒆 = 𝑖 𝑒 𝑖 2 (L2 regression) LAD: 𝐽 𝒆 = 𝑖 𝑒 𝑖 (L1 regression) Peter Huber (1964) Compromise between L1 and L2 regression Idea: penalize small residuals quadratically, and large residuals linearly 𝐽 𝒆 = 𝑖 𝜌 𝑒 𝑖 , ρ 𝑒 𝑖 = 𝑒 𝑖 𝑒 𝑖 <𝑐 𝑐 2 𝑒 𝑖 −𝑐 𝑒 𝑖 >𝑐 Heavy-tailed distribution ASG 34

58 Robust regression methodsM-estimators (Maximum Likelihood) obtained by minimizing a cost function, or equivalently, solving 𝑖 ψ 𝑒 𝑖 𝒙 𝑖 =0 Where ψ 𝑒 𝑖 = ρ 𝑒 𝑖 . a convenient way to solve for M-estimators is to use an iteratively weighted least squares (WLS) algorithm, in which we calculate appropriate weight for observations , solve the weighted least squares problem, re-calculate the weights, re-solve, and so on until convergence ASG 35

59 Maximum likelihood estimation - ExampleProbability density function (PDF) Density of a random variable, a function describes the relative likelihood for this random variable to take on a given value. used before data are available to describe possible future outcomes Normal density function 𝑓 𝜀;𝜇, 𝜎 2 = 1 2𝜋 𝜎 2 exp − 𝜀−𝜇 2 2𝜋 𝜎 2 Double exponential distribution 𝑓 𝜀 = 1 2𝜎 exp − 𝜀 𝜎 f(ε) ε ASG 36

60 Maximum likelihood estimation - ExampleLikelihood function 𝐿 𝑥 =𝑓(𝑎𝑙𝑙 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠|𝑥) used after data are available to describe a function of a parameter for a given outcome. 𝐿 𝑥 = 𝑖=1 𝑛 1 2𝜎 exp − 𝑒 𝑖 𝜎 = 1 (2𝜎) 𝑛 exp − 𝑖=1 𝑛 𝑒 𝑖 𝜎 Maximum likelihood minimizing the sum of absolute residuals ASG 37

61 Weighted least squaresApplications Focusing accuracy: We may care very strongly about predicting the response for certain values of the input. Larger weights near one region - regression will be pulled towards matching the data in that region Discounting imprecision: In case of heteroscedasticity , even in presence of Gaussian noise, LS is no longer efficient. Other optimization problems can be transformed into, or approximated by weighted least squares (M-estimators). ASG 38

62 Different robust estimatorsType Ψ(𝒖) 𝒘(𝒖) Least squares 1 Huber M-estimator k=1.345 Andrews’ wave function a=1.339 sin 𝑢 𝑖 𝑎 𝑢 𝑖 <𝑎π sin 𝑢 𝑖 𝑎 𝑢 𝑖 𝑎 𝑢 𝑖 <𝑎π Welsch M-estimator k=4.9675 𝑘 −exp 𝑢 𝑖 𝑘 2 exp 𝑢 𝑖 𝑘 2 IGG-3 scheme k1=1.5 k2=5.0 ASG 39

63 When not to use robust methods Heavy-tailed distributionRobust methods assume that the underlying distribution is roughly normal (and therefore unimodal and symmetrical) but contaminated with outliers and heavy tails. The methods will give misleading results if they are applied to data sets that are markedly skewed or multimodal, or if a large proportion of the data are identical in value. X X Heavy-tailed distribution Skewed distribution Uniform distribution ASG 40

64 Outlier detection and robust estimation of gravity field parametersASG 41

65 Range rate residuals plot – May 2006Standardized range rate residuals Standardized range rate residuals Time (MJD) Predicted values – Range rate perturbation (μm/s) ASG 42

66 Normal quantile-quantile plot – May 2006 Theoretical QuantilesHeavy-tailed distribution (Largest values are larger than they would be expected) Outliers! Sample Quantiles Theoretical Quantiles ASG 43

67 Weighting matrix ITSG- GRACE 2014 Taking noise correlation into account - Variance Component Estimation (VCE) 𝑊 𝑉𝐶𝐸 = Σ −1 Inverse of noise covariance matrix Modifying the weighting matrix by combining robust estimator and VCE 𝑊 𝑡𝑜𝑡𝑎𝑙 = 𝑊 𝑉𝐶𝐸 + 𝛼𝑊 𝑅𝑜𝑏𝑢𝑠𝑡 Implemented robust estimators and their corresponding weighting functions. ASG 44

68 Outlier detection in range rate observationsIn reality, correlation still exists! All the statistical tests and assumptions are based on uncorrelated residuals. Idea: eliminate the low frequency part and treat the rest as uncorrelated. Eliminating the low frequency region of the post-fit range rate residuals of the test month (May 2006). Detected outliers in range rate observations of the test month (May 2006) ASG 45

69 Results - Comparing gravity field solutionsDegree variance of different gravity field solutions for May 2006. ASG 46

70 Results –Frequency domainPSD of range rate residuals of different gravity field solutions for May 2006. In Hz region, the values of residuals are reduced particularly by Huber and Welsch estimators ( losing signal?) ASG 47

71 Results – Spatial domain (EWH)Gravity variation (EWH) VCE solution rel. to GOCO05s (May 2006). Welsch M-estimator - VCE Huber M-estimator - VCE IGG-3 scheme - VCE ASG 48

72 Results – Spatial domain (EWH)Gravity variation (EWH) VCE solution rel. to GOCO05s (May 2006). Welsch M-estimator - VCE Huber M-estimator - VCE IGG-3 scheme - VCE ASG 48

73 Summary Fitting a model is just the first stepNeed to check whether assumptions are satisfied Residual plots are our friend Residuals vs. Predicted values QQ plots to check for normality Outliers and Influential Observations Eliminate correlation Outlier detection Using a robust method that down weights outliers ASG 49

74 Questions? ASG 50

75 Noise Analysis and Outlier DetectionSaniya Behzadpour Institute of Geodesy WG Satellite Geodesy and Theoretical Geodesy Advanced Satellite Geodesy, SS 2016 ASG