1 Noise Analysis and Outlier DetectionSaniya Behzadpour Institute of Geodesy WG Satellite Geodesy and Theoretical Geodesy Advanced Satellite Geodesy, SS 2016 ASG
2 Outliers and Influential Observations Robust regression Outline Introduction Linear regression Residuals analysis Outliers and Influential Observations Robust regression Robust estimation of gravity field parameters ASG 1
3 Introduction ASG 2
4 Gravity field determination from GRACE dataBased on GRACE observations, in recent years a steady, step-by-step improvement has occurred on global gravity field determination. Gravity model based on decades of tracking geodetic satellites ASG 3
5 Gravity field determination from GRACE dataBased on GRACE observations, in recent years a steady, step-by-step improvement has occurred on global gravity field determination. Gravity model based on decades of tracking geodetic satellites GRACE gravity model01 based on 111 days data ASG 3
6 Gravity field determination from GRACE dataBased on GRACE observations, in recent years a steady, step-by-step improvement has occurred on global gravity field determination. Gravity model based on decades of tracking geodetic satellites GRACE gravity model01 based on 111 days data Gravity anomalies from four years ( ) of GRACE data(GGM03S) ASG 3
7 Gravity field determination from GRACE dataBased on GRACE observations, in recent years a steady, step-by-step improvement has occurred on global gravity field determination. Gravity model based on decades of tracking geodetic satellites GRACE gravity model01 based on 111 days data Gravity anomalies from four years ( ) of GRACE data(GGM03S) ITSG-GRACE 2014 monthly solution ( ) ASG 3
8 Gravity field determination from GRACE dataBased on GRACE observations, in recent years a steady, step-by-step improvement has occurred on global gravity field determination. Gravity model based on decades of tracking geodetic satellites GRACE gravity model01 based on 111 days data Gravity anomalies from four years ( ) of GRACE data(GGM03S) ITSG-GRACE 2014 monthly solution ( ) Every improvement in accuracy and spatial resolution of results will be very beneficial for geophysical Earth system studies ASG 3
9 Gravity parameter estimation procedureNonlinear system model Instrument data KBR instrument GPS receiver Star cameras Accelerometer Physical model Linearization Parameter estimation in LS sense Postfit residuals Gravity parameters ASG 4
10 Possible source of errors in the processKBR instrument GPS receiver Star cameras Accelerometer Physical models Error sources Systematic sensor and system modeling errors Mathematical model ASG 5
11 Possible source of errors in the processKBR instrument GPS receiver Star cameras Accelerometer Physical models Error sources Systematic sensor and system modeling errors Mathematical model ASG 5
12 Linear Regression ASG 6
13 Linear regression RegressionStatistical process for estimating the relationships among variables “a” independent variables ( explanatory variables, predictors,…) “b” dependent variables (response variables, observed values,…) Simple linear regression Multiple linear regression Multivariate linear regression ASG 7
14 Linear regression Example: Simple linear regression True (theoretical) relationship between observation and independent variable: The observed values vary about the regression line. Taking the variability into account: Unknown parameters Random error component ASG 8
15 Least squares regressionTheoretical relationship: Estimated regression line: Least squares method The parameters are chosen to minimize Which one is the best? Residual Residual Observation Predicted value ASG 9
16 Least squares regressionSimple LS regression solution Where, Sample variance of predictors Sample variance of observations Sample covariance ASG 10
17 Least squares regressionMultivariate LS regression solution The Gauss-Markov theorem “In a linear model in which the errors have expectation zero conditional on the independent variables, are uncorrelated and have equal variances, the best linear unbiased estimator of any linear combination of the observations, is its least-squares estimator. “ In a linear model, if the errors belong to a normal distribution the least squares estimators are also the maximum likelihood estimators. ASG 11
18 Least squares assumptionsAssumptions in order for LS to be the best linear unbiased estimator Theoretical random error Uncorrelated Normally distributed Homoscedastic Error values Fitted values Time Error values Histogram For a given problem, if these assumptions are true, then the observed residuals should behave in a similar fashion. ASG 12
19 Residual Plots ASG 13
20 Standardizing residualsWhy is the standardizing necessary? Variance of residuals is not always constant, even though the variance of errors is constant. Residuals have the same unit as observations. Methods for standardizing 1) Standardized residuals Using Median Absolute Deviation (MAD) as an approximation of the variance of 𝑖th residual. 𝑀𝐴𝐷 𝑒 𝑖 =𝑚𝑒𝑑𝑖𝑎𝑛 𝑒 𝑖 −𝑚𝑒𝑑𝑖𝑎𝑛( 𝑒 𝑖 ) the Tuning constant makes an approximately unbiased estimator if the data is large and error distribution is normal. 𝑢 𝑖 = 𝑒 𝑖 𝜎 , where 𝜎 = 𝑀𝐴𝐷 𝑒 𝑖 ASG 14
21 Standardizing residualsMethods for standardizing 2) Studentized residuals Improve the residual scaling by dividing 𝑒 𝑖 by the standard deviation of the 𝑖th residual. 𝑣𝑎𝑟 𝜺 = 𝜎 2 (𝑰−𝑯), where 𝑯=𝑨 ( 𝑨 𝑇 𝑨) −1 𝑨 𝑇 is the hat matrix 𝑣𝑎𝑟 𝑒 𝑖 = 𝜎 2 (1− ℎ 𝑖 ), ℎ 𝑖 is the 𝑖th element on the diagonal of H 𝑟 𝑖 = 𝑒 𝑖 𝜎 1− ℎ 𝑖 t-distribution, mean near 0 and a variance slightly larger than 1. In large data sets 𝑢 𝑖 and 𝑟 𝑖 do not differ dramatically. ASG 15
22 Assessing non-constant varianceResiduals vs. fitted values b or predictors 𝒂 Plotting against observations! There is a built-in correlation 𝒃= 𝒃 +𝒆 Residuals Fitted values ASG 16
23 Assessing non-constant varianceResiduals vs. fitted values b or predictors 𝒂 Plotting against observations! There is a built-in correlation 𝒃= 𝒃 +𝒆 Residuals Fitted values ASG 16
24 Assessing non-constant varianceResiduals vs. fitted values b or predictors 𝒂 Plotting against observations! There is a built-in correlation 𝒃= 𝒃 +𝒆 Residuals ✓ Fitted values ASG 16
25 Assessing non-constant varianceResiduals vs. fitted values b or predictors 𝒂 Plotting against observations! There is a built-in correlation 𝒃= 𝒃 +𝒆 Residuals ✓ Fitted values Having control over the fitted values (Discrete values) Approximately constant variability, no systematic curvature,…. Residuals Fitted values ASG 16
26 Assessing non-constant varianceResiduals vs. fitted values b or predictors 𝒂 Plotting against observations! There is a built-in correlation 𝒃= 𝒃 +𝒆 Residuals ✓ Fitted values Having control over the fitted values (Discrete values) Approximately constant variability, no systematic curvature,…. Residuals Fitted values ASG 16
27 Assessing non-constant varianceResiduals vs. fitted values b or predictors 𝒂 Plotting against observations! There is a built-in correlation 𝒃= 𝒃 +𝒆 Residuals ✓ Fitted values Having control over the fitted values (Discrete values) Approximately constant variability, no systematic curvature,…. Residuals ✓ Fitted values ASG 16
28 Assessing non-constant varianceWedge pattern Increasing the variance with mean Weighted Least Square Residuals Fitted values ASG 17
29 Assessing non-constant varianceWedge pattern Increasing the variance with mean Weighted Least Square Residuals Fitted values ASG 17
30 Assessing non-constant varianceWedge pattern Increasing the variance with mean Weighted Least Square Residuals X Fitted values ASG 17
31 Assessing non-constant varianceWedge pattern Increasing the variance with mean Weighted Least Square Residuals X Fitted values Bend pattern Systematic curvature Linear relation assumption is not correct Nonlinear regression Residuals Fitted values ASG 17
32 Assessing non-constant varianceWedge pattern Increasing the variance with mean Weighted Least Square Residuals X Fitted values Bend pattern Systematic curvature Linear relation assumption is not correct Nonlinear regression Residuals Fitted values ASG 17
33 Assessing non-constant varianceWedge pattern Increasing the variance with mean Weighted Least Square Residuals X Fitted values Bend pattern Systematic curvature Linear relation assumption is not correct Nonlinear regression Residuals X Fitted values ASG 17
34 Assessing non-constant varianceSystematic pattern Other predictor needed Residuals Fitted value ASG 18
35 Assessing non-constant varianceSystematic pattern Other predictor needed Residuals X Fitted value ASG 18
36 Assessing non-constant varianceSystematic pattern Other predictor needed Residuals X Fitted value Time dependent Residuals Time ASG 18
37 Assessing non-constant varianceSystematic pattern Other predictor needed Residuals X Fitted value Time dependent Residuals X Time ASG 18
38 Assessing non-constant variance - TestComparing group variance Dividing the data into m groups. Let 𝑒 𝑖𝑗 denote the 𝑖th of 𝑛 𝑗 variables in group 𝑗. Within-group sample variances are: 𝑠 𝑗 2 = 𝑖=1 𝑛 𝑗 ( 𝑒 𝑖𝑗 − 𝑒 𝑗 ) 𝑛 𝑗 −1 For non-normal distribution this test is not valid because the mean is not a good summary of the data ASG 19
39 Assessing non-constant variance - TestLevine’s test Replacing each value with 𝑍 𝑖𝑗 𝑍 𝑖𝑗 = 𝑒 𝑖𝑗 − 𝑒 𝑗 where 𝑒 𝑗 is the median (or median-dependent) value in group 𝑗. The test statistic, W, is defined as follows: 𝑊= (𝑛−𝑚) 𝑗=1 𝑚 𝑛 𝑗 (𝑍 0𝑗 − 𝑍 00 ) 2 (𝑚−1) 𝑗=1 𝑚 𝑖=1 𝑛 𝑗 (𝑍 𝑖𝑗 − 𝑍 0𝑗 ) 2 If 𝑊 ≤𝛼 (usually 0.05) violation of the assumption of constant variance ASG 20
40 Assessing non-normalityIf the residuals are normal Model inference is valid. Checking Normality : Normal Quantile-Quantile Plot If a set of residuals are approximately normally distributed, a Normal Quantile-Quantile (NQQ) plot of the residuals will result in an approximately straight line. ASG 21
41 Assessing non-normalityExample: QQ plot for a sample of size 9 from a normal distribution population 3.89, 4.75, 6.33, 4.75, 7.21, 5.78, 5.80, 5.20, 6.64 Step 1: order the data from smallest to largest: 3.89, 4.75, 4.75, 5.20, 5.78, 5.80, 6.33, 6.64, 7.21 Step 2: plot these values against appropriate quantiles from standard ND. f(z) z ASG 22
42 Assessing non-normalityExample: QQ plot for a sample of size 9 from a normal distribution population 3.89, 4.75, 6.33, 4.75, 7.21, 5.78, 5.80, 5.20, 6.64 Step 1: order the data from smallest to largest: 3.89, 4.75, 4.75, 5.20, 5.78, 5.80, 6.33, 6.64, 7.21 Step 2: plot these values against appropriate quantiles from standard ND. f(z) z -1.28 -0.52 0.52 1.28 Divide the distribution into 10 (n+1) equal areas. Reading the values on z axis from computer software or normal table ASG 22
43 Assessing non-normalityExample: QQ plot for a sample of size 9 from a normal distribution population Plotting the ordered value against its corresponding quantile from ND Theoretical Quantiles Sample Quantiles ASG 23
44 Assessing non-normalityHow is the QQ plot constructed? 𝑖th ordered value vs. 𝑖 𝑛+1 th quantile of standard normal distribution Instead of using 𝑖 𝑛+1 there is a variety of other possibilities, such as 𝑖−1/2 𝑛 or more generally 𝑖−𝑎 𝑛+1−2𝑎 ASG 24
45 Assessing non-normalityQuantile-Quantile plot of uniform distribution Sample Quantiles Uniform distribution Theoretical Quantiles ASG 25
46 Assessing non-normality Heavy-tailed distributionQuantile-Quantile plot of heavy-tailed distribution Sample Quantiles Heavy-tailed distribution Theoretical Quantiles ASG 26
47 Assessing non-normality Right-skewed distributionQuantile-Quantile plot of skewed distribution Sample Quantiles Right-skewed distribution Theoretical Quantiles ASG 27
48 Outliers and Influential ObservationsASG 28
49 Outliers and Influential ObservationsAn outlier is a point that falls far from the other data points. If the parameter estimates change a great deal when a point is removed from the calculations the point is said to be influential. X Y Outlier 1 Outlier1 included Outlier1 omitted X Y Outlier 2 Outlier2 included Outlier2 omitted ASG 29
50 Outliers and Influential ObservationsX Y Outlier 2 Outlier2 included Outlier2 omitted Low leverage Some influence The points with extreme values of X are said to have high leverage. How influential a point is , is a combination of how much leverage it has and how extreme it is in Y direction. Outlier 1 Outlier 4 Outlier 3 ASG 30
51 Outliers and Influential ObservationsX Y Outlier 2 Outlier2 included Outlier2 omitted Low leverage Some influence The points with extreme values of X are said to have high leverage. How influential a point is , is a combination of how much leverage it has and how extreme it is in Y direction. Outlier 1 High leverage High influence Outlier 4 Outlier 3 ASG 30
52 Outliers and Influential ObservationsX Y Outlier 2 Outlier2 included Outlier2 omitted Low leverage Some influence The points with extreme values of X are said to have high leverage. How influential a point is , is a combination of how much leverage it has and how extreme it is in Y direction. Outlier 1 High leverage High influence Outlier 4 Outlier 3 High leverage Low influence ASG 30
53 Outliers and Influential ObservationsX Y Outlier 2 Outlier2 included Outlier2 omitted Low leverage Some influence The points with extreme values of X are said to have high leverage. How influential a point is , is a combination of how much leverage it has and how extreme it is in Y direction. Outlier 1 High leverage High influence Outlier 4 High leverage Low influence Outlier 3 High leverage Low influence ASG 30
54 Numeric measures Leverage Recall: 𝑯=𝑨 ( 𝑨 𝑇 𝑨) −1 𝑨 𝑇 The leverage score for the 𝑖th data is defined as: ℎ 𝑖 = 𝐻 𝑖𝑖 . In simple linear regression ℎ 𝑖 = 1 𝑛 + ( 𝑋 𝑖 − 𝑋 ) 2 𝑗=1 𝑛 ( 𝑋 𝑗 − 𝑋 ) 2 High leverage Hat-values exceeding about twice the average hat-value. Influential observation Cook’s distance: 𝐷 𝑖 = 𝑒 𝑖 2 𝑀𝑆𝐸(𝑘+1) ℎ 𝑖 (1− ℎ 𝑖 ) 2 High influence 𝐷 𝑖 > 4 𝑛 Discrepancy Leverage ASG 31
55 Robust Regression ASG 32
56 Robust regression methodsWe expect: A robust regression procedure dampens the effect of observations that would be highly influential if LSM were used, Making the identification of outliers much easier It should produce the same results as LSM when there are no outliers, The estimation procedure should be relatively easy to perform. ASG 33
57 Robust regression methods Heavy-tailed distributionModifying the cost function LSM: 𝐽 𝒆 = 𝑖 𝑒 𝑖 2 (L2 regression) LAD: 𝐽 𝒆 = 𝑖 𝑒 𝑖 (L1 regression) Peter Huber (1964) Compromise between L1 and L2 regression Idea: penalize small residuals quadratically, and large residuals linearly 𝐽 𝒆 = 𝑖 𝜌 𝑒 𝑖 , ρ 𝑒 𝑖 = 𝑒 𝑖 𝑒 𝑖 <𝑐 𝑐 2 𝑒 𝑖 −𝑐 𝑒 𝑖 >𝑐 Heavy-tailed distribution ASG 34
58 Robust regression methodsM-estimators (Maximum Likelihood) obtained by minimizing a cost function, or equivalently, solving 𝑖 ψ 𝑒 𝑖 𝒙 𝑖 =0 Where ψ 𝑒 𝑖 = ρ 𝑒 𝑖 . a convenient way to solve for M-estimators is to use an iteratively weighted least squares (WLS) algorithm, in which we calculate appropriate weight for observations , solve the weighted least squares problem, re-calculate the weights, re-solve, and so on until convergence ASG 35
59 Maximum likelihood estimation - ExampleProbability density function (PDF) Density of a random variable, a function describes the relative likelihood for this random variable to take on a given value. used before data are available to describe possible future outcomes Normal density function 𝑓 𝜀;𝜇, 𝜎 2 = 1 2𝜋 𝜎 2 exp − 𝜀−𝜇 2 2𝜋 𝜎 2 Double exponential distribution 𝑓 𝜀 = 1 2𝜎 exp − 𝜀 𝜎 f(ε) ε ASG 36
60 Maximum likelihood estimation - ExampleLikelihood function 𝐿 𝑥 =𝑓(𝑎𝑙𝑙 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠|𝑥) used after data are available to describe a function of a parameter for a given outcome. 𝐿 𝑥 = 𝑖=1 𝑛 1 2𝜎 exp − 𝑒 𝑖 𝜎 = 1 (2𝜎) 𝑛 exp − 𝑖=1 𝑛 𝑒 𝑖 𝜎 Maximum likelihood minimizing the sum of absolute residuals ASG 37
61 Weighted least squaresApplications Focusing accuracy: We may care very strongly about predicting the response for certain values of the input. Larger weights near one region - regression will be pulled towards matching the data in that region Discounting imprecision: In case of heteroscedasticity , even in presence of Gaussian noise, LS is no longer efficient. Other optimization problems can be transformed into, or approximated by weighted least squares (M-estimators). ASG 38
62 Different robust estimatorsType Ψ(𝒖) 𝒘(𝒖) Least squares 1 Huber M-estimator k=1.345 Andrews’ wave function a=1.339 sin 𝑢 𝑖 𝑎 𝑢 𝑖 <𝑎π sin 𝑢 𝑖 𝑎 𝑢 𝑖 𝑎 𝑢 𝑖 <𝑎π Welsch M-estimator k=4.9675 𝑘 −exp 𝑢 𝑖 𝑘 2 exp 𝑢 𝑖 𝑘 2 IGG-3 scheme k1=1.5 k2=5.0 ASG 39
63 When not to use robust methods Heavy-tailed distributionRobust methods assume that the underlying distribution is roughly normal (and therefore unimodal and symmetrical) but contaminated with outliers and heavy tails. The methods will give misleading results if they are applied to data sets that are markedly skewed or multimodal, or if a large proportion of the data are identical in value. ✓ X X Heavy-tailed distribution Skewed distribution Uniform distribution ASG 40
64 Outlier detection and robust estimation of gravity field parametersASG 41
65 Range rate residuals plot – May 2006Standardized range rate residuals Standardized range rate residuals Time (MJD) Predicted values – Range rate perturbation (μm/s) ASG 42
66 Normal quantile-quantile plot – May 2006 Theoretical QuantilesHeavy-tailed distribution (Largest values are larger than they would be expected) Outliers! Sample Quantiles Theoretical Quantiles ASG 43
67 Weighting matrix ITSG- GRACE 2014 Taking noise correlation into account - Variance Component Estimation (VCE) 𝑊 𝑉𝐶𝐸 = Σ −1 Inverse of noise covariance matrix Modifying the weighting matrix by combining robust estimator and VCE 𝑊 𝑡𝑜𝑡𝑎𝑙 = 𝑊 𝑉𝐶𝐸 + 𝛼𝑊 𝑅𝑜𝑏𝑢𝑠𝑡 Implemented robust estimators and their corresponding weighting functions. ASG 44
68 Outlier detection in range rate observationsIn reality, correlation still exists! All the statistical tests and assumptions are based on uncorrelated residuals. Idea: eliminate the low frequency part and treat the rest as uncorrelated. Eliminating the low frequency region of the post-fit range rate residuals of the test month (May 2006). Detected outliers in range rate observations of the test month (May 2006) ASG 45
69 Results - Comparing gravity field solutionsDegree variance of different gravity field solutions for May 2006. ASG 46
70 Results –Frequency domainPSD of range rate residuals of different gravity field solutions for May 2006. In Hz region, the values of residuals are reduced particularly by Huber and Welsch estimators ( losing signal?) ASG 47
71 Results – Spatial domain (EWH)Gravity variation (EWH) VCE solution rel. to GOCO05s (May 2006). Welsch M-estimator - VCE Huber M-estimator - VCE IGG-3 scheme - VCE ASG 48
72 Results – Spatial domain (EWH)Gravity variation (EWH) VCE solution rel. to GOCO05s (May 2006). Welsch M-estimator - VCE Huber M-estimator - VCE IGG-3 scheme - VCE ASG 48
73 Summary Fitting a model is just the first stepNeed to check whether assumptions are satisfied Residual plots are our friend Residuals vs. Predicted values QQ plots to check for normality Outliers and Influential Observations Eliminate correlation Outlier detection Using a robust method that down weights outliers ASG 49
74 Questions? ASG 50
75 Noise Analysis and Outlier DetectionSaniya Behzadpour Institute of Geodesy WG Satellite Geodesy and Theoretical Geodesy Advanced Satellite Geodesy, SS 2016 ASG