Statistical Forecasting

1 Statistical ForecastingJan Verkade November 3, 2016 ...
Author: Marianna Ryan
0 downloads 2 Views

1 Statistical ForecastingJan Verkade November 3, 2016

2 Statistical Forecasting = forecasting from dataWhat does that mean? What other types of forecasting do you know?

3 Regression analysis Regression analysis: predicting future values of a variable using information about other variables Predictor: the variable that you want to forecast Predictand: the variable that you use as input what we hope to find is that the different variables do not vary independently (in a statistical sense), but that they tend to vary together. we assume that the future will behave like the past

4 Regression models A predictand may depend on predictor(s) in varying ways: y ~ x y ~ a + bx y ~ x2 …

5 The linear (regression) modelπ‘Œ 𝑑 = 𝑏 0 + 𝑏 1 𝑋 1𝑑 + 𝑏 2 𝑋 2𝑑 + …+ 𝑏 π‘˜ 𝑋 π‘˜π‘‘ prediction for Y is a straight-line function of each of the X-variables contributions of different X variables to predictions are additive slopes b1, b2, etc: coefficients of the variables intercept b0

6 Justification of linear model for regression assumptionsWhy should we assume that relationships between variables are linear? Because linear relationships are the simplest non-trivial relationships that can be imagined (hence the easiest to work with), and..... Because the "true" relationships between our variables are often at least approximately linear over the range of values that are of interest to us, and... Even if they're not, we can often transform the variables in such a way as to linearize the relationships.

7 Fitting a linear model We fit a linear model through an objective function: minimise the mean squared error (MSE) Steps: Standardize variables: convert them to units of standard-deviations-from-the-mean Calculate average product of standardized values Minimize mean squared error Subsitute, re-arrange and solve for b0 and b1

8 Fitting a linear model Standardize variables: convert them to units of standard-deviations-from-the-mean 𝑋 𝑑 βˆ— = 𝑋 𝑑 βˆ’π‘šπ‘’π‘Žπ‘›(𝑋) 𝑠𝑑𝑑𝑒𝑣(𝑋) π‘Œ 𝑑 βˆ— = π‘Œ 𝑑 βˆ’π‘šπ‘’π‘Žπ‘›(π‘Œ) 𝑠𝑑𝑑𝑒𝑣(π‘Œ)

9 Fitting a linear model Standardize variables: convert them to units of standard-deviations-from-the-mean Calculate average product of standardized values π‘Ÿ π‘‹π‘Œ = 1 𝑛 𝑋 1 βˆ— π‘Œ 1 βˆ— + 𝑋 2 βˆ— π‘Œ 2 βˆ— +…+ 𝑋 𝑛 βˆ— π‘Œ 𝑛 βˆ—

10 Fitting a linear model Standardize variables: convert them to units of standard-deviations-from-the-mean Calculate average product of standardized values Minimize mean squared error π‘Œ 𝑑 βˆ— = π‘Ÿ π‘‹π‘Œ 𝑋 𝑑 βˆ—

11 Fitting a linear model Standardize variables: convert them to units of standard-deviations-from-the-mean Calculate average product of standardized values Minimize mean squared error Subsitute, re-arrange and solve for b0 and b1 π‘Œ 𝑑 βˆ’π‘šπ‘’π‘Žπ‘›(π‘Œ) 𝑠𝑑𝑑𝑒𝑣(π‘Œ) = π‘Ÿ π‘‹π‘Œ 𝑋 𝑑 βˆ’π‘šπ‘’π‘Žπ‘›(𝑋) 𝑠𝑑𝑑𝑒𝑣(𝑋) π‘Œ 𝑑 βˆ— = π‘Ÿ π‘‹π‘Œ 𝑋 𝑑 βˆ— π‘Œ 𝑑 = 𝑏 0 + 𝑏 1 𝑋 1𝑑 𝑏 1 = π‘Ÿ π‘‹π‘Œ 𝑠𝑑𝑑𝑒𝑣(π‘Œ) 𝑠𝑑𝑑𝑒𝑣(𝑋) 𝑏 0 =π‘šπ‘’π‘Žπ‘› π‘Œ βˆ’ 𝑏 1 π‘šπ‘’π‘Žπ‘›(𝑋)

12 Exercise: piezometric head within a levee

13 Exercise: piezometric head within a leveeriver water level water pressure sensor

14 Exercise: piezometric head within a leveeUse voorhavendijk.xls Explore the data by building a scatter (x,y) plot Determine mean and standard deviations Determine standardized values; then explore… marginal distributions (ecdf of either variable) joint distribution (scatter plot) Determine the coefficient of correlation Determine the coefficients of the regression equation Verify by using Excel’s built-in function to show regression line

15 Exercise: piezometric head within a levee

16 Exercise: piezometric head within a leveeDiscuss: is the linear model a good model?

17 Exercise: piezometric head within a leveeHow to use / interpret the regression line?

18 Exercise: piezometric head within a leveeUse voorhavendijk.xls Explore the data by building a scatter (x,y) plot Determine mean and standard deviations Determine standardizes values; then explore… marginal distributions (ecdf of either variable) joint distribution (scatter plot) Determine the coefficient of correlation Determine the coefficients of the regression equation Verify by using Excel’s built-in function to show regression line Explore the residuals by plotting an empirical cumulative density function. What is the mean value? How are the residuals distributed?

19 LM-model: residuals

20 LM-model: residuals mean: e-18 stdev:

21 Exercise: piezometric head within a leveeHow to use / interpret the regression line?

22 Forecasting errors Intrinsic risk: signal v noiseParameter risk: uncertain parameter values Model risk: the risk of choosing the wrong model (linear model v quadratic model, for example)

23 Confidence Intervals v Prediction Intervals

24 An alternative statistical technique: Quantile RegressionPrinciples: QR is a method for describing conditional quantiles Rather than minimising the mean squared error (MSE) QR is based on minimising the mean absolute error (MAE) This yields not the sample mean but the sample median Other quantiles may be derived by adding weights to errors E.g. weight = .1 for positive errors and .9 for negative errors Fitting models may be done in transformed space to account for heteroscedasticity

25

26

27 Application in real-time hydrologic forecasting: post-processingEnsemble techniques Post-processing techniques

28 Application in real-time hydrologic forecasting: post-processingOnce a record of forecasts is in place This record can be analysed for β€˜forecast errors’ And these records can be assumed to occur in future forecasts also

29 1: Find a relationship between forecast and obs5 december 2017 1: Find a relationship between forecast and obs

30 2. Apply that relation to new forecasts

31 And here’s your forecast5 december 2017 And here’s your forecast

32 Famous forecasting quotes"I have seen the future and it is very much like the present, only longer." --Kehlog Albran, The Profit οƒ  Pretty concise description of statistical forecasting: We search for statistical properties of a time series that are constant in time (levels, trends, seasonal patterns, correlations and autocorrelations, etc.) We then predict that those properties will describe the future as well as the present

33 Famous forecasting quotes"Prediction is very difficult, especially if it's about the future." --Nils Bohr, Nobel laureate in Physics warning of the importance of validating a forecasting model out-of-sample. It's often easy to find a model that fits the past data well--perhaps too well!β€” but quite another matter to find a model that correctly identifies those patterns in the past data that will continue to hold in the future.