This study develops a new error modelling method for ensemble short-term and real-time streamflow forecasting, called error reduction and representation in stages (ERRIS). The novelty of ERRIS is that it does not rely on a single complex error model but runs a sequence of simple error models through four stages. At each stage, an error model attempts to incrementally improve over the previous stage. Stage 1 establishes parameters of a hydrological model and parameters of a transformation function for data normalization, Stage 2 applies a bias correction, Stage 3 applies autoregressive (AR) updating, and Stage 4 applies a Gaussian mixture distribution to represent model residuals. In a case study, we apply ERRIS for one-step-ahead forecasting at a range of catchments. The forecasts at the end of Stage 4 are shown to be much more accurate than at Stage 1 and to be highly reliable in representing forecast uncertainty. Specifically, the forecasts become more accurate by applying the AR updating at Stage 3, and more reliable in uncertainty spread by using a mixture of two Gaussian distributions to represent the residuals at Stage 4. ERRIS can be applied to any existing calibrated hydrological models, including those calibrated to deterministic (e.g. least-squares) objectives.

Streamflow forecasts have long been used to support management
of river conditions, such as flood emergency response and optimal water
allocation. Recently, much research has been carried out on ensemble
streamflow forecasting (e.g. Alfieri et al., 2013; Bennett et al., 2014a;
Demargne et al., 2014; Thielen et al., 2009), encouraged by research
communities such as the Hydrological Ensemble Prediction Experiment (HEPEX –

Streamflow forecasts are usually made by initializing hydrological models (e.g. conceptual rainfall–runoff models) and then forcing them with forecast rainfall. There are a number of sources of errors in streamflow forecasts, including errors in measurement of rainfall and streamflow, errors in hydrological model structure, errors in model parameters, and errors in forecast rainfall. Ideal hydrological error quantification would account for each individual source of errors explicitly and reliably, such that all sources of errors would accumulate to accurately represent overall errors in the streamflow forecasts. Various attempts have been made to identify and decompose the sources of errors, using methods such as sequential optimization and data assimilation (Vrugt et al., 2005), sequential assimilation (Moradkhani et al., 2005), the Bayesian total error analysis (BATEA) (Kavetski et al., 2006a, b; Kuczera et al., 2006), and the integrated Bayesian Uncertainty estimator (IBUNE) (Ajami et al., 2007). Such methods are useful for attempting to separate the major sources of errors, identifying deficiencies of model structure, performing parameter sensitivity analyses and comparing different hydrological models, without confounding input and output errors. However, because of a lack of information on the different sources of errors and on how they interact with each other, it is highly challenging to apply an error decomposition approach to arrive at statistically reliable overall errors in streamflow forecasts (Renard et al., 2010).

An alternative approach is to consider only the overall errors of forecasts, without attempting to explain the sources of errors. An estimate of the overall error of a forecast is the residual, defined as the difference between modelled streamflow and observations. We now concentrate our discussion on residuals, but we will continue to refer to models of residuals as “error models”, following common practice. Residuals of a series of forecasts form a time series. The most traditional and simplest error model, related to the classical least-squares calibration, is based on the assumption of uncorrelated homoscedastic Gaussian residuals in the time series of residuals (Diskin and Simon, 1977). This assumption is generally not valid for hydrological applications, where residuals are frequently autocorrelated, heteroscedastic and non-Gaussian (Kuczera, 1983; Sorooshian and Dracup, 1980). More sophisticated error models have been developed to address correlation, variance structure and the distribution of residuals. Autoregressive models have been widely used to account for autocorrelation of residuals (e.g. Bates and Campbell, 2001; Xiong and O'Connor, 2002). Heteroscedasticity may be explicitly dealt with by describing the variance of residuals as a function of some state-dependent variables (e.g. observed streamflow, dry/wet seasons) (e.g. Evin et al., 2013; Pianosi and Raso, 2012; Schaefli et al., 2007; Yang et al., 2007). Non-Gaussianity of residuals may be explicitly represented by non-Gaussian probability distributions (e.g. Marshall et al., 2006; Schaefli et al., 2007; Schoups and Vrugt, 2010). Heteroscedasticity and non-Gaussianity of residuals may also be dealt with implicitly, and often more conveniently, by using data transformation to normalize the residuals and stabilize their variance, such as the normal quantile transform (Kelly and Krzysztofowicz, 1997; Krzysztofowicz, 1997; Montanari and Brath, 2004), the Box–Cox transformation (Thyer et al., 2002) and the log–sinh transformation (Wang et al., 2012). Solomatine and Shrestha (2009) presented an alternative method of predicting residual error distributions using machine learning techniques. They built a non-linear regression model to predict the forecast quantiles at each lead time. Their method is not based on an autoregressive model but captured recent information about the model error with a non-linear regression.

Broadly, previous attempts to model residuals can be divided into “post-processor” methods that separate the estimation of hydrological model parameters from the estimation of error model parameters, and “joint inference” methods that estimate all parameters at once. Post-processor methods (e.g. Evin et al., 2014) are often held to be less theoretically desirable than joint inference methods (e.g. Kuczera, 1983; Bates and Campbell, 2001). This is because joint inference methods aspire to a complete description of the behaviour of errors, including behaviours that arise from interactions between parameters from hydrological and error models (see discussion in Evin et al., 2014). Unfortunately, joint inference methods can have serious limitations for operational forecasting of streamflows. Li et al. (2015) showed that a joint inference method caused poor performance in the hydrological model when it was isolated from the error model (we will call this the “base” hydrological model). Error models that account for autocorrelated residuals usually have less influence on forecasts as lead time increases. Thus, as lead time increases, and the influence of the error model decreases, the quality of the forecast relies on the performance of the base hydrological model and the quality of meteorological forecasts (Bennett et al., 2014a). Evin et al. (2014) demonstrated another (and perhaps more egregious) limitation of joint inference methods: joint estimation can result in deleterious interference between error model and hydrological model parameters, leading to poor out-of-sample streamflow predictions. In our experience, interactions between parameters of the hydrological model and the error model can make it very difficult to calibrate the models jointly. The shape of the distribution of forecast residuals can change markedly after hydrological model forecasts are updated, for example with an autoregressive error model. Despite considerable progress in hydrological uncertainty modelling, few studies in the literature present model forecasts (or simulations) that are practically reliable when error updating is applied (e.g. Gragne et al., 2015; Schoups and Vrugt, 2010).

This paper develops a new error modelling method, called error reduction and
representation in stages (ERRIS), for real-time and short-term streamflow
forecasting applications. ERRIS is a further development of the restricted
autoregressive model (Li et al., 2015) and a seasonal error model developed
by Li et al. (2013). ERRIS is a post-processing method developed to deal with
the overall errors of streamflow forecasts resulting from hydrological
uncertainty only. We assume that errors in streamflow forecasts due to
weather forecasts (precipitation in particular) will be considered separately
by using ensemble weather forecasts (Bennett et al., 2014a; Robertson et
al., 2013; Shrestha et al., 2013), and we do not consider these in this
paper. For convenience, in this study we use the term

In this study we use the term “ensemble” to mean a set of equally probable realizations of future streamflow that represents the hydrological model uncertainty. The forecasts based on ERRIS are not typical probabilistic forecasts (Gneiting and Katzfuss, 2014), which explicitly provide the predictive distribution of future streamflow. For ERRIS, the probability distribution may be theoretically derived for one-step-ahead forecasts based on the distributional assumption of model residuals. However, we can only obtain the predictive distribution of ERRIS forecasts at multiple step by means of Monte Carlo simulation.

The novelty of ERRIS is that it does not rely on a single complex error model but runs a sequence of simple error models through multiple stages. We start with a very simple model of independent Gaussian residuals after data transformation to determine hydrological model parameters. At each subsequent stage, an error model is introduced to improve over the previous stage and to finalize the representation, including associated parameter values, of one particular statistical feature (bias, correlation in residuals or a non-Gaussian distribution). ERRIS progressively refines model features, focusing only on a small number of model parameters at each stage. This is achieved by estimating the values for a core set of parameters at each stage and holding them constant at subsequent stages. In doing so, ERRIS avoids the problems associated with parameter interactions that can occur under joint inference methods.

This paper is organized as follows. The ERRIS method is described in detail in Sect. 2. A case study is introduced in Sect. 3. Major results are presented in Sect. 4, followed by discussion and further results in Sect. 5. Conclusions are made in Sect. 6.

We start from a simplified version of the seasonally
invariant error model described by Li et al. (2013) to calibrate the
hydrological model in the ERRIS method. At Stage 1, we apply the log–sinh
transformation (Wang et al., 2012)

We denote the observed and simulated streamflows at day

Summary of the ERRIS method.

At Stage 1, we assume that the hydrological simulation is overall unbiased.
However, the hydrological model often overestimates low flows and
underestimates high flows. At Stage 2, we adopt a simple but effective
bias-correction scheme firstly introduced by Wang et al. (2014) to revise
the forecast value made at Stage 1. This bias correction describes the
forecast bias in the transformed domain by a linear function. Because the
bias correction is applied to transformed data, it is able to cope with
conditional biases (biases that vary with flow magnitude) that are often
present in hydrological model simulations, even if these vary in a strongly
non-linear way. We express the specific error model structure of Stage 2 as

At the end of Stage 2, the forecast median in the original space is revised
to

At Stage 3, we no longer assume that forecast residuals are independent, and
use an AR-based error model to describe the correlation structure of forecast
residuals. The AR-based error model enables the ERRIS method to correct
forecast residuals based on the latest available observations of streamflow.
Specifically, we assume that the forecast residuals at Stage 2 follow a
restricted AR error model described by Li et al. (2015). The error model at
Stage 3 can be written as

In Sect. 4, we will demonstrate that the residuals after Stages 1 and 2 are well described by Gaussian distributions, but the shape of the residual distribution after Stage 3 dramatically changes. In particular, the distribution of the residuals after Stage 3 looks more peaked and has longer tails than a Gaussian distribution. The reason for the non-Gaussian residuals after Stage 3 is as follows. The AR updating at Stage 3 is very effective in correcting small residuals especially at hydrograph recession and therefore reducing residuals to very small values. The updating, however, is not very effective around peaks, where the residuals remain large even in the transformed space. This results in a centrally peaked and long-tailed distribution of residuals after Stage 3.

At Stage 4, we use a non-Gaussian distribution to describe the model
residuals from Stage 3. Several long-tailed distributions have been used in
hydrological modelling studies, such as the finite mixture distribution
(Schaefli et al., 2007; Smith et al., 2010), the exponential power
distribution (Schoups and Vrugt, 2010) and Student's

Using a two-component Gaussian mixture distribution, we express the residual
model at Stage 4 as

The maximum likelihood estimation (Li et al., 2013; Wang et al., 2009) is
used to estimate model parameters at all four stages. Denote the parameter
set as

At Stage 1, the hydrological model parameters, transformation parameters (

At Stage 2, the bias-correction parameters (

The shuffled complex evolution (SCE) algorithm (Duan et al., 1994) is used to maximize the log likelihood function at Stage 1, where a number of parameters are required to be calibrated. The simplex algorithm (Nelder and Mead, 1965) is used in the likelihood-based calibration at other stages, where fewer parameters are present. We use different optimization algorithms because the simplex algorithm is more computationally efficient when the number of parameters is small.

We use several performance measures to evaluate the ensemble forecasts
derived at each stage. The evaluation criteria suggested by Engeland et
al. (2010) are used to test for important attributes of ensemble forecasts
including

The

We evaluate the overall forecast skill with a skill score derived from the
widely used continuous ranked probability score (CRPS) (Gneiting and
Katzfuss, 2014; Grimit et al., 2006; Wang et al., 2009) (denoted by
CRPS_SS). CRPS is a negatively oriented score: a smaller value of CRPS
indicates a better forecast. As with the relative AWCI, the skill score
CRPS_SS is defined as the normalized version of CRPS with respect to a
reference forecast:

While a decomposition of CRPS is available that gives an indication of reliability (Hersbach, 2000), we do not use this. PIT plots are a stronger test of reliability (Candille and Talagrand, 2005), and accordingly we focus on PIT plots to discuss reliability.

We select six catchments in southeastern Australia and three catchments in
the United States (US) for this study (Fig. 1), from a range of climatic and
hydrological conditions. The streamflow data for the Australian catchments
are obtained from the Catchment Water Yield Estimation Tool (CWYET) data set
(Vaze et al., 2011). The rainfall and potential evaporation data for the
Australian catchments are taken from the Australian Water Availability
Project (AWAP) data set (Jones et al., 2009). All data for the US catchments
are taken from the Model Intercomparison Experiment (MOPEX) data set (Duan et
al., 2006). The Abercrombie and Emu catchments have many instances of zero
flow (Table 2), and accurate streamflow forecasting is particularly
challenging for such dry catchments. AWCI

Map of the catchments used in this study.

Basic catchment characteristics (1992–2005).

Daily streamflow is simulated with the GR4J rainfall–runoff model (Perrin et al., 2003) and then forecasted with ERRIS as described in Sect. 3. GR4J is a widely used conceptual model that was designed to be as parsimonious as possible. Its four parameters describe the depth of a production store (X1), groundwater exchange (X2), the depth of a routing store (X3) and the length of unit hydrographs (X4). Forecasts are generated from “perfect” (observed) deterministic rainfall forecasts at a lead time of 1 day (i.e. one time step ahead). All results reported in this study are based on cross-validation unless specified. Cross-validation allows us to generalize the forecast skill to data outside the sample period. Because of data availability, we choose different study periods for Australian and US catchments. For Australian catchments, data from 1990 to 1991 are used to warm up the hydrological model and the data from 1992 to 2005 are used to generate a leave-two-years-out cross-validation (i.e. effectively 14-fold cross-validation). For a particular year, we remove the streamflow data from this year and the following year and apply ERRIS to forecast the streamflow for the year. The removal of the data from the following year aims to minimize the impact of streamflow memory on model performance. For US catchments, the data from 1979 to 1980 are used in the warm-up period and the data from 1981 to 1998 are used for a leave-two-years-out cross-validation (i.e. effectively 18-fold cross-validation).

AWCI and CRPS calculated from the reference forecast for each catchment.

An example of streamflow time-series plots for the Mitta Mitta catchment in the period between 1 July 2000 and 31 December 2000.

Figure 2 compares forecasts at different stages for an example period. In this example, we generate daily streamflow forecasts for the Mitta Mitta catchment in the period between 1 July 2000 and 31 December 2000. The forecast mean and the 95 % forecast interval are plotted against observations. The forecast at Stage 1 (the base hydrological model forecast) frequently overestimates low flows, such as in the period between July and September. For high-flow periods (e.g. October), the forecast mean is generally more accurate but virtually all observations lie within the 95 % forecast intervals, suggesting that the forecast intervals are too wide (i.e. the forecasts may be underconfident). The forecast mean at Stage 2 is closer to the observations and the 95 % forecast intervals tend to be narrower. Stage 2 tends to overestimate high flows less than Stage 1, but introduces the problem of underestimating high flows in some instances (e.g. September).

Comparison of performance metrics for each catchment and each stage.

Comparison of the cumulative probability distribution of the PIT at different stages (light-blue shaded strips indicate the 95 % significant band of the Kolmogorov–Smirnov test).

The AR error updating applied in Stage 3 significantly reduces the forecast residuals, as we expect given that streamflows are usually heavily autocorrelated. The forecasts at Stage 3 are not only more accurate but also more certain, indicated by the considerably narrower 95 % forecast intervals. The differences between Stage 3 and Stage 4 are not evident in the time-series plots, in essence because Stage 4 is an attempt to address issues of reliability, which is difficult to see when forecast intervals are so narrow. We give a detailed view of changes to reliability at each stage below.

Figure 3 summarizes the performance at each stage for all catchments, and generally confirms the improvements in performance at each stage observed in Fig. 2. In general, Stage 1 and Stage 2 are similarly efficient (Fig. 3b), skillful (Fig. 3c), sharp (Fig. 3d) and reliable (Fig. 3e). As we expect, Stage 2 forecasts are consistently less biased than Stage 1 (Fig. 3a) (except for the Hope catchment, where many instances of zero flow occur; see Table 2). Stage 3 is generally much more efficient and skillful than Stages 1 and 2. A partial exception to this is the Abercrombie catchment, which is less efficient at Stage 3 than Stage 2. The Abercrombie catchment experiences low (to zero) flows, but is also punctuated by abrupt high flows. Stage 3 is based on the time persistence of the residuals and may introduce more errors when flows change abruptly, which sometimes occurs in the Abercrombie catchment. In addition, residuals tend to be larger at higher flows and because NSE is a measure of squared residuals, it tends to give more weights to residuals at high flows. This causes the Abercrombie Stage 3 forecasts to be less efficient than those of Stage 2.

As we expect, Stage 3 forecasts are notably sharper than those at Stage 2
(Fig. 3d). However, this sharpness is not supported by reliability: Stage 3
forecasts tend to be much less reliable than all other stages (Fig. 3e).
Figure 4 illustrates the reliability of the forecasts at each stage in more
detail with the PIT plots. As PIT values are highly autocorrelated, we have
to “thin” them in order to make the Kolmogorov–Smirnov significant band
applicable (Zhao et al., 2015). We generate PIT plots from every 30th
forecast to eliminate the autocorrelation. The PIT plots show that the
forecasts at the first two stages are reliable (as with the

Comparison of the different probability density functions fitted to the model residuals at Stage 3 for each catchment.

At Stage 3, unreliable forecasts are caused by representing the model
residual by an inappropriate (Gaussian) probability distribution. We compare
the underlying density of the model residuals at Stage 3,

To provide a basis for any future comparisons with this study, we include
example parameter values for each stage in Table 4 (derived by calibrating
each stage to the full set of data – i.e. without cross-validation). We note
that (1) the variance parameter at Stage 3 is always much smaller than at
Stage 1 and Stage 2, which leads to the dramatic reduction in the width of
forecast intervals at this stage, and (2) that the

The calibrated error model parameters for the selected catchments.

It is possible to use long-tailed distributions other than the Gaussian
mixture distribution at Stage 4. For example, Student's

We first examine how well Student's

Comparison of the upper quantile of the model residuals fitted by different distributions for each catchment.

A disadvantage of the very long tail of Student's

The calibrated parameters when Student's

Comparison of streamflow observations with streamflow forecast mean
for each catchment when the residual distribution is fitted by Student's

Same as Fig. 3 but the hydrological model is calibrated by the least-squares method.

Same as Fig. 4 but the hydrological model is calibrated by the least-squares method.

In this study, we apply a likelihood-based calibration at Stage 1 to derive the distribution of the forecast residuals. However, in operational practice forecasters may prefer to use their own methods for calibrating hydrological models (or it may be onerous to recalibrate large numbers of hydrological models, whatever method is used). It is possible to simply “bolt on” the ERRIS method to existing hydrological models. We simply need to calibrate the transformation parameters and the model residual standard deviation at Stage 1 while fixing the hydrological parameters to those already calibrated. We demonstrate this by first calibrating hydrological models with a simple least-squares objective. We then apply the ERRIS method and repeat the cross-validation analysis.

Figure 8, an analog to Fig. 3, summarizes forecast performance when the hydrological model is calibrated to a least-squares objective. The least-squares calibration essentially maximizes NSE as an objective, but the corresponding cross-validated NSE is not necessarily always greater than that of the likelihood-based calibration. The forecast performance from the two different calibrations can differ markedly at Stage 1 but is largely similar after the AR error updating at Stages 3 and 4. Thus, ERRIS is flexible enough to accommodate existing hydrological models.

Figure 9, an analogue to Fig. 4, compares the PIT plots for different catchments when the hydrological model is least-squares-calibrated. The main change is that the forecasts at Stage 1 are no longer reliable in many instances. This is caused by the least-squares calibration, which does not ensure the forecast residuals are Gaussian (even after the log–sinh transformation). The PIT plots derived from Stage 2 and Stage 3 in Fig. 9 show a very similar pattern to their counterparts in Figure 4. It suggests that poor reliability at Stage 3 occurs irrespective of the calibration strategy employed for the hydrological model. As with Fig. 4, Fig. 9 shows the Gaussian mixture distribution used at Stage 4 effectively ameliorates the problems with the reliability of Stage 3.

There are several advantages of using a multi-stage error model compared to a single complex error model. (1) The parameter estimation in ERRIS is relatively simple, and hence computationally efficient. Only a small number of parameters are estimated at each stage. Joint parameter estimations associated with a single complicated error model are often more computationally demanding. (2) Interference between parameters is minimized. The parameters of a single complex model can confound each other and the contribution of one parameter can sometimes be explained by others. For example, the hydrological model parameters describing soil moisture storage capacity may interfere strongly with the error parameters describing bias. Interference between parameters can make the parameter estimation unstable, because more than one set of parameters can achieve a similar objective function value, and thus over-fit parameters. (3) In operational forecasting it is often important that individual components of the forecasting model can function independently. For example, if forecasts are issued to long lead times, the influence of an AR model diminishes as lead time extends. Thus forecasts at long lead times rely strongly on the hydrological model (and, in our case, a bias correction) to be plausible, even with perfect meteorological forcings. If all parameters are estimated jointly, it is difficult to guarantee that each component of a forecasting model can operate independently. In addition, because stages are independent, it is possible to change a stage without affecting other stages, making the ERRIS approach easy to extend or modify.

This paper is aimed at developing a staged error model suitable for eventual use in an operational ensemble forecasting system. We have focused on presenting the theoretical underpinnings of this approach, and have limited its testing to forecasting with “perfect” (observed) rainfall forecasts at a lead time of 1 day. Operational systems routinely forecast to long lead times, and use uncertain rainfall forecasts to force hydrological models. In future work we will extend the validation of this model to forecast multiple lead times, and couple the ERRIS approach with reliable ensemble rainfall forecasts (Robertson et al., 2013; Shrestha et al., 2015).

The staged approach of ERRIS sets it apart from several predecessors, for example the hydrological uncertainty processor (HUP) and the dynamic uncertainty model by regression on absolute error (DUMBRAE). HUP is a Bayesian forecasting system to produce probabilistic streamflow forecasts (Kelly and Krzysztofowicz, 1997; Krzysztofowicz, 1999, 2001; Krzysztofowicz and Kelly, 2000; Reggiani et al., 2009; Todini, 2008). HUP and ERRIS have some similarities: (1) both are post-processors of deterministic hydrological models for hydrological uncertainty quantification, (2) both apply transformation to normalize data, (3) both use a linear regression in the transformed space for bias correction, and (4) both use an autoregressive model to update hydrological simulation. However, ERRIS differs fundamentally from HUP by being implemented in stages. As we have noted, the staged approach avoids unwanted interaction between parameters, and ensures the base hydrological model performs as strongly as possible. In addition, some other technical advances distinguish ERRIS from HUP. For instance, ERRIS applies a restricted autoregressive model in order to avoid the possible overcorrection from the ordinary autoregressive model used in HUP. ERRIS uses a mixture of two Gaussian distributions for the residual distribution, which is more flexible than a Gaussian distribution used in HUP to describe the peak, shoulder and tail of the distribution.

Pianosi and Raso (2012) developed DUMBRAE to quantify predictive uncertainty of deterministic hydrological models. Unlike ERRIS, DUMBRAE does not apply data transformation and considers an error model in the original space. To deal with heteroscedastic residual errors, DUMBRAE explicitly formulates the error variance as a function of time series of absolute hydrological model errors and several independent predictors (such as precipitation). The dynamic variance model of DUMBRAE is an interesting alternative to the method we have presented here. As with HUP, another major difference between ERRIS and DUMBRAE is staged error modelling that allows ERRIS to characterize the forecast error in stages and to avoid potential parameter interference and ensure robust performance of the base hydrological model.

In this study, we introduce the error
reduction and representation in stages (ERRIS) method to update errors and
quantify uncertainty in streamflow forecasts. The first stage of ERRIS
employs a simple error model that assumes independent Gaussian residuals
after the log–sinh transformation. The second stage applies a
bias correction that is able to correct conditional and unconditional biases,
including the sometimes strongly non-linear biases that occur in ephemeral
catchments. The third stage exploits autocorrelation in residuals with an AR
model to dramatically reduce forecast residuals, but this results in
unreliable ensemble forecasts. In the fourth stage a Gaussian mixture
distribution is used to describe the residuals, resulting in ensemble
forecasts that are both highly accurate and very reliable. Based on extensive
validation of ERRIS, the accuracy of the forecast mean is slightly improved
by the bias correction at Stage 2 and is considerably improved by the
updating at Stage 3. The reliability of the forecasts at Stage 3 becomes a
problem, because the shape of the residual distribution dramatically changes.
The revision of the residual distribution at Stage 4 is effective for
representing non-Gaussian residuals and leading to highly reliable forecasts.
The Gaussian mixture distribution is showed to be more suitable than the
Student's

All data for the US catchments are taken from the MOPEX data set, which are available at

This research was supported by the Water Information Research and Development Alliance (WIRADA) between the Bureau of Meteorology and CSIRO Land & Water Flagship. We would like to thank Andrew Schepen for valuable suggestions to improve quality of the manuscript. Edited by: D. Solomatine Reviewed by: two anonymous referees