The increasing need for resources, the expansion of settlements, and Climate Change have put humanity in a difficult relationship with the environment.
The Earth Science and Climate Change research group at the Institute of Machine Learning focuses on building deep learning models to better describe environmental systems. Our current research revolves mainly around water, the most vital natural resource. Within this field, we introduced a new approach to predict the amount of water in river systems. Unlike traditional simulations with coarse, hand-crafted abstractions, we use Machine Learning to generate fully data-driven predictions. Our approach has proven to outperform traditional hydrological models --- the outcomes of decades of expertise and model development --- by a large margin. Ultimately, these new models will allow us to provide early flood warnings, to predict future droughts, to operate hydropower plants more efficiently, to improve access to drinking water, and more.
Modelling these processes is an innately spatio-temporal problem. Our models thus make use of a wide range of different inputs, ranging from weather data to land-surface characteristics to satellite observations. In the future, we are also looking forward to extending the application and scope of our research to other areas of geosciences.
recent publications in AI 4 Earth:
Uncertainty Estimation with Deep Learning for Rainfall-Runoff Modelling
Klotz, D.,
Kratzert, F.,
Gauch, M.,
Keefe Sampson, A.,
Brandstetter, J.,
Klambauer, G.,
Hochreiter, S.,
and Nearing, G.
Hydrology and Earth System Sciences Discussions
2021
Deep Learning is becoming an increasingly important way to produce accurate hydrological predictions across a wide range of spatial and temporal scales. Uncertainty estimations are critical for actionable hydrological forecasting, and while standardized community benchmarks are becoming an increasingly important part of hydrological model development and research, similar tools for benchmarking uncertainty estimation are lacking. This contributions demonstrates that accurate uncertainty predictions can be obtained with Deep Learning. We establish an uncertainty estimation benchmarking procedure and present four Deep Learning baselines. Three baselines are based on Mixture Density Networks and one is based on Monte Carlo dropout. The results indicate that these approaches constitute strong baselines, especially the former ones. Additionaly, we provide a post-hoc model analysis to put forward some qualitative understanding of the resulting models. This analysis extends the notion of performance and show that learn nuanced behaviors in different situations.
A note on leveraging synergy in multiple meteorological datasets with deep learning for rainfall-runoff modeling
Kratzert, F.,
Klotz, D.,
Hochreiter, S.,
and Nearing, G.
Hydrology and Earth System Sciences
2021
A deep learning rainfall-runoff model can take multiple meteorological forcing products as inputs and learn to combine them in spatially and temporally dynamic ways. This is demonstrated using Long Short Term Memory networks (LSTMs) trained over basins in the continental US using the CAMELS data set. Using multiple precipitation products (NLDAS, Maurer, DayMet) in a single LSTM significantly improved simulation accuracy relative to using only individual precipitation products. A sensitivity analysis showed that the LSTM learned to utilize different precipitation products in different ways in different basins and for simulating different parts of the hydrograph in individual basins.
Rainfall–runoff prediction at multiple timescales with a single Long Short-Term Memory network
Gauch, M.,
Kratzert, F.,
Klotz, D.,
Nearing, G.,
Lin, J.,
and Hochreiter, S.
Hydrology and Earth System Sciences
2021
Long Short-Term Memory (LSTM) networks have been applied to daily discharge prediction with remarkable success. Many practical applications, however, require predictions at more granular timescales. For instance, accurate prediction of short but extreme flood peaks can make a lifesaving difference, yet such peaks may escape the coarse temporal resolution of daily predictions. Naively training an LSTM on hourly data, however, entails very long input sequences that make learning difficult and computationally expensive. In this study, we propose two multi-timescale LSTM (MTS-LSTM) architectures that jointly predict multiple timescales within one model, as they process long-past inputs at a different temporal resolution than more recent inputs. In a benchmark on 516 basins across the continental United States, these models achieved significantly higher Nash–Sutcliffe efficiency (NSE) values than the US National Water Model. Compared to naive prediction with distinct LSTMs per timescale, the multi-timescale architectures are computationally more efficient with no loss in accuracy. Beyond prediction quality, the multi-timescale LSTM can process different input variables at different timescales, which is especially relevant to operational applications where the lead time of meteorological forcings depends on their temporal resolution.
What Role Does Hydrological Science Play in the Age of Machine Learning?
Nearing, G.,
Kratzert, F.,
Sampson, A.,
Pelissier, C.,
Klotz, D.,
Frame, J.,
Prieto, C.,
and Gupta, H.
Water Resources Research
2020
This paper is derived from a keynote talk given at the Google’s 2020 Flood Forecasting Meets Machine Learning Workshop. Recent experiments applying deep learning to rainfall-runoff simulation indicate that there is significantly more information in large-scale hydrological data sets than hydrologists have been able to translate into theory or models. While there is a growing interest in machine learning in the hydrological sciences community, in many ways, our community still holds deeply subjective and nonevidence-based preferences for models based on a certain type of “process understanding” that has historically not translated into accurate theory, models, or predictions. This commentary is a call to action for the hydrology community to focus on developing a quantitative understanding of where and when hydrological process understanding is valuable in a modeling discipline increasingly dominated by machine learning. We offer some potential perspectives and preliminary examples about how this might be accomplished.
Toward Improved Predictions in Ungauged Basins: Exploiting the Power of Machine Learning
Kratzert, F.,
Klotz, D.,
Herrnegger, M.,
Sampson, A.,
Hochreiter, S.,
and Nearing, G.
2019
Long short-term memory (LSTM) networks offer unprecedented accuracy for prediction in ungauged basins. We trained and tested several LSTMs on 531 basins from the CAMELS data set using k-fold validation, so that predictions were made in basins that supplied no training data. The training and test data set included ∼30 years of daily rainfall-runoff data from catchments in the United States ranging in size from 4 to 2,000 km2 with aridity index from 0.22 to 5.20, and including 12 of the 13 IGPB vegetated land cover classifications. This effectively “ungauged” model was benchmarked over a 15-year validation period against the Sacramento Soil Moisture Accounting (SAC-SMA) model and also against the NOAA National Water Model reanalysis. SAC-SMA was calibrated separately for each basin using 15 years of daily data. The out-of-sample LSTM had higher median Nash-Sutcliffe Efficiencies across the 531 basins (0.69) than either the calibrated SAC-SMA (0.64) or the National Water Model (0.58). This indicates that there is (typically) sufficient information in available catchment attributes data about similarities and differences between catchment-level rainfall-runoff behaviors to provide out-of-sample simulations that are generally more accurate than current models under ideal (i.e., calibrated) conditions. We found evidence that adding physical constraints to the LSTM models might improve simulations, which we suggest motivates future research related to physics-guided machine learning.
Towards Learning Universal, Regional, and Local Hydrological Behaviors via Machine Learning Applied to Large-Sample Datasets
Kratzert, F.,
Klotz, D.,
Shalev, G.,
Klambauer, G.,
Hochreiter, S.,
and Nearing, G.
2019
Abstract. Regional rainfall–runoff modeling is an old but still mostly outstanding problem in the hydrological sciences. The problem currently is that traditional hydrological models degrade significantly in performance when calibrated for multiple basins together instead of for a single basin alone. In this paper, we propose a novel, data-driven approach using Long Short-Term Memory networks (LSTMs) and demonstrate that under a “big data” paradigm, this is not necessarily the case. By training a single LSTM model on 531 basins from the CAMELS dataset using meteorological time series data and static catchment attributes, we were able to significantly improve performance compared to a set of several different hydrological benchmark models. Our proposed approach not only significantly outperforms hydrological models that were calibrated regionally, but also achieves better performance than hydrological models that were calibrated for each basin individually. Furthermore, we propose an adaption to the standard LSTM architecture, which we call an Entity-Aware-LSTM (EA-LSTM), that allows for learning catchment similarities as a feature layer in a deep learning model. We show that these learned catchment similarities correspond well to what we would expect from prior hydrological understanding.
NeuralHydrology – Interpreting LSTMs in Hydrology
Kratzert, F.,
Herrnegger, M.,
Klotz, D.,
Hochreiter, S.,
and Klambauer, G.
2019
Despite the huge success of Long Short-Term Memory networks, their applications in environmental sciences are scarce. We argue that one reason is the difficulty to interpret the internals of trained networks. In this study, we look at the application of LSTMs for rainfall-runoff forecasting, one of the central tasks in the field of hydrology, in which the river discharge has to be predicted from meteorological observations. LSTMs are particularly well-suited for this problem since memory cells can represent dynamic reservoirs and storages, which are essential components in state-space modelling approaches of the hydrological system. On basis of two different catchments, one with snow influence and one without, we demonstrate how the trained model can be analyzed and interpreted. In the process, we show that the network internally learns to represent patterns that are consistent with our qualitative understanding of the hydrological system.