Improved interpretability of AI weather models via intermediate decoding

15.12.2025

Despite the rapid progress and strong performance of AI weather models, we still know little about how they construct forecasts internally, which is the reason why they are often considered black boxes. This is because they operate in a high-dimensional intermediate latent space difficult to interpret directly. While there is extensive work on interpretability of AI models in general, especially for Large Language Models and Vision Models, much less has been done on the interpretability of AI weather models.

For Language Models, a popular approach is the "logit lens". In this method, intermediate latent states are fed into the model’s decoder which produces a so-called logit distribution indicating the most likely next token, such as a word or part of a word. By observing how this predicted next token changes as more processor layers are included before decoding, we gain insight into how the model gradually forms its prediction, essentially showing what the model “thinks” the next word should be at different depths within the network.

To understand how AI weather models build forecasts, we adapt the logit lens method for Google’s GraphCast AI weather model, sending intermediate model states directly to the decoder. The decoder is trained to produce a forecast only at the end of all processing blocks, and the latent representations change their form with increasing model depth. To address this, we train translators, simple linear transformations that align each intermediate latent state with the form expected by the decoder. A similar extension of the logit lens method using translators has also been applied to Language Models, where changes in latent basis across depth have likewise been observed.

The figure shows the results of intermediate decoding for different numbers of processor layers included for a 6-hour GraphCast prediction. Instead of full processing through all 16 GraphCast layers, intermediate states can be shown: In this example, the gradual development of a 500 hPa geopotential field in the middle of the atmosphere is shown for April 1, 2020, 18 UTC - first, top, withouth any processing layers (only encoding and decoding), then after processing through half of the layers, and after complete processing.

The input state has been subtracted so that only the predicted change relative to the input is visible. Red areas show 6-hour positive changes of geopotential in m² s^-2 (related to increases in pressure), blue show negative changes. Remarkably, even when by-passing all processor layers and using only the encoder and decoder, the model produces a physically plausible pattern, including sensible changes along the Rossby wave structure in the Southern Hemisphere. As more processor layers are included, the forecasts become progressively more detailed, and the forecast quality inceases, showing that each layer contributes incrementally to constructing the final prediction.

3_4_GC_residuals_translated_geopot_500_globe_small

Search

Links and Functions

Language Selection

Breadcrumb Navigation

Main Navigation

Content

Improved interpretability of AI weather models via intermediate decoding

Footer