Learning a weather dictionary of atmospheric patterns using Latent Dirichlet Allocation

The mid-latitude atmospheric circulation is challenging to describe due to the turbulent and chaotic nature of the underlying flow, driven by the unstable dynamics of the jet stream. In general, the phase space of such turbulent geophysical flows appears to be large. Even so, previous work using dimensionality reduction methods has shown that mid-latitude dynamics can often be captured using only a limited number of degrees of freedom, and superpositions of cyclonic and anti-cyclonic structures in fields such as sea-level pressure or geopotential height. The characterization of these patterns was initially based on atmospheric indices built upon physical arguments. Later, similar detection of patterns in arbitrary situations was made possible with unsupervised pattern recognition based on a variety of classification or dimensionality reduction methods.

In a new paper, LML Fellow Davide Faranda and colleagues offer a change of perspective on atmospheric pattern recognition by applying to geophysical flows a machine learning technique initially developed in linguistics.

Specifically, they employ a soft clustering technique called Latent Dirichlet Allocation (LDA), which is typically applied to collections of discrete data, especially to describe corpora of text documents. In such work, each document is assumed to be a mixture of a small number of topics which are characterized by a distribution over words of a finite vocabulary. In their new paper, Faranda and colleagues apply LDA instead to a set of daily sea-level pressure anomaly maps over North-Atlantic from 1948 to 2018, considering grid-points as the equivalent of words and looking for a classification of the pressure anomaly maps in terms of distinct patterns equivalent to the topics. As they demonstrate, the LDA method produces any sea-level pressure map (documents) from the dataset (corpus) in terms of well-known cyclonic or anti-cyclonic structures (or “motifs,” which are here the equivalent of topics).

As the authors not, when applying LDA, it is first necessary to select the number of motifs that will be necessary to describe the whole dataset. In the current work, they find that 28 motifs is sufficient to get relevant patterns which are representative of the dynamics of mid-latitude circulation. This value agrees well with the upper bound of the number of degrees of freedom obtained by computing the local attractor dimensions for the same dataset. This suggests that each motif corresponds to an active degree of freedom in the maps, reinforcing the intuitive view that cyclones and anticyclones are the building blocks of the dynamics of mid-latitude flows. This may also indicate that the LDA method could be more suitable than others to define the modes of variability associated with the number of active degrees of freedom estimated by the local dimension.

Overall, the researchers suggest, these results suggest that the motifs are not only a practical way to represent a complex map, but also that they correspond to actual coherent sea level pressure patterns. This study should pave the way for several exciting applications of LDA in the context of present and future climates, namely in studying precursors of extreme events, comparing climate models in terms of their ability to represent specific motifs, or in the analysis of the emergence or disappearance of motifs in different climates conditions.

The paper is available as a pre-print at https://hal.archives-ouvertes.fr/hal-03258523

0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published. Required fields are marked *