Advancing computational power has encouraged the analysis of large, high-dimensional data sets with machine learning and data mining techniques, as well as the use of algorithms to compute dynamical indicators such as Lyapunov exponents or generalized dimensions. Yet these efforts still suffer from the so-called curse of dimensionality, as algorithms designed for low-dimensional systems fail in the high dimensional context. In a new paper, LML External Felllow Davide Faranda and colleagues examine this issue, focussing on methods for estimating the Hausdorff dimension d of a dynamical attractor, and derive some results on how the expected accuracy of estimates depends on d.
A recently developed technique based on extreme value theory allows the estimation of d without introducing any embedding dimension. This approach does not measure d directly but samples recurrences around different points in phase space to estimate a local dimension, which reflects how a set of hyperspheres densely covers the attractor. An average over these local dimensions yields an estimate for d. The authors first examine how this technique behaves when applied to truly random data sets, and show that it leads generically to an underestimation of the attractor dimension in high dimensional phase spaces. They then show how useful estimates can be made for non-random data by comparing analysis outcomes to the random benchmark.
The study highlights an inherent problem in estimating the attractor dimension, especially when d is large. For highly coupled systems with a small number of active degrees of freedom, the method gives an accurate estimate of d independently of the phase space dimension. But the use of the extreme value theory is only justified whenever d is of the order 10 or less, a result that is consistent with earlier studies. They also show that this estimator can provide further important information about complex dynamical systems with large d. For any dynamical system, it is possible to produce a random system with the same marginal probability distribution, and compute the corresponding value of the estimator. Then, the difference between the result for the random vector and for the dynamical system reflects the level of non-randomness of the system itself, and provides at least order-of-magnitude information on the dimension d.
The paper is available at https://hal.inria.fr/hal-01650250/