Modern computing technology has brought about the era of Big Data, and a host of new challenges concerning how to analyse high-dimensional data for problems ranging from studies of climate to genetics, biomedical imaging to economics. In a typical situation, researchers work with an NxN covariance matrix reflecting the empirical correlations among N variables. In practice, most elements of this matrix are small or zero – the matrix is sparse, as only a few of the many variables show significant correlations. A key challenge is to extract meaningful information from such matrices.
One powerful statistical approach compares empirical covariance matrices to a random benchmark developed from random matrix theory. The simplest null model assumes that the entries are independent Gaussian random variables, and gives expectations on quantities such as the eigenvalue statistics. However, the classic results do not carry over to sparse covariance matrices, for which analytical results for the joint distribution of eigenvalues remain lacking. In a new paper, LML Fellows Isaac Pérez Castillo and Fernando Metz develop an analytical approach to extend random matrix theory to sparse covariance random matrices. Using the replica method of disordered systems, they derive an analytical expression for the large-N behaviour of the cumulant generating function for the number of eigenvalues smaller than a certain threshold. This function provides a full picture of the eigenvalue fluctuations for this class of random matrices
Preprint at https://arxiv.org/pdf/1801.03726.pdf