# Root and community inference on latent network growth processes using noisy attachment models

Complex networks can be analysed using a variety of statistical models including classic Erdös–Rényi random graphs, latent space models, configuration graphs and others. These models generally specify some structure – arising from the existence of communities, for example – but the order in which the edges are added is of no importance. In contrast, real world networks often form through the gradual addition of vertices and edges, motivating the development of Markovian preferential attachment (PA) models for networks, which produce a sequence of networks G1,G2,…,Gn. The first network G1 starts as a single node – called the root node – and each iteration adds a new node and new edges, thereby producing networks with sparse edges, heavy-tailed degree distributions, and strands of chains as well as pendants (several degree 1 vertices linked to a single vertex), all important features of real world networks.

Several recent papers have established that, when Gn is a random PA tree, one can infer the early history of Gn, such as the root node, even as the size of the tree tends to infinity. Unfortunately, these algorithms apply only to tree-shaped networks, which limits their application in practice. In recent work, in an effort to overcome these problems, LML External Fellow Harry Crane and colleagues propose a Markovian model for networks which they call PAPER, for Preferential Attachment Plus Erdös–Rényi. They define a network Gm to have the PAPER distribution if it is generated by adding independent random edges to a preferential attachment tree T: the latent PA tree captures the growth process of the network whereas the ER random edges represent additional noise. The authors then study how to infer the early history of the latent tree T from a single snapshot of the final graph G, focusing on the concrete problem of constructing confidence sets for the root node.

As they note, most existing methods are not directly applicable, lacking knowledge of which edges of G correspond to the tree and which are noise. Consequently, the researchers propose an approach in which they first give the nodes new random labels which induce, for a given observation of the network G, a posterior distribution of both the latent tree and the latent arrival ordering of the nodes. Then, they sample from the posterior distribution to construct a credible set for the inferential target, e.g. the root node. In the presence of noisy Erdös–Rényi edges in the PAPER model, a particularly interesting question is the following: how does the size of the confidence set increase with the noise level? In the paper, the authors give an initial answer to this question under two specific settings of the preferential attachment mechanism: linear preferential attachment (LPA) and uniform attachment (UA). For LPA, we prove that the size of our proposed confidence set does not increase with the number of nodes n so long as the noisy edge probability is less than n^{-1/2 }and for UA, we prove that the size is bounded by n^{γ} for some γ < 1 so long as the noisy edge probability is less than log(n)/n. The analysis shows that a phenomenon recently discovered by Bubeck, Devroye and Lugosi (2017) – establishing the existence of confidence sets for the root node of O(1) size – is robust to the presence of noise.

For real world networks with community structures, it is often unrealistic to assume that a network originates from a single root node. The researchers therefore also propose variations of the PAPER model in which K growth processes occur simultaneously from K root nodes. In the multiple roots model, there is no longer a latent tree but rather a latent forest (union of disjoint trees), where the components of the forest can naturally be interpreted as the different communities of the network. The authors provide a model formulation that allows K to be either be fixed or random. Overall, the research demonstrates that this approach has competitive performance on two benchmark datasets and that the community membership estimate is more accurate for nodes with high posterior root probability than for the more peripheral nodes.

The paper is available at https://arxiv.org/pdf/2107.00153.pdf

## Leave a Reply

Want to join the discussion?Feel free to contribute!