Inference on the History of a Randomly Growing Tree

The skeletal outlines of many natural processes resemble growing tree-like networks. Examples include the movement of a disease through a human community, news or rumours through social media or a computer virus through a web of connected computers. The elements in these networks correspond to nodes in the tree, and links or edges between nodes reflect some kind of connection between individuals – defined by the transmission of a disease, or passing of a message between two people or computers, for example. In all cases, these trees grow over time from a first node, called the root, through a complete set of N nodes arriving at discrete later times.
In a new paper, LML Fellow Harry Crane and Min Xu of Rutgers University consider the problem of inferring the nature of such growth trees based in partial information.
For example, a common problem is estimating the likely original or “root” element of the network given only partial knowledge of the links between the network elements, and no information on the time they entered the network. This issue arises frequently in epidemiology, as in the  2020 outbreak of Covid-19, where long incubation periods and asymptomatic spreading led to incompleteness in the observed infection history (i.e., some or many edges in the infection tree were unobserved) as well as uncertainty about the direction of spread for those edges which were observed.
As Crane and Xu note, the problem of root estimation has been studied only recently, with some results showing that one can construct a confidence set of the root whose size does not increase with the size of the network N. These and related theoretical results, however, do not offer a practical method for inferring the root of a tree based on an observed contact pattern. In their paper – illustrated in the context of an epidemiological setting — the authors demonstrate a method for efficiently computing the conditional probability of the disease transmission history given the observed contact process. These conditional probabilities enable them to construct valid confidence sets for the inference questions of interest.
The researchers demonstrate their method on a flu transmission network, the data originating from an A(H1N1)v flu outbreak in a London school in April 2009. The patient-zero was a student who returned from travel abroad. After the outbreak, researchers used contact tracing to reconstruct a network of inter-personal contacts between 33 pupils in the same class as patient-zero. Using knowledge of the true patient-zero, times of symptom onset among all the infected students, and epidemiological models, earlier research has reconstructed a plausible infection tree. As a challenge, Crane and Xu tried to determine the correct patient-zero using only the connectivity structure of the tree. In constructing both 85% and 95% confidence sets, they found both to include the true patient-zero, this being the node with the third highest conditional root probability.
The authors suggest these inference procedures can also be applied in social media networks to help identify key spreaders and promoters of misinformation or fake news.
The paper is available at

Leave a Reply

Your email address will not be published. Required fields are marked *