Studying language evolution in the age of big data

The availability of large digital corpora of cross-linguistic data is revolutionizing many branches of linguistics, triggering a shift of study from detailed questions about individual features to more global patterns amenable to statistical analyses. Like biological evolution, the process of linguistic change is not only vertical, as words and grammatical structures descend from one generation to another, but also involves horizontal flow, as elements of language drift from one language to another. As LML Fellow Hyejin Youn and colleagues describe in a new review article, the aim of statistical linguistic analysis is not to devise a complete model of language evolution that takes all horizontal effects into account, but to allow a reliable disentangling of vertical evolution from horizontal influences. Efforts to do so can draw on modern practice in biology, where disentangling vertical and horizontal processes is also a key challenge.
The authors argue, in particular, that linguistics has much to gain from systematically adopting and adjusting the workflows of data-driven bioscience. However, they also emphasize that the formal computational approach cannot replace human thought, and should be considered only as a helpful tool. As they note, the unaided human mind clearly does not offer the widest possible window through which to experience the world. New devices for measuring, organizing and computing enlarge the conceptual reach of human thinking. The promise of computational methods in historical linguistics is to allow minds to see the language world in new ways, especially by detecting patterns that might otherwise be invisible to traditional approaches.
The paper is available at

Leave a Reply

Your email address will not be published. Required fields are marked *