News & events


25 September 2014

Cultural Computing at Literature Scale

D-Lib Magazine


As the mass-scale computational study of culture has expanded from digitized books (Michel et al., 2011) to news media (Leetaru, 2011), to social media (Leetaru et al., 2013), to television (Macdonald & Leetaru, 2014), a data source which has remained largely absent is the vast archive of academic literature that forms one of the primary outputs of the humanities and social science disciplines.

From Political Science to Anthropology, Sociology to History, Literature to Linguistics, these academic journals chronicle our best scholarly understanding of how societies function and the beliefs, ideals, and ethnic, religious, and tribal contexts that unify or divide civilizations. Yet the size, breadth, and depth of academic scholarship in the humanities and social sciences, the fact that it spans tens of thousands of journals across so many disciplines and specialties, often delving into uniquely characterized and detailed cases, and the increasing use of the web as a publication venue, makes this material far more difficult to consider at scale than other forms of media.

A recent study by the World Bank (Doemeland & Trevino, 2014) reports that more than a third of the reports it produces have never been accessed even a single time. More than three-quarters of its reports have been downloaded less than 100 times and just 13% have ever been cited. There is so much information available today that the majority of it is never seen. No scholar could possibly read every journal article and book published in the past half-century on the Hutus and the Tutsis, let alone map the exhaustive known geography of their mutual interactions or the thematic and societal contexts in which each of those interactions have occurred.

Even a large team of scholars would find it difficult to read half a million journal articles on Africa and construct a network diagram visualizing the portrayal of the region’s ethnic groups over time. Moreover, the number of distinct disciplines and specialties across which that knowledge is distributed makes it difficult to holistically access a unified understanding of a particular group, topic, or geography. Policymakers, in turn, are unable to comprehensively incorporate this scholarly knowledge into national policy or rapidly identify experts across disciplines to inform key policy discussions.

Read the article online

D-Lib Magazine

Related Articles