Summary

Wikipedia’s view of world history is explored and visualized through spatial, temporal, and emotional data mining using a Big Data approach to historical research.  Unlike previous studies which have looked only at Wikipedia’s metadata, this study focuses on the complete fulltext of all four million English-language entries to identify every mention of a location and date across every entry, automatically disambiguating and converting each location to an approximate geographic coordinate for mapping and every date to a numeric year.  More than 80 million locations and 42 million dates between 1000AD and 2012 are extracted, averaging 19 locations and 11 dates per article and Wikipedia is seen to have four periods of growth over the past millennia: 1001-1500 (Middle Ages), 1501-1729 (Early Modern Period), 1730-2003 (Age of Enlightenment), 2004-2011 (Wikipedia Era).  Since 2007 Wikipedia has hit a limit of around 1.7-1.9 million new mentions of each year, with the majority of its growth coming in the form of enhanced historical coverage, rather than increasing documentation of the present. Two animation sequences visualize Wikipedia’s view of the world over the past two centuries, while an interactive Google Earth display allows the browsing of Wikipedia’s knowledgebase in time and space. The one-way nature of connections in Wikipedia, the lack of links, and uneven distribution of Infoboxes, all point to the limitations of metadata-based data mining of collections such as Wikipedia and the ability of fulltext analysis and spatial and temporal analysis in particular, to overcome these limitations.  Along the way, the underlying challenges and opportunities facing Big Data analysis in the Humanities, Arts, and Social Sciences (HASS) disciplines are explored, including computational approaches, the data acquisition workflow, data storage, metadata construction and translating text into knowledge.

Part 1: Background

This part of the article describes the project background, purpose and some of the challenges of data collection.

Part 2: Data processing and Analytical methodologies

The methods by which the Wikipedia data was stored, processed, and analysed are presented in this part of the article.

Part 3: Data analytics and Visualization

This part of the article describes the analytical methodologies and visualization of knowledge extracted from the Wikipedia data.

 

VN:F [1.9.22_1171]
Rating: 0.0/10 (0 votes cast)