The ANR Corpus Geomedia project was launched in February 2013. It gathers researchers and engineers from various fields: geography, media and communication, and informatics.
It aims at creating, for a hundred newspapers in the world (French-, English- and Spanish-speaking), a tool for capturing RSS feeds dedicated to international news. This tool will then be used to research on various questions: What is an event? How can we explain under or over-representations of certain spaces or actors? Can we model the circulation of information at a global scale?
Jointly led by teams of computer scientists, specialists of geography and media information modelling, this project should offer an innovative database for future research on globalization, far beyond the fields of geography and the media. Storing a volatile information (RSS feeds enriched with spatial parameters) and royalty-free (unlike newspaper articles), it will represent a useful archive both to present time historians and future researchers.
Geomedia application was launched at the beginning of 2014. To date (June 2015), 300 RSS feeds are connected to the collecting application, and over 6 million RSS items (articles) have been collected in 8 languages (French, English, Spanish, Portuguese, German, Italian, Polish and Catalan).
Repartition of collected RSS feeds, by language
The ultimate objective of ANR Corpus Geomedia is to provide free access to the whole database by the end of the project (June 2016). However, a sample is already available so as to initiate exchanges with the scientific world about treatment, enrichment and visual projection methods of these RSS feeds.
The proposed sample corresponds to all RSS items (articles) collected for 8 international RSS feeds between October 1st and December 31st, 2014.
Three languages are represented in this sample: English (5 feeds), Spanish (2 feeds) and French (1 feed). All 8 feeds come from 8 different countries (Australia, Chile, China, France, India, Mexico, United Kingdom, USA).
- Download this documented sample (zip file)