This project aims to explore the importance and role of soft data in the public decision-making process concerning the planning and management of cities. Soft data, called "soft" as opposed to the traditional "hard" data produced by statistical institutions, are new types of data, mainly from Web 2.0 (Facebook, Twitter, RSS feeds...), which are offered to the public decision-maker as an original and rich source of information on the social phenomena taking place in cities. What makes these data particularly interesting is the fact that they can integrate geographical information into media data (see for example the Facebook check-in). Faced with the abundance of this data, this project aims to take stock of the existing data and to develop a shared reflection on the methodological ("numerical methods") and theoretical (the relationship between digital and physical) issues related to the use of this data in city policies.
In recent years, new technologies have radically changed several sectors of society such as the economy, health and transport. One of the most revolutionary changes certainly concerns the dissemination of digital technologies, especially the Internet (Castells, 2000). What makes this change particularly interesting is the fact that it affects both society itself and the way in which it is studied and managed (Benkler, 2006).
On the one hand, the invention of digital media has transformed the conditions of access to communication by a growing part of the world's population. In less than twenty years, the Internet and the World Wide Web have played a crucial role in extending the spatio-temporal limits of human interactions: by reducing communication costs, overcoming the boundaries between different forms of communication (written/oral, public/private...) and accelerating the circulation of ideas and knowledge.
On the other hand (and at the same time), digital communication has shaken up the conditions of research and politics, multiplying the availability of traces of collective phenomena. The advantage of electronic media is that all the interactions that pass through them leave digital traces that can be easily recorded, massively stored, then retrieved and analysed. Thus, digital media offer huge new databases that can be used to improve the analysis of social phenomena and, consequently, the decision-making process related to them (Rogers, 2013).
Digital traces are not only produced automatically by digital technologies: today we also have large amounts of data from new data providers such as members of online social networks and users of content-sharing platforms. In the context of Web 2.0, the success of social networks is no longer in doubt and their diffusion rates have reached unprecedented levels. Hundreds of millions of users are registered, they exchange via forums and blogs, they maintain Facebook pages, tell their latest thoughts, moods or activities in a few words, they share different types of content... The development of mobile devices such as smartphones or tablets has favoured the emergence of these new practices. As a result, users of social networks leave traces of their online and offline activities which can become new sources of information (so-called "soft" data) extremely useful for territorial studies and public policies.
Soft data, called "soft" as opposed to the traditional "hard" data produced by statistical institutions, can be defined in a very general way as information freely available on the Internet, not controlled by a public administration. They consist mainly of the new types of data from Web 2.0 (Facebook, Twitter, RSS feeds, etc.) which offer the public decision-maker an original and rich source of information on the social phenomena taking place in cities.
What makes these data particularly interesting is their geo-media nature, i.e. the fact that they integrate geographical information into the media data (see for example the Facebook check-in).
Traditionally, public decision related to the management of the city is based on the collection, transformation, analysis and interpretation of what can be qualified as 'hard' data, namely official statistics and more generally data produced by the public administration at different levels (local, urban, regional, national, international). These data have been carefully harmonised and stored in databases, subject to various controls, supplemented by the estimation of missing values and metadata. These data represent an exceptional added value for those interested in urban and territorial cohesion policy. Nevertheless, in recent years policy-makers have revealed some important gaps or frustrations with these data:
- Too long a delay in publication (official data are subject to a long technical and sometimes political process of harmonisation and validation)
- Insufficient coverage of certain subjects of interest for territorial cohesion (the attractiveness of places, citizens' feelings, perception of the actions of public decision-makers) which are not easy to represent with territorial data. These topics are addressed by large surveys, but with a low spatial resolution (countries), which makes it difficult to apply them at the scale of cities.
- Top-down definition of data of interest is an inherent feature of hard data. However, there is a growing demand for participatory, open and elaborated data from citizens, businesses, local and regional authorities. The bottom-up approach for the definition of data of interest is a dimension which cannot be ignored by public decision-makers involved in the management of the city.
None of these criticisms were very important ten years ago. As long as "hard" data were the main source of information for decision-makers and citizens, people were likely to admit a certain delay in the process of monitoring territories. However, the territorial cohesion agenda is being strongly modified by the development of the crisis (economic, demographic, social, environmental) combined with the exponential growth of the information available on the Internet. A large amount of information concerning the territorial development of cities is now available on the web, introducing a clear competition for classical data producers.
The challenge of this project is not to criticise this new source of data, but rather to examine its potential interest for the public policies of the city. Indeed, soft data provide -at first glance- interesting solutions to the above-mentioned shortcomings of hard data:
- A shorter publication period useful for public action. A classic example of this responsiveness is the recording of earthquakes through social media such as Twitter. Many researchers have demonstrated that geolocalised users of social networks can be considered as sensors, capable of locating catastrophic events in real time and monitoring their development.
- Coverage of new topics of interest such as transportation modes in urban areas, poverty and social exclusion, citizens' feelings towards city policies. This is clearly an effect of the traceability inherent in digital media.
- Bottom-up elaboration of tailor-made information: "soft" data can sometimes be the result of bottom-up elaboration, as shown by the example of Open Street Map, which offers an alternative to the official maps produced by geographical institutes. These participatory data can also be used for purposes not intended by their creator to create tailor-made information useful to the public decision-maker.
Given the abundance of these new types of data, this project aims to explore the importance and role of these soft data in the public decision-making process concerning the management of the city. Although several empirical studies have been carried out, a theoretical reflection on the use of these data in public policies is still lacking. Several questions need to be addressed. Considering the limits of time and budget of this project, we propose to start this reflection by addressing mainly two questions.
The methodological issue
There has been a lot of enthusiasm for big data, but working with them is anything but simple. In addition to the technical problems that could be raised by the mass of data, the researcher also has to deal with political, social and ethical issues. In particular, the issues of data representativeness (we cannot control the equivalence between the traces available on the Internet and the population we would like to study), data protection (right to confidentiality, copyright) and those related to the participatory nature of the data must be taken into account. What makes Web 2.0 data interesting is the fact that it is user-generated. However, their participatory nature must be carefully taken into account when they are included in territorial studies. These data are often generated by unknown sources, so they can be false or truly heterogeneous.
All these limitations may call into question the quality of these data, but it is important to stress that they are also opportunities. On the one hand, we need to keep these issues in mind and, if possible, seek solutions to address them. On the other hand, the heterogeneous, unexpected and sometimes unmanageable facet of these data guarantees their interest and richness. We work with these data because we expect that these characteristics can help us find new ideas in the territorial analysis that can be integrated into the results of the official analysis.
It is then important to have adequate methods to collect these data and prepare them for analysis. In recent years, a new group of methods, called "numerical methods" (Rogers, 2013), has been developed to deal with this type of data. By "numerical methods" we refer to a series of techniques aimed at exploring traces of online interactions as a source of information on social phenomena. In this project, we aim to build a shared reflection on the methodological issues of these data, through the creation of a working group of French and foreign researchers (in particular the Digital Media Initiative of the University of Amsterdam, led by Richard Rogers, the group that contributes the most today to the development of digital methods in Europe).
From a methodological point of view, we will also be able to build on the work already carried out within the framework of the ANR Géomédia project. The first results of this project strongly suggest that the analysis of RSS feeds of carefully selected daily newspapers can provide very interesting territorial information.
The theoretical challenge of soft data
The use of this information from new suppliers and concerning new themes must be strongly encouraged in the city management process, to be integrated into the official data. However, beyond the initial enthusiasm, the use of these methods, and in general of digital traces today, raises several theoretical questions. Amongst other things, one of the most problematic elements in the application of these methods is the management of the relationship between offline and online. The success of digital traces is notably due to their power to reveal characteristics of phenomena that take place in physical space. Indeed, through these traces, the researcher can study an urban phenomenon that he could not have studied otherwise than at the cost of a field survey approach that is much more costly in terms of resources and time.
Of course, the question of the relationship between online and offline is not new and the distinction itself has been debated on several occasions. Without falling into the excesses which aim to affirm this distinction in an absolute manner or to reject it a priori, we want to question the type of continuity or discontinuity generated today by the digital traces of the city. We find the case of the city particularly intriguing for its essentially physical being, but at the same time for its digital future. The aim of this project is to distance ourselves from empirical experiments related to digital methods in order to reflect on the significance of digital traces in the context of urban studies. When we study an urban phenomenon through the traces that the actors linked to this phenomenon have left on a blog or a social network, are we studying the offline phenomenon that takes place in physical space? Or are we studying the online projection of the phenomenon that takes place offline?
Or should this distinction be completely abandoned in the context of the city? Can digital traces connected to a territorial object such as the city have an 'existence' only online or not?
These issues will be addressed in a study day organised in two parts (workshop and general public conference), to which researchers from other disciplines, including digital philosophers and digital humanities experts, will be invited.
Douay N., Severo M. & Giraud T. (2012), « La carte du sang de l’immobilier chinois, un cas de cyberactivisme », L’information géographique, vol. 76, n° 1, pp. 74-88.
Gautreau P., Severo M., Giraud T. & Noucher M., Formes et fonctions de la « donnée » dans trois webs environnementaux sud-américains (Argentine, Bolivie, Brésil), NETCOM, à paraître.
Giraud T. & Severo M. (2011), « La blogosphère tunisienne », L’espace géographique, n° 2, p. 190.
Giraud T., Grasland C., Lamarche-Perrin R., Demazeau Y. & Vincent J.-M. (2013), « Identification of international media events by spatial and temporal aggregation of newspapers rss flows. Application to the case of the Syrian Civil War between May 2011 and December 2012 », Procceedings ECTQG 2013, Paris.
Grasland C., Giraud T. & Severo M. (2012), « Un capteur géomédiatique d’événements internationaux », Fonder les sciences du territoire (dir. Beckouche P. et al), Karthala, Paris.
Rogers R. (2013), Digital methods, MIT Press.
Severo M. (2012), « Media representations of the Solar Mediterranean Plan: a techno-political controversy », PCST Conference 2012, 18-20 avril, Florence.
Severo M., Giraud T. & Douay N. (2012), « The Wukan’s protests: from local activism to global media event », Just-in-time workshop, Social informatics conference, Lausanne.
Severo M. & Zuolo E. (2012), « Egyptian e-diaspora: migrant websites without a network? », Social Science information, n° 51, pp. 521-533.
Vienne F., Douay N., Le Goix R. & Severo, M. (2014), « Lieux et hauts lieux des densités intermédiaires : une analyse par les réseaux sociaux numériques », conférence Aux frontières de l’urbain, Avignon, janvier 2014.
This project is supported by the CIST Media and territories axis through its manager, Marta Severo. This research axis brings together many researchers from different disciplines (communication, geography, planning, political science, computer science) and belonging to several teams interested in the territorial representations generated by media data. Among these, the following will participate in the PEPS project:
– Groupe d’étude et de recherche interdisciplinaire en information et communication (GERiiCO) : Marta Severo (MCF, science de la communication, Université de Lille 3) et Camille Masse (gestionnaire du projet, Université de Lille 3)
– Savoirs, textes, langage (STL) : Christian Berner (PR, philosophie, Université de Lille 3)
– Géographie-cités : Claude Grasland (professeur, géographie et analyse spatiale, Université Paris Diderot), Nicolas Douay (MCF, aménagement, Université Paris Diderot) et Renaud Le Goix (MCF, géographie physique, humaine, économique et régionale, Université Paris 1)
– Réseau interdisciplinaire pour l’aménagement du territoire européen (RIATE) : Timothée Giraud (IE, géomatique)
– Collège international des sciences du territoires (CIST) : Hugues Pecout (IE, cartographie) et François Vienne (IE, aménagement)
– Pôle de recherche pour l’organisation et la diffusion de l’information géographique (PRODIG) : Pierre Gautreau (MCF, géographie, Université Paris 1)
– Centre population et développement (CEPED) : Marina Lafay (chercheur associé, sociologie, coordinateur Projet Emergence Minweb)
– Centre de recherche Textes et francophonies (CRTF) : Romain Badouard (MCF, sciences de la communication, Université de Cergy-Pontoise)
– Aménagement, développement, environnement, santé et sociétés (ADESS) : Marina Duféal (MCF, géographie, Université de Bordeaux 3)
– Digital Methods Initiative : Richard Rogers (PR, media studies, Université de Amsterdam)
This PEPS funding is thought in connection with ESPON-ORATE funding (European Agency for Spatial Planning) that our team has just received to work on the same subject (call for tender “Tools (2011-2014). Feasibility Study on Analytical Tool based on Big Data”). The ESPON funding aims at the empirical exploration of soft data for planning. In particular, the project aims to develop two cases of soft data use for the study of a group of European cities.
We believe that it is fundamental for the successful implementation of the project to accompany the empirical study with theoretical reflection and that the CIST's Media and territories research axis provides the ideal context for this research. Therefore, we present this application for PEPS funding in order to be able to develop the three following actions:
1. Development of the state of the art and theoretical thinking (March-July 2014)
Organisation of several meetings with the project participants in order to develop a shared reflection on the use of soft data for the city's public policies. Participation of 3 or 4 people in the Summer School of the Digital Methods Initiative "On Geolocation: Remote Event Analysis".
2. Study day on soft data for city management, open to public decision-makers, organised in two parts (October 2014)
Methodological workshop on data and tools with external guests on an international level and a closing lecture for the general public on the theoretical issues of digital traces as a representation of the city.
3. Scientific publication (October-December 2014)
Production of a publication with the theoretical and empirical results of the project. The publication could be produced as a book in the collection "Le débat du numérique" at the Presses des Mines or in the "Collection du CIST" at Karthala (preliminary agreements have been made with both partners).