Pablo Barberá

Moore-Sloan Data Science Fellow
New York University



Google Scholar

Download my CV

I am a Moore-Sloan Fellow at the NYU Center for Data Science. In July 2016, I will be joining the faculty of the School of International Relations at the University of Southern California as an Assistant Professor. I received my PhD in Political Science from New York University in 2015, where I was also a graduate research associate in the Social Media and Political Participation lab. My primary areas of research include social media and politics, quantitative political methodology, and electoral behavior and political representation. On this website you will find information about myself and my research.

Recent Publications

Big data, social media, and protest: foundations for a research agenda

Chapter in "Computational Social Science", edited by Michael Alvarez, Cambridge University Press, 2016.
Co-authored with Joshua Tucker, Jonathan Nagler, Megan Metzger, Pablo Barbera, Duncan Penfold-Brown, and Richard Bonneau.


The Critical Periphery in the Growth of Social Protests

PLOS ONE, 2015, 10 (11).
Co-authored with Ning Wang, Richard Bonneau, John T. Jost, Jonathan Nagler, Joshua Tucker and Sandra González-Bailón

Link | Online appendix | Replication data | Expand abstract »

Social media have provided instrumental means of communication in many recent political protests. The efficiency of online networks in disseminating timely information has been praised by many commentators; at the same time, users are often derided as “slacktivists” because of the shallow commitment involved in clicking a forwarding button. Here we consider the role of these peripheral online participants, the immense majority of users who surround the small epicenter of protests, representing layers of diminishing online activity around the committed minority. We analyze three datasets tracking protest communication in different languages and political contexts through the social media platform Twitter and employ a network decomposition technique to examine their hierarchical structure. We provide consistent evidence that peripheral participants are critical in increasing the reach of protest messages and generating online content at levels that are comparable to core participants. Although committed minorities may constitute the heart of protest movements, our results suggest that their success in maximizing the number of online citizens exposed to protest messages depends, at least in part, on activating the critical periphery. Peripheral users are less active on a per capita basis, but their power lies in their numbers: their aggregate contribution to the spread of protest messages is comparable in magnitude to that of core participants. An analysis of two other datasets unrelated to mass protests strengthens our interpretation that core-periphery dynamics are characteristically important in the context of collective action events. Theoretical models of diffusion in social networks would benefit from increased attention to the role of peripheral nodes in the propagation of information and behavior.

Tweeting from Left to Right: Is Online Political Communication More Than an Echo Chamber?

Psychological Science, 2015, 26 (10), 1531-1542.
Co-authored with John T. Jost, Jonathan Nagler, Joshua Tucker, and Richard Bonneau.

Link | Online appendix | Replication materials and data | Expand abstract »

We estimated ideological preferences of 3.8 million Twitter users and, using a dataset of 150 million tweets concerning 12 political and non-political issues, explored whether online communication resembles an "echo chamber" due to selective exposure and ideological segregation or a "national conversation." We observed that information was exchanged primarily among individuals with similar ideological preferences for political issues (e.g., presidential election, government shutdown) but not for many other current events (e.g., Boston marathon bombing, Super Bowl). Discussion of the Newtown shootings in 2012 reflected a dynamic process, beginning as a "national conversation" before being transformed into a polarized exchange. With respect to political and non-political issues, liberals were more likely than conservatives to engage in cross-ideological dissemination, highlighting an important asymmetry with respect to the structure of communication that is consistent with psychological theory and research. We conclude that previous work may have overestimated the degree of ideological segregation in social media usage.

Birds of the Same Feather Tweet Together. Bayesian Ideal Point Estimation Using Twitter Data.

Political Analysis, 2015, 23 (1), 76-91

Link | Pre-print | Online appendix | Replication materials | GitHub tutorial | Expand abstract »

Politicians and citizens increasingly engage in political conversations on social media outlets such as Twitter. In this paper I show that the structure of the social networks in which they are embedded can be a source of information about their ideological positions. Under the assumption that social networks are homophilic, I develop a Bayesian Spatial Following model that considers ideology as a latent variable, whose value can be inferred by examining which politics actors each user is following. This method allows us to estimate ideology for more actors than any existing alternative, at any point in time and across many polities. I apply this method to estimate ideal points for a large sample of both elite and mass public Twitter users in the US and five European countries. Thee estimated positions of legislators and political parties replicate conventional measures of ideology. The method is also able to successfully classify individuals who state their political preferences publicly and a sample of users matched with their party registration records. To illustrate the potential contribution of these estimates, I examine the extent to which online behavior during the 2012 US presidential election campaign is clustered along ideological lines.

Political Expression and Action on Social Media: Exploring the Relationship Between Lower- and Higher-Threshold Political Activities Among Twitter Users in Italy

Journal of Computer-Mediated Communication, 2015, 20 (2), 221–239.
Co-authored with Cristian Vaccari, Augusto Valeriani, Richard Bonneau, John T. Jost, Jonathan Nagler, and Joshua Tucker.

Link | Expand abstract »

Scholars and commentators have debated whether lower-threshold forms of political engagement on social media should be treated as being conducive to higher-threshold modes of political participation or a diversion from them. Drawing on an original survey of a representative sample of Italians who discussed the 2013 election on Twitter, we demonstrate that the more respondents acquire political information via social media and express themselves politically on these platforms, the more they are likely to contact politicians via e-mail, campaign for parties and candidates using social media, and attend offline events to which they were invited online. These results suggest that lower-threshold forms of political engagement on social media do not distract from higher-threshold activities, but are strongly associated with them.

Understanding the political representativeness of Twitter users.

Social Science Computer Review, 2015, 33 (6), 712-729.
Co-authored with Gonzalo Rivero.

Link | Pre-print | Expand abstract »

In this article we analyze the structure and content of the political conversations that took place through the micro-blogging platform Twitter in the context of the 2011 Spanish legislative elections and the 2012 US presidential elections. Using a unique database of nearly 70 million tweets collected during both election campaigns, we find that Twitter replicates most of the existing inequalities in public political exchanges. Twitter users who write about politics tend to be male, to live in urban areas, and to have extreme ideological preferences. Our results have important implications for future research on the relationship between social media and politics, since they highlight the need to correct for potential biases derived from these sources of inequality.

Rooting out corruption or rooting for corruption? The Heterogenous Electoral Consequences of Scandals

Political Science Research and Methods, 2016, 4 (2), 379-397.
Co-authored with Pablo Fernández-Vázquez and Gonzalo Rivero.

Link | Pre-print | Replication materials | Expand abstract »

Corruption scandals have been found to have significant but mild electoral effects in the comparative literature (Golden, 2006). However, most studies have assumed that voters punish all kinds of illegal practices. This article challenges this assumption by distinguishing between two types of corruption, according to the type of welfare consequences they have for the constituency. This hypothesis is tested using data from the 2011 Spanish local elections. We exploit the abundance of corruption allegations associated with the Spanish housing boom, which generated income gains for a wide segment of the electorate in the short-term. We find that voters ignore corruption when there are side benefits to it, and that punishment is only administered in those cases in which they do not receive compensation.

Social Media and Political Communication: A survey of Twitter users during the 2013 Italian general election

Italian Political Science Review, 2013.
Co-authored with Cristian Vaccari, Augusto Valeriani, Richard Bonneau, John T. Jost, Jonathan Nagler, and Joshua Tucker.

Link | Expand abstract »

Social media have become increasingly relevant in election campaigns, as both politicians and citizens have integrated them into their communication repertoires. However, little is known about which types of citizens employ these tools to discuss politics and stay informed about current affairs and how they integrate the contents and connections they encounter online with their offline repertoires of political action. In order to address these questions, we devised an innovative online survey involving a random sample representative of Italians who communicated about the 2013 general election on Twitter. Our results show that Twitter political users in Italy are disproportionately male, younger, better educated, more interested in politics, and ideologically more left-wing than the population as a whole. Moreover, there is a strong correlation between online and offline political communication, and Twitter users often relay the political contents they encounter on the web in their face-to-face conversations. Although the political users of social media are not representative of the population, their greater propensity to engage in political conversations both online and offline make them important channels of personal communication and allow the contents that circulate on the web to diffuse among populations that are much broader than those that engage with social media. The electoral significance of these digital platforms thus reaches well beyond the immediate audiences that are exposed to political contents through them.

Work in progress

Less is More? How Demographic Sample Weights can Improve Public Opinion Estimates Based on Twitter Data.

Working paper, April 2016 | Expand abstract »

An important limitation in previous studies of political behavior using Twitter data is the lack of information about the sociodemographic characteristics of individual users. This paper addresses this challenge by developing new machine learning methods that will allow researchers to estimate the age, gender, race, party affiliation, propensity to vote, and income of any Twitter user in the U.S. with high accuracy. The training dataset for these classifiers was obtained by matching a massive dataset of 1 billion geolocated Twitter messages with voting registration records and estimates of home values across 15 different states, resulting in a sample of nearly 250,000 Twitter users whose sociodemographic traits are known. I illustrate the value of these new methods with two applications. First, I explore how attention to different candidates in the 2016 presidential primary election varies across demographic groups within a panel of randomly selected Twitter users. I argue that these covariates can be used to adjust estimates of sentiment towards political actors based on Twitter data, and provide a proof of concept using presidential approval. Second, I examine whether social media can reduce inequalities in potential exposure to political messages. In particular, I show that retweets (a proxy for inadvertent exposure) have a large equalizing effect in access to information.

How Social Media Reduces Mass Political Polarization. Evidence from Germany, Spain, and the United States

Working paper, August 2015 | Expand abstract »

A growing proportion of citizens rely on social media to gather political information and to engage in political discussions within their personal networks. Existing studies argue that social media create “echo-chambers,” where individuals are primarily exposed to like-minded views. However, this literature has ignored that social media platforms facilitate exposure to messages from those with whom individuals have weak ties, which are more likely to provide novel information to which individuals would not be exposed otherwise through offline interactions. Because weak ties tend to be with people who are more politically heterogeneous than citizens' immediate personal networks, this exposure reduces political extremism. To test this hypothesis, I develop a new method to estimate dynamic ideal points for social media users. I apply this method to measure the ideological positions of millions of individuals in Germany, Spain, and the United States over time, as well as the ideological composition of their personal networks. Results from this panel design show that most social media users are embedded in ideologically diverse networks, and that exposure to political diversity has a positive effect on political moderation. This result is robust to the inclusion of covariates measuring offline political behavior, obtained by matching Twitter user profiles with publicly available voter files in several U.S. states. I also provide evidence from survey data in these three countries that bolsters these findings. Contrary to conventional wisdom, my analysis provides evidence that social media usage reduces mass political polarization.
Media coverage: New York Times, Wall Street Journal, Nieman Lab, Wired UK, Slate FR, Le Monde

The Empirical Determinants of Social Media Adoption by World Leaders and its Empirical Consequences.

Co-authored with Thomas Zeitzoff

Under review | Working paper, February 2016 | Expand abstract »

The emergence of social media has led scholars to focus on its effects on mass behavior and protest. A key understudied question is what explains the variation in the adoption and use of social media by world leaders? Social media, and in particular Twitter and Facebook, have emerged as important, new channels for political communication. By the end of 2014, over 76% of world leaders had an active presence on social media platforms, which are being used to communicate with domestic and international audiences. We look at several different potential hypotheses that explain adoption of social media by world leaders including: modernization, social pressure, level of democratization, and diffusion. We find strong support for two explanations--increased political pressure from social unrest and higher levels of democratization both increase the likelihood of leaders adopting social media. Taken together, these findings show how institutional and political pressures shape political communication and leader behavior.

Vague concepts in survey questions: A general problem illustrated with the left-right scale.

Co-authored with Paul Bauer, Kathrin Ackermann and Aaron Venetz.

Under review | Working paper, July 2015 | Expand abstract »

Vague concepts in survey questions trigger different associations and thus impact respondents’ answers. If these associations vary systematically with other explanatory variables they may bias the empirical relationships we observe. We illustrate this problem using a unique survey conducted in Germany that asked respondents open-ended questions regarding the meanings they attribute to the concepts “left” and “right,” which we categorize using topic modeling techniques. Our analysis shows that variation in respondents’ associations is systematically related to their self-placement on the left-right scale and to other explanatory variables, which indicates that the interpersonal comparability of the left-right scale across individuals is impaired. We recommend replacing the left-right scale with a battery of questions about issues with more specific ideological content in future surveys.

A Bad Workman Blames His Tweets. The Consequences of Citizens' Uncivil Twitter Use when Interacting with Party Candidates

Co-authored with Yannis Theocharis, Sebastian Adrian Popa, and Zoltan Fazekas

Under review | Working paper, May 2016 | Expand abstract »

The recent emergence of microblogs has had a significant effect on the contemporary political landscape. The platform’s potential to enhance information availability and make interactive discussions between politicians and citizens feasible is especially important. Existing studies focusing on politicians' adoption of Twitter have found that far from exploiting the platform's two-way communication potential, they use it as a method of broadcasting, thus wasting a valuable opportunity to interact with citizens. We argue that citizens’ impolite and/or uncivil behaviour is one potential explanation for such decisions. Social media conversations are rife with trolling and harassment practices and politicians are often a prime target for such behaviour, a phenomenon altering the incentive structures of engaging in dialogue on social media. We use all Spanish, Greek, German and UK candidates' tweets sent during the run-up to the recent EU election, along with the responses they elicited, and rely on automated text analysis to measure their level of civility. Our contribution is an actor oriented theory of the political dialogue that incorporates the specificity of the social media platform, further clarifying how and why democratic promises of such social media platforms are fulfilled or limited.

Local Cartels: Parliamentary Representation and Subnational Electoral Success

Co-authored with Elias Dinas and Pedro Riera

Working paper, March 2015 | Expand abstract »

This article investigates how parties’ access to resources provided by the state improves their subsequent electoral performance. Previous cross-national research has emphasized the impact of legal rules on deterring new party entry. However, no clear consensus exists regarding the exact mechanisms that sustain insider parties while excluding outsiders. This article aims to fill this gap by arguing that the capacity of the former to ensure their own survival is higher whenever the benefits associated with presence in parliament are larger. Our main hypothesis is that, ceteris paribus, the greater the economic and informative resources parliamentary representation provides, the more likely obtaining at least one seat improves future electoral fortunes of political parties. We test it by exploiting the discontinuities generated by legal thresholds of representation at the subnational level in Spain, which allows us to causally identify the effect of parliamentary representation. We demonstrate that the magnitude of this effect is crucially shaped by the availability of important state subventions for parliamentary parties, the existence of a public television station at the regional level, the levels of fiscal decentralization, and the lack of a single-party with a parliamentary majority.

Prospects of Ideological Realignment(s) in the 2014 EP elections? Analyzing the Common Multidimensional Political Space for Voters, Parties, and Legislators in Europe

Co-authored with Sebastian Adrian Popa and Hermann Schmitt

Working paper, April 2015 | Expand abstract »

Given the current economic and political crisis in Europe, many argue that the 2014 EP elections shifted the electoral competition from national politics to a debate about the extent and scope of the EU level of governance. We contribute to this discussion by analyzing the European ideological space at the time of the elections. We build upon existing scaling techniques applied to social media networks and develop a new method to measure the positions of political parties and individual legislators in a multidimensional political space. We apply this method to estimate the ideological positions of candidates to the European Parliament and the sitting MPs in all 28 EU states, relying on a new dataset of social media accounts. To validate our estimates, we compare them with the aggregate perceptions of parties’ positions from the Voter Study of the European Election Study 2014. Our final goal is to analyze to what extent the 2014 EP elections brought the expected changes. We achieve this by establishing the relative importance of the left-right and European integration dimensions in each country. We also examine if and why the position of parties and candidates in EP elections differs from the position of parties and legislators in national parliaments.

Leaders or Followers? Measuring Political Responsiveness in the U.S. Congress Using Social Media Data.

Co-authored with Richard Bonneau, Patrick Egan, John T. Jost, Jonathan Nagler and Joshua Tucker

Working paper, June 2014 | Expand abstract »

Are legislators responsive to their constituents in their public communication? To what extent are they able to shape the agenda that the mass public cares about, as expressed by the issues they discuss? We address this twofold question with an analysis of all tweets sent by Members of the U.S. Congress and a random sample of their followers from January 2013 to March 2014. Using a Latent Dirichlet Allocation model, we extract topics that represent the diversity of issues that legislators and ordinary citizens discuss on this social networking site. Then, we exploit variation in the distribution of topics over time to test whether Members of Congress lead or follow their constituents in their selection of issues to discuss, employing a Granger-causality frame- work. We find that legislators are responsive in their public statements to their constituents, but also that they have limited influence on their followers’ public agenda. To further understand the mechanisms that explain political responsiveness, we also examine whether Members of Congress are more responsive to specific constituents groups, showing that they are more influenced by co-partisans, politically interested citizens, and social media users located within their constituency.



This R package, available on CRAN, provides access to Twitter's Streaming API via R. See the vignette for a tutorial on how to use it. The latest version is on this GitHub repository.


This R package, available on CRAN, provides access to Facebook's Graph API via R. See the vignette for a tutorial on how to use it. The latest version is on this GitHub repository.


Internal R package used by the Social Media and Political Participation Lab at NYU. See GitHub repository for latest version and documentation.


Python tools for the analysis of Twitter data in JSON format or in MongoDB collections, and network visualization using Gephi. See GitHub repository.

For more code and other materials, check my GitHub repositories

Teaching Materials

NYU PhD Course "Quantitative Methods for Political Science 3"

See this GitHub repository for recitation materials on maximum likelihood, duration models, time-series analysis, and Bayesian statistics.

NYU-Abu Dhabi Course "Social Media and Political Participation"

The recitation materials for the course, available on this GitHub repository, provide an introduction to statistical analysis using R, and show how to harvest and analyze data from Twitter and Facebook.

NYU Politics DataLab Workshops

These two-hour workshops, taught in Spring and Fall 2013, give an overview of how to scrape Twitter and web data with R and how to visualize data with R and ggplot2.

Workshop: Data Science and Social Science (NYU)

This 3-day workshop provides an introduction to the R programming language, modeling and visualization, automated textual analysis, social network analysis, and web scraping & APIs. Click here for materials.

Recent Blog Posts