Pablo Barberá

Computational Political Scientist

I have designed and taught a variety of semester-length courses, summer schools, and workshops on topics related to computational social science, automated text analysis, statistical computing, social data analysis, and Big Data. My teaching generally follows a "learning-by-doing" approach, with short guided coding sessions followed by data challenges that prompt students to practice new methods and skills. All my original course materials are licensed under a Creative Commons Attribution 4.0 International License.

POIR613 - Computational Social Science

University of Southern California, Fall 2017, Fall 2019, Fall 2021.

Course website | Expand summary »

Citizens across the globe spend an increasing proportion of their daily lives online. Their activities leave behind granular, time-stamped footprints of human behavior and personal interactions that represent a new and exciting source of data to study standing questions about political and social behavior. At the same time, the volume and heterogeneity of digital data present unprecedented methodological challenges. The goal of this course is to introduce students to new computational social science methods and tools required to explore and harness the potential of “Big Data” using the R programming language.

POIR611 - Introduction to Regression Analysis

University of Southern California, Spring 2017, Fall 2020.

Syllabus | Expand summary »

This course introduces PhD students to quantitative analysis in the social sciences. At the end of the semester, you will be able to: (1) Read and evaluate quantitative research in Political Science and IR. (2) Test hypotheses about relationships between variables using quantitative methods, including regression analysis. (3) Read and manipulate data in multiple formats for large-n research projects. (4) Understand what additional training and skills you will need to conduct research, and the thorough grounding necessary for self-teaching. (5) Provide you with a working knowledge of R to facilitate 2-4.

The course is roughly divided in three parts. Weeks 1-5 focus on learning description and inference for a single variable. We will cover the basics of probability theory and hypothesis testing. Weeks 6-11 introduce the workhorse of quantitative analysis – linear regression. This part will focus on the derivation, estimation, and interpretation of the linear model, and then solutions to violations of the linear regression assumptions. The final weeks of the semester will discuss more advanced topics, including techniques for causal inference, matrix algebra, time series analysis, and data visualization.

MY459 - Quantitative Text Analysis

London School of Economics, LT 2018, LT 2019.
Co-taught with Ken Benoit in LT 2018.

Course website | Expand summary »

The course surveys methods for systematically extracting quantitative information from political text for social scientific purposes, starting with classical content analysis and dictionary-based methods, to classification methods, and state-of-the-art scaling methods and topic models for estimating quantities from text using statistical techniques. The course lays a theoretical foundation for text analysis but mainly takes a very practical and applied approach, so that students learn how to apply these methods in actual research. The common focus across all methods is that they can be reduced to a three-step process: first, identifying texts and units of texts for analysis; second, extracting from the texts quantitatively measured features—such as coded content categories, word counts, word types, dictionary counts, or parts of speech—and converting these into a quantitative matrix; and third, using quantitative or statistical methods to analyse this matrix in order to generate inferences about the texts or their authors. The course systematically covers these methods in a logical progression, with a practical, hands-on approach where each technique will be applied using appropriate software to real texts.

MY472 - Data for Data Scientists

London School of Economics, MT 2018.
Co-taught with Akitaka Matsuo.

Course website | Expand summary »

This course will cover the principles of digital methods for storing and structuring data, including data types, relational and nonrelational database design, and query languages. Students will learn to build, populate, manipulate and query databases based on datasets relevant to their fields of interest. The course will also cover workflow management for typical data transformation and cleaning projects, frequently the starting point and most time-consuming part of any data science project. This course uses a project-based learning approach towards the study of performance computation and group-based collaboration, essential ingredients of modern data science projects. The coverage of data sharing will include key skills in on-line publishing, including the elements of web design, the technical elements of web technologies and web programming, as well as the use of revision-control and group collaboration tools such as GitHub.

IR211 - Approaches to Research in International Relations

University of Southern California, Fall 2017.
Designed based on materials by Ben Graham.

Syllabus | Expand summary »

This class is an introduction to social science research methodology. Our main goal is to teach you the basics of creating and consuming research in the social sciences, and international relations in particular. The course will lead you through conceptualization and theory construction, the derivation of testable hypotheses, and how to use data analysis methods to evaluate these hypotheses. We will cover causal inference, observation and measurement, ethics of social science research, and quantitative research methods. We will also discuss the way in which academic articles in the social sciences are written, and how they should be read. This course includes some introductory statistics, and requires use of Excel for some class assignments. These include descriptive statistics, contingency tables, correlation analysis, and significance tests for relationships between variables from different quantitative datasets.

IR312 - Introduction to Data Analysis

University of Southern California, Fall 2016.
Designed based on materials by Kosuke Imai.

Syllabus | Expand summary »

Are democratic countries less likely to engage in interstate disputes? Do cash transfer programs reduce poverty in developing countries? What factors predict bilateral trade flows? Is it possible to detect electoral fraud just by looking at the distribution of vote counts across districts? Has income inequality increase across and within countries over the past few decades? Academic researchers and policy-makers increasingly rely on quantitative methods to answer these questions. As the sheer volume of data available grows, the ability to analyze data, interpret the results, and effectively communicate key findings has become an essential skill to conduct empirical research in the social sciences. The ability to extract valuable insights from quantitative data – often referred to as "data science" – is also a common demand by employers in the private sector.

IR468 - European Integration

University of Southern California, Fall 2016.

Syllabus | Expand summary »

The recent political and economic crisis in the European Union (EU) highlights the considerable challenges that the process of European integration currently faces. How should the European institutional framework be designed in order to strike a balance between protecting national sovereignty and ensuring an efficient policy process? Is there a “democratic deficit” within the EU, and if so, what type of reforms may foster political accountability and democratic legitimacy? After the financial crisis and the prospects of Greece leaving the Eurozone, is a common currency still viable? What should be the role of the EU as an international actor, regarding its trade, defense, and foreign policies? These are some of the broad themes we will address throughout the course.

The course will begin with an overview of the history of the EU, and the key theories and debates on European integration. We will draw on this historical and theoretical background to discuss its governing institutions and to examine the main aspects of legislative, executive, and judicial politics in the Union, as well as the role of citizens, political parties, and interest groups. During the second half of the semester, we will focus on some of the important questions regarding the future of the EU: the potential enlargement of the Union, the viability of the single currency, and the EU’s role as a global actor.

Collecting and Analyzing Social Media Data with R

4 hours. Last offered at LSE, January 2018.

Workshop website, January 2018 | Expand summary »

Citizens across the globe spend an increasing proportion of their daily lives on social media websites, such as Twitter and Facebook. Their activities leave behind granular, time-stamped footprints of human behavior and personal interactions that represent a new and exciting source of data to study standing questions about political and social behavior. At the same time, the volume and heterogeneity of social media data present unprecedented methodological challenges. The goal of this workshop is to gain the skills necessary to automate the process of downloading, cleaning, and analyzing social media data using the R programming language for statistical computing.

The workshop follows a “learning-by-doing” approach, with short guided coding sessions followed by data challenges that will prompt participants to practice what they just learned. Most of the applications will be related to Political Science and International Relations questions, but the course should be of interest to social science students more generally.

Querying large-scale online datasets: SQL and Google BigQuery

4 hours. Last offered at LSE, December 2018.

Workshop website, December 2018 | Expand summary »

The volume and heterogeneity of the new datasets available in the digital age present unprecedented opportunities for social scientists, but also new methodological challenges. Computing a simple average for a variable across groups can take minutes when a researcher is working with government records, large-scale survey studies or social media datasets with millions of rows. The goal of this workshop is to learn how to overcome challenges associated to massive-scale online databases. We will learn the basics of SQL, a language designed to query relational databases that is currently employed by most tech companies; and how to use it from R using the DBI package. From all the available options to store online databases, we will focus on BigQuery, which relies on Google’s infrastructure to efficiently store and query databases at scale. We will learn how to process, upload, and query databases of up to a billion rows in a matter of seconds, and how to export the results of our queries.

Automated Collection of Web and Social Data

15 hours. ECPR Methods Summer School, Central European University, 2017-18.

ECPR course description | Course website | Expand summary »

An increasingly vast wealth of data is freely available on the web -- from election results and legislative speeches to social media posts, newspaper articles, and press releases, among many other examples. Although this data is easily accessible, in most cases it is available in an unstructured format, which makes its analysis challenging. The goal of this course is to gain the skills necessary to automate the process of downloading, cleaning, and reshaping web and social data using the R programming language for statistical computing. We will cover all the most common scenarios: scraping data available in multiple pages or behind web forms, interacting with APIs and RSS feeds such as those provided by most media outlets, collecting data from Facebook and Twitter, extracting text and table data from PDF files, and manipulating datasets into a format ready for analysis. The course will follow a "learning-by-doing" approach, with short theoretical sessions followed by "data challenges" where participants will need to apply new methods.

Big Data Analysis in the Social Sciences

15 hours. ECPR Methods Summer School, Central European University, 2017-18.

ECPR course description | Course website | Expand summary »

Massive-scale datasets from web sources and social media, newly digitized text sources, and large longitudinal survey studies present exciting opportunities for the study of social and political behaviour, but at the same time its size and heterogeneity present significant challenges. This course will introduce participants to new computational methods and tools required to explore and analyse Big Data in the social sciences using the R programming language. It will be structured around techniques to deal with the 3 V's of Big Data: volume, variety, and veracity. First, we will cover the basics of parallel programming and cloud computing to analyse large-scale datasets. Second, we will learn how to scale human tasks through the use of machine learning methods. Finally, we will discuss how to automatically discover insights from large text and network datasets and validate the output of this analysis. The course will follow a "learning-by-doing" approach, with short theoretical sessions followed by "data challenges" where participants will need to apply new methods.

Big Data and Social Media Research

12 hours. Barcelona Summer School in Survey Methodology, Universitat Pompeu Fabra, 2018.

UPF course description | Course website | Expand summary »

Citizens across the globe spend an increasing proportion of their daily lives online. Their activities leave behind granular, time-stamped footprints of human behavior and personal interactions that represent a new and exciting source of data to study standing questions about political and social behavior. At the same time, the volume and heterogeneity of web data present unprecedented methodological challenges. The goal of this course is to introduce participants to new computational methods and tools required to explore and analyze Big Data from online sources using the R programming language. We will focus in particular on data collected from social networking sites, such as Twitter, whose use is becoming widespread in the social sciences.

Social Media Research

15 hours. EITM Europe Summer Institute, 2018.

EITM course description | Course website | Expand summary »

Quantitative Text Analysis

20 hours. University of Vienna, 2018.

Course website | Expand summary »

Pablo Barberá

Term-length courses

POIR613 - Computational Social Science

POIR611 - Introduction to Regression Analysis

MY459 - Quantitative Text Analysis

MY472 - Data for Data Scientists

IR211 - Approaches to Research in International Relations

IR312 - Introduction to Data Analysis

IR468 - European Integration

Workshops and summer schools

Collecting and Analyzing Social Media Data with R

Querying large-scale online datasets: SQL and Google BigQuery

Automated Collection of Web and Social Data

Big Data Analysis in the Social Sciences

Big Data and Social Media Research

Social Media Research

Quantitative Text Analysis