Quantitative Text Analysis

with Applications to Social Media Research

University of Vienna, October 15-19 2018

Citizens across the globe spend an increasing proportion of their daily lives on social media websites, such as Twitter and Facebook. Their activities leave behind granular, time-stamped footprints of human behavior and personal interactions that represent a new and exciting source of data to study standing questions about political and social behavior. At the same time, the volume and heterogeneity of social media data present unprecedented methodological challenges. The goal of this short course is to gain the skills necessary to automate the process of downloading, cleaning, and analyzing textual data from social media sites using the R programming language.


Pablo Barberá (Instructor) P.Barbera@lse.ac.uk @p_barbera


Monday October 15, 2018 Session 1 Social media research. 14:00–16:30
Tuesday October 16, 2018 Session 1 R refresher. Text analysis. 09:00–12:00
Session 2 Dictionary methods. 13:00–15:00
Wednesday October 17, 2018 Session 1 Supervised text classification. 9:00–12:00
Session 2 Topic models 14:00–17:30
Thursday October 18, 2018 Session 1 Collecting Twitter data 09:00–12:00
Session 2 Analyzing Twitter data. 16:00–19:00
Friday October 19, 2018 Session 1 Advanced topics 11:00–13:00


The course assumes intermediate familiarity with the R statistical programming language. Participants should be able to know how to read datasets in R, work with vectors and data frames, and run basic statistical analyses, such as linear regression. More advanced knowledge of statistical computing, such as writing functions and loops, is helpful but not required.

Students are expected to bring a laptop to class and follow along the coding section of each session.


This course will use R, which is a free and open-source programming language primarily used for statistics and data analysis. We will also use RStudio, which is an easy-to-use interface to R.

Installing R or RStudio prior to the workshop is not necessary. The instructor will provide individual login details to an RStudio Server that all workshop participants can access to run their code.

License and credit

Science should be open, and this course builds up other open licensed material, so unless otherwise noted, all materials for this class are licensed under a Creative Commons Attribution 4.0 International License.

The layout for this website was designed by Jeffrey Arnold (thanks!).

The source for the materials of this course is on GitHub at pablobarbera/text-analysis-vienna


If you have any feedback on the course or find any typos or errors in this website go to issues, click on the “New Issue” button to create a new issue, and add your suggestion or describe the problem.