Topic Modeling: LDA

While supervised learning is used when we know the categories we want to produce, unsupervised learning (including topic modeling) is used when we do not know the categories. In topic modeling, documents are not assumed to belong to one topic or category, but simultaneously belong to several topics. The topic distributions also vary over documents.

The workhorse function for the topic model is LDA, which stands for Latent Dirichlet Allocation, the technical name for this particular kind of model.

We will now use a dataset that contains the lead paragraph of around 5,000 articles about the economy published in the New York Times between 1980 and 2014. As before, we will preprocess the text using the standard set of techniques.

The number of topics in a topic model is somewhat arbitrary, so you need to play with the number of topics to see if you get anything more meaningful. We start here with 30 topics.

library(topicmodels)
# reading data and preparing corpus object
nyt <- read.csv("../data/nytimes.csv", stringsAsFactors = FALSE)
library(quanteda)
## Warning: package 'quanteda' was built under R version 3.4.2
nytcorpus <- corpus(nyt$lead_paragraph)
nytdfm <- dfm(nytcorpus, remove=stopwords("english"), verbose=TRUE,
               remove_punct=TRUE, remove_numbers=TRUE)
cdfm <- dfm_trim(nytdfm, min_docfreq = 2)

# estimate LDA with K topics
K <- 30
lda <- LDA(cdfm, k = K, method = "Gibbs", 
                control = list(verbose=25L, seed = 123, burnin = 100, iter = 500))
## K = 30; V = 11492; M = 5000
## Sampling 600 iterations!
## Iteration 25 ...
## Iteration 50 ...
## Iteration 75 ...
## Iteration 100 ...
## Iteration 125 ...
## Iteration 150 ...
## Iteration 175 ...
## Iteration 200 ...
## Iteration 225 ...
## Iteration 250 ...
## Iteration 275 ...
## Iteration 300 ...
## Iteration 325 ...
## Iteration 350 ...
## Iteration 375 ...
## Iteration 400 ...
## Iteration 425 ...
## Iteration 450 ...
## Iteration 475 ...
## Iteration 500 ...
## Iteration 525 ...
## Iteration 550 ...
## Iteration 575 ...
## Iteration 600 ...
## Gibbs sampling completed!

We can use get_terms to the top n terms from the topic model, and get_topics to predict the top k topic for each document. This will help us interpret the results of the model.

terms <- get_terms(lda, 15)
terms[,5]
##  [1] "jobs"         "unemployment" "labor"        "last"        
##  [5] "workers"      "job"          "week"         "million"     
##  [9] "work"         "number"       "people"       "benefits"    
## [13] "employment"   "statistics"   "force"
topics <- get_topics(lda, 1)
head(topics)
## text1 text2 text3 text4 text5 text6 
##    16    11     6    20    22    10

Let’s take a closer look at some of these topics. To help us interpret the output, we can look at the words associated with each topic and take a random sample of documents highly associated with each topic.

# Topic 5
paste(terms[,5], collapse=", ")
## [1] "jobs, unemployment, labor, last, workers, job, week, million, work, number, people, benefits, employment, statistics, force"
sample(nyt$lead_paragraph[topics==5], 1)
## [1] "In a recession replete with mysteries, one of the most puzzling is why so few of America's 9.5 million idle workers are collecting unemployment benefits. Figures on the nation's jobless from Townsend-Greenspan & Company, the economic consulting firm, indicate that only 42 percent now receive benefits under various Federal unemployment programs."
# Topic 11
paste(terms[,11], collapse=", ")
## [1] "economic, president, reagan, administration, officials, program, policy, white, washington, today, congress, mr, house, reagan's, policies"
sample(nyt$lead_paragraph[topics==11], 1)
## [1] "With a quick trip to St. Louis Thursday, President Reagan continues a busy week of work that seems designed to divert attention from the afflictions of his economic program. The apparent diversionary strategy reflects what one Presidential aide called the ''creeping pessimism'' within the White House about the chances for a long-term economic recovery. Since the start of July, Mr. Reagan has hopscotched through a series of events that, whether held in the California desert, the harbor in Baltimore or the Capitol steps, have given prominence to some issue other than the economic recovery program that up until now has been the Administration's main concern. In St. Louis, Mr. Reagan will turn his energies to yet another new concern: raising funds for the United States Olympic Committee. The President's staff, struggling to position Mr. Reagan for a fall election they expect to be hard on the Republican Party, is also using this trip and the powerful hospitality of the White House to court voters who regard this Administration with suspicion."
# Topic 12
paste(terms[,12], collapse=", ")
## [1] "economy, next, may, even, say, many, likely, economists, weeks, already, still, analysts, less, continue, coming"
sample(nyt$lead_paragraph[topics==12], 1)
## [1] "It is 5:30 in the evening as Adriana makes her way to work against a flow of people streaming out of the lattice of downtown stores and office towers here. She punches a time card, dons a uniform and sets out to clean her first bathroom of the night. A few miles away, Ana arrives at a suburban Target store at 10 p.m. to clean the in-house restaurant for the next day's shoppers. At 5:30 the next morning, Emilio starts his rounds at the changing rooms at a suburban department store. A half-hour later, Polo rushes to clean the showers and locker room at a university here before the early birds in the pool finish their morning swim."
# Topic 16
paste(terms[,16], collapse=", ")
## [1] "editor, income, people, poverty, americans, poor, article, school, programs, living, study, million, university, black, bureau"
sample(nyt$lead_paragraph[topics==16], 1)
## [1] "The Postal Service said its revenue for the last three months was nearly a half-billion dollars less than expected because mail volume continued to be flat. The Postal Service has struggled to deal with declining volume, a weak economy and the repercussions of the terrorist attacks in 2001, including anthrax-laden letters sent to lawmakers in Washington. Mail volume has dropped, in part, because more people are beginning to use electronic mail and bill-paying services, officials said. Postal revenue for the third quarter of the fiscal year was about $16 billion, $483 million below expectations, the Postal Service said."

You will that often some topics do not make much sense. They just capture the remaining cluster of words, and often correspond to stopwords. For example:

# Topic 3
paste(terms[,3], collapse=", ")
## [1] "economic, growth, economy, recovery, economists, strong, still, signs, forecast, expansion, recent, second, despite, slowdown, evidence"
sample(nyt$lead_paragraph[topics==3], 1)
## [1] "That was some party we had. But the boom years are long over now, a rapidly receding memory, and we're still reeling a bit from our drunken excess. Unfortunately, this is one hangover that won't be cured with a couple of aspirins and a nap. In fact, this thing could last awhile. There was some optimistic buzz about a recovery this winter, but that has died down. This isn't the typical recession in which a few interest-rate cuts perk things right up. In fact, there is no recent precedent for our current economic condition of anemic growth amid quiescent inflation and low interest rates. It happened last in 1957-58, when a slump was followed by a slow recovery and another drop. In the 1960 presidential campaign, John F. Kennedy promised to ''get the country moving again,'' which he did, sort of. But it wasn't until 1965 that unemployment rates retreated to boom-era levels."
# Topic 4
paste(terms[,4], collapse=", ")
## [1] "new, york, country, city, across, every, week, day, later, well, south, last, lost, cities, summer"
sample(nyt$lead_paragraph[topics==4], 1)
## [1] "This is a town of old money and new, bound together by country clubs, riding stables and costly water views in one of the wealthiest counties in the nation. But here in Greenwich, and in dozens of other towns across the metropolitan region, there are people who are having a hard time putting food on their tables. They conceal it as best they can, as life in the suburbs often dictates. But left jobless by the recession, caught in the vise of their mortgages and with no more unemployment benefits available, thousands of men and women in the region who never sought assistance of any kind before are beginning to accept free turkeys, jars of peanut butter and boxes of spaghetti in an effort to make ends meet."

In the case of date with timestamps, looking at the evolution of certain topics over time can also help interpret their meaning. Let’s look for example at Topic 13, which appears to be related to the stock market.

# Topic 13
paste(terms[,13], collapse=", ")
## [1] "s, many, americans, much, years, great, now, less, past, time, real, decade, depression, longer, course"
sample(nyt$lead_paragraph[topics==13], 1)
## [1] "No bull market in American history has been so avidly tracked as that of the 1990's. For eight years, we have watched this expanding balloon with a sense of collective wonder, transfixed by our unending prosperity. Measured purely in terms of popular participation, this market has eclipsed anything seen before. Where only a quarter of Americans dabbled in stocks before the 1987 plunge, nearly half now own stocks, either directly or through mutual funds. By comparison, the roaring market of the Jazz Age, with its paltry three million investors, seems a mere sideshow."
# add predicted topic to dataset
nyt$pred_topic <- topics
nyt$year <- substr(nyt$datetime, 1, 4) # extract year
 # frequency table with articles about stock market, per year
tab <- table(nyt$year[nyt$pred_topic==13])
plot(tab)

But we can actually do better than this. LDA is a probabilistic model, which means that for each document, it actually computes a distribution over topics. In other words, each document is considered to be about a mixture of topics.

This information is included in the matrix gamma in the LDA object (theta in the notation we used for the slides). For example, article 1 is 10% about topic 8, 9% about topic 20, 7% about topic 23, and then less than 5% for each of the rest.

round(lda@gamma[1,], 2)
##  [1] 0.02 0.02 0.03 0.10 0.03 0.02 0.02 0.03 0.05 0.02 0.02 0.02 0.04 0.03
## [15] 0.02 0.11 0.02 0.02 0.03 0.02 0.03 0.02 0.02 0.03 0.02 0.02 0.08 0.04
## [29] 0.02 0.03

So we can actually take the information in the matrix and aggregate it to compute the average probability that an article each year is about a particular topic. Let’s now choose Topic 21, which appears to be related to the financial crisis.

# Topic 21
paste(terms[,21], collapse=", ")
## [1] "can, now, just, people, get, think, right, us, going, things, way, know, really, fact, something"
# add probability to df
nyt$prob_topic_21 <- lda@gamma[,21]
# now aggregate at the year level
agg <- aggregate(nyt$prob_topic_21, by=list(year=nyt$year), FUN=mean)
# and plot it
plot(agg$year, agg$x, type="l", xlab="Year", ylab="Avg. prob. of article about topic 21",
     main="Estimated proportion of articles about the financial crisis")

Topic Modeling: Structural Topic Model

Most text corpora have not only the documents per se, but also a lot of metadata associated – we know the author, characteristics of the author, when the document was produced, etc. The structural topic model takes advantage of this metadata to improve the discovery of topics. Here we will learn how it works, how we can interpret the output, and some issues related to its usage for research.

We will continue with the previous example, but now adding one covariate: the party of the president.

library(stm)
## Warning: package 'stm' was built under R version 3.4.1
## stm v1.3.0 (2017-09-08) successfully loaded. See ?stm for help.
# extracting covariates
year <- as.numeric(substr(nyt$datetime, 1, 4))
repub <- ifelse(year %in% c(1981:1992, 2000:2008), 1, 0)

And now we’re ready to run stm!

# running STM
stm <- stm(documents=cdfm, K=30, prevalence=~repub, max.em.its=100)
## Beginning Spectral Initialization 
##   Calculating the gram matrix...
##   Using only 10000 most frequent terms during initialization...
##   Finding anchor words...
##      ..............................
##   Recovering initialization...
##      ....................................................................................................
## Initialization complete.
## ....................................................................................................
## Completed E-Step (3 seconds). 
## Completed M-Step. 
## Completing Iteration 1 (approx. per word bound = -7.823) 
## ....................................................................................................
## Completed E-Step (3 seconds). 
## Completed M-Step. 
## Completing Iteration 2 (approx. per word bound = -7.286, relative change = 6.856e-02) 
## ....................................................................................................
## Completed E-Step (3 seconds). 
## Completed M-Step. 
## Completing Iteration 3 (approx. per word bound = -7.233, relative change = 7.307e-03) 
## ....................................................................................................
## Completed E-Step (3 seconds). 
## Completed M-Step. 
## Completing Iteration 4 (approx. per word bound = -7.217, relative change = 2.164e-03) 
## ....................................................................................................
## Completed E-Step (3 seconds). 
## Completed M-Step. 
## Completing Iteration 5 (approx. per word bound = -7.210, relative change = 1.044e-03) 
## Topic 1: bill, senate, house, d, stimulus 
##  Topic 2: billion, deficit, budget, trade, year 
##  Topic 3: federal, rates, interest, reserve, fed 
##  Topic 4: even, way, make, people, day 
##  Topic 5: president, mr, obama, bush, economy 
##  Topic 6: prices, dollar, oil, yesterday, new 
##  Topic 7: street, wall, financial, many, money 
##  Topic 8: said, new, prime, long, minister 
##  Topic 9: one, editor, us, can, america's 
##  Topic 10: corporate, year, companies, america, last 
##  Topic 11: economic, president, company, said, none 
##  Topic 12: percent, prices, said, rose, last 
##  Topic 13: country, cut, time, federal, now 
##  Topic 14: american, editor, economic, article, policy 
##  Topic 15: united, states, said, american, foreign 
##  Topic 16: economy, recession, economic, states, united 
##  Topic 17: inflation, index, said, survey, economy 
##  Topic 18: time, now, new, oil, last 
##  Topic 19: market, stock, economy, stocks, investors 
##  Topic 20: gov, new, mr, york, business 
##  Topic 21: market, new, business, black, year 
##  Topic 22: financial, banks, crisis, mortgage, bank 
##  Topic 23: percent, quarter, rate, annual, year 
##  Topic 24: economic, president, mr, reagan, tax 
##  Topic 25: jobs, unemployment, workers, people, job 
##  Topic 26: economic, seem, consumers, like, depression 
##  Topic 27: states, united, new, economic, global 
##  Topic 28: health, care, new, education, tax 
##  Topic 29: percent, said, department, orders, commerce 
##  Topic 30: said, economy, center, office, one 
## ....................................................................................................
## Completed E-Step (2 seconds). 
## Completed M-Step. 
## Completing Iteration 6 (approx. per word bound = -7.205, relative change = 5.929e-04) 
## ....................................................................................................
## Completed E-Step (2 seconds). 
## Completed M-Step. 
## Completing Iteration 7 (approx. per word bound = -7.203, relative change = 4.002e-04) 
## ....................................................................................................
## Completed E-Step (2 seconds). 
## Completed M-Step. 
## Completing Iteration 8 (approx. per word bound = -7.201, relative change = 2.395e-04) 
## ....................................................................................................
## Completed E-Step (2 seconds). 
## Completed M-Step. 
## Completing Iteration 9 (approx. per word bound = -7.199, relative change = 2.108e-04) 
## ....................................................................................................
## Completed E-Step (2 seconds). 
## Completed M-Step. 
## Completing Iteration 10 (approx. per word bound = -7.198, relative change = 1.927e-04) 
## Topic 1: bill, senate, house, stimulus, tax 
##  Topic 2: billion, deficit, budget, trade, year 
##  Topic 3: federal, rates, interest, reserve, fed 
##  Topic 4: even, way, people, make, day 
##  Topic 5: president, mr, obama, bush, economy 
##  Topic 6: prices, dollar, oil, yesterday, new 
##  Topic 7: street, wall, financial, money, treasury 
##  Topic 8: said, long, new, prime, minister 
##  Topic 9: one, editor, us, can, people 
##  Topic 10: corporate, year, companies, america, last 
##  Topic 11: president, economic, company, said, none 
##  Topic 12: percent, prices, rose, last, year 
##  Topic 13: country, cut, time, now, government 
##  Topic 14: editor, american, economic, policy, article 
##  Topic 15: united, states, foreign, american, said 
##  Topic 16: economy, recession, war, economic, world 
##  Topic 17: inflation, index, said, economy, consumer 
##  Topic 18: time, now, new, oil, last 
##  Topic 19: market, stock, economy, investors, stocks 
##  Topic 20: new, york, gov, mr, business 
##  Topic 21: market, new, business, black, years 
##  Topic 22: financial, banks, crisis, bank, credit 
##  Topic 23: percent, quarter, rate, annual, year 
##  Topic 24: economic, president, mr, reagan, administration 
##  Topic 25: jobs, unemployment, workers, job, labor 
##  Topic 26: economic, consumers, seem, spend, like 
##  Topic 27: states, united, economic, new, world 
##  Topic 28: health, care, education, new, social 
##  Topic 29: percent, department, said, sales, commerce 
##  Topic 30: said, center, economy, think, lead 
## ....................................................................................................
## Completed E-Step (2 seconds). 
## Completed M-Step. 
## Completing Iteration 11 (approx. per word bound = -7.197, relative change = 1.516e-04) 
## ....................................................................................................
## Completed E-Step (2 seconds). 
## Completed M-Step. 
## Completing Iteration 12 (approx. per word bound = -7.196, relative change = 1.164e-04) 
## ....................................................................................................
## Completed E-Step (2 seconds). 
## Completed M-Step. 
## Completing Iteration 13 (approx. per word bound = -7.195, relative change = 1.027e-04) 
## ....................................................................................................
## Completed E-Step (2 seconds). 
## Completed M-Step. 
## Completing Iteration 14 (approx. per word bound = -7.195, relative change = 1.093e-04) 
## ....................................................................................................
## Completed E-Step (2 seconds). 
## Completed M-Step. 
## Completing Iteration 15 (approx. per word bound = -7.194, relative change = 9.860e-05) 
## Topic 1: house, bill, senate, tax, stimulus 
##  Topic 2: billion, deficit, budget, spending, trade 
##  Topic 3: federal, rates, reserve, interest, fed 
##  Topic 4: even, way, people, lot, day 
##  Topic 5: president, mr, obama, bush, economy 
##  Topic 6: prices, dollar, oil, yesterday, new 
##  Topic 7: street, wall, financial, money, treasury 
##  Topic 8: said, long, new, prime, minister 
##  Topic 9: one, editor, us, can, people 
##  Topic 10: corporate, year, companies, america, last 
##  Topic 11: economic, president, company, said, none 
##  Topic 12: percent, prices, rose, last, increase 
##  Topic 13: country, cut, time, government, now 
##  Topic 14: editor, american, policy, economic, article 
##  Topic 15: united, states, foreign, american, said 
##  Topic 16: recession, economy, war, economic, american 
##  Topic 17: said, index, economy, economic, inflation 
##  Topic 18: time, now, new, oil, texas 
##  Topic 19: market, stock, investors, stocks, economy 
##  Topic 20: new, york, mr, gov, advertising 
##  Topic 21: new, market, business, black, american 
##  Topic 22: financial, banks, crisis, credit, bank 
##  Topic 23: percent, rate, quarter, annual, year 
##  Topic 24: economic, president, mr, reagan, administration 
##  Topic 25: jobs, unemployment, workers, labor, job 
##  Topic 26: economic, consumers, seem, spend, spending 
##  Topic 27: states, united, economic, world, countries 
##  Topic 28: health, care, education, social, new 
##  Topic 29: percent, sales, department, said, commerce 
##  Topic 30: lead, said, center, think, million 
## ....................................................................................................
## Completed E-Step (3 seconds). 
## Completed M-Step. 
## Completing Iteration 16 (approx. per word bound = -7.193, relative change = 8.774e-05) 
## ....................................................................................................
## Completed E-Step (2 seconds). 
## Completed M-Step. 
## Completing Iteration 17 (approx. per word bound = -7.193, relative change = 8.819e-05) 
## ....................................................................................................
## Completed E-Step (2 seconds). 
## Completed M-Step. 
## Completing Iteration 18 (approx. per word bound = -7.192, relative change = 8.691e-05) 
## ....................................................................................................
## Completed E-Step (2 seconds). 
## Completed M-Step. 
## Completing Iteration 19 (approx. per word bound = -7.191, relative change = 7.822e-05) 
## ....................................................................................................
## Completed E-Step (2 seconds). 
## Completed M-Step. 
## Completing Iteration 20 (approx. per word bound = -7.191, relative change = 7.849e-05) 
## Topic 1: house, bill, senate, tax, congress 
##  Topic 2: billion, deficit, budget, spending, year 
##  Topic 3: federal, rates, reserve, interest, fed 
##  Topic 4: even, way, people, lot, day 
##  Topic 5: president, mr, obama, bush, economy 
##  Topic 6: prices, dollar, oil, yesterday, new 
##  Topic 7: street, wall, money, financial, treasury 
##  Topic 8: said, long, new, prime, auto 
##  Topic 9: one, editor, us, can, people 
##  Topic 10: corporate, year, companies, america, last 
##  Topic 11: company, said, economic, president, none 
##  Topic 12: percent, prices, rose, last, increase 
##  Topic 13: country, cut, time, government, now 
##  Topic 14: editor, american, policy, economic, conservative 
##  Topic 15: united, states, foreign, american, trade 
##  Topic 16: recession, economy, economic, war, economists 
##  Topic 17: said, economy, economic, index, consumer 
##  Topic 18: time, now, new, oil, texas 
##  Topic 19: market, stock, investors, stocks, economy 
##  Topic 20: new, york, advertising, mr, gov 
##  Topic 21: new, market, business, black, american 
##  Topic 22: financial, crisis, banks, credit, bank 
##  Topic 23: percent, rate, quarter, annual, year 
##  Topic 24: economic, president, mr, reagan, administration 
##  Topic 25: unemployment, jobs, workers, labor, job 
##  Topic 26: economic, consumers, seem, spend, spending 
##  Topic 27: states, united, world, economic, countries 
##  Topic 28: health, care, social, education, new 
##  Topic 29: percent, sales, department, said, commerce 
##  Topic 30: lead, million, short, think, center 
## ....................................................................................................
## Completed E-Step (2 seconds). 
## Completed M-Step. 
## Completing Iteration 21 (approx. per word bound = -7.190, relative change = 7.834e-05) 
## ....................................................................................................
## Completed E-Step (2 seconds). 
## Completed M-Step. 
## Completing Iteration 22 (approx. per word bound = -7.190, relative change = 6.686e-05) 
## ....................................................................................................
## Completed E-Step (2 seconds). 
## Completed M-Step. 
## Completing Iteration 23 (approx. per word bound = -7.189, relative change = 7.216e-05) 
## ....................................................................................................
## Completed E-Step (2 seconds). 
## Completed M-Step. 
## Completing Iteration 24 (approx. per word bound = -7.189, relative change = 6.782e-05) 
## ....................................................................................................
## Completed E-Step (2 seconds). 
## Completed M-Step. 
## Completing Iteration 25 (approx. per word bound = -7.188, relative change = 6.329e-05) 
## Topic 1: house, tax, bill, senate, congress 
##  Topic 2: billion, deficit, budget, spending, year 
##  Topic 3: federal, rates, reserve, interest, fed 
##  Topic 4: even, way, people, day, lot 
##  Topic 5: president, mr, obama, bush, economy 
##  Topic 6: prices, dollar, oil, yesterday, new 
##  Topic 7: street, wall, money, financial, treasury 
##  Topic 8: said, long, new, prime, auto 
##  Topic 9: one, editor, us, people, column 
##  Topic 10: corporate, year, companies, profits, america 
##  Topic 11: company, said, economic, president, chief 
##  Topic 12: percent, prices, inflation, increase, rose 
##  Topic 13: country, cut, time, government, now 
##  Topic 14: editor, american, policy, economic, industrial 
##  Topic 15: foreign, united, states, american, trade 
##  Topic 16: recession, economy, economic, war, economists 
##  Topic 17: said, economy, economic, index, report 
##  Topic 18: time, now, new, oil, texas 
##  Topic 19: market, stock, investors, stocks, markets 
##  Topic 20: new, york, advertising, mr, gov 
##  Topic 21: new, market, business, black, years 
##  Topic 22: financial, crisis, banks, credit, bank 
##  Topic 23: percent, rate, quarter, annual, year 
##  Topic 24: economic, president, mr, reagan, administration 
##  Topic 25: unemployment, jobs, workers, labor, job 
##  Topic 26: economic, consumers, seem, spend, can 
##  Topic 27: states, united, world, economic, american 
##  Topic 28: health, care, social, education, new 
##  Topic 29: percent, sales, department, said, commerce 
##  Topic 30: lead, million, short, think, center 
## ....................................................................................................
## Completed E-Step (2 seconds). 
## Completed M-Step. 
## Completing Iteration 26 (approx. per word bound = -7.188, relative change = 5.409e-05) 
## ....................................................................................................
## Completed E-Step (2 seconds). 
## Completed M-Step. 
## Completing Iteration 27 (approx. per word bound = -7.188, relative change = 5.409e-05) 
## ....................................................................................................
## Completed E-Step (2 seconds). 
## Completed M-Step. 
## Completing Iteration 28 (approx. per word bound = -7.187, relative change = 5.513e-05) 
## ....................................................................................................
## Completed E-Step (2 seconds). 
## Completed M-Step. 
## Completing Iteration 29 (approx. per word bound = -7.187, relative change = 5.677e-05) 
## ....................................................................................................
## Completed E-Step (2 seconds). 
## Completed M-Step. 
## Completing Iteration 30 (approx. per word bound = -7.186, relative change = 5.935e-05) 
## Topic 1: tax, house, bill, senate, congress 
##  Topic 2: billion, deficit, budget, spending, year 
##  Topic 3: federal, rates, reserve, interest, fed 
##  Topic 4: even, way, people, new, day 
##  Topic 5: president, mr, obama, bush, economy 
##  Topic 6: prices, dollar, oil, yesterday, treasury 
##  Topic 7: street, wall, money, financial, treasury 
##  Topic 8: said, long, new, auto, prime 
##  Topic 9: one, editor, us, column, people 
##  Topic 10: corporate, companies, year, profits, america 
##  Topic 11: company, said, economic, monday, thursday 
##  Topic 12: percent, prices, inflation, increase, consumer 
##  Topic 13: country, cut, time, government, now 
##  Topic 14: editor, american, policy, economic, industrial 
##  Topic 15: foreign, united, american, states, said 
##  Topic 16: economy, recession, recovery, economic, economists 
##  Topic 17: said, economic, economy, report, index 
##  Topic 18: time, now, new, texas, oil 
##  Topic 19: market, stock, investors, stocks, markets 
##  Topic 20: york, new, advertising, mr, gov 
##  Topic 21: new, market, business, black, years 
##  Topic 22: financial, crisis, banks, credit, bank 
##  Topic 23: percent, rate, quarter, annual, year 
##  Topic 24: economic, president, mr, reagan, administration 
##  Topic 25: unemployment, jobs, workers, labor, job 
##  Topic 26: economic, can, seem, consumers, spend 
##  Topic 27: states, united, world, economic, american 
##  Topic 28: health, care, social, education, new 
##  Topic 29: percent, sales, department, said, commerce 
##  Topic 30: lead, million, short, center, said 
## ....................................................................................................
## Completed E-Step (4 seconds). 
## Completed M-Step. 
## Completing Iteration 31 (approx. per word bound = -7.186, relative change = 5.604e-05) 
## ....................................................................................................
## Completed E-Step (5 seconds). 
## Completed M-Step. 
## Completing Iteration 32 (approx. per word bound = -7.185, relative change = 5.339e-05) 
## ....................................................................................................
## Completed E-Step (3 seconds). 
## Completed M-Step. 
## Completing Iteration 33 (approx. per word bound = -7.185, relative change = 5.228e-05) 
## ....................................................................................................
## Completed E-Step (3 seconds). 
## Completed M-Step. 
## Completing Iteration 34 (approx. per word bound = -7.185, relative change = 5.668e-05) 
## ....................................................................................................
## Completed E-Step (2 seconds). 
## Completed M-Step. 
## Completing Iteration 35 (approx. per word bound = -7.184, relative change = 5.242e-05) 
## Topic 1: tax, house, bill, congress, senate 
##  Topic 2: billion, deficit, budget, spending, year 
##  Topic 3: federal, rates, reserve, interest, fed 
##  Topic 4: even, people, way, new, day 
##  Topic 5: president, mr, obama, bush, economy 
##  Topic 6: prices, dollar, oil, yesterday, treasury 
##  Topic 7: street, wall, money, treasury, financial 
##  Topic 8: said, long, auto, prime, new 
##  Topic 9: one, editor, us, column, people 
##  Topic 10: corporate, companies, year, profits, america 
##  Topic 11: company, said, thursday, monday, economic 
##  Topic 12: percent, prices, inflation, increase, consumer 
##  Topic 13: country, cut, time, government, now 
##  Topic 14: editor, american, policy, economic, industrial 
##  Topic 15: foreign, united, american, said, states 
##  Topic 16: economy, recession, recovery, economists, economic 
##  Topic 17: said, economic, economy, report, index 
##  Topic 18: time, now, new, texas, oil 
##  Topic 19: market, stock, investors, stocks, markets 
##  Topic 20: york, new, advertising, mr, gov 
##  Topic 21: new, market, business, black, years 
##  Topic 22: financial, crisis, banks, credit, bank 
##  Topic 23: percent, rate, quarter, annual, year 
##  Topic 24: economic, president, mr, reagan, administration 
##  Topic 25: unemployment, jobs, workers, labor, job 
##  Topic 26: economic, can, seem, spend, consumers 
##  Topic 27: states, united, world, economic, american 
##  Topic 28: health, care, social, education, new 
##  Topic 29: percent, sales, department, said, commerce 
##  Topic 30: lead, million, short, said, center 
## ....................................................................................................
## Completed E-Step (2 seconds). 
## Completed M-Step. 
## Completing Iteration 36 (approx. per word bound = -7.184, relative change = 5.060e-05) 
## ....................................................................................................
## Completed E-Step (3 seconds). 
## Completed M-Step. 
## Completing Iteration 37 (approx. per word bound = -7.184, relative change = 4.812e-05) 
## ....................................................................................................
## Completed E-Step (3 seconds). 
## Completed M-Step. 
## Completing Iteration 38 (approx. per word bound = -7.183, relative change = 4.286e-05) 
## ....................................................................................................
## Completed E-Step (3 seconds). 
## Completed M-Step. 
## Completing Iteration 39 (approx. per word bound = -7.183, relative change = 4.424e-05) 
## ....................................................................................................
## Completed E-Step (2 seconds). 
## Completed M-Step. 
## Completing Iteration 40 (approx. per word bound = -7.183, relative change = 4.250e-05) 
## Topic 1: tax, house, bill, congress, senate 
##  Topic 2: billion, deficit, budget, spending, year 
##  Topic 3: federal, rates, reserve, interest, fed 
##  Topic 4: even, people, way, new, day 
##  Topic 5: president, mr, obama, bush, economy 
##  Topic 6: prices, dollar, oil, yesterday, treasury 
##  Topic 7: street, wall, money, treasury, financial 
##  Topic 8: said, long, prime, auto, new 
##  Topic 9: one, editor, us, column, people 
##  Topic 10: corporate, companies, year, profits, america 
##  Topic 11: thursday, chief, said, monday, tuesday 
##  Topic 12: percent, prices, inflation, increase, consumer 
##  Topic 13: country, cut, time, government, now 
##  Topic 14: editor, american, policy, economic, business 
##  Topic 15: foreign, united, said, american, states 
##  Topic 16: economy, recession, recovery, economists, economic 
##  Topic 17: said, economic, economy, report, index 
##  Topic 18: time, now, new, last, texas 
##  Topic 19: market, stock, investors, stocks, markets 
##  Topic 20: york, new, mr, advertising, gov 
##  Topic 21: new, market, business, black, years 
##  Topic 22: financial, crisis, banks, credit, bank 
##  Topic 23: percent, rate, quarter, annual, year 
##  Topic 24: president, economic, mr, reagan, administration 
##  Topic 25: unemployment, jobs, workers, labor, job 
##  Topic 26: economic, can, seem, spend, consumers 
##  Topic 27: states, united, world, economic, american 
##  Topic 28: health, care, social, education, new 
##  Topic 29: percent, sales, department, said, commerce 
##  Topic 30: lead, million, week, short, said 
## ....................................................................................................
## Completed E-Step (2 seconds). 
## Completed M-Step. 
## Completing Iteration 41 (approx. per word bound = -7.182, relative change = 4.074e-05) 
## ....................................................................................................
## Completed E-Step (2 seconds). 
## Completed M-Step. 
## Completing Iteration 42 (approx. per word bound = -7.182, relative change = 3.747e-05) 
## ....................................................................................................
## Completed E-Step (2 seconds). 
## Completed M-Step. 
## Completing Iteration 43 (approx. per word bound = -7.182, relative change = 3.751e-05) 
## ....................................................................................................
## Completed E-Step (2 seconds). 
## Completed M-Step. 
## Completing Iteration 44 (approx. per word bound = -7.182, relative change = 3.862e-05) 
## ....................................................................................................
## Completed E-Step (2 seconds). 
## Completed M-Step. 
## Completing Iteration 45 (approx. per word bound = -7.181, relative change = 3.648e-05) 
## Topic 1: tax, house, bill, congress, senate 
##  Topic 2: billion, deficit, budget, spending, year 
##  Topic 3: federal, rates, reserve, interest, fed 
##  Topic 4: even, people, way, new, day 
##  Topic 5: president, mr, obama, bush, economy 
##  Topic 6: prices, dollar, oil, yesterday, treasury 
##  Topic 7: street, wall, money, treasury, financial 
##  Topic 8: said, long, prime, auto, last 
##  Topic 9: one, editor, us, column, people 
##  Topic 10: corporate, companies, year, profits, america 
##  Topic 11: thursday, chief, wednesday, tuesday, monday 
##  Topic 12: percent, prices, inflation, consumer, increase 
##  Topic 13: country, cut, time, government, now 
##  Topic 14: editor, american, policy, economic, business 
##  Topic 15: foreign, said, united, american, states 
##  Topic 16: economy, recession, recovery, economic, economists 
##  Topic 17: said, economic, economy, report, index 
##  Topic 18: time, now, new, last, texas 
##  Topic 19: market, stock, investors, stocks, markets 
##  Topic 20: york, new, mr, advertising, gov 
##  Topic 21: new, market, business, black, years 
##  Topic 22: financial, crisis, banks, credit, bank 
##  Topic 23: percent, rate, annual, quarter, year 
##  Topic 24: president, economic, mr, reagan, administration 
##  Topic 25: unemployment, jobs, workers, labor, job 
##  Topic 26: economic, can, seem, spend, like 
##  Topic 27: states, united, world, american, economic 
##  Topic 28: health, care, social, education, national 
##  Topic 29: percent, sales, department, said, commerce 
##  Topic 30: lead, million, week, year, short 
## ....................................................................................................
## Completed E-Step (2 seconds). 
## Completed M-Step. 
## Completing Iteration 46 (approx. per word bound = -7.181, relative change = 2.892e-05) 
## ....................................................................................................
## Completed E-Step (2 seconds). 
## Completed M-Step. 
## Completing Iteration 47 (approx. per word bound = -7.181, relative change = 2.633e-05) 
## ....................................................................................................
## Completed E-Step (2 seconds). 
## Completed M-Step. 
## Completing Iteration 48 (approx. per word bound = -7.181, relative change = 2.573e-05) 
## ....................................................................................................
## Completed E-Step (2 seconds). 
## Completed M-Step. 
## Completing Iteration 49 (approx. per word bound = -7.181, relative change = 2.542e-05) 
## ....................................................................................................
## Completed E-Step (2 seconds). 
## Completed M-Step. 
## Completing Iteration 50 (approx. per word bound = -7.180, relative change = 2.404e-05) 
## Topic 1: tax, house, congress, bill, senate 
##  Topic 2: billion, deficit, budget, spending, year 
##  Topic 3: federal, rates, reserve, interest, fed 
##  Topic 4: even, people, way, new, day 
##  Topic 5: president, mr, obama, bush, campaign 
##  Topic 6: prices, dollar, oil, yesterday, treasury 
##  Topic 7: street, wall, money, treasury, financial 
##  Topic 8: said, long, prime, last, auto 
##  Topic 9: one, editor, us, column, people 
##  Topic 10: corporate, companies, year, profits, america 
##  Topic 11: chief, thursday, wednesday, tuesday, monday 
##  Topic 12: percent, prices, inflation, consumer, increase 
##  Topic 13: country, cut, time, government, now 
##  Topic 14: editor, american, policy, economic, business 
##  Topic 15: foreign, said, united, american, states 
##  Topic 16: economy, recession, recovery, economic, economists 
##  Topic 17: said, economic, economy, report, index 
##  Topic 18: time, new, now, last, texas 
##  Topic 19: market, stock, investors, stocks, points 
##  Topic 20: york, new, mr, advertising, gov 
##  Topic 21: new, market, business, black, years 
##  Topic 22: financial, crisis, banks, credit, bank 
##  Topic 23: percent, rate, annual, quarter, year 
##  Topic 24: president, economic, mr, reagan, administration 
##  Topic 25: unemployment, jobs, workers, labor, job 
##  Topic 26: economic, can, seem, like, people 
##  Topic 27: states, united, world, american, economic 
##  Topic 28: health, care, social, education, national 
##  Topic 29: percent, sales, department, said, commerce 
##  Topic 30: lead, million, week, year, said 
## ....................................................................................................
## Completed E-Step (2 seconds). 
## Completed M-Step. 
## Completing Iteration 51 (approx. per word bound = -7.180, relative change = 2.909e-05) 
## ....................................................................................................
## Completed E-Step (2 seconds). 
## Completed M-Step. 
## Completing Iteration 52 (approx. per word bound = -7.180, relative change = 3.217e-05) 
## ....................................................................................................
## Completed E-Step (2 seconds). 
## Completed M-Step. 
## Completing Iteration 53 (approx. per word bound = -7.180, relative change = 3.287e-05) 
## ....................................................................................................
## Completed E-Step (2 seconds). 
## Completed M-Step. 
## Completing Iteration 54 (approx. per word bound = -7.180, relative change = 2.428e-05) 
## ....................................................................................................
## Completed E-Step (2 seconds). 
## Completed M-Step. 
## Completing Iteration 55 (approx. per word bound = -7.179, relative change = 2.914e-05) 
## Topic 1: tax, house, congress, bill, senate 
##  Topic 2: billion, deficit, budget, spending, year 
##  Topic 3: federal, rates, reserve, interest, fed 
##  Topic 4: even, people, way, new, day 
##  Topic 5: president, mr, bush, obama, campaign 
##  Topic 6: prices, dollar, oil, yesterday, treasury 
##  Topic 7: street, wall, money, financial, treasury 
##  Topic 8: said, long, prime, last, auto 
##  Topic 9: one, editor, us, column, people 
##  Topic 10: corporate, companies, year, profits, retailers 
##  Topic 11: chief, thursday, wednesday, monday, tuesday 
##  Topic 12: percent, prices, inflation, consumer, increase 
##  Topic 13: country, cut, time, government, now 
##  Topic 14: editor, american, policy, economic, business 
##  Topic 15: foreign, said, united, american, states 
##  Topic 16: economy, recession, recovery, economic, economists 
##  Topic 17: said, economic, economy, report, index 
##  Topic 18: time, new, now, last, texas 
##  Topic 19: market, stock, investors, stocks, points 
##  Topic 20: york, new, mr, advertising, gov 
##  Topic 21: new, market, business, black, years 
##  Topic 22: financial, crisis, banks, credit, bank 
##  Topic 23: percent, rate, annual, quarter, year 
##  Topic 24: president, economic, mr, reagan, administration 
##  Topic 25: unemployment, jobs, workers, labor, job 
##  Topic 26: can, economic, people, seem, like 
##  Topic 27: states, united, world, american, economic 
##  Topic 28: health, care, social, education, national 
##  Topic 29: percent, sales, department, said, commerce 
##  Topic 30: lead, million, week, year, last 
## ....................................................................................................
## Completed E-Step (2 seconds). 
## Completed M-Step. 
## Completing Iteration 56 (approx. per word bound = -7.179, relative change = 3.323e-05) 
## ....................................................................................................
## Completed E-Step (2 seconds). 
## Completed M-Step. 
## Completing Iteration 57 (approx. per word bound = -7.179, relative change = 2.947e-05) 
## ....................................................................................................
## Completed E-Step (2 seconds). 
## Completed M-Step. 
## Completing Iteration 58 (approx. per word bound = -7.179, relative change = 3.075e-05) 
## ....................................................................................................
## Completed E-Step (2 seconds). 
## Completed M-Step. 
## Completing Iteration 59 (approx. per word bound = -7.178, relative change = 2.775e-05) 
## ....................................................................................................
## Completed E-Step (2 seconds). 
## Completed M-Step. 
## Completing Iteration 60 (approx. per word bound = -7.178, relative change = 2.437e-05) 
## Topic 1: tax, house, congress, bill, senate 
##  Topic 2: billion, deficit, budget, spending, year 
##  Topic 3: federal, rates, reserve, interest, fed 
##  Topic 4: even, people, way, day, new 
##  Topic 5: president, mr, bush, obama, campaign 
##  Topic 6: prices, dollar, oil, yesterday, treasury 
##  Topic 7: street, wall, money, financial, treasury 
##  Topic 8: said, long, prime, auto, last 
##  Topic 9: one, editor, us, column, people 
##  Topic 10: corporate, companies, year, retailers, profits 
##  Topic 11: chief, thursday, wednesday, monday, tuesday 
##  Topic 12: percent, prices, inflation, consumer, increase 
##  Topic 13: country, cut, time, government, new 
##  Topic 14: editor, policy, american, economic, business 
##  Topic 15: foreign, said, united, american, states 
##  Topic 16: economy, recession, recovery, economic, economists 
##  Topic 17: said, economic, economy, report, index 
##  Topic 18: time, new, now, last, texas 
##  Topic 19: market, stock, investors, stocks, points 
##  Topic 20: york, new, mr, advertising, gov 
##  Topic 21: new, market, business, black, years 
##  Topic 22: financial, crisis, banks, credit, bank 
##  Topic 23: percent, rate, annual, quarter, year 
##  Topic 24: president, economic, mr, reagan, administration 
##  Topic 25: unemployment, jobs, workers, labor, percent 
##  Topic 26: can, economic, people, like, seem 
##  Topic 27: states, united, world, american, economic 
##  Topic 28: health, care, social, education, national 
##  Topic 29: percent, sales, department, said, commerce 
##  Topic 30: lead, million, week, year, last 
## ....................................................................................................
## Completed E-Step (2 seconds). 
## Completed M-Step. 
## Completing Iteration 61 (approx. per word bound = -7.178, relative change = 2.335e-05) 
## ....................................................................................................
## Completed E-Step (2 seconds). 
## Completed M-Step. 
## Completing Iteration 62 (approx. per word bound = -7.178, relative change = 2.354e-05) 
## ....................................................................................................
## Completed E-Step (2 seconds). 
## Completed M-Step. 
## Completing Iteration 63 (approx. per word bound = -7.178, relative change = 2.603e-05) 
## ....................................................................................................
## Completed E-Step (2 seconds). 
## Completed M-Step. 
## Completing Iteration 64 (approx. per word bound = -7.178, relative change = 2.441e-05) 
## ....................................................................................................
## Completed E-Step (2 seconds). 
## Completed M-Step. 
## Completing Iteration 65 (approx. per word bound = -7.177, relative change = 2.302e-05) 
## Topic 1: tax, house, congress, bill, senate 
##  Topic 2: billion, deficit, budget, spending, year 
##  Topic 3: federal, rates, reserve, interest, fed 
##  Topic 4: even, people, way, new, day 
##  Topic 5: president, mr, bush, obama, campaign 
##  Topic 6: prices, dollar, oil, yesterday, treasury 
##  Topic 7: street, wall, financial, money, treasury 
##  Topic 8: said, auto, long, prime, industry 
##  Topic 9: one, editor, us, column, people 
##  Topic 10: corporate, companies, year, retailers, profits 
##  Topic 11: chief, thursday, wednesday, monday, tuesday 
##  Topic 12: percent, prices, inflation, consumer, increase 
##  Topic 13: country, cut, time, government, new 
##  Topic 14: editor, policy, american, economic, business 
##  Topic 15: foreign, said, united, american, trade 
##  Topic 16: economy, recession, recovery, economic, economists 
##  Topic 17: said, economic, economy, report, index 
##  Topic 18: time, new, now, last, texas 
##  Topic 19: market, stock, investors, stocks, points 
##  Topic 20: york, new, mr, advertising, gov 
##  Topic 21: new, market, business, black, years 
##  Topic 22: financial, crisis, banks, credit, bank 
##  Topic 23: percent, rate, annual, quarter, year 
##  Topic 24: president, economic, mr, reagan, administration 
##  Topic 25: unemployment, jobs, workers, labor, percent 
##  Topic 26: can, economic, people, like, seem 
##  Topic 27: states, united, world, american, economic 
##  Topic 28: health, care, social, education, national 
##  Topic 29: percent, sales, department, said, reported 
##  Topic 30: lead, million, week, year, last 
## ....................................................................................................
## Completed E-Step (2 seconds). 
## Completed M-Step. 
## Completing Iteration 66 (approx. per word bound = -7.177, relative change = 1.783e-05) 
## ....................................................................................................
## Completed E-Step (2 seconds). 
## Completed M-Step. 
## Completing Iteration 67 (approx. per word bound = -7.177, relative change = 1.876e-05) 
## ....................................................................................................
## Completed E-Step (2 seconds). 
## Completed M-Step. 
## Completing Iteration 68 (approx. per word bound = -7.177, relative change = 1.886e-05) 
## ....................................................................................................
## Completed E-Step (2 seconds). 
## Completed M-Step. 
## Completing Iteration 69 (approx. per word bound = -7.177, relative change = 1.680e-05) 
## ....................................................................................................
## Completed E-Step (2 seconds). 
## Completed M-Step. 
## Completing Iteration 70 (approx. per word bound = -7.177, relative change = 1.547e-05) 
## Topic 1: tax, house, congress, bill, senate 
##  Topic 2: billion, deficit, budget, spending, fiscal 
##  Topic 3: federal, rates, reserve, interest, fed 
##  Topic 4: even, people, way, new, day 
##  Topic 5: president, mr, bush, obama, campaign 
##  Topic 6: prices, dollar, oil, yesterday, treasury 
##  Topic 7: street, wall, financial, money, treasury 
##  Topic 8: said, auto, long, prime, industry 
##  Topic 9: one, editor, us, column, money 
##  Topic 10: corporate, companies, year, retailers, sales 
##  Topic 11: chief, thursday, wednesday, monday, tuesday 
##  Topic 12: percent, prices, inflation, consumer, increase 
##  Topic 13: country, cut, time, government, new 
##  Topic 14: editor, policy, american, economic, business 
##  Topic 15: foreign, said, united, american, trade 
##  Topic 16: economy, recession, recovery, economic, economists 
##  Topic 17: said, economic, economy, report, index 
##  Topic 18: time, new, now, last, texas 
##  Topic 19: market, stock, investors, stocks, dow 
##  Topic 20: york, new, mr, advertising, gov 
##  Topic 21: new, market, business, black, companies 
##  Topic 22: financial, crisis, banks, credit, bank 
##  Topic 23: percent, rate, annual, quarter, year 
##  Topic 24: president, economic, mr, reagan, administration 
##  Topic 25: unemployment, jobs, workers, labor, percent 
##  Topic 26: can, economic, people, like, seem 
##  Topic 27: states, united, world, american, economic 
##  Topic 28: health, care, social, education, national 
##  Topic 29: percent, sales, department, said, reported 
##  Topic 30: lead, million, week, year, last 
## ....................................................................................................
## Completed E-Step (2 seconds). 
## Completed M-Step. 
## Completing Iteration 71 (approx. per word bound = -7.177, relative change = 1.735e-05) 
## ....................................................................................................
## Completed E-Step (2 seconds). 
## Completed M-Step. 
## Completing Iteration 72 (approx. per word bound = -7.177, relative change = 1.900e-05) 
## ....................................................................................................
## Completed E-Step (2 seconds). 
## Completed M-Step. 
## Completing Iteration 73 (approx. per word bound = -7.176, relative change = 1.808e-05) 
## ....................................................................................................
## Completed E-Step (2 seconds). 
## Completed M-Step. 
## Completing Iteration 74 (approx. per word bound = -7.176, relative change = 1.739e-05) 
## ....................................................................................................
## Completed E-Step (2 seconds). 
## Completed M-Step. 
## Completing Iteration 75 (approx. per word bound = -7.176, relative change = 1.515e-05) 
## Topic 1: tax, house, congress, bill, senate 
##  Topic 2: billion, deficit, budget, spending, fiscal 
##  Topic 3: federal, rates, reserve, interest, fed 
##  Topic 4: even, people, way, day, new 
##  Topic 5: president, mr, bush, obama, campaign 
##  Topic 6: prices, dollar, oil, yesterday, treasury 
##  Topic 7: street, wall, financial, money, treasury 
##  Topic 8: said, auto, long, prime, industry 
##  Topic 9: one, editor, us, column, money 
##  Topic 10: corporate, companies, year, retailers, sales 
##  Topic 11: chief, thursday, wednesday, monday, tuesday 
##  Topic 12: percent, prices, inflation, consumer, increase 
##  Topic 13: country, cut, time, government, new 
##  Topic 14: editor, policy, american, economic, business 
##  Topic 15: foreign, said, united, american, trade 
##  Topic 16: economy, recession, recovery, economic, economists 
##  Topic 17: said, economic, economy, report, index 
##  Topic 18: time, new, now, last, texas 
##  Topic 19: market, stock, investors, stocks, dow 
##  Topic 20: york, new, mr, advertising, gov 
##  Topic 21: new, market, business, black, companies 
##  Topic 22: financial, crisis, banks, credit, bank 
##  Topic 23: percent, rate, annual, quarter, year 
##  Topic 24: president, economic, mr, reagan, administration 
##  Topic 25: unemployment, jobs, workers, labor, percent 
##  Topic 26: can, economic, people, like, much 
##  Topic 27: states, united, world, american, economic 
##  Topic 28: health, care, social, education, national 
##  Topic 29: percent, sales, department, said, reported 
##  Topic 30: lead, million, week, year, last 
## ....................................................................................................
## Completed E-Step (2 seconds). 
## Completed M-Step. 
## Completing Iteration 76 (approx. per word bound = -7.176, relative change = 1.895e-05) 
## ....................................................................................................
## Completed E-Step (2 seconds). 
## Completed M-Step. 
## Completing Iteration 77 (approx. per word bound = -7.176, relative change = 1.951e-05) 
## ....................................................................................................
## Completed E-Step (2 seconds). 
## Completed M-Step. 
## Completing Iteration 78 (approx. per word bound = -7.176, relative change = 1.855e-05) 
## ....................................................................................................
## Completed E-Step (2 seconds). 
## Completed M-Step. 
## Completing Iteration 79 (approx. per word bound = -7.176, relative change = 1.563e-05) 
## ....................................................................................................
## Completed E-Step (2 seconds). 
## Completed M-Step. 
## Completing Iteration 80 (approx. per word bound = -7.176, relative change = 1.568e-05) 
## Topic 1: tax, house, congress, bill, senate 
##  Topic 2: billion, budget, deficit, spending, fiscal 
##  Topic 3: federal, rates, reserve, interest, fed 
##  Topic 4: even, people, way, day, new 
##  Topic 5: president, mr, bush, obama, campaign 
##  Topic 6: prices, dollar, oil, yesterday, treasury 
##  Topic 7: street, wall, financial, money, treasury 
##  Topic 8: said, auto, long, prime, industry 
##  Topic 9: one, editor, us, column, money 
##  Topic 10: corporate, companies, year, sales, retailers 
##  Topic 11: chief, thursday, wednesday, monday, tuesday 
##  Topic 12: percent, prices, inflation, consumer, increase 
##  Topic 13: country, cut, time, government, new 
##  Topic 14: editor, policy, american, economic, business 
##  Topic 15: foreign, said, united, american, states 
##  Topic 16: economy, recession, recovery, economic, economists 
##  Topic 17: said, economic, report, economy, index 
##  Topic 18: time, new, now, texas, last 
##  Topic 19: market, stock, investors, stocks, dow 
##  Topic 20: new, york, mr, advertising, gov 
##  Topic 21: new, market, business, black, companies 
##  Topic 22: financial, crisis, banks, credit, bank 
##  Topic 23: percent, rate, quarter, annual, year 
##  Topic 24: president, mr, economic, reagan, administration 
##  Topic 25: unemployment, jobs, workers, labor, percent 
##  Topic 26: can, economic, people, like, much 
##  Topic 27: states, united, world, american, economic 
##  Topic 28: health, care, social, education, national 
##  Topic 29: percent, sales, department, said, reported 
##  Topic 30: week, lead, million, year, last 
## ....................................................................................................
## Completed E-Step (2 seconds). 
## Completed M-Step. 
## Completing Iteration 81 (approx. per word bound = -7.175, relative change = 1.802e-05) 
## ....................................................................................................
## Completed E-Step (2 seconds). 
## Completed M-Step. 
## Completing Iteration 82 (approx. per word bound = -7.175, relative change = 1.657e-05) 
## ....................................................................................................
## Completed E-Step (3 seconds). 
## Completed M-Step. 
## Completing Iteration 83 (approx. per word bound = -7.175, relative change = 1.481e-05) 
## ....................................................................................................
## Completed E-Step (2 seconds). 
## Completed M-Step. 
## Completing Iteration 84 (approx. per word bound = -7.175, relative change = 1.231e-05) 
## ....................................................................................................
## Completed E-Step (2 seconds). 
## Completed M-Step. 
## Completing Iteration 85 (approx. per word bound = -7.175, relative change = 1.021e-05) 
## Topic 1: tax, house, congress, bill, senate 
##  Topic 2: billion, budget, deficit, spending, fiscal 
##  Topic 3: federal, rates, reserve, interest, fed 
##  Topic 4: even, people, way, day, new 
##  Topic 5: president, mr, bush, obama, campaign 
##  Topic 6: prices, dollar, oil, yesterday, bond 
##  Topic 7: street, wall, financial, money, treasury 
##  Topic 8: said, auto, long, prime, industry 
##  Topic 9: one, editor, us, column, op-ed 
##  Topic 10: corporate, companies, year, sales, retailers 
##  Topic 11: chief, thursday, wednesday, monday, tuesday 
##  Topic 12: percent, prices, inflation, consumer, increase 
##  Topic 13: country, cut, time, government, new 
##  Topic 14: editor, policy, american, economic, business 
##  Topic 15: foreign, said, united, states, american 
##  Topic 16: economy, recession, economic, recovery, economists 
##  Topic 17: said, economic, report, economy, index 
##  Topic 18: time, new, now, texas, last 
##  Topic 19: market, stock, investors, stocks, dow 
##  Topic 20: new, york, mr, advertising, gov 
##  Topic 21: new, market, business, black, companies 
##  Topic 22: financial, crisis, banks, credit, bank 
##  Topic 23: percent, rate, quarter, annual, year 
##  Topic 24: president, mr, economic, reagan, administration 
##  Topic 25: unemployment, jobs, workers, labor, percent 
##  Topic 26: can, economic, people, like, much 
##  Topic 27: states, united, world, american, economic 
##  Topic 28: health, care, social, education, national 
##  Topic 29: percent, sales, department, said, reported 
##  Topic 30: week, lead, million, year, last 
## ....................................................................................................
## Completed E-Step (2 seconds). 
## Completed M-Step. 
## Completing Iteration 86 (approx. per word bound = -7.175, relative change = 1.023e-05) 
## ....................................................................................................
## Completed E-Step (2 seconds). 
## Completed M-Step. 
## Model Converged
save(stm, file="../backup/stm-output.Rdata")

stm offers a series of features to explore the output. First, just like LDA, we can look at the words that are most associated with each topic.

load("../backup/stm-output.Rdata")
# looking at a few topics
labelTopics(stm, topics=1)
## Topic 1 Top Words:
##       Highest Prob: tax, house, congress, bill, senate, president, plan 
##       FREX: y, senate, bill, package, stimulus, tax, republicans 
##       Lift: ackerman, gingrich's, boren, moynihan, newt, reid, y 
##       Score: y, tax, senate, democrats, republicans, bill, congress
labelTopics(stm, topics=4)
## Topic 4 Top Words:
##       Highest Prob: even, people, way, day, new, like, lot 
##       FREX: park, friend, tables, lot, death, ms, always 
##       Lift: aspect, bedford, brand-new, condos, hyde, installed, jeep 
##       Score: park, lot, starbucks, lake, young, death, ms
labelTopics(stm, topics=5)
## Topic 5 Top Words:
##       Highest Prob: president, mr, bush, obama, campaign, economy, republican 
##       FREX: romney, kerry, election, presidential, campaign, voters, obama 
##       Lift: adversaries, advocacy, audacious, bachmann, beirut, big-spending, boasts 
##       Score: obama, voters, republican, bush, democrats, senator, campaign
labelTopics(stm, topics=7)
## Topic 7 Top Words:
##       Highest Prob: street, wall, financial, money, treasury, paulson, plan 
##       FREX: street, paulson, goldman, wall, sachs, henry, rescue 
##       Lift: aviation, hobbled, lehman's, backstop, bondholders, disliked, likewise 
##       Score: wall, street, paulson, goldman, sachs, bailout, henry
labelTopics(stm, topics=9)
## Topic 9 Top Words:
##       Highest Prob: one, editor, us, column, op-ed, money, like 
##       FREX: column, op-ed, re, editor, aug, us, generation 
##       Lift: 19th-century, admonitions, arisen, austere, backs, brooks, burn 
##       Score: editor, op-ed, column, re, us, krugman, galbraith

But unlike LDA, we now can estimate the effects of the features we considered into the prevalence of different topics

# effects
est <- estimateEffect(~repub, stm,
    uncertainty="None")
summary(est, topics=1)
## 
## Call:
## estimateEffect(formula = ~repub, stmobj = stm, uncertainty = "None")
## 
## 
## Topic 1:
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.054347   0.002644  20.556  < 2e-16 ***
## repub       -0.026158   0.003269  -8.001 1.53e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(est, topics=4)
## 
## Call:
## estimateEffect(formula = ~repub, stmobj = stm, uncertainty = "None")
## 
## 
## Topic 4:
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.054213   0.002583  20.991  < 2e-16 ***
## repub       -0.025965   0.003245  -8.001 1.52e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(est, topics=5)
## 
## Call:
## estimateEffect(formula = ~repub, stmobj = stm, uncertainty = "None")
## 
## 
## Topic 5:
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.054358   0.002631  20.660  < 2e-16 ***
## repub       -0.026228   0.003211  -8.168 3.93e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(est, topics=7)
## 
## Call:
## estimateEffect(formula = ~repub, stmobj = stm, uncertainty = "None")
## 
## 
## Topic 7:
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.054318   0.002542  21.364   <2e-16 ***
## repub       -0.026018   0.003110  -8.366   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(est, topics=9)
## 
## Call:
## estimateEffect(formula = ~repub, stmobj = stm, uncertainty = "None")
## 
## 
## Topic 9:
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.054526   0.002553  21.354  < 2e-16 ***
## repub       -0.026283   0.003258  -8.068 8.87e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Let’s say we’re interested in finding the most partisan topics. How would we do this?

# let's look at the structure of the output object...
names(est)
## [1] "parameters"  "topics"      "call"        "uncertainty" "formula"    
## [6] "data"        "modelframe"  "varlist"
length(est$parameters)
## [1] 30
est$parameters[[1]]
## [[1]]
## [[1]]$est
## (Intercept)       repub 
##  0.05431745 -0.02605660 
## 
## [[1]]$vcov
##               [,1]          [,2]
## [1,]  6.882598e-06 -6.882598e-06
## [2,] -6.882598e-06  1.058536e-05
# aha! we'll just extract the coefficients for each topic
coef <- se <- rep(NA, 30)
for (i in 1:30){
    coef[i] <- est$parameters[[i]][[1]]$est[2]
    se[i] <- sqrt(est$parameters[[i]][[1]]$vcov[2,2])
}

df <- data.frame(topic = 1:30, coef=coef, se=se)
df <- df[order(df$coef),] # sorting by "partisanship"
head(df[order(df$coef),])
##    topic        coef          se
## 1      1 -0.02605660 0.003253514
## 25    25 -0.01984025 0.003217467
## 4      4 -0.01932438 0.003585274
## 5      5 -0.01674535 0.004176024
## 26    26 -0.01529345 0.002361583
## 22    22 -0.01330527 0.003146045
tail(df[order(df$coef),])
##    topic        coef          se
## 29    29 0.008094138 0.004829440
## 11    11 0.009420292 0.002704950
## 30    30 0.015715169 0.001280006
## 16    16 0.017828492 0.002750289
## 23    23 0.024470820 0.003255611
## 24    24 0.024503662 0.003501498
# three most "democratic" topics
labelTopics(stm, topics=df$topic[1])
## Topic 1 Top Words:
##       Highest Prob: tax, house, congress, bill, senate, president, plan 
##       FREX: y, senate, bill, package, stimulus, tax, republicans 
##       Lift: ackerman, gingrich's, boren, moynihan, newt, reid, y 
##       Score: y, tax, senate, democrats, republicans, bill, congress
labelTopics(stm, topics=df$topic[2])
## Topic 25 Top Words:
##       Highest Prob: unemployment, jobs, workers, labor, percent, americans, job 
##       FREX: unemployment, jobs, census, bureau, workers, poverty, employment 
##       Lift: snail's, staple, 11.7, 18,000, 2007-9, 247,000, 39,000 
##       Score: jobs, unemployment, poverty, workers, labor, bureau, census
labelTopics(stm, topics=df$topic[3])
## Topic 4 Top Words:
##       Highest Prob: even, people, way, day, new, like, lot 
##       FREX: park, friend, tables, lot, death, ms, always 
##       Lift: aspect, bedford, brand-new, condos, hyde, installed, jeep 
##       Score: park, lot, starbucks, lake, young, death, ms
# three most "republican" topics
labelTopics(stm, topics=df$topic[30])
## Topic 24 Top Words:
##       Highest Prob: president, mr, economic, reagan, administration, today, white 
##       FREX: reagan, advisers, reagan's, secretary, president's, administration, council 
##       Lift: approves, federation's, fulfill, hubbard, jordan, panetta, regan's 
##       Score: reagan, secretary, white, reagan's, president, mr, administration
labelTopics(stm, topics=df$topic[29])
## Topic 23 Top Words:
##       Highest Prob: percent, rate, quarter, annual, year, growth, ago 
##       FREX: after-tax, quarter, gross, q, annual, iv, product 
##       Lift: 0.0, 1,150, 118,116, 118,350, 128.8, 133.4, 134.5 
##       Score: quarter, q, rate, gross, annual, percent, product
labelTopics(stm, topics=df$topic[28])
## Topic 16 Top Words:
##       Highest Prob: economy, recession, economic, recovery, economists, growth, months 
##       FREX: recovery, recession, economists, slow, outlook, predicted, economy 
##       Lift: anti-keynesian, chester, chew, destabilizing, discomfort, downfall, fosler 
##       Score: recession, recovery, economists, growth, economy, war, outlook

Let’s now try running a slightly more complex example where both prevalence and content are a function of covariates. Here we assume that topics discussed by Republicans and Democrats may be different, and also that the “meaning” of topics discussed may change over time.

# metadata into a data frame
meta <- data.frame(year=year, repub=repub)
# another run
stm <- stm(documents=cdfm, K=30, prevalence=~s(year)+repub,
    max.em.its=100, content=~repub, data=meta)
## Beginning Spectral Initialization 
##   Calculating the gram matrix...
##   Using only 10000 most frequent terms during initialization...
##   Finding anchor words...
##      ..............................
##   Recovering initialization...
##      ....................................................................................................
## Initialization complete.
## ....................................................................................................
## Completed E-Step (3 seconds). 
## ....................................................................................................
## Completed M-Step (60 seconds). 
## Completing Iteration 1 (approx. per word bound = -7.823) 
## ....................................................................................................
## Completed E-Step (3 seconds). 
## ....................................................................................................
## Completed M-Step (59 seconds). 
## Completing Iteration 2 (approx. per word bound = -7.409, relative change = 5.283e-02) 
## ....................................................................................................
## Completed E-Step (4 seconds). 
## ....................................................................................................
## Completed M-Step (54 seconds). 
## Completing Iteration 3 (approx. per word bound = -7.327, relative change = 1.111e-02) 
## ....................................................................................................
## Completed E-Step (2 seconds). 
## ....................................................................................................
## Completed M-Step (50 seconds). 
## Completing Iteration 4 (approx. per word bound = -7.298, relative change = 3.947e-03) 
## ....................................................................................................
## Completed E-Step (3 seconds). 
## ....................................................................................................
## Completed M-Step (50 seconds). 
## Completing Iteration 5 (approx. per word bound = -7.286, relative change = 1.595e-03) 
## Topic 1: dodd, moynihan, overwhelmingly, vote, schumer 
##  Topic 2: folly, cumulative, shortfall, widened, totaling 
##  Topic 3: strengthens, slumps, unaffected, inflation-fighting, groundwork 
##  Topic 4: manhattan's, wretched, park, jacket, gilded 
##  Topic 5: challenger, presumptive, referendum, combative, candidacy 
##  Topic 6: cocoa, mercantile, pared, intraday, ounce 
##  Topic 7: resurrect, wall, street, chrysler, hobbled 
##  Topic 8: thatcher, stocked, margaret, hall, truism 
##  Topic 9: ancient, down-and-out, stranger, sport, neighbor 
##  Topic 10: disheartening, suits, takeovers, franchise, integrity 
##  Topic 11: family-owned, stuart, incorrectly, copies, charles 
##  Topic 12: 7.05, 55.4, eke, d14, latitude 
##  Topic 13: dam, appalled, hoover, blowing, span 
##  Topic 14: kevin, enterprise, phillips, winner, tells 
##  Topic 15: harriman, decreed, creature, naturalization, player 
##  Topic 16: destabilizing, anti-keynesian, efficiently, downfall, bartlett 
##  Topic 17: 56.6, 61.5, cooled, purchasing, risking 
##  Topic 18: gadgets, livestock, sue, aluminum, 12,000 
##  Topic 19: long-predicted, ignited, beware, drifting, unknown 
##  Topic 20: apocalypse, sarah, smile, palin, flight 
##  Topic 21: madness, rite, high-quality, eric, entrepreneurship 
##  Topic 22: conjuring, bean, regulators, comptroller, giants 
##  Topic 23: flash, gross, 3.2, revision, quarter 
##  Topic 24: imperatives, repaired, buttresses, altman, untested 
##  Topic 25: rolls, 9.5, proportion, eliminated, nonfarm 
##  Topic 26: novelty, free-spending, strange, chicken, style 
##  Topic 27: disagreements, insights, euro, italy, germany 
##  Topic 28: wellness, epic, amazingly, mightiest, twists 
##  Topic 29: 353,000, 34,000, totaled, first-time, stockpiles 
##  Topic 30: rockefeller, building's, economic, said, new 
## Aspect 1: masses, blasted, draws, multiple, sweet 
##  Aspect 2: portion, anniversary, become, consulting, activities 
## ....................................................................................................
## Completed E-Step (2 seconds). 
## ....................................................................................................
## Completed M-Step (46 seconds). 
## Completing Iteration 6 (approx. per word bound = -7.278, relative change = 1.116e-03) 
## ....................................................................................................
## Completed E-Step (3 seconds). 
## ....................................................................................................
## Completed M-Step (59 seconds). 
## Completing Iteration 7 (approx. per word bound = -7.274, relative change = 5.656e-04) 
## ....................................................................................................
## Completed E-Step (2 seconds). 
## ....................................................................................................
## Completed M-Step (52 seconds). 
## Completing Iteration 8 (approx. per word bound = -7.271, relative change = 3.678e-04) 
## ....................................................................................................
## Completed E-Step (2 seconds). 
## ....................................................................................................
## Completed M-Step (56 seconds). 
## Completing Iteration 9 (approx. per word bound = -7.270, relative change = 1.562e-04) 
## ....................................................................................................
## Completed E-Step (4 seconds). 
## ....................................................................................................
## Completed M-Step (60 seconds). 
## Completing Iteration 10 (approx. per word bound = -7.269, relative change = 1.139e-04) 
## Topic 1: reid, dodd, moynihan, approved, overwhelmingly 
##  Topic 2: folly, shortfall, cumulative, billion, widened 
##  Topic 3: slumps, groundwork, strengthens, inflation-fighting, unaffected 
##  Topic 4: creators, park, manhattan's, pop, jacket 
##  Topic 5: challenger, referendum, candidacy, compassion, commander 
##  Topic 6: 4.50, cocoa, pared, mercantile, intraday 
##  Topic 7: resurrect, wall, street, stake, demanding 
##  Topic 8: thatcher, stocked, stone, margaret, stung 
##  Topic 9: ancient, down-and-out, sport, maureen, tolerance 
##  Topic 10: marcus, disheartening, takeovers, ambitions, suits 
##  Topic 11: family-owned, stuart, incorrectly, charles, copies 
##  Topic 12: vitner, 7.05, 55.4, 12-month, inched 
##  Topic 13: dam, appalled, hoover, blowing, dividend 
##  Topic 14: kevin, dies, auspices, enterprise, engineering 
##  Topic 15: harriman, seoul, decreed, creature, naturalization 
##  Topic 16: industrialized, slows, destabilizing, anti-keynesian, downfall 
##  Topic 17: purchasing, 52.9, 52.1, 56.6, 61.5 
##  Topic 18: gadgets, sue, aluminum, 12,000, trailer 
##  Topic 19: long-predicted, ignited, beware, dow, drifting 
##  Topic 20: apocalypse, sarah, palin, smile, newspapers 
##  Topic 21: rite, madness, high-quality, eric, sex 
##  Topic 22: conjuring, bean, regulators, mae, banks 
##  Topic 23: april-june, quarter, flash, gross, annual 
##  Topic 24: imperatives, council's, bentsen, repaired, altman 
##  Topic 25: unemployed, 9.5, employers, rolls, paychecks 
##  Topic 26: novelty, free-spending, climate, depression, chicken 
##  Topic 27: disagreements, insights, nations, italy, germany 
##  Topic 28: epic, wellness, amazingly, twists, mightiest 
##  Topic 29: 353,000, big-ticket, 34,000, durable, orders 
##  Topic 30: economic, said, building's, rockefeller, new 
## Aspect 1: alignment, masses, multiple, draws, sweet 
##  Aspect 2: affects, portion, anniversary, become, consulting 
## ....................................................................................................
## Completed E-Step (3 seconds). 
## ....................................................................................................
## Completed M-Step (53 seconds). 
## Model Converged
save(stm, file="../backup/stm-small-output.Rdata")

stm offers other functions to explore how content varies as a function of covariates. Let’s take a look.

load("../backup/stm-small-output.Rdata")

# summary
plot(stm, type = "summary", xlim = c(0, .3))

# how topics are different under republican (TRUE) presidents
plot(stm, type = "perspectives", topics = 6)

plot(stm, type = "perspectives", topics = 4)

# we can also compare specific topics
plot(stm, type = "perspectives", topics = c(1,10))

# prevalence over time
est <- estimateEffect(~s(year)+repub, 
    stm, uncertainty = "None", meta=meta)

plot(est, covariate="repub", topics=1:30,
    model=stm, method="difference",
    cov.value1=0, cov.value2=1,
    xlab = "More Democrats ... More Republicans",
    labeltype="custom", custom.labels=paste("Topic", 1:30))

plot(stm, type = "perspectives", topics = 12)

plot(est, "year", method="continuous", topics=1:2)

Choosing the number of topics

Finally, this is the code to generate the figure in the slides. Many moving parts here…

require(cvTools)
## Loading required package: cvTools
## Loading required package: lattice
## 
## Attaching package: 'lattice'
## The following object is masked from 'package:stm':
## 
##     cloud
## Loading required package: robustbase
cvLDA <- function(Ntopics,dtm,K=5) {
  folds<-cvFolds(nrow(dtm),K,1)
  perplex <- rep(NA,K)
  llk <- rep(NA,K)
  for(i in unique(folds$which)){
    cat(i, " ")
    which.test <- folds$subsets[folds$which==i]
    which.train <- {1:nrow(dtm)}[-which.test]
    dtm.train <- dtm[which.train,]
    dtm.test <- dtm[which.test,]
    lda.fit <- LDA(dtm.train, k=Ntopics, method="Gibbs",
        control=list(verbose=50L, iter=100))
    perplex[i] <- perplexity(lda.fit, convert(dtm.test, to="topicmodels"))
    llk[i] <- logLik(lda.fit)
  }
  return(list(K=Ntopics,perplexity=perplex,logLik=llk))
}
K <- c(20, 30, 40, 50, 60, 70)

results <- list()

i = 1
for (k in K){
    cat("\n\n\n##########\n ", k, "topics", "\n")
    res <- cvLDA(k, cdfm)
    results[[i]] <- res
    i = i + 1
}
## 
## 
## 
## ##########
##   20 topics 
## 1  K = 20; V = 11492; M = 4000
## Sampling 100 iterations!
## Iteration 50 ...
## Iteration 100 ...
## Gibbs sampling completed!
## K = 20; V = 11492; M = 1000
## Sampling 100 iterations!
## Iteration 50 ...
## Iteration 100 ...
## Gibbs sampling completed!
## 2  K = 20; V = 11492; M = 4000
## Sampling 100 iterations!
## Iteration 50 ...
## Iteration 100 ...
## Gibbs sampling completed!
## K = 20; V = 11492; M = 1000
## Sampling 100 iterations!
## Iteration 50 ...
## Iteration 100 ...
## Gibbs sampling completed!
## 3  K = 20; V = 11492; M = 4000
## Sampling 100 iterations!
## Iteration 50 ...
## Iteration 100 ...
## Gibbs sampling completed!
## K = 20; V = 11492; M = 1000
## Sampling 100 iterations!
## Iteration 50 ...
## Iteration 100 ...
## Gibbs sampling completed!
## 4  K = 20; V = 11492; M = 4000
## Sampling 100 iterations!
## Iteration 50 ...
## Iteration 100 ...
## Gibbs sampling completed!
## K = 20; V = 11492; M = 1000
## Sampling 100 iterations!
## Iteration 50 ...
## Iteration 100 ...
## Gibbs sampling completed!
## 5  K = 20; V = 11492; M = 4000
## Sampling 100 iterations!
## Iteration 50 ...
## Iteration 100 ...
## Gibbs sampling completed!
## K = 20; V = 11492; M = 1000
## Sampling 100 iterations!
## Iteration 50 ...
## Iteration 100 ...
## Gibbs sampling completed!
## 
## 
## 
## ##########
##   30 topics 
## 1  K = 30; V = 11492; M = 4000
## Sampling 100 iterations!
## Iteration 50 ...
## Iteration 100 ...
## Gibbs sampling completed!
## K = 30; V = 11492; M = 1000
## Sampling 100 iterations!
## Iteration 50 ...
## Iteration 100 ...
## Gibbs sampling completed!
## 2  K = 30; V = 11492; M = 4000
## Sampling 100 iterations!
## Iteration 50 ...
## Iteration 100 ...
## Gibbs sampling completed!
## K = 30; V = 11492; M = 1000
## Sampling 100 iterations!
## Iteration 50 ...
## Iteration 100 ...
## Gibbs sampling completed!
## 3  K = 30; V = 11492; M = 4000
## Sampling 100 iterations!
## Iteration 50 ...
## Iteration 100 ...
## Gibbs sampling completed!
## K = 30; V = 11492; M = 1000
## Sampling 100 iterations!
## Iteration 50 ...
## Iteration 100 ...
## Gibbs sampling completed!
## 4  K = 30; V = 11492; M = 4000
## Sampling 100 iterations!
## Iteration 50 ...
## Iteration 100 ...
## Gibbs sampling completed!
## K = 30; V = 11492; M = 1000
## Sampling 100 iterations!
## Iteration 50 ...
## Iteration 100 ...
## Gibbs sampling completed!
## 5  K = 30; V = 11492; M = 4000
## Sampling 100 iterations!
## Iteration 50 ...
## Iteration 100 ...
## Gibbs sampling completed!
## K = 30; V = 11492; M = 1000
## Sampling 100 iterations!
## Iteration 50 ...
## Iteration 100 ...
## Gibbs sampling completed!
## 
## 
## 
## ##########
##   40 topics 
## 1  K = 40; V = 11492; M = 4000
## Sampling 100 iterations!
## Iteration 50 ...
## Iteration 100 ...
## Gibbs sampling completed!
## K = 40; V = 11492; M = 1000
## Sampling 100 iterations!
## Iteration 50 ...
## Iteration 100 ...
## Gibbs sampling completed!
## 2  K = 40; V = 11492; M = 4000
## Sampling 100 iterations!
## Iteration 50 ...
## Iteration 100 ...
## Gibbs sampling completed!
## K = 40; V = 11492; M = 1000
## Sampling 100 iterations!
## Iteration 50 ...
## Iteration 100 ...
## Gibbs sampling completed!
## 3  K = 40; V = 11492; M = 4000
## Sampling 100 iterations!
## Iteration 50 ...
## Iteration 100 ...
## Gibbs sampling completed!
## K = 40; V = 11492; M = 1000
## Sampling 100 iterations!
## Iteration 50 ...
## Iteration 100 ...
## Gibbs sampling completed!
## 4  K = 40; V = 11492; M = 4000
## Sampling 100 iterations!
## Iteration 50 ...
## Iteration 100 ...
## Gibbs sampling completed!
## K = 40; V = 11492; M = 1000
## Sampling 100 iterations!
## Iteration 50 ...
## Iteration 100 ...
## Gibbs sampling completed!
## 5  K = 40; V = 11492; M = 4000
## Sampling 100 iterations!
## Iteration 50 ...
## Iteration 100 ...
## Gibbs sampling completed!
## K = 40; V = 11492; M = 1000
## Sampling 100 iterations!
## Iteration 50 ...
## Iteration 100 ...
## Gibbs sampling completed!
## 
## 
## 
## ##########
##   50 topics 
## 1  K = 50; V = 11492; M = 4000
## Sampling 100 iterations!
## Iteration 50 ...
## Iteration 100 ...
## Gibbs sampling completed!
## K = 50; V = 11492; M = 1000
## Sampling 100 iterations!
## Iteration 50 ...
## Iteration 100 ...
## Gibbs sampling completed!
## 2  K = 50; V = 11492; M = 4000
## Sampling 100 iterations!
## Iteration 50 ...
## Iteration 100 ...
## Gibbs sampling completed!
## K = 50; V = 11492; M = 1000
## Sampling 100 iterations!
## Iteration 50 ...
## Iteration 100 ...
## Gibbs sampling completed!
## 3  K = 50; V = 11492; M = 4000
## Sampling 100 iterations!
## Iteration 50 ...
## Iteration 100 ...
## Gibbs sampling completed!
## K = 50; V = 11492; M = 1000
## Sampling 100 iterations!
## Iteration 50 ...
## Iteration 100 ...
## Gibbs sampling completed!
## 4  K = 50; V = 11492; M = 4000
## Sampling 100 iterations!
## Iteration 50 ...
## Iteration 100 ...
## Gibbs sampling completed!
## K = 50; V = 11492; M = 1000
## Sampling 100 iterations!
## Iteration 50 ...
## Iteration 100 ...
## Gibbs sampling completed!
## 5  K = 50; V = 11492; M = 4000
## Sampling 100 iterations!
## Iteration 50 ...
## Iteration 100 ...
## Gibbs sampling completed!
## K = 50; V = 11492; M = 1000
## Sampling 100 iterations!
## Iteration 50 ...
## Iteration 100 ...
## Gibbs sampling completed!
## 
## 
## 
## ##########
##   60 topics 
## 1  K = 60; V = 11492; M = 4000
## Sampling 100 iterations!
## Iteration 50 ...
## Iteration 100 ...
## Gibbs sampling completed!
## K = 60; V = 11492; M = 1000
## Sampling 100 iterations!
## Iteration 50 ...
## Iteration 100 ...
## Gibbs sampling completed!
## 2  K = 60; V = 11492; M = 4000
## Sampling 100 iterations!
## Iteration 50 ...
## Iteration 100 ...
## Gibbs sampling completed!
## K = 60; V = 11492; M = 1000
## Sampling 100 iterations!
## Iteration 50 ...
## Iteration 100 ...
## Gibbs sampling completed!
## 3  K = 60; V = 11492; M = 4000
## Sampling 100 iterations!
## Iteration 50 ...
## Iteration 100 ...
## Gibbs sampling completed!
## K = 60; V = 11492; M = 1000
## Sampling 100 iterations!
## Iteration 50 ...
## Iteration 100 ...
## Gibbs sampling completed!
## 4  K = 60; V = 11492; M = 4000
## Sampling 100 iterations!
## Iteration 50 ...
## Iteration 100 ...
## Gibbs sampling completed!
## K = 60; V = 11492; M = 1000
## Sampling 100 iterations!
## Iteration 50 ...
## Iteration 100 ...
## Gibbs sampling completed!
## 5  K = 60; V = 11492; M = 4000
## Sampling 100 iterations!
## Iteration 50 ...
## Iteration 100 ...
## Gibbs sampling completed!
## K = 60; V = 11492; M = 1000
## Sampling 100 iterations!
## Iteration 50 ...
## Iteration 100 ...
## Gibbs sampling completed!
## 
## 
## 
## ##########
##   70 topics 
## 1  K = 70; V = 11492; M = 4000
## Sampling 100 iterations!
## Iteration 50 ...
## Iteration 100 ...
## Gibbs sampling completed!
## K = 70; V = 11492; M = 1000
## Sampling 100 iterations!
## Iteration 50 ...
## Iteration 100 ...
## Gibbs sampling completed!
## 2  K = 70; V = 11492; M = 4000
## Sampling 100 iterations!
## Iteration 50 ...
## Iteration 100 ...
## Gibbs sampling completed!
## K = 70; V = 11492; M = 1000
## Sampling 100 iterations!
## Iteration 50 ...
## Iteration 100 ...
## Gibbs sampling completed!
## 3  K = 70; V = 11492; M = 4000
## Sampling 100 iterations!
## Iteration 50 ...
## Iteration 100 ...
## Gibbs sampling completed!
## K = 70; V = 11492; M = 1000
## Sampling 100 iterations!
## Iteration 50 ...
## Iteration 100 ...
## Gibbs sampling completed!
## 4  K = 70; V = 11492; M = 4000
## Sampling 100 iterations!
## Iteration 50 ...
## Iteration 100 ...
## Gibbs sampling completed!
## K = 70; V = 11492; M = 1000
## Sampling 100 iterations!
## Iteration 50 ...
## Iteration 100 ...
## Gibbs sampling completed!
## 5  K = 70; V = 11492; M = 4000
## Sampling 100 iterations!
## Iteration 50 ...
## Iteration 100 ...
## Gibbs sampling completed!
## K = 70; V = 11492; M = 1000
## Sampling 100 iterations!
## Iteration 50 ...
## Iteration 100 ...
## Gibbs sampling completed!
## plot
df <- data.frame(
    k = rep(K, each=5),
    perp =  unlist(lapply(results, '[[', 'perplexity')),
    loglk = unlist(lapply(results, '[[', 'logLik')),
    stringsAsFactors=F)

min(df$perp)
## [1] 1878.043
df$ratio_perp <- df$perp / max(df$perp)
df$ratio_lk <- df$loglk / min(df$loglk)

df <- data.frame(cbind(
    aggregate(df$ratio_perp, by=list(df$k), FUN=mean),
    aggregate(df$ratio_perp, by=list(df$k), FUN=sd)$x,
    aggregate(df$ratio_lk, by=list(df$k), FUN=mean)$x,
    aggregate(df$ratio_lk, by=list(df$k), FUN=sd)$x),
    stringsAsFactors=F)
names(df) <- c("k", "ratio_perp", "sd_perp", "ratio_lk", "sd_lk")
library(reshape)
pd <- melt(df[,c("k","ratio_perp", "ratio_lk")], id.vars="k")
pd2 <- melt(df[,c("k","sd_perp", "sd_lk")], id.vars="k")
pd$sd <- pd2$value
levels(pd$variable) <- c("Perplexity", "LogLikelihood")

library(ggplot2)
library(grid)

p <- ggplot(pd, aes(x=k, y=value, linetype=variable))
pq <- p + geom_line() + geom_point(aes(shape=variable), 
        fill="white", shape=21, size=1.40) +
    geom_errorbar(aes(ymax=value+sd, ymin=value-sd), width=4) +
    scale_y_continuous("Ratio wrt worst value") +
    scale_x_continuous("Number of topics", 
        breaks=K) +
    theme_bw() 
pq