I am happy to announce that our R package for semi-supervised document classification, newsmap is available on CRAN. This package is simple in terms of algorithms but comes with well-maintained geographical seed dictionaries in English, German, Spanish, Russian and Japanese.
This package was created originally for geographical classification of news articles, but it can also be used for other tasks such as topic classification. For example, I performed two-dimensional classification in my latest paper on conspiracy theory using tentative topical seed words:
economy: [market*, money, bank*, stock*, bond*, industry, company, shop*] politics: [parliament*, congress*, party leader*, party member*, voter*, lawmaker*, politician*] society: [police, prison*, school*, hospital*] diplomacy: [ambassador*, diplomat*, embassy, treaty] military: [military, soldier*, air force, marine, navy, army] nature: [water, wind, sand, forest, mountain, desert, animal, human]
In the mosaic plot, width and height of the columns show the proportions of counties and topics, respectively. Since the categories are per-defined here, it is much easier to interpret the result than in unsupervised topic classification by LDA.
By the way, if you want to produce the plot just pass a cross table of topic and country to
top <- head(sort(table(data$country), decreasing = TRUE), 20) tb <- table(data$country, data$topic)[names(top),] mosaicplot(tb, border = 0, col = RColorBrewer::brewer.pal(6, "Set2"), main = "")