I am happy to announce that our R package for semi-supervised document classification, newsmap is available on CRAN. This package is simple in terms of algorithms but comes with well-maintained geographical seed dictionaries in English, German, Spanish, Russian and Japanese. This package was created originally for geographical classification of news articles, but it can also […]
I have presented my latest study on Sputnik News at ECPR Hamburg. This study shows that Russia is using conspiracy theory in Sputnik News articles to promote anti-establishment sentiment in the United State and Britain. The paper and slides are available.
I have presented Quantitative Analysis of Textual Data with R at a TokyoR event on 15th July hosted by Yahoo Japan. This was a great opportunity for me to reach out broad Japanese R users and tell them how easy it is to analyze Asian texts using quanteda. It was also really nice to meet […]
In a presentation titled Internationalizing Text Analysis at a workshop on the 27th June at Waseda University, I and Oul Han discussed what obstructing adoption of quantitative text analysis techniques in Japan and Korea. Our question is why there are only few people who do quantitative analysis of Japanese and Korean texts, despite it is […]
We have added a new page to Quanteda Tutorials website on special handling of Japanese texts. This page will be used in Quantitative Political Methodology at Kobe University in the next week. This page summarizes my posts about Japanese text analysis in this blog. We are planing to add pages about other languages.
I presented a technique for a longitudinal analysis of media content at BEAMS (Behavioral and Experimental Analyses in Macro-finance) workshop at Waseda University.
I was invited to deliver a workshop on quantitative text analysis at PolText Incubator Workshop at Hungarian Academy of Science on 9 May 2018. Workshop materials are available in my Github repo.
At the LSE Computational Social Science hackathon, I presented how to develop text analysis models using quanteda‘s core API’s such as as.tokens(), as.dfm() and pattern2id(). All the slides and the files available are in my Github repository.
A lot of people are using quanteda to analyze social media posts because it is very fast and flexible, but they sometimes face dramatic slow down due to memory swapping caused by insufficient sizes of RAM. quanteda requires the size of RAM to be 5 times larger than the data to analyze, but it can […]
Until quanteda v1.1, our users needed to have R 3.4.0 installed, but we relax the requirement to R 3.1.0, because people working in companies or other large organizations often do not have latest version of R in their computers, and therefore cannot use our package. To investigate why quanteda requires R 3.4.0 quickly, I wrote […]