April 2017 – Kohei Watanabe

Text analysisApril 21, 2017December 22, 2019

Upcoming presentation at Waseda University

I am invited to present a new approach to comparative text analysis in a research seminar at Waseda Universtiy (Tokyo) on 17th. My talk is titled Data-driven approach to bilingual text analysis: representation of US foreign policy in Japanese and British newspapers in 1985-2016. Kohei Watanabe will present a new approach to text analysis of […]

Text analysisApril 20, 2017December 22, 2019

Redefining word boundaries by collocation analysis

Quanteda’s tokenizer can segment Japanese and Chinese texts thanks to stringi, but its results are not always good, because its underlying function, ICU, recognizes only limited number of words. For example, this Japanese text “ニューヨークのケネディ国際空港” can be translated to “Kennedy International Airport (ケネディ国際空港) in (の) New York (ニューヨーク)”. Quanteda’s tokenizer (tokens function) segments this into […]

Text analysisApril 20, 2017December 22, 2019

Analyzing Asian texts in R on English Windows machines

R is generally good with Unicode, and we do not see garbled texts as far as we use stringi package. But there are some known bugs. The worst is probably the bug that have been discussed on the online community. On Windows, R prints character vectors properly, but not character vectors in data.frame: > sessionInfo() […]

Text analysisApril 20, 2017December 22, 2019

R and Python text analysis packages performance comparison

Like many other people, I started text analysis in Python, because R was notoriously slow. Python looked like a perfect language for text analysis, and I did a lot of work during my PhD using gensim with home-grown tools. I loved gensim’s LSA that quickly and consistently decomposes very large document-feature matrices. However, I faced […]

Develop efficient custom functions using quanteda v4.0 – Kohei Watanabe on New tokens object in quanteda v4.0April 16, 2024
[…] most important change in quanteda v4.0 is the creation of the external pointer-based tokens object, called tokens_xptr, that allows…
Setting fonts to plot Chinese polarity words in LSS – Kohei Watanabe on New paper on historical geopolitical threats to the USFebruary 19, 2024
[…] models are measuring to others. I am using this function myself in my project on construction of a geopolitical…
New paper on semantic temporality analysis – Kohei Watanabe on New paper on Latent Semantic ScalingAugust 29, 2023
[…] on temporal orientation of texts appeared in Research & Politics. In this study we applied latent semantic scaling (LSS)…
Kohei on Tutorial websites on LSS and Seeded LDAAugust 26, 2023
Please use base R's set.seed() before running the command.
Marli Fernandes on Tutorial websites on LSS and Seeded LDAAugust 24, 2023
I am currently using the seededlda package. I am using the following code: slda <- textmodel_seededlda(dfmt, dict, residual = 2)…