Kohei Watanabe

Programing, Text analysisApril 16, 2024April 19, 2024

Develop efficient custom functions using quanteda v4.0

The most important change in quanteda v4.0 is the creation of the external pointer-based tokens object, called tokens_xptr, that allows us to write efficient custom functions. In earlier versions of the package, modification of tokens using tokens_*() required long execution time and large memory space because they transfer data between R and C++. Such inefficiency […]

Text analysisApril 7, 2024April 7, 2024

New tokens object in quanteda v4.0

I am very happy that we could release quanteda v4.0 after more than a year of development. For this release, I improved the package’s efficiency by creating a new tokens object and writing many internal functions in C++ to allow users to process millions of documents on a laptop (or tens of millions on a […]

Text analysisMarch 17, 2024March 17, 2024

Perform context-specific operations on tokens using a new argument

The bag-of-words approach is common in text analysis, but it has problems in distinguishing between meanings of words that depend on their contexts. In order to address this issue, we added an argument that allows users to select, remove or lookup tokens that occur in specific contexts in the new version of quanteda (v4.0). We […]

Programing, Text analysisMarch 9, 2024March 9, 2024

Group polarity words with different colors in LSS

We should always ensure that we are measuring what we want to measure in text analysis. In Latent Semantic Scaling (LSS), we can asses the validity of measurement by inspecting polarity scores of words using LSX::textplot_terms(). This function automatically selects words with very high or low polarity scores and highlights them. We can confirm that […]

Chinese, Programing, Text analysisFebruary 19, 2024February 19, 2024

Setting fonts for Chinese polarity words in LSS

I always recommend users of the LSX package visualizing polarity words using textplot_terms() because it allows them to explain intuitively what their fitted LSS models are measuring to others. I am using this function myself in my project on construction of a geopolitical threat index (GTI) for multiple countries that include China and Japan, but […]

Programing, Text analysisAugust 10, 2023August 10, 2023

Use pre-trained Glove word vectors in LSS

I mad it possible to use pre-trained word vector for Latent Semantic Scaling (LSS) in the version 0.9 of the LSX package, but I don’t think I explained how to do. It can be done easily by using the as.textmodel_lss() function but you need to load the word vectors to R as dense matrix beforehand. […]

Text analysisJuly 24, 2023February 19, 2024

Tutorial websites on LSS and Seeded LDA

I have written about my packages in different places including in my blog posts, but I decided to explain how to use them in dedicated websites about Latent Semantic Scaling and Seeded LDA. I though this is necessary because the methodology with these packages are getting more established with new functions that I added to […]

Event, Japanese, Text analysisJune 12, 2023June 14, 2023

日本メディア学会の研究会での討論

６月１１日の「メディア研究における量的テキスト分析の動向」と題した日本メディア学会の研究会に討論者として参加させてもらいました。于海春さんの中国のメディア統制に関する重要な研究を、討論を通じてテキスト分析の観点からより広い文脈に位置付けるように試みたつもりです。60人以上の方がオンラインで出席してくれたようでとても有意義なものでした。内容の要約は発表に用いたスライドを見てください。また、質疑応答の際に約束したように種語の選別の仕方についてのページも作成したので活用してください。

Programing, Publication, Text analysisMay 31, 2023June 5, 2023

New papers on distributed LDA for sentence-level topic classification

I have been studying and developing an LDA algorithm for classification of sentences since 2022. Sentence-level topic classification allows us to analyze association between topics and other properties such as sentiments within documents. Also, sentence-level analysis has become more common in text analysis in general thanks to highly capable transformer models in recent years. My […]

Develop efficient custom functions using quanteda v4.0 – Kohei Watanabe on New tokens object in quanteda v4.0April 16, 2024
[…] most important change in quanteda v4.0 is the creation of the external pointer-based tokens object, called tokens_xptr, that allows…
Setting fonts to plot Chinese polarity words in LSS – Kohei Watanabe on New paper on historical geopolitical threats to the USFebruary 19, 2024
[…] models are measuring to others. I am using this function myself in my project on construction of a geopolitical…
New paper on semantic temporality analysis – Kohei Watanabe on New paper on Latent Semantic ScalingAugust 29, 2023
[…] on temporal orientation of texts appeared in Research & Politics. In this study we applied latent semantic scaling (LSS)…
Kohei on Tutorial websites on LSS and Seeded LDAAugust 26, 2023
Please use base R's set.seed() before running the command.
Marli Fernandes on Tutorial websites on LSS and Seeded LDAAugust 24, 2023
I am currently using the seededlda package. I am using the following code: slda <- textmodel_seededlda(dfmt, dict, residual = 2)…