August 2014 – Kohei Watanabe

UncategorizedAugust 16, 2014September 25, 2014

Mitigating performance agony of R

R users often complain about the language’s performance. For shorter running time, we should avoid computationally loops, but we sometimes need to use long loops since there is only a few core functions in R for textual analysis. I wish the developers of the R core to address this issue, but we can mitigate the […]

UncategorizedAugust 7, 2014December 22, 2019

Import UK parliamentary debate data in Python

I tried to import UK parliamentary debates into R, but it seems that Hansard reports are too large for R. R also has very poor in handling different character coding, so I gave up with R and wrote an importer in Python. The Python script imports the XML into MySQL database. #!/usr/bin/python # -*- coding: […]

UncategorizedAugust 7, 2014December 2, 2020

Import UK parliamentary debate data into R

Debates in UK parliament is transcribed and published online as Hansard, but not easy to scrape all the texts from the website. A much better source of parliament debate data is ParlParse, a website of TheyWorkForYou. On the website, Hansard reports are provided in XML files. Yet, we still have to write a script to […]

UncategorizedAugust 5, 2014January 12, 2015

News data importer for R

In this April, I created a R scrip to import files downloaded from Nexis and Factiva. Factiva does not offer file download function, but its search results pages can be save as HTML files and imported to R using this script. library(XML) #might need libxml2-dev via apt-get command readNewsDir

UncategorizedAugust 4, 2014December 22, 2019

International Newsmap

I have been running a website called International Newsmap. It collects international news stories from news sites and classify them according to their geographic focus using Bayesian classifier and lexicon expansion technique. The sources of of news are English websites in the US, the UK, New Zealand, India, Singapore, Kenya, and South Africa. The main […]

Develop efficient custom functions using quanteda v4.0 – Kohei Watanabe on New tokens object in quanteda v4.0April 16, 2024
[…] most important change in quanteda v4.0 is the creation of the external pointer-based tokens object, called tokens_xptr, that allows…
Setting fonts to plot Chinese polarity words in LSS – Kohei Watanabe on New paper on historical geopolitical threats to the USFebruary 19, 2024
[…] models are measuring to others. I am using this function myself in my project on construction of a geopolitical…
New paper on semantic temporality analysis – Kohei Watanabe on New paper on Latent Semantic ScalingAugust 29, 2023
[…] on temporal orientation of texts appeared in Research & Politics. In this study we applied latent semantic scaling (LSS)…
Kohei on Tutorial websites on LSS and Seeded LDAAugust 26, 2023
Please use base R's set.seed() before running the command.
Marli Fernandes on Tutorial websites on LSS and Seeded LDAAugust 24, 2023
I am currently using the seededlda package. I am using the following code: slda <- textmodel_seededlda(dfmt, dict, residual = 2)…