news – Kohei Watanabe

UncategorizedJune 18, 2015June 18, 2015

Sentence segmentation

I believe that sentence is the optimal unit of sentiment analysis, but splitting whole news articles into sentences is often tricky because there are a lot of quotations in news. If we simply chop up texts based on punctuations, we get quoted texts are split into different sentences. This code is meant to avoid such […]

UncategorizedJune 18, 2015December 22, 2019

Nexis news importer updated

I posted the code Nexis importer last year, but it tuned out that the HTML format of the database service is less consistent than I though, so I changed the logic. The new version is dependent less on the structure of the HTML files, but more on the format of the content. library(XML) #might need […]

UncategorizedDecember 3, 2014December 22, 2019

Text analysis dictionary on psychology

My automated dictionary creation project is making good progress, and I created a psychology dictionary from a large corpus of UK news on psychology from 1990 to 2011. Scores given to each entry word is interpreted as strength of association to psychology, and the list can be truncated based on the scores. The words are […]

UncategorizedOctober 7, 2014December 22, 2019

Testing immigration dictionary

After making some changes in my automated dictionary creation system, I ran a test to validate the word choice for the new immigration dictionary. Latest version contains fewer intuitively negative words with positive scores, unlike the original version. The test was performed by comparing the computer content-analysis with human coding of the 2010 UK manifestos. […]

UncategorizedAugust 5, 2014January 12, 2015

News data importer for R

In this April, I created a R scrip to import files downloaded from Nexis and Factiva. Factiva does not offer file download function, but its search results pages can be save as HTML files and imported to R using this script. library(XML) #might need libxml2-dev via apt-get command readNewsDir

UncategorizedAugust 4, 2014December 22, 2019

International Newsmap

I have been running a website called International Newsmap. It collects international news stories from news sites and classify them according to their geographic focus using Bayesian classifier and lexicon expansion technique. The sources of of news are English websites in the US, the UK, New Zealand, India, Singapore, Kenya, and South Africa. The main […]

Develop efficient custom functions using quanteda v4.0 – Kohei Watanabe on New tokens object in quanteda v4.0April 16, 2024
[…] most important change in quanteda v4.0 is the creation of the external pointer-based tokens object, called tokens_xptr, that allows…
Setting fonts to plot Chinese polarity words in LSS – Kohei Watanabe on New paper on historical geopolitical threats to the USFebruary 19, 2024
[…] models are measuring to others. I am using this function myself in my project on construction of a geopolitical…
New paper on semantic temporality analysis – Kohei Watanabe on New paper on Latent Semantic ScalingAugust 29, 2023
[…] on temporal orientation of texts appeared in Research & Politics. In this study we applied latent semantic scaling (LSS)…
Kohei on Tutorial websites on LSS and Seeded LDAAugust 26, 2023
Please use base R's set.seed() before running the command.
Marli Fernandes on Tutorial websites on LSS and Seeded LDAAugust 24, 2023
I am currently using the seededlda package. I am using the following code: slda <- textmodel_seededlda(dfmt, dict, residual = 2)…